Re: [PATCH] [RFC] Pass a valid token to rats_call() in phyp-dump code.
Yes, That is required. It is in the patches that I sent to Ben, Paul Brad. I just waiting to post it with other patches. Acked-by: Manish Ahuja mahu...@gmail.com Tony Breeds wrote: ibm_configure_kernel_dump, is passed as the token to rtas_call() but I cannot see where it is initialised. Set it to something sane? Signed-off-by: Tony Breeds t...@bakeyournoodle.com --- arch/powerpc/platforms/pseries/phyp_dump.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/platforms/pseries/phyp_dump.c b/arch/powerpc/platforms/pseries/phyp_dump.c index 16e659a..6cf35cd 100644 --- a/arch/powerpc/platforms/pseries/phyp_dump.c +++ b/arch/powerpc/platforms/pseries/phyp_dump.c @@ -414,6 +414,8 @@ static int __init phyp_dump_setup(void) of_node_put(rtas); } + ibm_configure_kernel_dump = rtas_token(ibm,configure-kernel-dump); + print_dump_header(dump_header); dump_area_length = init_dump_header(phdr); /* align down */ -- -- Manish Ahuja Linux RAS Engineer. IBM Linux Technology Center mah...@us.ibm.com 512-838-1928, t/l 678-1928. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Protect against NULL pointer deref in phyp-dump code.
Acked-by: Manish Ahuja mahu...@gmail.com Tony Breeds wrote: print_dump_header() will be called at least once with a NULL pointer in a normal boot sequence. if DEBUG is defined then we will get a deref, add a quick fix to exit early in the NULL pointer case. Signed-off-by: Tony Breeds t...@bakeyournoodle.com --- arch/powerpc/platforms/pseries/phyp_dump.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/platforms/pseries/phyp_dump.c b/arch/powerpc/platforms/pseries/phyp_dump.c index edbc012..16e659a 100644 --- a/arch/powerpc/platforms/pseries/phyp_dump.c +++ b/arch/powerpc/platforms/pseries/phyp_dump.c @@ -130,6 +130,9 @@ static unsigned long init_dump_header(struct phyp_dump_header *ph) static void print_dump_header(const struct phyp_dump_header *ph) { #ifdef DEBUG + if (ph == NULL) + return; + printk(KERN_INFO dump header:\n); /* setup some ph-sections required */ printk(KERN_INFO version = %d\n, ph-version); -- -- Manish Ahuja Linux RAS Engineer. IBM Linux Technology Center mah...@us.ibm.com 512-838-1928, t/l 678-1928. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] pseries: phyp dump: Variable size reserve space.
Yeah, that makes sense, I will shortly send a documentation patch for all the boot vars that I have added. Thanks for reminding. -Manish Linas Vepstas wrote: On 07/04/2008, Manish Ahuja [EMAIL PROTECTED] wrote: A small proposed change in the amount of reserve space we allocate during boot. Currently we reserve 256MB only. The proposed change does one of the 3 things. A. It checks to see if there is cmdline variable set and if found sets the value to it. OR B. It computes 5% of total ram and rounds it down to multiples of 256MB. AND C. Compares the rounded down value and returns larger of two values, the new computed value or 256MB. Again this is for large systems who have excess memory. [...] early_param(phyp_dump, early_phyp_dump_enabled); I'm pretty sure you will want to document this boot param in the documentation, as well as add a few words about why it might be interesting to users (i.e. that its for large systems...) --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] pseries: phyp dump: Variable size reserve space.
Paul, The aim is to have more flex space for the kernel on machines with more resources. Although the dump will be collected pretty fast and the memory released really early on allowing the machine to have the full memory available, this alleviates any issues that can be caused by having way too little memory on very very large systems during those few minutes. -Manish Paul Mackerras wrote: Manish Ahuja writes: B. It computers 5% of total ram and rounds it down to multiples of 256MB. C. Compares the rounded down value and returns larger of 256MB or the new computed value. So if we have 10GB of memory or more we'll use reserve more than 256MB. What is the advantage of reserving more than 256MB of memory? Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] pseries: phyp dump: Variable size reserve space.
Reposting patch with following changes: 1. Changed phyp_dump_reserve_bootvar to just reserve_bootvar. 2. Changed 0x0001fff to 0x0fffUL. Paulus, If you think this is okay can you send this upstream ? Many thanks, Manish A small proposed change in the amount of reserve space we allocate during boot. Currently we reserve 256MB only. The proposed change does one of the 3 things. A. It checks to see if there is boot variable set and if found sets the value to it. B. It computers 5% of total ram and rounds it down to multiples of 256MB. C. Compares the rounded down value and returns larger of 256MB or the new computed value. Again this is for large systems who have excess memory. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] --- arch/powerpc/kernel/prom.c | 35 +++-- arch/powerpc/platforms/pseries/phyp_dump.c |9 +++ include/asm-powerpc/phyp_dump.h|4 ++- 3 files changed, 45 insertions(+), 3 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-04-02 23:36:51.0 -0500 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-04-11 23:54:34.0 -0500 @@ -496,3 +496,12 @@ static int __init early_phyp_dump_enable } early_param(phyp_dump, early_phyp_dump_enabled); +/* Look for phyp_dump_reserve_size= cmdline option */ +static int __init early_phyp_dump_reserve_size(char *p) +{ +if (p) + phyp_dump_info-reserve_bootvar = memparse(p, p); + +return 0; +} +early_param(phyp_dump_reserve_size, early_phyp_dump_reserve_size); Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h === --- 2.6.25-rc1.orig/include/asm-powerpc/phyp_dump.h 2008-04-02 23:36:49.0 -0500 +++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h 2008-04-11 23:53:10.0 -0500 @@ -24,8 +24,10 @@ struct phyp_dump { /* Memory that is reserved during very early boot. */ unsigned long init_reserve_start; unsigned long init_reserve_size; - /* Check status during boot if dump supported, active present*/ + /* cmd line options during boot */ + unsigned long reserve_bootvar; unsigned long phyp_dump_at_boot; + /* Check status during boot if dump supported, active present*/ unsigned long phyp_dump_configured; unsigned long phyp_dump_is_active; /* store cpu hpte size */ Index: 2.6.25-rc1/arch/powerpc/kernel/prom.c === --- 2.6.25-rc1.orig/arch/powerpc/kernel/prom.c 2008-04-02 23:36:49.0 -0500 +++ 2.6.25-rc1/arch/powerpc/kernel/prom.c 2008-04-11 23:53:48.0 -0500 @@ -1042,6 +1042,33 @@ static void __init early_reserve_mem(voi #ifdef CONFIG_PHYP_DUMP /** + * phyp_dump_calculate_reserve_size() - reserve variable boot area 5% or arg + * + * Function to find the largest size we need to reserve + * during early boot process. + * + * It either looks for boot param and returns that OR + * returns larger of 256 or 5% rounded down to multiples of 256MB. + * + */ +static inline unsigned long phyp_dump_calculate_reserve_size(void) +{ + unsigned long tmp; + + if (phyp_dump_info-reserve_bootvar) + return phyp_dump_info-reserve_bootvar; + + /* divide by 20 to get 5% of value */ + tmp = lmb_end_of_DRAM(); + do_div(tmp, 20); + + /* round it down in multiples of 256 */ + tmp = tmp ~0x0FFFUL; + + return (tmp PHYP_DUMP_RMR_END ? tmp : PHYP_DUMP_RMR_END); +} + +/** * phyp_dump_reserve_mem() - reserve all not-yet-dumped mmemory * * This routine may reserve memory regions in the kernel only @@ -1054,6 +1081,8 @@ static void __init early_reserve_mem(voi static void __init phyp_dump_reserve_mem(void) { unsigned long base, size; + unsigned long variable_reserve_size; + if (!phyp_dump_info-phyp_dump_configured) { printk(KERN_ERR Phyp-dump not supported on this hardware\n); return; @@ -1064,9 +1093,11 @@ static void __init phyp_dump_reserve_mem return; } + variable_reserve_size = phyp_dump_calculate_reserve_size(); + if (phyp_dump_info-phyp_dump_is_active) { /* Reserve *everything* above RMR.Area freed by userland tools*/ - base = PHYP_DUMP_RMR_END; + base = variable_reserve_size; size = lmb_end_of_DRAM() - base; /* XXX crashed_ram_end is wrong, since it may be beyond @@ -1078,7 +1109,7 @@ static void __init phyp_dump_reserve_mem } else { size = phyp_dump_info-cpu_state_size + phyp_dump_info-hpte_region_size + - PHYP_DUMP_RMR_END
Re: [PATCH] pseries: phyp dump: Variable size reserve space.
Olof Johansson wrote: These make for some really long variable names and lines. I know from experience, since I've picked unneccessary long driver names in the past myself. :) How about just naming the new variables reserve_bootvar, etc? The name of the struct they're in makes it obvious what they're for. Yeah, I guess thats a good suggestion. Will truncate it. +static inline unsigned long phyp_dump_calculate_reserve_size(void) +{ +unsigned long tmp; + +if (phyp_dump_info-phyp_dump_reserve_bootvar) +return phyp_dump_info-phyp_dump_reserve_bootvar; + +/* divide by 20 to get 5% of value */ +tmp = lmb_end_of_DRAM(); +do_div(tmp, 20); + +/* round it down in multiples of 256 */ +tmp = tmp ~0x1FFF; That's 512MB, isn't it? No, its 5 % of memory and then rounded down to 256 MB multiples. so if you 4GB its 256MB. if you have 8 GB its 512 MB etc. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] pseries: phyp dump: Variable size reserve space.
Olof Johansson wrote: +static inline unsigned long phyp_dump_calculate_reserve_size(void) +{ +unsigned long tmp; + +if (phyp_dump_info-phyp_dump_reserve_bootvar) +return phyp_dump_info-phyp_dump_reserve_bootvar; + +/* divide by 20 to get 5% of value */ +tmp = lmb_end_of_DRAM(); +do_div(tmp, 20); + +/* round it down in multiples of 256 */ +tmp = tmp ~0x1FFF; That's 512MB, isn't it? My calculations in the example I gave in the last email were wrong. In mentally did 10% instead of 5%. But the premise is same. So assuming 5% of some memory is 400 MB, it rounds it down to 256MB etc. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] pseries: phyp dump: Variable size reserve space.
Hmmm, You are possibly right. Okay I can check and fix that. -Manish Olof Johansson wrote: That's 512MB, isn't it? My calculations in the example I gave in the last email were wrong. In mentally did 10% instead of 5%. But the premise is same. So assuming 5% of some memory is 400 MB, it rounds it down to 256MB etc. But 0x1fff is 512MB, not 256MB. So you're rounding it down to a multiple of 512MB. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
The following series of patches implement a basic framework for hypervisor-assisted dump. The very first patch provides documentation explaining what this is. A list of open issues / todo list is included in the documentation. It also appears that the not-yet-released firmware versions this was tested on are still,incomplete; this work is also pending. The following is a list of changes from previous version: - Deleted ifdef CONFIG_PHYP_DUMP from early_init_dt_scan_phyp_dump function. - Changed reserve_crashed_mem() to phyp_dump_reserve_mem() as suggested. - Added #ifdef CONFIG_PHYP_DUMP around of_scan_flat_dt call, removed empty function from header file. - Changed phyp_dump_global to phyp_dump_vars. - Changed style issues at several places. Manish Linas. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 1/8] pseries: phyp dump: Documentation
Basic documentation for hypervisor-assisted dump. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Signed-off-by: Manish Ahuja [EMAIL PROTECTED] --- Documentation/powerpc/phyp-assisted-dump.txt | 127 +++ 1 file changed, 127 insertions(+) Index: 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt === --- /dev/null 1970-01-01 00:00:00.0 + +++ 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt 2008-02-18 03:22:33.0 -0600 @@ -0,0 +1,127 @@ + + Hypervisor-Assisted Dump + + November 2007 + +The goal of hypervisor-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +As compared to kdump or other strategies, hypervisor-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- As the dump is performed, the dumped memory becomes + immediately available to the system for normal use. +-- After the dump is completed, no further reboots are + required; the system will be fully usable, and running + in it's normal, production mode on it normal kernel. + +The above can only be accomplished by coordination with, +and assistance from the hypervisor. The procedure is +as follows: + +-- When a system crashes, the hypervisor will save + the low 256MB of RAM to a previously registered + save region. It will also save system state, system + registers, and hardware PTE's. + +-- After the low 256MB area has been saved, the + hypervisor will reset PCI and other hardware state. + It will *not* clear RAM. It will then launch the + bootloader, as normal. + +-- The freshly booted kernel will notice that there + is a new node (ibm,dump-kernel) in the device tree, + indicating that there is crash data available from + a previous boot. It will boot into only 256MB of RAM, + reserving the rest of system memory. + +-- Userspace tools will parse /sys/kernel/release_region + and read /proc/vmcore to obtain the contents of memory, + which holds the previous crashed kernel. The userspace + tools may copy this info to disk, or network, nas, san, + iscsi, etc. as desired. + + For Example: the values in /sys/kernel/release-region + would look something like this (address-range pairs). + CPU:0x177fee000-0x1: HPTE:0x177ffe020-0x1000: / + DUMP:0x177fff020-0x1000, 0x1000-0x16F1D370A + +-- As the userspace tools complete saving a portion of + dump, they echo an offset and size to + /sys/kernel/release_region to release the reserved + memory back to general use. + + An example of this is: + echo 0x4000 0x1000 /sys/kernel/release_region + which will release 256MB at the 1GB boundary. + +Please note that the hypervisor-assisted dump feature +is only available on Power6-based systems with recent +firmware versions. + +Implementation details: +-- + +During boot, a check is made to see if firmware supports +this feature on this particular machine. If it does, then +we check to see if a active dump is waiting for us. If yes +then everything but 256 MB of RAM is reserved during early +boot. This area is released once we collect a dump from user +land scripts that are run. If there is dump data, then +the /sys/kernel/release_region file is created, and +the reserved memory is held. + +If there is no waiting dump data, then only the highest +256MB of the ram is reserved as a scratch area. This area +is *not* be released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy +of the low 256MB in the case a crash does occur. See, +however, open issues below, as to whether +such a reserved region is really needed. + +Currently the dump will be copied from /proc/vmcore to a +a new file upon user intervention. The starting address +to be read and the range for each data point in provided +in /sys/kernel/release_region. + +The tools to examine the dump will be same as the ones +used for kdump. + +General notes: +-- +Security: please note that there are potential security issues +with any sort of dump mechanism. In particular, plaintext +(unencrypted) data, and possibly passwords, may be present in +the dump data. Userspace tools must take adequate precautions to +preserve security. + +Open issues/ToDo: + + o The various code paths that tell the hypervisor that a crash + occurred, vs. it simply being a normal reboot, should be + reviewed, and possibly clarified/fixed. + + o Instead of using /sys/kernel, should there be a /sys/dump + instead
[PATCH 2/8] pseries: phyp dump: reserve-release
Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] --- arch/powerpc/kernel/prom.c | 52 ++ arch/powerpc/platforms/pseries/Makefile|1 arch/powerpc/platforms/pseries/phyp_dump.c | 103 + include/asm-powerpc/phyp_dump.h| 41 +++ 4 files changed, 197 insertions(+) Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h 2008-03-21 23:37:11.0 -0500 @@ -0,0 +1,41 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2008 + * Copyright 2008 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _PPC64_PHYP_DUMP_H +#define _PPC64_PHYP_DUMP_H + +#ifdef CONFIG_PHYP_DUMP + +/* The RMR region will be saved for later dumping + * whenever the kernel crashes. Set this to 256MB. */ +#define PHYP_DUMP_RMR_START 0x0 +#define PHYP_DUMP_RMR_END (1UL28) + +struct phyp_dump { + /* Memory that is reserved during very early boot. */ + unsigned long init_reserve_start; + unsigned long init_reserve_size; + /* Check status during boot if dump supported, active present*/ + unsigned long phyp_dump_configured; + unsigned long phyp_dump_is_active; + /* store cpu hpte size */ + unsigned long cpu_state_size; + unsigned long hpte_region_size; +}; + +extern struct phyp_dump *phyp_dump_info; + +int early_init_dt_scan_phyp_dump(unsigned long node, + const char *uname, int depth, void *data); + +#endif /* CONFIG_PHYP_DUMP */ +#endif /* _PPC64_PHYP_DUMP_H */ Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 23:37:12.0 -0500 @@ -0,0 +1,103 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2008 + * Copyright 2008 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ + +#include linux/init.h +#include linux/mm.h +#include linux/pfn.h +#include linux/swap.h + +#include asm/page.h +#include asm/phyp_dump.h +#include asm/machdep.h +#include asm/prom.h + +/* Variables, used to communicate data between early boot and late boot */ +static struct phyp_dump phyp_dump_vars; +struct phyp_dump *phyp_dump_info = phyp_dump_vars; + +/** + * release_memory_range -- release memory previously lmb_reserved + * @start_pfn: starting physical frame number + * @nr_pages: number of pages to free. + * + * This routine will release memory that had been previously + * lmb_reserved in early boot. The released memory becomes + * available for genreal use. + */ +static void +release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +{ + struct page *rpage; + unsigned long end_pfn; + long i; + + end_pfn = start_pfn + nr_pages; + + for (i = start_pfn; i = end_pfn; i++) { + rpage = pfn_to_page(i); + if (PageReserved(rpage)) { + ClearPageReserved(rpage); + init_page_count(rpage); + __free_page(rpage); + totalram_pages++; + } + } +} + +static int __init phyp_dump_setup(void) +{ + unsigned long start_pfn, nr_pages; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info-init_reserve_size == 0) + return 0; + + /* Release memory that was reserved in early boot */ + start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); + nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); + release_memory_range(start_pfn, nr_pages); + + return 0; +} +machine_subsys_initcall(pseries, phyp_dump_setup); + +int __init early_init_dt_scan_phyp_dump(unsigned long node, + const char *uname, int depth, void *data) +{ + const unsigned int *sizes; + + phyp_dump_info-phyp_dump_configured = 0; + phyp_dump_info-phyp_dump_is_active = 0; + + if (depth != 1 || strcmp(uname, rtas) != 0) + return
[PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
Check to see if there actually is data from a previously crashed kernel waiting. If so, Allow user-sapce tools to grab the data (by reading /proc/kcore). When user-space finishes dumping a section, it must release that memory by writing to sysfs. For example, echo 0x4000 0x1000 /sys/kernel/release_region will release 256MB starting at the 1GB. The released memory becomes free for general use. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Signed-off-by: Manish Ahuja [EMAIL PROTECTED] --- arch/powerpc/platforms/pseries/phyp_dump.c | 81 +++-- 1 file changed, 76 insertions(+), 5 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 00:10:15.0 -0500 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:39:21.0 -0500 @@ -12,19 +12,24 @@ */ #include linux/init.h +#include linux/kobject.h #include linux/mm.h +#include linux/of.h #include linux/pfn.h #include linux/swap.h +#include linux/sysfs.h #include asm/page.h #include asm/phyp_dump.h #include asm/machdep.h #include asm/prom.h +#include asm/rtas.h /* Variables, used to communicate data between early boot and late boot */ static struct phyp_dump phyp_dump_vars; struct phyp_dump *phyp_dump_info = phyp_dump_vars; +/* - */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -54,18 +59,84 @@ release_memory_range(unsigned long start } } -static int __init phyp_dump_setup(void) +/* - */ +/** + * sysfs_release_region -- sysfs interface to release memory range. + * + * Usage: + * echo start addr length /sys/kernel/release_region + * + * Example: + * echo 0x4000 0x1000 /sys/kernel/release_region + * + * will release 256MB starting at 1GB. + */ +static ssize_t store_release_region(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) { + unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; + ssize_t ret; + + ret = sscanf(buf, %lx %lx, start_addr, length); + if (ret != 2) + return -EINVAL; + + /* Range-check - don't free any reserved memory that +* wasn't reserved for phyp-dump */ + if (start_addr phyp_dump_info-init_reserve_start) + start_addr = phyp_dump_info-init_reserve_start; + + end_addr = phyp_dump_info-init_reserve_start + + phyp_dump_info-init_reserve_size; + if (start_addr+length end_addr) + length = end_addr - start_addr; + + /* Release the region of memory assed in by user */ + start_pfn = PFN_DOWN(start_addr); + nr_pages = PFN_DOWN(length); + release_memory_range(start_pfn, nr_pages); + + return count; +} + +static struct kobj_attribute rr = __ATTR(release_region, 0600, +NULL, store_release_region); + +static int __init phyp_dump_setup(void) +{ + struct device_node *rtas; + const int *dump_header = NULL; + int header_len = 0; + int rc; /* If no memory was reserved in early boot, there is nothing to do */ if (phyp_dump_info-init_reserve_size == 0) return 0; - /* Release memory that was reserved in early boot */ - start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); - nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); - release_memory_range(start_pfn, nr_pages); + /* Return if phyp dump not supported */ + if (!phyp_dump_info-phyp_dump_configured) + return -ENOSYS; + + /* Is there dump data waiting for us? */ + rtas = of_find_node_by_path(/rtas); + if (rtas) { + dump_header = of_get_property(rtas, ibm,kernel-dump, + header_len); + of_node_put(rtas); + } + + if (dump_header == NULL) + return 0; + + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ + rc = sysfs_create_file(kernel_kobj, rr.attr); + if (rc) { + printk(KERN_ERR phyp-dump: unable to create sysfs file (%d)\n, + rc); + return 0; + } return 0; } ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 4/8] pseries: phyp dump: register dump area.
Set up the actual dump header, register it with the hypervisor. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] --- arch/powerpc/platforms/pseries/phyp_dump.c | 137 +++-- 1 file changed, 131 insertions(+), 6 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:39:21.0 -0500 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:52:53.0 -0500 @@ -29,6 +29,117 @@ static struct phyp_dump phyp_dump_vars; struct phyp_dump *phyp_dump_info = phyp_dump_vars; +static int ibm_configure_kernel_dump; +/* - */ +/* RTAS interfaces to declare the dump regions */ + +struct dump_section { + u32 dump_flags; + u16 source_type; + u16 error_flags; + u64 source_address; + u64 source_length; + u64 length_copied; + u64 destination_address; +}; + +struct phyp_dump_header { + u32 version; + u16 num_of_sections; + u16 status; + + u32 first_offset_section; + u32 dump_disk_section; + u64 block_num_dd; + u64 num_of_blocks_dd; + u32 offset_dd; + u32 maxtime_to_auto; + /* No dump disk path string used */ + + struct dump_section cpu_data; + struct dump_section hpte_data; + struct dump_section kernel_data; +}; + +/* The dump header *must be* in low memory, so .bss it */ +static struct phyp_dump_header phdr; + +#define NUM_DUMP_SECTIONS 3 +#define DUMP_HEADER_VERSION0x1 +#define DUMP_REQUEST_FLAG 0x1 +#define DUMP_SOURCE_CPU0x0001 +#define DUMP_SOURCE_HPTE 0x0002 +#define DUMP_SOURCE_RMO0x0011 + +/** + * init_dump_header() - initialize the header declaring a dump + * Returns: length of dump save area. + * + * When the hypervisor saves crashed state, it needs to put + * it somewhere. The dump header tells the hypervisor where + * the data can be saved. + */ +static unsigned long init_dump_header(struct phyp_dump_header *ph) +{ + unsigned long addr_offset = 0; + + /* Set up the dump header */ + ph-version = DUMP_HEADER_VERSION; + ph-num_of_sections = NUM_DUMP_SECTIONS; + ph-status = 0; + + ph-first_offset_section = + (u32)offsetof(struct phyp_dump_header, cpu_data); + ph-dump_disk_section = 0; + ph-block_num_dd = 0; + ph-num_of_blocks_dd = 0; + ph-offset_dd = 0; + + ph-maxtime_to_auto = 0; /* disabled */ + + /* The first two sections are mandatory */ + ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG; + ph-cpu_data.source_type = DUMP_SOURCE_CPU; + ph-cpu_data.source_address = 0; + ph-cpu_data.source_length = phyp_dump_info-cpu_state_size; + ph-cpu_data.destination_address = addr_offset; + addr_offset += phyp_dump_info-cpu_state_size; + + ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG; + ph-hpte_data.source_type = DUMP_SOURCE_HPTE; + ph-hpte_data.source_address = 0; + ph-hpte_data.source_length = phyp_dump_info-hpte_region_size; + ph-hpte_data.destination_address = addr_offset; + addr_offset += phyp_dump_info-hpte_region_size; + + /* This section describes the low kernel region */ + ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG; + ph-kernel_data.source_type = DUMP_SOURCE_RMO; + ph-kernel_data.source_address = PHYP_DUMP_RMR_START; + ph-kernel_data.source_length = PHYP_DUMP_RMR_END; + ph-kernel_data.destination_address = addr_offset; + addr_offset += ph-kernel_data.source_length; + + return addr_offset; +} + +static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 1, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) + printk(KERN_ERR phyp-dump: unexpected error (%d) on + register\n, rc); +} + /* - */ /** * release_memory_range -- release memory previously lmb_reserved @@ -107,7 +218,9 @@ static struct kobj_attribute rr = __ATTR static int __init phyp_dump_setup(void) { struct device_node *rtas; - const int *dump_header = NULL; + const struct phyp_dump_header *dump_header = NULL; + unsigned long dump_area_start; + unsigned long dump_area_length; int header_len = 0; int rc; @@ -119,7 +232,13 @@ static int __init phyp_dump_setup(void
[PATCH 5/8] pseries: phyp dump: debugging print routines.
Provide some basic debugging support. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] --- arch/powerpc/platforms/pseries/phyp_dump.c | 61 - 1 file changed, 59 insertions(+), 2 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:52:53.0 -0500 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:54:44.0 -0500 @@ -123,6 +123,61 @@ static unsigned long init_dump_header(st return addr_offset; } +static void print_dump_header(const struct phyp_dump_header *ph) +{ +#ifdef DEBUG + printk(KERN_INFO dump header:\n); + /* setup some ph-sections required */ + printk(KERN_INFO version = %d\n, ph-version); + printk(KERN_INFO Sections = %d\n, ph-num_of_sections); + printk(KERN_INFO Status = 0x%x\n, ph-status); + + /* No ph-disk, so all should be set to 0 */ + printk(KERN_INFO Offset to first section 0x%x\n, + ph-first_offset_section); + printk(KERN_INFO dump disk sections should be zero\n); + printk(KERN_INFO dump disk section = %d\n, ph-dump_disk_section); + printk(KERN_INFO block num = %ld\n, ph-block_num_dd); + printk(KERN_INFO number of blocks = %ld\n, ph-num_of_blocks_dd); + printk(KERN_INFO dump disk offset = %d\n, ph-offset_dd); + printk(KERN_INFO Max auto time= %d\n, ph-maxtime_to_auto); + + /*set cpu state and hpte states as well scratch pad area */ + printk(KERN_INFO CPU AREA \n); + printk(KERN_INFO cpu dump_flags =%d\n, ph-cpu_data.dump_flags); + printk(KERN_INFO cpu source_type =%d\n, ph-cpu_data.source_type); + printk(KERN_INFO cpu error_flags =%d\n, ph-cpu_data.error_flags); + printk(KERN_INFO cpu source_address =%lx\n, + ph-cpu_data.source_address); + printk(KERN_INFO cpu source_length =%lx\n, + ph-cpu_data.source_length); + printk(KERN_INFO cpu length_copied =%lx\n, + ph-cpu_data.length_copied); + + printk(KERN_INFO HPTE AREA \n); + printk(KERN_INFO HPTE dump_flags =%d\n, ph-hpte_data.dump_flags); + printk(KERN_INFO HPTE source_type =%d\n, ph-hpte_data.source_type); + printk(KERN_INFO HPTE error_flags =%d\n, ph-hpte_data.error_flags); + printk(KERN_INFO HPTE source_address =%lx\n, + ph-hpte_data.source_address); + printk(KERN_INFO HPTE source_length =%lx\n, + ph-hpte_data.source_length); + printk(KERN_INFO HPTE length_copied =%lx\n, + ph-hpte_data.length_copied); + + printk(KERN_INFO SRSD AREA \n); + printk(KERN_INFO SRSD dump_flags =%d\n, ph-kernel_data.dump_flags); + printk(KERN_INFO SRSD source_type =%d\n, ph-kernel_data.source_type); + printk(KERN_INFO SRSD error_flags =%d\n, ph-kernel_data.error_flags); + printk(KERN_INFO SRSD source_address =%lx\n, + ph-kernel_data.source_address); + printk(KERN_INFO SRSD source_length =%lx\n, + ph-kernel_data.source_length); + printk(KERN_INFO SRSD length_copied =%lx\n, + ph-kernel_data.length_copied); +#endif +} + static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; @@ -135,9 +190,11 @@ static void register_dump_area(struct ph 1, ph, sizeof(struct phyp_dump_header)); } while (rtas_busy_delay(rc)); - if (rc) + if (rc) { printk(KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); + print_dump_header(ph); + } } /* - */ @@ -246,8 +303,8 @@ static int __init phyp_dump_setup(void) of_node_put(rtas); } + print_dump_header(dump_header); dump_area_length = init_dump_header(phdr); - /* align down */ dump_area_start = phyp_dump_info-init_reserve_start PAGE_MASK; ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.
Routines to a. invalidate dump b. Calculate region that is reserved and needs to be freed. This is exported through sysfs interface. Unregister has been removed for now as it wasn't being used. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] --- arch/powerpc/platforms/pseries/phyp_dump.c | 83 ++--- include/asm-powerpc/phyp_dump.h|3 + 2 files changed, 80 insertions(+), 6 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-20 21:52:59.0 -0500 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-20 21:55:52.0 -0500 @@ -70,6 +70,10 @@ static struct phyp_dump_header phdr; #define DUMP_SOURCE_CPU0x0001 #define DUMP_SOURCE_HPTE 0x0002 #define DUMP_SOURCE_RMO0x0011 +#define DUMP_ERROR_FLAG0x2000 +#define DUMP_TRIGGERED 0x4000 +#define DUMP_PERFORMED 0x8000 + /** * init_dump_header() - initialize the header declaring a dump @@ -181,9 +185,15 @@ static void print_dump_header(const stru static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; - ph-cpu_data.destination_address += addr; - ph-hpte_data.destination_address += addr; - ph-kernel_data.destination_address += addr; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + /* ToDo Invalidate kdump and free memory range. */ do { rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, @@ -197,6 +207,30 @@ static void register_dump_area(struct ph } } +static +void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 2, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) { + printk(KERN_ERR phyp-dump: unexpected error (%d) + on invalidate\n, rc); + print_dump_header(ph); + } +} + /* - */ /** * release_memory_range -- release memory previously lmb_reserved @@ -207,8 +241,8 @@ static void register_dump_area(struct ph * lmb_reserved in early boot. The released memory becomes * available for genreal use. */ -static void -release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +static void release_memory_range(unsigned long start_pfn, + unsigned long nr_pages) { struct page *rpage; unsigned long end_pfn; @@ -269,8 +303,29 @@ static ssize_t store_release_region(stru return count; } +static ssize_t show_release_region(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + u64 second_addr_range; + + /* total reserved size - start of scratch area */ + second_addr_range = phyp_dump_info-init_reserve_size - + phyp_dump_info-reserved_scratch_size; + return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx: +DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n, + phdr.cpu_data.destination_address, + phdr.cpu_data.length_copied, + phdr.hpte_data.destination_address, + phdr.hpte_data.length_copied, + phdr.kernel_data.destination_address, + phdr.kernel_data.length_copied, + phyp_dump_info-init_reserve_start, + second_addr_range); +} + static struct kobj_attribute rr = __ATTR(release_region, 0600, -NULL, store_release_region); + show_release_region, + store_release_region); static int __init phyp_dump_setup(void) { @@ -313,6 +368,22 @@ static int __init phyp_dump_setup(void) return 0; } + /* re-register the dump area, if old dump was invalid */ + if ((dump_header) (dump_header-status DUMP_ERROR_FLAG)) { + invalidate_last_dump(phdr, dump_area_start); + register_dump_area(phdr, dump_area_start); + return 0
[PATCH 7/8] pseries: phyp dump: Tracking memory range freed.
This patch tracks the size freed. For now it does a simple rudimentary calculation of the ranges freed. The idea is to keep it simple at the external shell script level and send in large chunks for now. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] --- arch/powerpc/platforms/pseries/phyp_dump.c | 35 + 1 file changed, 35 insertions(+) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:14:00.0 -0500 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-21 22:14:05.0 -0500 @@ -261,6 +261,39 @@ static void release_memory_range(unsigne } } +/** + * track_freed_range -- Counts the range being freed. + * Once the counter goes to zero, it re-registers dump for + * future use. + */ +static void +track_freed_range(unsigned long addr, unsigned long length) +{ + static unsigned long scratch_area_size, reserved_area_size; + + if (addr phyp_dump_info-init_reserve_start) + return; + + if ((addr = phyp_dump_info-init_reserve_start) + (addr = phyp_dump_info-init_reserve_start + +phyp_dump_info-init_reserve_size)) + reserved_area_size += length; + + if ((addr = phyp_dump_info-reserved_scratch_addr) + (addr = phyp_dump_info-reserved_scratch_addr + +phyp_dump_info-reserved_scratch_size)) + scratch_area_size += length; + + if ((reserved_area_size == phyp_dump_info-init_reserve_size) + (scratch_area_size == phyp_dump_info-reserved_scratch_size)) { + + invalidate_last_dump(phdr, + phyp_dump_info-reserved_scratch_addr); + register_dump_area(phdr, + phyp_dump_info-reserved_scratch_addr); + } +} + /* - */ /** * sysfs_release_region -- sysfs interface to release memory range. @@ -285,6 +318,8 @@ static ssize_t store_release_region(stru if (ret != 2) return -EINVAL; + track_freed_range(start_addr, length); + /* Range-check - don't free any reserved memory that * wasn't reserved for phyp-dump */ if (start_addr phyp_dump_info-init_reserve_start) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 8/8] pseries: phyp dump: config file
Add hypervisor-assisted dump to kernel config Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Signed-off-by: Manish Ahuja [EMAIL PROTECTED] --- arch/powerpc/Kconfig | 10 ++ 1 file changed, 10 insertions(+) Index: 2.6.25-rc1/arch/powerpc/Kconfig === --- 2.6.25-rc1.orig/arch/powerpc/Kconfig2008-03-20 20:53:33.0 -0500 +++ 2.6.25-rc1/arch/powerpc/Kconfig 2008-03-20 21:06:29.0 -0500 @@ -306,6 +306,16 @@ config CRASH_DUMP Don't change this unless you know what you are doing. +config PHYP_DUMP + bool Hypervisor-assisted dump (EXPERIMENTAL) + depends on PPC_PSERIES EXPERIMENTAL + help + Hypervisor-assisted dump is meant to be a kdump replacement + offering robustness and speed not possible without system + hypervisor assistence. + + If unsure, say N + config PPCBUG_NVRAM bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC default y if PPC_PREP ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 1/2] pseries: phyp dump: Disable phyp-dump through boot-var.
The goal of these 2 patches is to ensure that there is only one dumping mechanism enabled at any given time. These patches depend upon phyp-dump patches posted earlier. Patch 1: Addition of boot-variable phyp_dump, which takes values [0/1] for disabling/ enabling phyp_dump at boot time. Kdump can use this on cmdline (phyp_dump=0) to disable phyp-dump during boot when enabling itself. This will ensure only one dumping mechanism is active at any given time. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] --- arch/powerpc/kernel/prom.c |5 + arch/powerpc/platforms/pseries/phyp_dump.c | 18 ++ include/asm-powerpc/phyp_dump.h|1 + 3 files changed, 24 insertions(+) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-22 00:42:02.0 -0500 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-22 01:07:43.0 -0500 @@ -460,3 +460,21 @@ int __init early_init_dt_scan_phyp_dump( *((unsigned long *)sizes[4]); return 1; } + +/* Look for phyp_dump= cmdline option */ +static int __init early_phyp_dump_enabled(char *p) +{ + phyp_dump_info-phyp_dump_at_boot = 1; + +if (!p) +return 0; + +if (strncmp(p, 1, 1) == 0) + phyp_dump_info-phyp_dump_at_boot = 1; +else if (strncmp(p, 0, 1) == 0) + phyp_dump_info-phyp_dump_at_boot = 0; + +return 0; +} +early_param(phyp_dump, early_phyp_dump_enabled); + Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h === --- 2.6.25-rc1.orig/include/asm-powerpc/phyp_dump.h 2008-03-22 00:42:02.0 -0500 +++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h 2008-03-22 00:42:08.0 -0500 @@ -25,6 +25,7 @@ struct phyp_dump { unsigned long init_reserve_start; unsigned long init_reserve_size; /* Check status during boot if dump supported, active present*/ + unsigned long phyp_dump_at_boot; unsigned long phyp_dump_configured; unsigned long phyp_dump_is_active; /* store cpu hpte size */ Index: 2.6.25-rc1/arch/powerpc/kernel/prom.c === --- 2.6.25-rc1.orig/arch/powerpc/kernel/prom.c 2008-03-22 00:42:02.0 -0500 +++ 2.6.25-rc1/arch/powerpc/kernel/prom.c 2008-03-22 00:42:54.0 -0500 @@ -1059,6 +1059,11 @@ static void __init phyp_dump_reserve_mem return; } + if (!phyp_dump_info-phyp_dump_at_boot) { + printk(KERN_INFO Phyp-dump disabled at boot time\n); + return; + } + if (phyp_dump_info-phyp_dump_is_active) { /* Reserve *everything* above RMR.Area freed by userland tools*/ base = PHYP_DUMP_RMR_END; ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 2/2] pseries: phyp dump: inform kdump, phyp-dump is loaded.
Patch 2: Addition of /sys/kernel/phyp_dump_active so that kdump init scripts may look for it and take appropriate action if this file is found. This file is only loaded when phyp_dump has been registered. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] --- arch/powerpc/platforms/pseries/phyp_dump.c | 18 ++ 1 file changed, 18 insertions(+) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-22 01:07:43.0 -0500 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-03-22 01:08:56.0 -0500 @@ -182,6 +182,18 @@ static void print_dump_header(const stru #endif } +static ssize_t show_phyp_dump_active(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + + /* create filesystem entry so kdump is phyp-dump aware */ + return sprintf(buf, %lx\n, phyp_dump_info-phyp_dump_at_boot); +} + +static struct kobj_attribute pdl = __ATTR(phyp_dump_active, 0600, + show_phyp_dump_active, + NULL); + static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; @@ -204,7 +216,13 @@ static void register_dump_area(struct ph printk(KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); print_dump_header(ph); + return; } + + rc = sysfs_create_file(kernel_kobj, pdl.attr); + if (rc) + printk(KERN_ERR phyp-dump: unable to create sysfs +file (%d)\n, rc); } static ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 8/8] pseries: phyp dump: config file
Thanks for the review. I will try and make the recommended changes and repost it soon. Manish Paul Mackerras wrote: Manish Ahuja writes: +config PHYP_DUMP +bool Hypervisor-assisted dump (EXPERIMENTAL) +depends on PPC_PSERIES EXPERIMENTAL +default y I think this should default to n for now (i.e. leave out the default line entirely). We can make it default to y later. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
If Mike and Paul are okay, then I will leave this bit as is and fix all other issues and comments. Thanks, Manish Linas Vepstas wrote: On 10/03/2008, Michael Ellerman [EMAIL PROTECTED] wrote: On Thu, 2008-02-28 at 18:24 -0600, Manish Ahuja wrote: + +/* Global, used to communicate data between early boot and late boot */ +static struct phyp_dump phyp_dump_global; +struct phyp_dump *phyp_dump_info = phyp_dump_global; I don't see the point of this. You have a static (ie. non-global) struct called phyp_dump_global, then you create a pointer to it and pass that around. I did this. This is a style used to minimize disruption due to future design changes. Basically, the idea is that, at some later time, for some unknown reason, we decide that this structure shouldn't be global, or maybe shouldn't be statically allocated, or maybe should be per-cpu, or who knows. By creating a pointer, and just passing that around, you isolate other code from this change. I learned this trick after spending too many months of my life hunting down globals and replacing them by dynamically allocated structs. Its a long and painful process, on many levels, often requiring major code restructuring. Code that touches globals directly is often poorly thought out, designed. But going in the opposite direction is easy: if your code always passes everything it needs as args to subroutines, then you are free clear ... if one of those args just happens to be a pointer to a global, there's no loss (not even a performance loss -- the arg passing overhead is about the same as a global TOC lookup!) So it may look weird if you're not used to seeing it; but the alternative is almost always worse. --linas ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
Changes from previous version: The only changes are in patch 2. moved early_init_dt_scan_phyp_dump from rtas.c to phyp_dump.c Added dummy function in phyp_dump.h Patch 3 required repatching due to changes to patch 2. Resubmitting all patches to avoid confusion. Thanks, Manish Michael Ellerman wrote: On Sun, 2008-02-17 at 22:53 -0600, Manish Ahuja wrote: The following series of patches implement a basic framework for hypervisor-assisted dump. The very first patch provides documentation explaining what this is :-) . Yes, its supposed to be an improvement over kdump. A list of open issues / todo list is included in the documentation. It also appears that the not-yet-released firmware versions this was tested on are still, ahem, incomplete; this work is also pending. I have included most of the changes requested. Although, I did find one or two, fixed in a later patch file rather than the first location they appeared at. This series still doesn't build on !CONFIG_RTAS configs: http://kisskb.ellerman.id.au/kisskb/head/629/ This solution is to move early_init_dt_scan_phyp_dump() into arch/powerpc/platforms/pseries/phyp_dump.c and provide a dummy implementation in asm-powerpc/phyp_dump.c for the !CONFIG_PHYP_DUMP case. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 1/8] pseries: phyp dump: Docmentation
Basic documentation for hypervisor-assisted dump. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Documentation/powerpc/phyp-assisted-dump.txt | 127 +++ 1 file changed, 127 insertions(+) Index: 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt === --- /dev/null 1970-01-01 00:00:00.0 + +++ 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt 2008-02-18 03:22:33.0 -0600 @@ -0,0 +1,127 @@ + + Hypervisor-Assisted Dump + + November 2007 + +The goal of hypervisor-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +As compared to kdump or other strategies, hypervisor-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- As the dump is performed, the dumped memory becomes + immediately available to the system for normal use. +-- After the dump is completed, no further reboots are + required; the system will be fully usable, and running + in it's normal, production mode on it normal kernel. + +The above can only be accomplished by coordination with, +and assistance from the hypervisor. The procedure is +as follows: + +-- When a system crashes, the hypervisor will save + the low 256MB of RAM to a previously registered + save region. It will also save system state, system + registers, and hardware PTE's. + +-- After the low 256MB area has been saved, the + hypervisor will reset PCI and other hardware state. + It will *not* clear RAM. It will then launch the + bootloader, as normal. + +-- The freshly booted kernel will notice that there + is a new node (ibm,dump-kernel) in the device tree, + indicating that there is crash data available from + a previous boot. It will boot into only 256MB of RAM, + reserving the rest of system memory. + +-- Userspace tools will parse /sys/kernel/release_region + and read /proc/vmcore to obtain the contents of memory, + which holds the previous crashed kernel. The userspace + tools may copy this info to disk, or network, nas, san, + iscsi, etc. as desired. + + For Example: the values in /sys/kernel/release-region + would look something like this (address-range pairs). + CPU:0x177fee000-0x1: HPTE:0x177ffe020-0x1000: / + DUMP:0x177fff020-0x1000, 0x1000-0x16F1D370A + +-- As the userspace tools complete saving a portion of + dump, they echo an offset and size to + /sys/kernel/release_region to release the reserved + memory back to general use. + + An example of this is: + echo 0x4000 0x1000 /sys/kernel/release_region + which will release 256MB at the 1GB boundary. + +Please note that the hypervisor-assisted dump feature +is only available on Power6-based systems with recent +firmware versions. + +Implementation details: +-- + +During boot, a check is made to see if firmware supports +this feature on this particular machine. If it does, then +we check to see if a active dump is waiting for us. If yes +then everything but 256 MB of RAM is reserved during early +boot. This area is released once we collect a dump from user +land scripts that are run. If there is dump data, then +the /sys/kernel/release_region file is created, and +the reserved memory is held. + +If there is no waiting dump data, then only the highest +256MB of the ram is reserved as a scratch area. This area +is *not* be released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy +of the low 256MB in the case a crash does occur. See, +however, open issues below, as to whether +such a reserved region is really needed. + +Currently the dump will be copied from /proc/vmcore to a +a new file upon user intervention. The starting address +to be read and the range for each data point in provided +in /sys/kernel/release_region. + +The tools to examine the dump will be same as the ones +used for kdump. + +General notes: +-- +Security: please note that there are potential security issues +with any sort of dump mechanism. In particular, plaintext +(unencrypted) data, and possibly passwords, may be present in +the dump data. Userspace tools must take adequate precautions to +preserve security. + +Open issues/ToDo: + + o The various code paths that tell the hypervisor that a crash + occurred, vs. it simply being a normal reboot, should be + reviewed, and possibly clarified/fixed. + + o Instead of using /sys/kernel, should there be a /sys/dump + instead
[PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/kernel/prom.c | 49 + arch/powerpc/platforms/pseries/Makefile|1 arch/powerpc/platforms/pseries/phyp_dump.c | 105 + include/asm-powerpc/phyp_dump.h| 44 4 files changed, 199 insertions(+) Index: 2.6.25-rc1/include/asm-powerpc/phyp_dump.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ 2.6.25-rc1/include/asm-powerpc/phyp_dump.h 2008-02-28 22:05:25.0 -0600 @@ -0,0 +1,44 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2008 + * Copyright 2008 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _PPC64_PHYP_DUMP_H +#define _PPC64_PHYP_DUMP_H + +#ifdef CONFIG_PHYP_DUMP + +/* The RMR region will be saved for later dumping + * whenever the kernel crashes. Set this to 256MB. */ +#define PHYP_DUMP_RMR_START 0x0 +#define PHYP_DUMP_RMR_END (1UL28) + +struct phyp_dump { + /* Memory that is reserved during very early boot. */ + unsigned long init_reserve_start; + unsigned long init_reserve_size; + /* Check status during boot if dump supported, active present*/ + unsigned long phyp_dump_configured; + unsigned long phyp_dump_is_active; + /* store cpu hpte size */ + unsigned long cpu_state_size; + unsigned long hpte_region_size; +}; + +extern struct phyp_dump *phyp_dump_info; + +int early_init_dt_scan_phyp_dump(unsigned long node, + const char *uname, int depth, void *data); +#else /* CONFIG_PHYP_DUMP */ +int early_init_dt_scan_phyp_dump(unsigned long node, + const char *uname, int depth, void *data) { return 0; } + +#endif /* CONFIG_PHYP_DUMP */ +#endif /* _PPC64_PHYP_DUMP_H */ Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-28 21:57:52.0 -0600 @@ -0,0 +1,105 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2008 + * Copyright 2008 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ + +#include linux/init.h +#include linux/mm.h +#include linux/pfn.h +#include linux/swap.h + +#include asm/page.h +#include asm/phyp_dump.h +#include asm/machdep.h +#include asm/prom.h + +/* Global, used to communicate data between early boot and late boot */ +static struct phyp_dump phyp_dump_global; +struct phyp_dump *phyp_dump_info = phyp_dump_global; + +/** + * release_memory_range -- release memory previously lmb_reserved + * @start_pfn: starting physical frame number + * @nr_pages: number of pages to free. + * + * This routine will release memory that had been previously + * lmb_reserved in early boot. The released memory becomes + * available for genreal use. + */ +static void +release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +{ + struct page *rpage; + unsigned long end_pfn; + long i; + + end_pfn = start_pfn + nr_pages; + + for (i = start_pfn; i = end_pfn; i++) { + rpage = pfn_to_page(i); + if (PageReserved(rpage)) { + ClearPageReserved(rpage); + init_page_count(rpage); + __free_page(rpage); + totalram_pages++; + } + } +} + +static int __init phyp_dump_setup(void) +{ + unsigned long start_pfn, nr_pages; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info-init_reserve_size == 0) + return 0; + + /* Release memory that was reserved in early boot */ + start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); + nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); + release_memory_range(start_pfn, nr_pages); + + return 0; +} +machine_subsys_initcall(pseries, phyp_dump_setup); + +int __init early_init_dt_scan_phyp_dump(unsigned long node, + const char *uname, int depth, void *data) +{ +#ifdef CONFIG_PHYP_DUMP + const unsigned int
[PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
Check to see if there actually is data from a previously crashed kernel waiting. If so, Allow user-sapce tools to grab the data (by reading /proc/kcore). When user-space finishes dumping a section, it must release that memory by writing to sysfs. For example, echo 0x4000 0x1000 /sys/kernel/release_region will release 256MB starting at the 1GB. The released memory becomes free for general use. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Signed-off-by: Manish Ahuja [EMAIL PROTECTED] -- arch/powerpc/platforms/pseries/phyp_dump.c | 82 +++-- 1 file changed, 77 insertions(+), 5 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-28 21:57:52.0 -0600 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-28 23:36:01.0 -0600 @@ -12,19 +12,25 @@ */ #include linux/init.h +#include linux/kobject.h #include linux/mm.h +#include linux/of.h #include linux/pfn.h #include linux/swap.h +#include linux/sysfs.h #include asm/page.h #include asm/phyp_dump.h #include asm/machdep.h #include asm/prom.h +#include asm/rtas.h + /* Global, used to communicate data between early boot and late boot */ static struct phyp_dump phyp_dump_global; struct phyp_dump *phyp_dump_info = phyp_dump_global; +/* - */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -54,18 +60,84 @@ release_memory_range(unsigned long start } } -static int __init phyp_dump_setup(void) +/* - */ +/** + * sysfs_release_region -- sysfs interface to release memory range. + * + * Usage: + * echo start addr length /sys/kernel/release_region + * + * Example: + * echo 0x4000 0x1000 /sys/kernel/release_region + * + * will release 256MB starting at 1GB. + */ +static ssize_t store_release_region(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) { + unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; + ssize_t ret; + + ret = sscanf(buf, %lx %lx, start_addr, length); + if (ret != 2) + return -EINVAL; + + /* Range-check - don't free any reserved memory that +* wasn't reserved for phyp-dump */ + if (start_addr phyp_dump_info-init_reserve_start) + start_addr = phyp_dump_info-init_reserve_start; + + end_addr = phyp_dump_info-init_reserve_start + + phyp_dump_info-init_reserve_size; + if (start_addr+length end_addr) + length = end_addr - start_addr; + + /* Release the region of memory assed in by user */ + start_pfn = PFN_DOWN(start_addr); + nr_pages = PFN_DOWN(length); + release_memory_range(start_pfn, nr_pages); + + return count; +} + +static struct kobj_attribute rr = __ATTR(release_region, 0600, +NULL, store_release_region); + +static int __init phyp_dump_setup(void) +{ + struct device_node *rtas; + const int *dump_header = NULL; + int header_len = 0; + int rc; /* If no memory was reserved in early boot, there is nothing to do */ if (phyp_dump_info-init_reserve_size == 0) return 0; - /* Release memory that was reserved in early boot */ - start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); - nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); - release_memory_range(start_pfn, nr_pages); + /* Return if phyp dump not supported */ + if (!phyp_dump_info-phyp_dump_configured) + return -ENOSYS; + + /* Is there dump data waiting for us? */ + rtas = of_find_node_by_path(/rtas); + if (rtas) { + dump_header = of_get_property(rtas, ibm,kernel-dump, + header_len); + of_node_put(rtas); + } + + if (dump_header == NULL) + return 0; + + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ + rc = sysfs_create_file(kernel_kobj, rr.attr); + if (rc) { + printk(KERN_ERR phyp-dump: unable to create sysfs file (%d)\n, + rc); + return 0; + } return 0; } ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 4/8] pseries: phyp dump: register dump area.
Set up the actual dump header, register it with the hypervisor. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] -- arch/powerpc/platforms/pseries/phyp_dump.c | 137 +++-- 1 file changed, 131 insertions(+), 6 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-28 23:36:01.0 -0600 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-28 23:36:42.0 -0600 @@ -30,6 +30,117 @@ static struct phyp_dump phyp_dump_global; struct phyp_dump *phyp_dump_info = phyp_dump_global; +static int ibm_configure_kernel_dump; +/* - */ +/* RTAS interfaces to declare the dump regions */ + +struct dump_section { + u32 dump_flags; + u16 source_type; + u16 error_flags; + u64 source_address; + u64 source_length; + u64 length_copied; + u64 destination_address; +}; + +struct phyp_dump_header { + u32 version; + u16 num_of_sections; + u16 status; + + u32 first_offset_section; + u32 dump_disk_section; + u64 block_num_dd; + u64 num_of_blocks_dd; + u32 offset_dd; + u32 maxtime_to_auto; + /* No dump disk path string used */ + + struct dump_section cpu_data; + struct dump_section hpte_data; + struct dump_section kernel_data; +}; + +/* The dump header *must be* in low memory, so .bss it */ +static struct phyp_dump_header phdr; + +#define NUM_DUMP_SECTIONS 3 +#define DUMP_HEADER_VERSION 0x1 +#define DUMP_REQUEST_FLAG 0x1 +#define DUMP_SOURCE_CPU 0x0001 +#define DUMP_SOURCE_HPTE 0x0002 +#define DUMP_SOURCE_RMO 0x0011 + +/** + * init_dump_header() - initialize the header declaring a dump + * Returns: length of dump save area. + * + * When the hypervisor saves crashed state, it needs to put + * it somewhere. The dump header tells the hypervisor where + * the data can be saved. + */ +static unsigned long init_dump_header(struct phyp_dump_header *ph) +{ + unsigned long addr_offset = 0; + + /* Set up the dump header */ + ph-version = DUMP_HEADER_VERSION; + ph-num_of_sections = NUM_DUMP_SECTIONS; + ph-status = 0; + + ph-first_offset_section = + (u32)offsetof(struct phyp_dump_header, cpu_data); + ph-dump_disk_section = 0; + ph-block_num_dd = 0; + ph-num_of_blocks_dd = 0; + ph-offset_dd = 0; + + ph-maxtime_to_auto = 0; /* disabled */ + + /* The first two sections are mandatory */ + ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG; + ph-cpu_data.source_type = DUMP_SOURCE_CPU; + ph-cpu_data.source_address = 0; + ph-cpu_data.source_length = phyp_dump_info-cpu_state_size; + ph-cpu_data.destination_address = addr_offset; + addr_offset += phyp_dump_info-cpu_state_size; + + ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG; + ph-hpte_data.source_type = DUMP_SOURCE_HPTE; + ph-hpte_data.source_address = 0; + ph-hpte_data.source_length = phyp_dump_info-hpte_region_size; + ph-hpte_data.destination_address = addr_offset; + addr_offset += phyp_dump_info-hpte_region_size; + + /* This section describes the low kernel region */ + ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG; + ph-kernel_data.source_type = DUMP_SOURCE_RMO; + ph-kernel_data.source_address = PHYP_DUMP_RMR_START; + ph-kernel_data.source_length = PHYP_DUMP_RMR_END; + ph-kernel_data.destination_address = addr_offset; + addr_offset += ph-kernel_data.source_length; + + return addr_offset; +} + +static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 1, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) + printk(KERN_ERR phyp-dump: unexpected error (%d) on + register\n, rc); +} + /* - */ /** * release_memory_range -- release memory previously lmb_reserved @@ -108,7 +219,9 @@ static struct kobj_attribute rr = __ATTR static int __init phyp_dump_setup(void) { struct device_node *rtas; - const int *dump_header = NULL; + const struct phyp_dump_header *dump_header = NULL; + unsigned long dump_area_start; + unsigned long dump_area_length; int header_len = 0; int rc; @@ -120,7 +233,13 @@ static int __init phyp_dump_setup(void) if (!phyp_dump_info-phyp_dump_configured
[PATCH 5/8] pseries: phyp dump: debugging print routines.
Provide some basic debugging support. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] - arch/powerpc/platforms/pseries/phyp_dump.c | 61 - 1 file changed, 59 insertions(+), 2 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-28 23:36:42.0 -0600 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-28 23:36:45.0 -0600 @@ -124,6 +124,61 @@ static unsigned long init_dump_header(st return addr_offset; } +static void print_dump_header(const struct phyp_dump_header *ph) +{ +#ifdef DEBUG + printk(KERN_INFO dump header:\n); + /* setup some ph-sections required */ + printk(KERN_INFO version = %d\n, ph-version); + printk(KERN_INFO Sections = %d\n, ph-num_of_sections); + printk(KERN_INFO Status = 0x%x\n, ph-status); + + /* No ph-disk, so all should be set to 0 */ + printk(KERN_INFO Offset to first section 0x%x\n, + ph-first_offset_section); + printk(KERN_INFO dump disk sections should be zero\n); + printk(KERN_INFO dump disk section = %d\n, ph-dump_disk_section); + printk(KERN_INFO block num = %ld\n, ph-block_num_dd); + printk(KERN_INFO number of blocks = %ld\n, ph-num_of_blocks_dd); + printk(KERN_INFO dump disk offset = %d\n, ph-offset_dd); + printk(KERN_INFO Max auto time= %d\n, ph-maxtime_to_auto); + + /*set cpu state and hpte states as well scratch pad area */ + printk(KERN_INFO CPU AREA \n); + printk(KERN_INFO cpu dump_flags =%d\n, ph-cpu_data.dump_flags); + printk(KERN_INFO cpu source_type =%d\n, ph-cpu_data.source_type); + printk(KERN_INFO cpu error_flags =%d\n, ph-cpu_data.error_flags); + printk(KERN_INFO cpu source_address =%lx\n, + ph-cpu_data.source_address); + printk(KERN_INFO cpu source_length =%lx\n, + ph-cpu_data.source_length); + printk(KERN_INFO cpu length_copied =%lx\n, + ph-cpu_data.length_copied); + + printk(KERN_INFO HPTE AREA \n); + printk(KERN_INFO HPTE dump_flags =%d\n, ph-hpte_data.dump_flags); + printk(KERN_INFO HPTE source_type =%d\n, ph-hpte_data.source_type); + printk(KERN_INFO HPTE error_flags =%d\n, ph-hpte_data.error_flags); + printk(KERN_INFO HPTE source_address =%lx\n, + ph-hpte_data.source_address); + printk(KERN_INFO HPTE source_length =%lx\n, + ph-hpte_data.source_length); + printk(KERN_INFO HPTE length_copied =%lx\n, + ph-hpte_data.length_copied); + + printk(KERN_INFO SRSD AREA \n); + printk(KERN_INFO SRSD dump_flags =%d\n, ph-kernel_data.dump_flags); + printk(KERN_INFO SRSD source_type =%d\n, ph-kernel_data.source_type); + printk(KERN_INFO SRSD error_flags =%d\n, ph-kernel_data.error_flags); + printk(KERN_INFO SRSD source_address =%lx\n, + ph-kernel_data.source_address); + printk(KERN_INFO SRSD source_length =%lx\n, + ph-kernel_data.source_length); + printk(KERN_INFO SRSD length_copied =%lx\n, + ph-kernel_data.length_copied); +#endif +} + static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; @@ -136,9 +191,11 @@ static void register_dump_area(struct ph 1, ph, sizeof(struct phyp_dump_header)); } while (rtas_busy_delay(rc)); - if (rc) + if (rc) { printk(KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); + print_dump_header(ph); + } } /* - */ @@ -247,8 +304,8 @@ static int __init phyp_dump_setup(void) of_node_put(rtas); } + print_dump_header(dump_header); dump_area_length = init_dump_header(phdr); - /* align down */ dump_area_start = phyp_dump_info-init_reserve_start PAGE_MASK; ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.
Routines to a. invalidate dump b. Calculate region that is reserved and needs to be freed. This is exported through sysfs interface. Unregister has been removed for now as it wasn't being used. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] - --- arch/powerpc/platforms/pseries/phyp_dump.c | 83 ++--- include/asm-powerpc/phyp_dump.h|3 + 2 files changed, 80 insertions(+), 6 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-28 23:36:45.0 -0600 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-28 23:36:47.0 -0600 @@ -71,6 +71,10 @@ static struct phyp_dump_header phdr; #define DUMP_SOURCE_CPU 0x0001 #define DUMP_SOURCE_HPTE 0x0002 #define DUMP_SOURCE_RMO 0x0011 +#define DUMP_ERROR_FLAG 0x2000 +#define DUMP_TRIGGERED 0x4000 +#define DUMP_PERFORMED 0x8000 + /** * init_dump_header() - initialize the header declaring a dump @@ -182,9 +186,15 @@ static void print_dump_header(const stru static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; - ph-cpu_data.destination_address += addr; - ph-hpte_data.destination_address += addr; - ph-kernel_data.destination_address += addr; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + /* ToDo Invalidate kdump and free memory range. */ do { rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, @@ -198,6 +208,30 @@ static void register_dump_area(struct ph } } +static +void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 2, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) { + printk(KERN_ERR phyp-dump: unexpected error (%d) + on invalidate\n, rc); + print_dump_header(ph); + } +} + /* - */ /** * release_memory_range -- release memory previously lmb_reserved @@ -208,8 +242,8 @@ static void register_dump_area(struct ph * lmb_reserved in early boot. The released memory becomes * available for genreal use. */ -static void -release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +static +void release_memory_range(unsigned long start_pfn, unsigned long nr_pages) { struct page *rpage; unsigned long end_pfn; @@ -270,8 +304,29 @@ static ssize_t store_release_region(stru return count; } +static ssize_t show_release_region(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + u64 second_addr_range; + + /* total reserved size - start of scratch area */ + second_addr_range = phyp_dump_info-init_reserve_size - + phyp_dump_info-reserved_scratch_size; + return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx: +DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n, + phdr.cpu_data.destination_address, + phdr.cpu_data.length_copied, + phdr.hpte_data.destination_address, + phdr.hpte_data.length_copied, + phdr.kernel_data.destination_address, + phdr.kernel_data.length_copied, + phyp_dump_info-init_reserve_start, + second_addr_range); +} + static struct kobj_attribute rr = __ATTR(release_region, 0600, -NULL, store_release_region); + show_release_region, + store_release_region); static int __init phyp_dump_setup(void) { @@ -314,6 +369,22 @@ static int __init phyp_dump_setup(void) return 0; } + /* re-register the dump area, if old dump was invalid */ + if ((dump_header) (dump_header-status DUMP_ERROR_FLAG)) { + invalidate_last_dump(phdr, dump_area_start); + register_dump_area(phdr, dump_area_start); + return 0; + } + + if (dump_header) { + phyp_dump_info-reserved_scratch_addr
[PATCH 7/8] pseries: phyp dump: Tracking memory range freed.
This patch tracks the size freed. For now it does a simple rudimentary calculation of the ranges freed. The idea is to keep it simple at the external shell script level and send in large chunks for now. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] - --- arch/powerpc/platforms/pseries/phyp_dump.c | 35 + 1 file changed, 35 insertions(+) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-28 23:36:47.0 -0600 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-28 23:36:49.0 -0600 @@ -262,6 +262,39 @@ void release_memory_range(unsigned long } } +/** + * track_freed_range -- Counts the range being freed. + * Once the counter goes to zero, it re-registers dump for + * future use. + */ +static void +track_freed_range(unsigned long addr, unsigned long length) +{ + static unsigned long scratch_area_size, reserved_area_size; + + if (addr phyp_dump_info-init_reserve_start) + return; + + if ((addr = phyp_dump_info-init_reserve_start) + (addr = phyp_dump_info-init_reserve_start + +phyp_dump_info-init_reserve_size)) + reserved_area_size += length; + + if ((addr = phyp_dump_info-reserved_scratch_addr) + (addr = phyp_dump_info-reserved_scratch_addr + +phyp_dump_info-reserved_scratch_size)) + scratch_area_size += length; + + if ((reserved_area_size == phyp_dump_info-init_reserve_size) + (scratch_area_size == phyp_dump_info-reserved_scratch_size)) { + + invalidate_last_dump(phdr, + phyp_dump_info-reserved_scratch_addr); + register_dump_area(phdr, + phyp_dump_info-reserved_scratch_addr); + } +} + /* - */ /** * sysfs_release_region -- sysfs interface to release memory range. @@ -286,6 +319,8 @@ static ssize_t store_release_region(stru if (ret != 2) return -EINVAL; + track_freed_range(start_addr, length); + /* Range-check - don't free any reserved memory that * wasn't reserved for phyp-dump */ if (start_addr phyp_dump_info-init_reserve_start) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 8/8] pseries: phyp dump: config file
Add hypervisor-assisted dump to kernel config Signed-off-by: Linas Vepstas [EMAIL PROTECTED] - arch/powerpc/Kconfig | 11 +++ 1 file changed, 11 insertions(+) Index: 2.6.25-rc1/arch/powerpc/Kconfig === --- 2.6.25-rc1.orig/arch/powerpc/Kconfig2008-02-18 03:22:06.0 -0600 +++ 2.6.25-rc1/arch/powerpc/Kconfig 2008-02-18 03:22:45.0 -0600 @@ -306,6 +306,17 @@ config CRASH_DUMP Don't change this unless you know what you are doing. +config PHYP_DUMP + bool Hypervisor-assisted dump (EXPERIMENTAL) + depends on PPC_PSERIES EXPERIMENTAL + default y + help + Hypervisor-assisted dump is meant to be a kdump replacement + offering robustness and speed not possible without system + hypervisor assistence. + + If unsure, say Y + config PPCBUG_NVRAM bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC default y if PPC_PREP ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
The following series of patches implement a basic framework for hypervisor-assisted dump. The very first patch provides documentation explaining what this is :-) . Yes, its supposed to be an improvement over kdump. A list of open issues / todo list is included in the documentation. It also appears that the not-yet-released firmware versions this was tested on are still, ahem, incomplete; this work is also pending. I have included most of the changes requested. Although, I did find one or two, fixed in a later patch file rather than the first location they appeared at. Also it now does not block any memory on machines other than power6 boxes which have the requisite firmware. This is from a power5 box. from jal-lp6 a power5 machine. . Phyp-dump not supported on this hardware Using pSeries machine description console [udbg-1] enabled ... I think I incorporated everyones comments so far. -- Manish Linas. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 1/8] pseries: phyp dump: Documentation
Basic documentation for hypervisor-assisted dump. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Documentation/powerpc/phyp-assisted-dump.txt | 127 +++ 1 file changed, 127 insertions(+) Index: 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt === --- /dev/null 1970-01-01 00:00:00.0 + +++ 2.6.25-rc1/Documentation/powerpc/phyp-assisted-dump.txt 2008-02-18 03:22:33.0 -0600 @@ -0,0 +1,127 @@ + + Hypervisor-Assisted Dump + + November 2007 + +The goal of hypervisor-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +As compared to kdump or other strategies, hypervisor-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- As the dump is performed, the dumped memory becomes + immediately available to the system for normal use. +-- After the dump is completed, no further reboots are + required; the system will be fully usable, and running + in it's normal, production mode on it normal kernel. + +The above can only be accomplished by coordination with, +and assistance from the hypervisor. The procedure is +as follows: + +-- When a system crashes, the hypervisor will save + the low 256MB of RAM to a previously registered + save region. It will also save system state, system + registers, and hardware PTE's. + +-- After the low 256MB area has been saved, the + hypervisor will reset PCI and other hardware state. + It will *not* clear RAM. It will then launch the + bootloader, as normal. + +-- The freshly booted kernel will notice that there + is a new node (ibm,dump-kernel) in the device tree, + indicating that there is crash data available from + a previous boot. It will boot into only 256MB of RAM, + reserving the rest of system memory. + +-- Userspace tools will parse /sys/kernel/release_region + and read /proc/vmcore to obtain the contents of memory, + which holds the previous crashed kernel. The userspace + tools may copy this info to disk, or network, nas, san, + iscsi, etc. as desired. + + For Example: the values in /sys/kernel/release-region + would look something like this (address-range pairs). + CPU:0x177fee000-0x1: HPTE:0x177ffe020-0x1000: / + DUMP:0x177fff020-0x1000, 0x1000-0x16F1D370A + +-- As the userspace tools complete saving a portion of + dump, they echo an offset and size to + /sys/kernel/release_region to release the reserved + memory back to general use. + + An example of this is: + echo 0x4000 0x1000 /sys/kernel/release_region + which will release 256MB at the 1GB boundary. + +Please note that the hypervisor-assisted dump feature +is only available on Power6-based systems with recent +firmware versions. + +Implementation details: +-- + +During boot, a check is made to see if firmware supports +this feature on this particular machine. If it does, then +we check to see if a active dump is waiting for us. If yes +then everything but 256 MB of RAM is reserved during early +boot. This area is released once we collect a dump from user +land scripts that are run. If there is dump data, then +the /sys/kernel/release_region file is created, and +the reserved memory is held. + +If there is no waiting dump data, then only the highest +256MB of the ram is reserved as a scratch area. This area +is *not* be released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy +of the low 256MB in the case a crash does occur. See, +however, open issues below, as to whether +such a reserved region is really needed. + +Currently the dump will be copied from /proc/vmcore to a +a new file upon user intervention. The starting address +to be read and the range for each data point in provided +in /sys/kernel/release_region. + +The tools to examine the dump will be same as the ones +used for kdump. + +General notes: +-- +Security: please note that there are potential security issues +with any sort of dump mechanism. In particular, plaintext +(unencrypted) data, and possibly passwords, may be present in +the dump data. Userspace tools must take adequate precautions to +preserve security. + +Open issues/ToDo: + + o The various code paths that tell the hypervisor that a crash + occurred, vs. it simply being a normal reboot, should be + reviewed, and possibly clarified/fixed. + + o Instead of using /sys/kernel, should there be a /sys/dump + instead
[PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
Check to see if there actually is data from a previously crashed kernel waiting. If so, Allow user-sapce tools to grab the data (by reading /proc/kcore). When user-space finishes dumping a section, it must release that memory by writing to sysfs. For example, echo 0x4000 0x1000 /sys/kernel/release_region will release 256MB starting at the 1GB. The released memory becomes free for general use. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Signed-off-by: Manish Ahuja [EMAIL PROTECTED] -- arch/powerpc/platforms/pseries/phyp_dump.c | 81 +++-- 1 file changed, 76 insertions(+), 5 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-18 03:23:47.0 -0600 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-18 04:32:13.0 -0600 @@ -12,18 +12,23 @@ */ #include linux/init.h +#include linux/kobject.h #include linux/mm.h +#include linux/of.h #include linux/pfn.h #include linux/swap.h +#include linux/sysfs.h #include asm/page.h #include asm/phyp_dump.h #include asm/machdep.h +#include asm/rtas.h /* Global, used to communicate data between early boot and late boot */ static struct phyp_dump phyp_dump_global; struct phyp_dump *phyp_dump_info = phyp_dump_global; +/* - */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -53,18 +58,84 @@ release_memory_range(unsigned long start } } -static int __init phyp_dump_setup(void) +/* - */ +/** + * sysfs_release_region -- sysfs interface to release memory range. + * + * Usage: + * echo start addr length /sys/kernel/release_region + * + * Example: + * echo 0x4000 0x1000 /sys/kernel/release_region + * + * will release 256MB starting at 1GB. + */ +static ssize_t store_release_region(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) { + unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; + ssize_t ret; + + ret = sscanf(buf, %lx %lx, start_addr, length); + if (ret != 2) + return -EINVAL; + + /* Range-check - don't free any reserved memory that +* wasn't reserved for phyp-dump */ + if (start_addr phyp_dump_info-init_reserve_start) + start_addr = phyp_dump_info-init_reserve_start; + + end_addr = phyp_dump_info-init_reserve_start + + phyp_dump_info-init_reserve_size; + if (start_addr+length end_addr) + length = end_addr - start_addr; + + /* Release the region of memory assed in by user */ + start_pfn = PFN_DOWN(start_addr); + nr_pages = PFN_DOWN(length); + release_memory_range(start_pfn, nr_pages); + + return count; +} + +static struct kobj_attribute rr = __ATTR(release_region, 0600, +NULL, store_release_region); + +static int __init phyp_dump_setup(void) +{ + struct device_node *rtas; + const int *dump_header = NULL; + int header_len = 0; + int rc; /* If no memory was reserved in early boot, there is nothing to do */ if (phyp_dump_info-init_reserve_size == 0) return 0; - /* Release memory that was reserved in early boot */ - start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); - nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); - release_memory_range(start_pfn, nr_pages); + /* Return if phyp dump not supported */ + if (!phyp_dump_info-phyp_dump_configured) + return -ENOSYS; + + /* Is there dump data waiting for us? */ + rtas = of_find_node_by_path(/rtas); + if (rtas) { + dump_header = of_get_property(rtas, ibm,kernel-dump, + header_len); + of_node_put(rtas); + } + + if (dump_header == NULL) + return 0; + + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ + rc = sysfs_create_file(kernel_kobj, rr.attr); + if (rc) { + printk(KERN_ERR phyp-dump: unable to create sysfs file (%d)\n, + rc); + return 0; + } return 0; } ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 4/8] pseries: phyp dump: register dump area.
Set up the actual dump header, register it with the hypervisor. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] -- arch/powerpc/platforms/pseries/phyp_dump.c | 137 +++-- 1 file changed, 131 insertions(+), 6 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-18 03:26:56.0 -0600 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-18 04:30:28.0 -0600 @@ -28,6 +28,117 @@ static struct phyp_dump phyp_dump_global; struct phyp_dump *phyp_dump_info = phyp_dump_global; +static int ibm_configure_kernel_dump; +/* - */ +/* RTAS interfaces to declare the dump regions */ + +struct dump_section { + u32 dump_flags; + u16 source_type; + u16 error_flags; + u64 source_address; + u64 source_length; + u64 length_copied; + u64 destination_address; +}; + +struct phyp_dump_header { + u32 version; + u16 num_of_sections; + u16 status; + + u32 first_offset_section; + u32 dump_disk_section; + u64 block_num_dd; + u64 num_of_blocks_dd; + u32 offset_dd; + u32 maxtime_to_auto; + /* No dump disk path string used */ + + struct dump_section cpu_data; + struct dump_section hpte_data; + struct dump_section kernel_data; +}; + +/* The dump header *must be* in low memory, so .bss it */ +static struct phyp_dump_header phdr; + +#define NUM_DUMP_SECTIONS 3 +#define DUMP_HEADER_VERSION 0x1 +#define DUMP_REQUEST_FLAG 0x1 +#define DUMP_SOURCE_CPU 0x0001 +#define DUMP_SOURCE_HPTE 0x0002 +#define DUMP_SOURCE_RMO 0x0011 + +/** + * init_dump_header() - initialize the header declaring a dump + * Returns: length of dump save area. + * + * When the hypervisor saves crashed state, it needs to put + * it somewhere. The dump header tells the hypervisor where + * the data can be saved. + */ +static unsigned long init_dump_header(struct phyp_dump_header *ph) +{ + unsigned long addr_offset = 0; + + /* Set up the dump header */ + ph-version = DUMP_HEADER_VERSION; + ph-num_of_sections = NUM_DUMP_SECTIONS; + ph-status = 0; + + ph-first_offset_section = + (u32)offsetof(struct phyp_dump_header, cpu_data); + ph-dump_disk_section = 0; + ph-block_num_dd = 0; + ph-num_of_blocks_dd = 0; + ph-offset_dd = 0; + + ph-maxtime_to_auto = 0; /* disabled */ + + /* The first two sections are mandatory */ + ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG; + ph-cpu_data.source_type = DUMP_SOURCE_CPU; + ph-cpu_data.source_address = 0; + ph-cpu_data.source_length = phyp_dump_info-cpu_state_size; + ph-cpu_data.destination_address = addr_offset; + addr_offset += phyp_dump_info-cpu_state_size; + + ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG; + ph-hpte_data.source_type = DUMP_SOURCE_HPTE; + ph-hpte_data.source_address = 0; + ph-hpte_data.source_length = phyp_dump_info-hpte_region_size; + ph-hpte_data.destination_address = addr_offset; + addr_offset += phyp_dump_info-hpte_region_size; + + /* This section describes the low kernel region */ + ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG; + ph-kernel_data.source_type = DUMP_SOURCE_RMO; + ph-kernel_data.source_address = PHYP_DUMP_RMR_START; + ph-kernel_data.source_length = PHYP_DUMP_RMR_END; + ph-kernel_data.destination_address = addr_offset; + addr_offset += ph-kernel_data.source_length; + + return addr_offset; +} + +static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 1, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) + printk(KERN_ERR phyp-dump: unexpected error (%d) on + register\n, rc); +} + /* - */ /** * release_memory_range -- release memory previously lmb_reserved @@ -106,7 +217,9 @@ static struct kobj_attribute rr = __ATTR static int __init phyp_dump_setup(void) { struct device_node *rtas; - const int *dump_header = NULL; + const struct phyp_dump_header *dump_header = NULL; + unsigned long dump_area_start; + unsigned long dump_area_length; int header_len = 0; int rc; @@ -118,7 +231,13 @@ static int __init phyp_dump_setup(void) if (!phyp_dump_info-phyp_dump_configured
[PATCH 5/8] pseries: phyp dump: debugging print routines.
Provide some basic debugging support. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] - arch/powerpc/platforms/pseries/phyp_dump.c | 61 - 1 file changed, 59 insertions(+), 2 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-18 03:30:53.0 -0600 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-18 04:25:19.0 -0600 @@ -122,6 +122,61 @@ static unsigned long init_dump_header(st return addr_offset; } +static void print_dump_header(const struct phyp_dump_header *ph) +{ +#ifdef DEBUG + printk(KERN_INFO dump header:\n); + /* setup some ph-sections required */ + printk(KERN_INFO version = %d\n, ph-version); + printk(KERN_INFO Sections = %d\n, ph-num_of_sections); + printk(KERN_INFO Status = 0x%x\n, ph-status); + + /* No ph-disk, so all should be set to 0 */ + printk(KERN_INFO Offset to first section 0x%x\n, + ph-first_offset_section); + printk(KERN_INFO dump disk sections should be zero\n); + printk(KERN_INFO dump disk section = %d\n, ph-dump_disk_section); + printk(KERN_INFO block num = %ld\n, ph-block_num_dd); + printk(KERN_INFO number of blocks = %ld\n, ph-num_of_blocks_dd); + printk(KERN_INFO dump disk offset = %d\n, ph-offset_dd); + printk(KERN_INFO Max auto time= %d\n, ph-maxtime_to_auto); + + /*set cpu state and hpte states as well scratch pad area */ + printk(KERN_INFO CPU AREA \n); + printk(KERN_INFO cpu dump_flags =%d\n, ph-cpu_data.dump_flags); + printk(KERN_INFO cpu source_type =%d\n, ph-cpu_data.source_type); + printk(KERN_INFO cpu error_flags =%d\n, ph-cpu_data.error_flags); + printk(KERN_INFO cpu source_address =%lx\n, + ph-cpu_data.source_address); + printk(KERN_INFO cpu source_length =%lx\n, + ph-cpu_data.source_length); + printk(KERN_INFO cpu length_copied =%lx\n, + ph-cpu_data.length_copied); + + printk(KERN_INFO HPTE AREA \n); + printk(KERN_INFO HPTE dump_flags =%d\n, ph-hpte_data.dump_flags); + printk(KERN_INFO HPTE source_type =%d\n, ph-hpte_data.source_type); + printk(KERN_INFO HPTE error_flags =%d\n, ph-hpte_data.error_flags); + printk(KERN_INFO HPTE source_address =%lx\n, + ph-hpte_data.source_address); + printk(KERN_INFO HPTE source_length =%lx\n, + ph-hpte_data.source_length); + printk(KERN_INFO HPTE length_copied =%lx\n, + ph-hpte_data.length_copied); + + printk(KERN_INFO SRSD AREA \n); + printk(KERN_INFO SRSD dump_flags =%d\n, ph-kernel_data.dump_flags); + printk(KERN_INFO SRSD source_type =%d\n, ph-kernel_data.source_type); + printk(KERN_INFO SRSD error_flags =%d\n, ph-kernel_data.error_flags); + printk(KERN_INFO SRSD source_address =%lx\n, + ph-kernel_data.source_address); + printk(KERN_INFO SRSD source_length =%lx\n, + ph-kernel_data.source_length); + printk(KERN_INFO SRSD length_copied =%lx\n, + ph-kernel_data.length_copied); +#endif +} + static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; @@ -134,9 +189,11 @@ static void register_dump_area(struct ph 1, ph, sizeof(struct phyp_dump_header)); } while (rtas_busy_delay(rc)); - if (rc) + if (rc) { printk(KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); + print_dump_header(ph); + } } /* - */ @@ -245,8 +302,8 @@ static int __init phyp_dump_setup(void) of_node_put(rtas); } + print_dump_header(dump_header); dump_area_length = init_dump_header(phdr); - /* align down */ dump_area_start = phyp_dump_info-init_reserve_start PAGE_MASK; ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.
Routines to a. invalidate dump b. Calculate region that is reserved and needs to be freed. This is exported through sysfs interface. Unregister has been removed for now as it wasn't being used. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] - --- arch/powerpc/platforms/pseries/phyp_dump.c | 83 ++--- include/asm-powerpc/phyp_dump.h|3 + 2 files changed, 80 insertions(+), 6 deletions(-) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-18 04:25:19.0 -0600 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-18 04:25:32.0 -0600 @@ -69,6 +69,10 @@ static struct phyp_dump_header phdr; #define DUMP_SOURCE_CPU 0x0001 #define DUMP_SOURCE_HPTE 0x0002 #define DUMP_SOURCE_RMO 0x0011 +#define DUMP_ERROR_FLAG 0x2000 +#define DUMP_TRIGGERED 0x4000 +#define DUMP_PERFORMED 0x8000 + /** * init_dump_header() - initialize the header declaring a dump @@ -180,9 +184,15 @@ static void print_dump_header(const stru static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; - ph-cpu_data.destination_address += addr; - ph-hpte_data.destination_address += addr; - ph-kernel_data.destination_address += addr; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + /* ToDo Invalidate kdump and free memory range. */ do { rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, @@ -196,6 +206,30 @@ static void register_dump_area(struct ph } } +static +void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 2, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) { + printk(KERN_ERR phyp-dump: unexpected error (%d) + on invalidate\n, rc); + print_dump_header(ph); + } +} + /* - */ /** * release_memory_range -- release memory previously lmb_reserved @@ -206,8 +240,8 @@ static void register_dump_area(struct ph * lmb_reserved in early boot. The released memory becomes * available for genreal use. */ -static void -release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +static +void release_memory_range(unsigned long start_pfn, unsigned long nr_pages) { struct page *rpage; unsigned long end_pfn; @@ -268,8 +302,29 @@ static ssize_t store_release_region(stru return count; } +static ssize_t show_release_region(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + u64 second_addr_range; + + /* total reserved size - start of scratch area */ + second_addr_range = phyp_dump_info-init_reserve_size - + phyp_dump_info-reserved_scratch_size; + return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx: +DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n, + phdr.cpu_data.destination_address, + phdr.cpu_data.length_copied, + phdr.hpte_data.destination_address, + phdr.hpte_data.length_copied, + phdr.kernel_data.destination_address, + phdr.kernel_data.length_copied, + phyp_dump_info-init_reserve_start, + second_addr_range); +} + static struct kobj_attribute rr = __ATTR(release_region, 0600, -NULL, store_release_region); + show_release_region, + store_release_region); static int __init phyp_dump_setup(void) { @@ -312,6 +367,22 @@ static int __init phyp_dump_setup(void) return 0; } + /* re-register the dump area, if old dump was invalid */ + if ((dump_header) (dump_header-status DUMP_ERROR_FLAG)) { + invalidate_last_dump(phdr, dump_area_start); + register_dump_area(phdr, dump_area_start); + return 0; + } + + if (dump_header) { + phyp_dump_info-reserved_scratch_addr
[PATCH 7/8] pseries: phyp dump: Tracking memory range freed.
This patch tracks the size freed. For now it does a simple rudimentary calculation of the ranges freed. The idea is to keep it simple at the external shell script level and send in large chunks for now. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] - --- arch/powerpc/platforms/pseries/phyp_dump.c | 35 + 1 file changed, 35 insertions(+) Index: 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.25-rc1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-18 03:31:22.0 -0600 +++ 2.6.25-rc1/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-18 03:31:30.0 -0600 @@ -260,6 +260,39 @@ void release_memory_range(unsigned long } } +/** + * track_freed_range -- Counts the range being freed. + * Once the counter goes to zero, it re-registers dump for + * future use. + */ +static void +track_freed_range(unsigned long addr, unsigned long length) +{ + static unsigned long scratch_area_size, reserved_area_size; + + if (addr phyp_dump_info-init_reserve_start) + return; + + if ((addr = phyp_dump_info-init_reserve_start) + (addr = phyp_dump_info-init_reserve_start + +phyp_dump_info-init_reserve_size)) + reserved_area_size += length; + + if ((addr = phyp_dump_info-reserved_scratch_addr) + (addr = phyp_dump_info-reserved_scratch_addr + +phyp_dump_info-reserved_scratch_size)) + scratch_area_size += length; + + if ((reserved_area_size == phyp_dump_info-init_reserve_size) + (scratch_area_size == phyp_dump_info-reserved_scratch_size)) { + + invalidate_last_dump(phdr, + phyp_dump_info-reserved_scratch_addr); + register_dump_area(phdr, + phyp_dump_info-reserved_scratch_addr); + } +} + /* - */ /** * sysfs_release_region -- sysfs interface to release memory range. @@ -284,6 +317,8 @@ static ssize_t store_release_region(stru if (ret != 2) return -EINVAL; + track_freed_range(start_addr, length); + /* Range-check - don't free any reserved memory that * wasn't reserved for phyp-dump */ if (start_addr phyp_dump_info-init_reserve_start) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 8/8] pseries: phyp dump: config file
Add hypervisor-assisted dump to kernel config Signed-off-by: Linas Vepstas [EMAIL PROTECTED] - arch/powerpc/Kconfig | 11 +++ 1 file changed, 11 insertions(+) Index: 2.6.25-rc1/arch/powerpc/Kconfig === --- 2.6.25-rc1.orig/arch/powerpc/Kconfig2008-02-18 03:22:06.0 -0600 +++ 2.6.25-rc1/arch/powerpc/Kconfig 2008-02-18 03:22:45.0 -0600 @@ -306,6 +306,17 @@ config CRASH_DUMP Don't change this unless you know what you are doing. +config PHYP_DUMP + bool Hypervisor-assisted dump (EXPERIMENTAL) + depends on PPC_PSERIES EXPERIMENTAL + default y + help + Hypervisor-assisted dump is meant to be a kdump replacement + offering robustness and speed not possible without system + hypervisor assistence. + + If unsure, say Y + config PPCBUG_NVRAM bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC default y if PPC_PREP ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
Olof, I will run it through checkpatch before resubmitting. Thanks, Manish Olof Johansson wrote: On Thu, Feb 14, 2008 at 02:46:21PM +1100, Tony Breeds wrote: Hi Manish, Sorry for the minor nits but this should be: --- * Linas Vepstas, Manish Ahuja 2008 * Copyright 2008 IBM Corp. --- You can optionally use the '??' symbol after word 'Copyright' but you shouldn't use '(c)' anymore. Also in at least one place you've misspelt Copyright If we're going to nitpick, then I'd like to point out that the whole series needs to be run through checkpatch and at least the whitespace issues should be taken care of. I'm still not convinced that this is a useful feature compared to hardening kdump, especially now that ehea can handle kexec/kdump (patch posted the other day). But in the end it's up to Paul if he wants to take it or not, not me. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
Tony Breeds wrote: On Tue, Feb 12, 2008 at 01:11:58AM -0600, Manish Ahuja wrote: snip +static ssize_t +show_release_region(struct kset * kset, char *buf) +{ +return sprintf(buf, ola\n); +} + +static struct subsys_attribute rr = __ATTR(release_region, 0600, + show_release_region, + store_release_region); Any reason this sysfs attribute can't be write only? The show method doesn't seem needed. yes, its used later in the code. +static int __init phyp_dump_setup(void) +{ snip +/* Is there dump data waiting for us? */ +rtas = of_find_node_by_path(/rtas); +dump_header = of_get_property(rtas, ibm,kernel-dump, header_len); Hmm this isn't good. You need to check rtas != NULL. yes, will fix this as well. +if (dump_header == NULL) { +release_all(); +return 0; +} + +/* Should we create a dump_subsys, analogous to s390/ipl.c ? */ +rc = subsys_create_file(kernel_subsys, rr); +if (rc) { +printk (KERN_ERR phyp-dump: unable to create sysfs file (%d)\n, rc); +release_all(); +return 0; +} return 0; } - subsys_initcall(phyp_dump_setup); Hmm I think this really should be a: machine_subsys_initcall(pseries, phyp_dump_setup) Yours Tony linux.conf.auhttp://linux.conf.au/ || http://lca2008.linux.org.au/ Jan 28 - Feb 02 2008 The Australian Linux Technical Conference! ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.
Changed asm to asm-powerpc. Hopefully this was the last of them. -Manish Routines to a. invalidate dump b. Calculate region that is reserved and needs to be freed. This is exported through sysfs interface. Unregister has been removed for now as it wasn't being used. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] - --- arch/powerpc/platforms/pseries/phyp_dump.c | 85 + include/asm-powerpc/phyp_dump.h|3 + 2 files changed, 77 insertions(+), 11 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-13 21:21:00.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-13 21:21:48.0 -0600 @@ -69,6 +69,10 @@ static struct phyp_dump_header phdr; #define DUMP_SOURCE_CPU 0x0001 #define DUMP_SOURCE_HPTE 0x0002 #define DUMP_SOURCE_RMO 0x0011 +#define DUMP_ERROR_FLAG 0x2000 +#define DUMP_TRIGGERED 0x4000 +#define DUMP_PERFORMED 0x8000 + /** * init_dump_header() - initialize the header declaring a dump @@ -180,9 +184,15 @@ static void print_dump_header(const stru static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; - ph-cpu_data.destination_address += addr; - ph-hpte_data.destination_address += addr; - ph-kernel_data.destination_address += addr; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + /* ToDo Invalidate kdump and free memory range. */ do { rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, @@ -195,6 +205,30 @@ static void register_dump_area(struct ph } } +static +void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 2, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) { + printk (KERN_ERR phyp-dump: unexpected error (%d) + on invalidate\n, rc); + print_dump_header(ph); + } +} + /* - */ /** * release_memory_range -- release memory previously lmb_reserved @@ -205,8 +239,8 @@ static void register_dump_area(struct ph * lmb_reserved in early boot. The released memory becomes * available for genreal use. */ -static void -release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +static +void release_memory_range(unsigned long start_pfn, unsigned long nr_pages) { struct page *rpage; unsigned long end_pfn; @@ -237,8 +271,8 @@ release_memory_range(unsigned long start * * will release 256MB starting at 1GB. */ -static ssize_t -store_release_region(struct kset *kset, const char *buf, size_t count) +static +ssize_t store_release_region(struct kset *kset, const char *buf, size_t count) { unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; @@ -266,10 +300,23 @@ store_release_region(struct kset *kset, return count; } -static ssize_t -show_release_region(struct kset * kset, char *buf) +static ssize_t show_release_region(struct kset * kset, char *buf) { - return sprintf(buf, ola\n); + u64 second_addr_range; + + /* total reserved size - start of scratch area */ + second_addr_range = phyp_dump_info-init_reserve_size - + phyp_dump_info-reserved_scratch_size; + return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx: +DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n, + phdr.cpu_data.destination_address, + phdr.cpu_data.length_copied, + phdr.hpte_data.destination_address, + phdr.hpte_data.length_copied, + phdr.kernel_data.destination_address, + phdr.kernel_data.length_copied, + phyp_dump_info-init_reserve_start, + second_addr_range); } static struct subsys_attribute rr = __ATTR(release_region, 0600, @@ -293,7 +340,6 @@ static int __init phyp_dump_setup(void) if (!phyp_dump_info-phyp_dump_configured) { return -ENOSYS
[PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
Michael, Fixed. -Manish -- Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/kernel/prom.c | 50 arch/powerpc/kernel/rtas.c | 32 + arch/powerpc/platforms/pseries/Makefile|1 arch/powerpc/platforms/pseries/phyp_dump.c | 71 + include/asm-powerpc/phyp_dump.h| 38 +++ include/asm-powerpc/rtas.h |3 + 6 files changed, 195 insertions(+) Index: 2.6.24-rc5/include/asm-powerpc/phyp_dump.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ 2.6.24-rc5/include/asm-powerpc/phyp_dump.h 2008-02-12 16:12:45.0 -0600 @@ -0,0 +1,38 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyright (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _PPC64_PHYP_DUMP_H +#define _PPC64_PHYP_DUMP_H + +#ifdef CONFIG_PHYP_DUMP + +/* The RMR region will be saved for later dumping + * whenever the kernel crashes. Set this to 256MB. */ +#define PHYP_DUMP_RMR_START 0x0 +#define PHYP_DUMP_RMR_END (1UL28) + +struct phyp_dump { + /* Memory that is reserved during very early boot. */ + unsigned long init_reserve_start; + unsigned long init_reserve_size; + /* Check status during boot if dump supported, active present*/ + unsigned long phyp_dump_configured; + unsigned long phyp_dump_is_active; + /* store cpu hpte size */ + unsigned long cpu_state_size; + unsigned long hpte_region_size; +}; + +extern struct phyp_dump *phyp_dump_info; + +#endif /* CONFIG_PHYP_DUMP */ +#endif /* _PPC64_PHYP_DUMP_H */ Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 16:12:45.0 -0600 @@ -0,0 +1,71 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyrhgit (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ + +#include linux/init.h +#include linux/mm.h +#include linux/pfn.h +#include linux/swap.h + +#include asm/page.h +#include asm/phyp_dump.h + +/* Global, used to communicate data between early boot and late boot */ +static struct phyp_dump phyp_dump_global; +struct phyp_dump *phyp_dump_info = phyp_dump_global; + +/** + * release_memory_range -- release memory previously lmb_reserved + * @start_pfn: starting physical frame number + * @nr_pages: number of pages to free. + * + * This routine will release memory that had been previously + * lmb_reserved in early boot. The released memory becomes + * available for genreal use. + */ +static void +release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +{ + struct page *rpage; + unsigned long end_pfn; + long i; + + end_pfn = start_pfn + nr_pages; + + for (i=start_pfn; i = end_pfn; i++) { + rpage = pfn_to_page(i); + if (PageReserved(rpage)) { + ClearPageReserved(rpage); + init_page_count(rpage); + __free_page(rpage); + totalram_pages++; + } + } +} + +static int __init phyp_dump_setup(void) +{ + unsigned long start_pfn, nr_pages; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info-init_reserve_size == 0) + return 0; + + /* Release memory that was reserved in early boot */ + start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); + nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); + release_memory_range(start_pfn, nr_pages); + + return 0; +} + +subsys_initcall(phyp_dump_setup); Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/Makefile === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/Makefile 2008-02-12 16:11:44.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/Makefile 2008-02-12 16:12:45.0 -0600 @@ -18,3 +18,4
Re: [PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
As noted, its fixed in patch 4. If its okay for this time, I will prefer to leave it there. -Manish Stephen Rothwell wrote: Hi Manish, Just a small comment. On Tue, 12 Feb 2008 01:11:58 -0600 Manish Ahuja [EMAIL PROTECTED] wrote: +/* Is there dump data waiting for us? */ +rtas = of_find_node_by_path(/rtas); +dump_header = of_get_property(rtas, ibm,kernel-dump, header_len); You need an of_node_put(rtas) here. +if (dump_header == NULL) { +release_all(); +return 0; +} ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 4/8] pseries: phyp dump: register dump area.
For now, if we can leave this patch as is, that will be great. That move requires me to work all remaining patches as they apply uncleanly after that. I will bunch those two together functionally next time onwards. Thanks, Manish Stephen Rothwell wrote: Hi Manish, -/* Is there dump data waiting for us? */ +/* Is there dump data waiting for us? If there isn't, + * then register a new dump area, and release all of + * the rest of the reserved ram. + * + * The /rtas/ibm,kernel-dump rtas node is present only + * if there is dump data waiting for us. + */ rtas = of_find_node_by_path(/rtas); dump_header = of_get_property(rtas, ibm,kernel-dump, header_len); +of_node_put(rtas); Oh, here is the of_node_put() - you should move that to patch 3. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
Sorry, I think i sent the wrong patch file, it shouldn't have my printk statement in there. Let me re-send the correct file and let me test it once more to make sure it does the right thing. -Manish Paul Mackerras wrote: Manish Ahuja writes: Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. [snip] +static void __init reserve_crashed_mem(void) +{ +unsigned long base, size; + +if (phyp_dump_info-phyp_dump_is_active) { +/* Reserve *everything* above RMR. We'll free this real soon.*/ +base = PHYP_DUMP_RMR_END; +size = lmb_end_of_DRAM() - base; + +/* XXX crashed_ram_end is wrong, since it may be beyond +* the memory_limit, it will need to be adjusted. */ +lmb_reserve(base, size); + +phyp_dump_info-init_reserve_start = base; +phyp_dump_info-init_reserve_size = size; +} +else { +size = phyp_dump_info-cpu_state_size + +phyp_dump_info-hpte_region_size + +PHYP_DUMP_RMR_END; +base = lmb_end_of_DRAM() - size; +printk(KERN_ERR Manish reserve regular kernel space is %ld %ld\n, base, size); +lmb_reserve(base, size); This is still reserving memory even on systems that aren't running on pHyp at all. Please rework this so that no memory is reserved if the system doesn't support phyp-assisted dump. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 1/8] pseries: phyp dump: Docmentation
Basic documentation for hypervisor-assisted dump. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Documentation/powerpc/phyp-assisted-dump.txt | 127 +++ 1 file changed, 127 insertions(+) Index: 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt === --- /dev/null 1970-01-01 00:00:00.0 + +++ 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt 2008-02-12 06:38:25.0 -0600 @@ -0,0 +1,127 @@ + + Hypervisor-Assisted Dump + + November 2007 + +The goal of hypervisor-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +As compared to kdump or other strategies, hypervisor-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- As the dump is performed, the dumped memory becomes + immediately available to the system for normal use. +-- After the dump is completed, no further reboots are + required; the system will be fully usable, and running + in it's normal, production mode on it normal kernel. + +The above can only be accomplished by coordination with, +and assistance from the hypervisor. The procedure is +as follows: + +-- When a system crashes, the hypervisor will save + the low 256MB of RAM to a previously registered + save region. It will also save system state, system + registers, and hardware PTE's. + +-- After the low 256MB area has been saved, the + hypervisor will reset PCI and other hardware state. + It will *not* clear RAM. It will then launch the + bootloader, as normal. + +-- The freshly booted kernel will notice that there + is a new node (ibm,dump-kernel) in the device tree, + indicating that there is crash data available from + a previous boot. It will boot into only 256MB of RAM, + reserving the rest of system memory. + +-- Userspace tools will parse /sys/kernel/release_region + and read /proc/vmcore to obtain the contents of memory, + which holds the previous crashed kernel. The userspace + tools may copy this info to disk, or network, nas, san, + iscsi, etc. as desired. + + For Example: the values in /sys/kernel/release-region + would look something like this (address-range pairs). + CPU:0x177fee000-0x1: HPTE:0x177ffe020-0x1000: / + DUMP:0x177fff020-0x1000, 0x1000-0x16F1D370A + +-- As the userspace tools complete saving a portion of + dump, they echo an offset and size to + /sys/kernel/release_region to release the reserved + memory back to general use. + + An example of this is: + echo 0x4000 0x1000 /sys/kernel/release_region + which will release 256MB at the 1GB boundary. + +Please note that the hypervisor-assisted dump feature +is only available on Power6-based systems with recent +firmware versions. + +Implementation details: +-- + +During boot, a check is made to see if firmware supports +this feature on this particular machine. If it does, then +we check to see if a active dump is waiting for us. If yes +then everything but 256 MB of RAM is reserved during early +boot. This area is released once we collect a dump from user +land scripts that are run. If there is dump data, then +the /sys/kernel/release_region file is created, and +the reserved memory is held. + +If there is no waiting dump data, then only the highest +256MB of the ram is reserved as a scratch area. This area +is *not* be released: this region will be kept permanently +reserved, so that it can act as a receptacle for a copy +of the low 256MB in the case a crash does occur. See, +however, open issues below, as to whether +such a reserved region is really needed. + +Currently the dump will be copied from /proc/vmcore to a +a new file upon user intervention. The starting address +to be read and the range for each data point in provided +in /sys/kernel/release_region. + +The tools to examine the dump will be same as the ones +used for kdump. + +General notes: +-- +Security: please note that there are potential security issues +with any sort of dump mechanism. In particular, plaintext +(unencrypted) data, and possibly passwords, may be present in +the dump data. Userspace tools must take adequate precautions to +preserve security. + +Open issues/ToDo: + + o The various code paths that tell the hypervisor that a crash + occurred, vs. it simply being a normal reboot, should be + reviewed, and possibly clarified/fixed. + + o Instead of using /sys/kernel, should there be a /sys/dump + instead
[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
The following series of patches implement a basic framework for hypervisor-assisted dump. The very first patch provides documentation explaining what this is :-). Yes, its supposed to be an improvement over kdump. A list of open issues / todo list is included in the documentation. It also appears that the not-yet-released firmware versions this was tested on are still, ahem, incomplete; this work is also pending. I have included most of the changes requested. Although, I did find one or two, fixed in a later patch file rather than the first location they appeared at. Also it now does not block any memory on machines other than power6 boxes which have the requisite firmware. This is from a power5 box. from jal-lp6 a power5 machine. . Phyp-dump not supported on this hardware Using pSeries machine description console [udbg-1] enabled ... Since I changed a few more things, I am reposting all the patches. -- Manish Linas. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 5/8] pseries: phyp dump: debugging print routines.
Provide some basic debugging support. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepsts [EMAIL PROTECTED] - arch/powerpc/platforms/pseries/phyp_dump.c | 64 +++-- 1 file changed, 60 insertions(+), 4 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:01.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:06.0 -0600 @@ -2,7 +2,7 @@ * Hypervisor-assisted dump * * Linas Vepstas, Manish Ahuja 2007 - * Copyrhgit (c) 2007 IBM Corp. + * Copyright (c) 2007 IBM Corp. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -122,6 +122,61 @@ static unsigned long init_dump_header(st return addr_offset; } +static void print_dump_header(const struct phyp_dump_header *ph) +{ +#ifdef DEBUG + printk(KERN_INFO dump header:\n); + /* setup some ph-sections required */ + printk(KERN_INFO version = %d\n, ph-version); + printk(KERN_INFO Sections = %d\n, ph-num_of_sections); + printk(KERN_INFO Status = 0x%x\n, ph-status); + + /* No ph-disk, so all should be set to 0 */ + printk(KERN_INFO Offset to first section 0x%x\n, + ph-first_offset_section); + printk(KERN_INFO dump disk sections should be zero\n); + printk(KERN_INFO dump disk section = %d\n, ph-dump_disk_section); + printk(KERN_INFO block num = %ld\n, ph-block_num_dd); + printk(KERN_INFO number of blocks = %ld\n, ph-num_of_blocks_dd); + printk(KERN_INFO dump disk offset = %d\n, ph-offset_dd); + printk(KERN_INFO Max auto time= %d\n, ph-maxtime_to_auto); + + /*set cpu state and hpte states as well scratch pad area */ + printk(KERN_INFO CPU AREA \n); + printk(KERN_INFO cpu dump_flags =%d\n, ph-cpu_data.dump_flags); + printk(KERN_INFO cpu source_type =%d\n, ph-cpu_data.source_type); + printk(KERN_INFO cpu error_flags =%d\n, ph-cpu_data.error_flags); + printk(KERN_INFO cpu source_address =%lx\n, + ph-cpu_data.source_address); + printk(KERN_INFO cpu source_length =%lx\n, + ph-cpu_data.source_length); + printk(KERN_INFO cpu length_copied =%lx\n, + ph-cpu_data.length_copied); + + printk(KERN_INFO HPTE AREA \n); + printk(KERN_INFO HPTE dump_flags =%d\n, ph-hpte_data.dump_flags); + printk(KERN_INFO HPTE source_type =%d\n, ph-hpte_data.source_type); + printk(KERN_INFO HPTE error_flags =%d\n, ph-hpte_data.error_flags); + printk(KERN_INFO HPTE source_address =%lx\n, + ph-hpte_data.source_address); + printk(KERN_INFO HPTE source_length =%lx\n, + ph-hpte_data.source_length); + printk(KERN_INFO HPTE length_copied =%lx\n, + ph-hpte_data.length_copied); + + printk(KERN_INFO SRSD AREA \n); + printk(KERN_INFO SRSD dump_flags =%d\n, ph-kernel_data.dump_flags); + printk(KERN_INFO SRSD source_type =%d\n, ph-kernel_data.source_type); + printk(KERN_INFO SRSD error_flags =%d\n, ph-kernel_data.error_flags); + printk(KERN_INFO SRSD source_address =%lx\n, + ph-kernel_data.source_address); + printk(KERN_INFO SRSD source_length =%lx\n, + ph-kernel_data.source_length); + printk(KERN_INFO SRSD length_copied =%lx\n, + ph-kernel_data.length_copied); +#endif +} + static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; @@ -134,9 +189,9 @@ static void register_dump_area(struct ph 1, ph, sizeof(struct phyp_dump_header)); } while (rtas_busy_delay(rc)); - if (rc) - { - printk (KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); + if (rc) { + printk(KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); + print_dump_header(ph); } } @@ -238,6 +293,7 @@ static int __init phyp_dump_setup(void) if (!phyp_dump_info-phyp_dump_configured) { return -ENOSYS; } + print_dump_header(dump_header); /* Is there dump data waiting for us? If there isn't, * then register a new dump area, and release all of ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 6/8] pseries: phyp dump: Invalidate and print dump areas.
Routines to a. invalidate dump b. Calculate region that is reserved and needs to be freed. This is exported through sysfs interface. Unregister has been removed for now as it wasn't being used. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] - --- arch/powerpc/platforms/pseries/phyp_dump.c | 85 + include/asm/phyp_dump.h|6 +- 2 files changed, 79 insertions(+), 12 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:06.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:17.0 -0600 @@ -69,6 +69,10 @@ static struct phyp_dump_header phdr; #define DUMP_SOURCE_CPU 0x0001 #define DUMP_SOURCE_HPTE 0x0002 #define DUMP_SOURCE_RMO 0x0011 +#define DUMP_ERROR_FLAG 0x2000 +#define DUMP_TRIGGERED 0x4000 +#define DUMP_PERFORMED 0x8000 + /** * init_dump_header() - initialize the header declaring a dump @@ -180,9 +184,15 @@ static void print_dump_header(const stru static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; - ph-cpu_data.destination_address += addr; - ph-hpte_data.destination_address += addr; - ph-kernel_data.destination_address += addr; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + /* ToDo Invalidate kdump and free memory range. */ do { rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, @@ -195,6 +205,30 @@ static void register_dump_area(struct ph } } +static +void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 2, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) { + printk (KERN_ERR phyp-dump: unexpected error (%d) + on invalidate\n, rc); + print_dump_header(ph); + } +} + /* - */ /** * release_memory_range -- release memory previously lmb_reserved @@ -205,8 +239,8 @@ static void register_dump_area(struct ph * lmb_reserved in early boot. The released memory becomes * available for genreal use. */ -static void -release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +static +void release_memory_range(unsigned long start_pfn, unsigned long nr_pages) { struct page *rpage; unsigned long end_pfn; @@ -237,8 +271,8 @@ release_memory_range(unsigned long start * * will release 256MB starting at 1GB. */ -static ssize_t -store_release_region(struct kset *kset, const char *buf, size_t count) +static +ssize_t store_release_region(struct kset *kset, const char *buf, size_t count) { unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; @@ -266,10 +300,23 @@ store_release_region(struct kset *kset, return count; } -static ssize_t -show_release_region(struct kset * kset, char *buf) +static ssize_t show_release_region(struct kset * kset, char *buf) { - return sprintf(buf, ola\n); + u64 second_addr_range; + + /* total reserved size - start of scratch area */ + second_addr_range = phyp_dump_info-init_reserve_size - + phyp_dump_info-reserved_scratch_size; + return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx: +DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n, + phdr.cpu_data.destination_address, + phdr.cpu_data.length_copied, + phdr.hpte_data.destination_address, + phdr.hpte_data.length_copied, + phdr.kernel_data.destination_address, + phdr.kernel_data.length_copied, + phyp_dump_info-init_reserve_start, + second_addr_range); } static struct subsys_attribute rr = __ATTR(release_region, 0600, @@ -293,7 +340,6 @@ static int __init phyp_dump_setup(void) if (!phyp_dump_info-phyp_dump_configured) { return -ENOSYS; } - print_dump_header(dump_header); /* Is there dump data waiting for us? If there isn't, * then register a new dump
[PATCH 7/8] pseries: phyp dump: Tracking memory range freed.
This patch tracks the size freed. For now it does a simple rudimentary calculation of the ranges freed. The idea is to keep it simple at the external shell script level and send in large chunks for now. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] - --- arch/powerpc/platforms/pseries/phyp_dump.c | 35 + 1 file changed, 35 insertions(+) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:17.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:21.0 -0600 @@ -259,6 +259,39 @@ void release_memory_range(unsigned long } } +/** + * track_freed_range -- Counts the range being freed. + * Once the counter goes to zero, it re-registers dump for + * future use. + */ +static void +track_freed_range(unsigned long addr, unsigned long length) +{ + static unsigned long scratch_area_size, reserved_area_size; + + if (addr phyp_dump_info-init_reserve_start) + return; + + if ((addr = phyp_dump_info-init_reserve_start) + (addr = phyp_dump_info-init_reserve_start + +phyp_dump_info-init_reserve_size)) + reserved_area_size += length; + + if ((addr = phyp_dump_info-reserved_scratch_addr) + (addr = phyp_dump_info-reserved_scratch_addr + +phyp_dump_info-reserved_scratch_size)) + scratch_area_size += length; + + if ((reserved_area_size == phyp_dump_info-init_reserve_size) + (scratch_area_size == phyp_dump_info-reserved_scratch_size)) { + + invalidate_last_dump(phdr, + phyp_dump_info-reserved_scratch_addr); + register_dump_area (phdr, + phyp_dump_info-reserved_scratch_addr); + } +} + /* - */ /** * sysfs_release_region -- sysfs interface to release memory range. @@ -282,6 +315,8 @@ ssize_t store_release_region(struct kset if (ret != 2) return -EINVAL; + track_freed_range(start_addr, length); + /* Range-check - don't free any reserved memory that * wasn't reserved for phyp-dump */ if (start_addr phyp_dump_info-init_reserve_start) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 8/8] pseries: phyp dump: config file
Add hypervisor-assisted dump to kernel config Signed-off-by: Linas Vepstas [EMAIL PROTECTED] - arch/powerpc/Kconfig | 11 +++ 1 file changed, 11 insertions(+) Index: 2.6.24-rc5/arch/powerpc/Kconfig === --- 2.6.24-rc5.orig/arch/powerpc/Kconfig2008-02-12 06:12:08.0 -0600 +++ 2.6.24-rc5/arch/powerpc/Kconfig 2008-02-12 06:13:24.0 -0600 @@ -266,6 +266,17 @@ config CRASH_DUMP Don't change this unless you know what you are doing. +config PHYP_DUMP + bool Hypervisor-assisted dump (EXPERIMENTAL) + depends on PPC_PSERIES EXPERIMENTAL + default y + help + Hypervisor-assisted dump is meant to be a kdump replacement + offering robustness and speed not possible without system + hypervisor assistence. + + If unsure, say Y + config PPCBUG_NVRAM bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC default y if PPC_PREP ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
Check to see if there actually is data from a previously crashed kernel waiting. If so, Allow user-space tools to grab the data (by reading /proc/kcore). When user-space finishes dumping a section, it must release that memory by writing to sysfs. For example, echo 0x4000 0x1000 /sys/kernel/release_region will release 256MB starting at the 1GB. The released memory becomes free for general use. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] -- arch/powerpc/platforms/pseries/phyp_dump.c | 88 +++-- 1 file changed, 82 insertions(+), 6 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:12:37.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:12:55.0 -0600 @@ -12,17 +12,24 @@ */ #include linux/init.h +#include linux/kobject.h #include linux/mm.h +#include linux/of.h #include linux/pfn.h #include linux/swap.h +#include linux/sysfs.h #include asm/page.h #include asm/phyp_dump.h +#include asm/rtas.h /* Global, used to communicate data between early boot and late boot */ static struct phyp_dump phyp_dump_global; struct phyp_dump *phyp_dump_info = phyp_dump_global; +static int ibm_configure_kernel_dump; + +/* - */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -52,20 +59,89 @@ release_memory_range(unsigned long start } } -static int __init phyp_dump_setup(void) +/* - */ +/** + * sysfs_release_region -- sysfs interface to release memory range. + * + * Usage: + * echo start addr length /sys/kernel/release_region + * + * Example: + * echo 0x4000 0x1000 /sys/kernel/release_region + * + * will release 256MB starting at 1GB. + */ +static ssize_t +store_release_region(struct kset *kset, const char *buf, size_t count) { + unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; + ssize_t ret; + + ret = sscanf(buf, %lx %lx, start_addr, length); + if (ret != 2) + return -EINVAL; + + /* Range-check - don't free any reserved memory that +* wasn't reserved for phyp-dump */ + if (start_addr phyp_dump_info-init_reserve_start) + start_addr = phyp_dump_info-init_reserve_start; + + end_addr = phyp_dump_info-init_reserve_start + + phyp_dump_info-init_reserve_size; + if (start_addr+length end_addr) + length = end_addr - start_addr; + + /* Release the region of memory assed in by user */ + start_pfn = PFN_DOWN(start_addr); + nr_pages = PFN_DOWN(length); + release_memory_range (start_pfn, nr_pages); + + return count; +} + +static ssize_t +show_release_region(struct kset * kset, char *buf) +{ + return sprintf(buf, ola\n); +} + +static struct subsys_attribute rr = __ATTR(release_region, 0600, +show_release_region, +store_release_region); + +static int __init phyp_dump_setup(void) +{ + struct device_node *rtas; + const int *dump_header; + int header_len = 0; + int rc; /* If no memory was reserved in early boot, there is nothing to do */ if (phyp_dump_info-init_reserve_size == 0) return 0; - /* Release memory that was reserved in early boot */ - start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); - nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); - release_memory_range(start_pfn, nr_pages); + /* Return if phyp dump not supported */ + if (!phyp_dump_info-phyp_dump_configured) { + return -ENOSYS; + } + + /* Is there dump data waiting for us? */ + rtas = of_find_node_by_path(/rtas); + dump_header = of_get_property(rtas, ibm,kernel-dump, header_len); + if (dump_header == NULL) { + release_all(); + return 0; + } + + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ + rc = subsys_create_file(kernel_subsys, rr); + if (rc) { + printk (KERN_ERR phyp-dump: unable to create sysfs file (%d)\n, rc); + release_all(); + return 0; + } return 0; } - subsys_initcall(phyp_dump_setup); ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 4/8] pseries: phyp dump: register dump area.
Set up the actual dump header, register it with the hypervisor. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] -- arch/powerpc/platforms/pseries/phyp_dump.c | 136 +++-- 1 file changed, 129 insertions(+), 7 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:12:55.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-02-12 06:13:01.0 -0600 @@ -30,6 +30,117 @@ struct phyp_dump *phyp_dump_info = phyp static int ibm_configure_kernel_dump; /* - */ +/* RTAS interfaces to declare the dump regions */ + +struct dump_section { + u32 dump_flags; + u16 source_type; + u16 error_flags; + u64 source_address; + u64 source_length; + u64 length_copied; + u64 destination_address; +}; + +struct phyp_dump_header { + u32 version; + u16 num_of_sections; + u16 status; + + u32 first_offset_section; + u32 dump_disk_section; + u64 block_num_dd; + u64 num_of_blocks_dd; + u32 offset_dd; + u32 maxtime_to_auto; + /* No dump disk path string used */ + + struct dump_section cpu_data; + struct dump_section hpte_data; + struct dump_section kernel_data; +}; + +/* The dump header *must be* in low memory, so .bss it */ +static struct phyp_dump_header phdr; + +#define NUM_DUMP_SECTIONS 3 +#define DUMP_HEADER_VERSION 0x1 +#define DUMP_REQUEST_FLAG 0x1 +#define DUMP_SOURCE_CPU 0x0001 +#define DUMP_SOURCE_HPTE 0x0002 +#define DUMP_SOURCE_RMO 0x0011 + +/** + * init_dump_header() - initialize the header declaring a dump + * Returns: length of dump save area. + * + * When the hypervisor saves crashed state, it needs to put + * it somewhere. The dump header tells the hypervisor where + * the data can be saved. + */ +static unsigned long init_dump_header(struct phyp_dump_header *ph) +{ + unsigned long addr_offset = 0; + + /* Set up the dump header */ + ph-version = DUMP_HEADER_VERSION; + ph-num_of_sections = NUM_DUMP_SECTIONS; + ph-status = 0; + + ph-first_offset_section = + (u32)offsetof(struct phyp_dump_header, cpu_data); + ph-dump_disk_section = 0; + ph-block_num_dd = 0; + ph-num_of_blocks_dd = 0; + ph-offset_dd = 0; + + ph-maxtime_to_auto = 0; /* disabled */ + + /* The first two sections are mandatory */ + ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG; + ph-cpu_data.source_type = DUMP_SOURCE_CPU; + ph-cpu_data.source_address = 0; + ph-cpu_data.source_length = phyp_dump_info-cpu_state_size; + ph-cpu_data.destination_address = addr_offset; + addr_offset += phyp_dump_info-cpu_state_size; + + ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG; + ph-hpte_data.source_type = DUMP_SOURCE_HPTE; + ph-hpte_data.source_address = 0; + ph-hpte_data.source_length = phyp_dump_info-hpte_region_size; + ph-hpte_data.destination_address = addr_offset; + addr_offset += phyp_dump_info-hpte_region_size; + + /* This section describes the low kernel region */ + ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG; + ph-kernel_data.source_type = DUMP_SOURCE_RMO; + ph-kernel_data.source_address = PHYP_DUMP_RMR_START; + ph-kernel_data.source_length = PHYP_DUMP_RMR_END; + ph-kernel_data.destination_address = addr_offset; + addr_offset += ph-kernel_data.source_length; + + return addr_offset; +} + +static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 1, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) + { + printk (KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); + } +} + +/* - */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -113,7 +224,9 @@ static struct subsys_attribute rr = __AT static int __init phyp_dump_setup(void) { struct device_node *rtas; - const int *dump_header; + const struct phyp_dump_header *dump_header; + unsigned long dump_area_start; + unsigned long dump_area_length; int header_len = 0; int rc; @@ -126,22 +239,31 @@ static int __init phyp_dump_setup(void) return -ENOSYS; } - /* Is there dump data waiting for us
[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
The following series of patches implement a basic framework for hypervisor-assisted dump. The very first patch provides documentation explaining what this is :-) . Yes, its supposed to be an improvement over kdump. A list of open issues / todo list is included in the documentation. It also appears that the not-yet-released firmware versions this was tested on are still, ahem, incomplete; this work is also pending. I have included most of the changes requested. Although, I did find one or two, fixed in a later patch file rather than the first location they appeared at. -- Manish Linas. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
The following series of patches implement a basic framework for hypervisor-assisted dump. The very first patch provides documentation explaining what this is:-) . Yes, its supposed to be an improvement over kdump. A list of open issues / todo list is included in the documentation. It also appears that the not-yet-released firmware versions this was tested on are still, ahem, incomplete; this work is also pending. I have included most of the changes requested. Although, I did find one or two, fixed in a later patch file rather than the first location they appeared at. -- Manish Linas. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 3/8] pseries: phyp dump: use sysfs to release reserved mem
Check to see if there actually is data from a previously crashed kernel waiting. If so, Allow user-sapce tools to grab the data (by reading /proc/kcore). When user-space finishes dumping a section, it must release that memory by writing to sysfs. For example, echo 0x4000 0x1000 /sys/kernel/release_region will release 256MB starting at the 1GB. The released memory becomes free for general use. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] -- arch/powerpc/platforms/pseries/phyp_dump.c | 102 +++-- 1 file changed, 96 insertions(+), 6 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-18 07:37:33.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-18 22:43:00.0 -0600 @@ -12,17 +12,24 @@ */ #include linux/init.h +#include linux/kobject.h #include linux/mm.h +#include linux/of.h #include linux/pfn.h #include linux/swap.h +#include linux/sysfs.h #include asm/page.h #include asm/phyp_dump.h +#include asm/rtas.h /* Global, used to communicate data between early boot and late boot */ static struct phyp_dump phyp_dump_global; struct phyp_dump *phyp_dump_info = phyp_dump_global; +static int ibm_configure_kernel_dump; + +/* - */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -52,20 +59,103 @@ release_memory_range(unsigned long start } } -static int __init phyp_dump_setup(void) +/* - */ +/** + * sysfs_release_region -- sysfs interface to release memory range. + * + * Usage: + * echo start addr length /sys/kernel/release_region + * + * Example: + * echo 0x4000 0x1000 /sys/kernel/release_region + * + * will release 256MB starting at 1GB. + */ +static ssize_t +store_release_region(struct kset *kset, const char *buf, size_t count) { + unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; + ssize_t ret; - /* If no memory was reserved in early boot, there is nothing to do */ - if (phyp_dump_info-init_reserve_size == 0) - return 0; + ret = sscanf(buf, %lx %lx, start_addr, length); + if (ret != 2) + return -EINVAL; + + /* Range-check - don't free any reserved memory that +* wasn't reserved for phyp-dump */ + if (start_addr phyp_dump_info-init_reserve_start) + start_addr = phyp_dump_info-init_reserve_start; + + end_addr = phyp_dump_info-init_reserve_start + + phyp_dump_info-init_reserve_size; + if (start_addr+length end_addr) + length = end_addr - start_addr; + + /* Release the region of memory assed in by user */ + start_pfn = PFN_DOWN(start_addr); + nr_pages = PFN_DOWN(length); + release_memory_range (start_pfn, nr_pages); - /* Release memory that was reserved in early boot */ + return count; +} + +static ssize_t +show_release_region(struct kset * kset, char *buf) +{ + return sprintf(buf, ola\n); +} + +static struct subsys_attribute rr = __ATTR(release_region, 0600, +show_release_region, +store_release_region); + +/* - */ + +static void release_all (void) +{ + unsigned long start_pfn, nr_pages; + + /* Release all memory that was reserved in early boot */ start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); release_memory_range(start_pfn, nr_pages); +} + +static int __init phyp_dump_setup(void) +{ + struct device_node *rtas; + const int *dump_header; + int header_len = 0; + int rc; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info-init_reserve_size == 0) + return 0; + + /* Return if phyp dump not supported */ + ibm_configure_kernel_dump = rtas_token(ibm,configure-kernel-dump); + if (ibm_configure_kernel_dump == RTAS_UNKNOWN_SERVICE) { + release_all(); + return -ENOSYS; + } + + /* Is there dump data waiting for us? */ + rtas = of_find_node_by_path(/rtas); + dump_header = of_get_property(rtas, ibm,kernel-dump, header_len); + if (dump_header == NULL) { + release_all(); + return 0; + } + + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ + rc = subsys_create_file(kernel_subsys, rr); + if (rc) { + printk (KERN_ERR phyp-dump: unable to create sysfs file (%d)\n, rc); +
Re: [PATCH 2/8] pseries: phyp dump: reserve-release proof-of-concept
Reposted this one. I got the email id wrong in this one. Sorry about that. Manish ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 5/8] pseries: phyp dump: debugging print routines.
Provide some basic debugging support. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] - arch/powerpc/platforms/pseries/phyp_dump.c | 64 +++-- 1 file changed, 60 insertions(+), 4 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-21 02:51:54.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-21 02:58:41.0 -0600 @@ -2,7 +2,7 @@ * Hypervisor-assisted dump * * Linas Vepstas, Manish Ahuja 2007 - * Copyrhgit (c) 2007 IBM Corp. + * Copyright (c) 2007 IBM Corp. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -122,6 +122,61 @@ static unsigned long init_dump_header(st return addr_offset; } +static void print_dump_header(const struct phyp_dump_header *ph) +{ +#ifdef DEBUG + printk(KERN_INFO dump header:\n); + /* setup some ph-sections required */ + printk(KERN_INFO version = %d\n, ph-version); + printk(KERN_INFO Sections = %d\n, ph-num_of_sections); + printk(KERN_INFO Status = 0x%x\n, ph-status); + + /* No ph-disk, so all should be set to 0 */ + printk(KERN_INFO Offset to first section 0x%x\n, + ph-first_offset_section); + printk(KERN_INFO dump disk sections should be zero\n); + printk(KERN_INFO dump disk section = %d\n, ph-dump_disk_section); + printk(KERN_INFO block num = %ld\n, ph-block_num_dd); + printk(KERN_INFO number of blocks = %ld\n, ph-num_of_blocks_dd); + printk(KERN_INFO dump disk offset = %d\n, ph-offset_dd); + printk(KERN_INFO Max auto time= %d\n, ph-maxtime_to_auto); + + /*set cpu state and hpte states as well scratch pad area */ + printk(KERN_INFO CPU AREA \n); + printk(KERN_INFO cpu dump_flags =%d\n, ph-cpu_data.dump_flags); + printk(KERN_INFO cpu source_type =%d\n, ph-cpu_data.source_type); + printk(KERN_INFO cpu error_flags =%d\n, ph-cpu_data.error_flags); + printk(KERN_INFO cpu source_address =%lx\n, + ph-cpu_data.source_address); + printk(KERN_INFO cpu source_length =%lx\n, + ph-cpu_data.source_length); + printk(KERN_INFO cpu length_copied =%lx\n, + ph-cpu_data.length_copied); + + printk(KERN_INFO HPTE AREA \n); + printk(KERN_INFO HPTE dump_flags =%d\n, ph-hpte_data.dump_flags); + printk(KERN_INFO HPTE source_type =%d\n, ph-hpte_data.source_type); + printk(KERN_INFO HPTE error_flags =%d\n, ph-hpte_data.error_flags); + printk(KERN_INFO HPTE source_address =%lx\n, + ph-hpte_data.source_address); + printk(KERN_INFO HPTE source_length =%lx\n, + ph-hpte_data.source_length); + printk(KERN_INFO HPTE length_copied =%lx\n, + ph-hpte_data.length_copied); + + printk(KERN_INFO SRSD AREA \n); + printk(KERN_INFO SRSD dump_flags =%d\n, ph-kernel_data.dump_flags); + printk(KERN_INFO SRSD source_type =%d\n, ph-kernel_data.source_type); + printk(KERN_INFO SRSD error_flags =%d\n, ph-kernel_data.error_flags); + printk(KERN_INFO SRSD source_address =%lx\n, + ph-kernel_data.source_address); + printk(KERN_INFO SRSD source_length =%lx\n, + ph-kernel_data.source_length); + printk(KERN_INFO SRSD length_copied =%lx\n, + ph-kernel_data.length_copied); +#endif +} + static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; @@ -134,9 +189,9 @@ static void register_dump_area(struct ph 1, ph, sizeof(struct phyp_dump_header)); } while (rtas_busy_delay(rc)); - if (rc) - { - printk (KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); + if (rc) { + printk(KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); + print_dump_header(ph); } } @@ -249,6 +304,7 @@ static int __init phyp_dump_setup(void) release_all(); return -ENOSYS; } + print_dump_header(dump_header); /* Is there dump data waiting for us? If there isn't, * then register a new dump area, and release all of ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 6/8] pseries: phyp dump: Unregister and print dump areas.
Routines to invalidate and unregister dump routines. Unregister has not been used yet, I will release another patch for that at a later stage with the kdump integration patches. There is also a routine which calculates the regions to be freed and exports that through sysfs. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] - --- arch/powerpc/platforms/pseries/phyp_dump.c | 101 + include/asm/phyp_dump.h|3 2 files changed, 93 insertions(+), 11 deletions(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-21 23:06:20.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-21 23:49:10.0 -0600 @@ -69,6 +69,10 @@ static struct phyp_dump_header phdr; #define DUMP_SOURCE_CPU 0x0001 #define DUMP_SOURCE_HPTE 0x0002 #define DUMP_SOURCE_RMO 0x0011 +#define DUMP_ERROR_FLAG 0x2000 +#define DUMP_TRIGGERED 0x4000 +#define DUMP_PERFORMED 0x8000 + /** * init_dump_header() - initialize the header declaring a dump @@ -180,9 +184,15 @@ static void print_dump_header(const stru static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; - ph-cpu_data.destination_address += addr; - ph-hpte_data.destination_address += addr; - ph-kernel_data.destination_address += addr; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + /* ToDo Invalidate kdump and free memory range. */ do { rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, @@ -195,6 +205,46 @@ static void register_dump_area(struct ph } } +static +void invalidate_last_dump(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + + /* Add addr value if not initialized before */ + if (ph-cpu_data.destination_address == 0) { + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + } + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 2, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) { + printk (KERN_ERR phyp-dump: unexpected error (%d) + on invalidate\n, rc); + print_dump_header(ph); + } +} + +static void unregister_dump_area(struct phyp_dump_header *ph) +{ + int rc; + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 3, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) { + printk (KERN_ERR phyp-dump: unexpected error (%d) + on unregister\n, rc); + print_dump_header(ph); + } +} + /* - */ /** * release_memory_range -- release memory previously lmb_reserved @@ -205,8 +255,8 @@ static void register_dump_area(struct ph * lmb_reserved in early boot. The released memory becomes * available for genreal use. */ -static void -release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +static +void release_memory_range(unsigned long start_pfn, unsigned long nr_pages) { struct page *rpage; unsigned long end_pfn; @@ -237,8 +287,8 @@ release_memory_range(unsigned long start * * will release 256MB starting at 1GB. */ -static ssize_t -store_release_region(struct kset *kset, const char *buf, size_t count) +static +ssize_t store_release_region(struct kset *kset, const char *buf, size_t count) { unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; @@ -266,10 +316,23 @@ store_release_region(struct kset *kset, return count; } -static ssize_t -show_release_region(struct kset * kset, char *buf) +static ssize_t show_release_region(struct kset * kset, char *buf) { - return sprintf(buf, ola\n); + u64 second_addr_range; + + /* total reserved size - start of scratch area */ + second_addr_range = phyp_dump_info-init_reserve_size - + phyp_dump_info-reserved_scratch_size; + return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx: +DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n, + phdr.cpu_data.destination_address, + phdr.cpu_data.length_copied, + phdr.hpte_data.destination_address
[PATCH 7/8] pseries: phyp dump: Tracking memory range freed.
This patch tracks the size freed. For now it does a simple rudimentary calculation of the ranges freed. The idea is to keep it simple at the external shell script level and send in large chunks for now. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] - --- arch/powerpc/platforms/pseries/phyp_dump.c | 35 + 1 file changed, 35 insertions(+) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-21 23:30:18.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-21 23:42:04.0 -0600 @@ -275,6 +275,39 @@ void release_memory_range(unsigned long } } +/** + * track_freed_range -- Counts the range being freed. + * Once the counter goes to zero, it re-registers dump for + * future use. + */ +static void +track_freed_range(unsigned long addr, unsigned long length) +{ + static unsigned long scratch_area_size, reserved_area_size; + + if (addr phyp_dump_info-init_reserve_start) + return; + + if ((addr = phyp_dump_info-init_reserve_start) + (addr = phyp_dump_info-init_reserve_start + +phyp_dump_info-init_reserve_size)) + reserved_area_size += length; + + if ((addr = phyp_dump_info-reserved_scratch_addr) + (addr = phyp_dump_info-reserved_scratch_addr + +phyp_dump_info-reserved_scratch_size)) + scratch_area_size += length; + + if ((reserved_area_size == phyp_dump_info-init_reserve_size) + (scratch_area_size == phyp_dump_info-reserved_scratch_size)) { + + invalidate_last_dump(phdr, + phyp_dump_info-reserved_scratch_addr); + register_dump_area (phdr, + phyp_dump_info-reserved_scratch_addr); + } +} + /* - */ /** * sysfs_release_region -- sysfs interface to release memory range. @@ -298,6 +331,8 @@ ssize_t store_release_region(struct kset if (ret != 2) return -EINVAL; + track_freed_range(start_addr, length); + /* Range-check - don't free any reserved memory that * wasn't reserved for phyp-dump */ if (start_addr phyp_dump_info-init_reserve_start) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 8/8] pseries: phyp dump: config file
To: linuxppc-dev@ozlabs.org Add hypervisor-assisted dump to kernel config Signed-off-by: Linas Vepstas [EMAIL PROTECTED] - arch/powerpc/Kconfig | 11 +++ 1 file changed, 11 insertions(+) Index: linux-2.6.24-rc2-git4/arch/powerpc/Kconfig === --- linux-2.6.24-rc2-git4.orig/arch/powerpc/Kconfig 2007-11-14 16:39:20.0 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/Kconfig 2007-11-15 14:27:33.0 -0600 @@ -261,6 +261,17 @@ config CRASH_DUMP Don't change this unless you know what you are doing. +config PHYP_DUMP + bool Hypervisor-assisted dump (EXPERIMENTAL) + depends on PPC_PSERIES EXPERIMENTAL + default y + help + Hypervisor-assisted dump is meant to be a kdump replacement + offering robustness and speed not possible without system + hypervisor assistence. + + If unsure, say Y + config PPCBUG_NVRAM bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC default y if PPC_PREP ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/8] pseries: phyp dump: Docmentation
I used the word actually. I already know that it is intended to be faster. :) it should blow it away, as, after all, it requires one less reboot! There's more than rebooting going on during system dump processing. Depending on the system type, booting may not be where most time is spent. As a side effect, the system is in production *while* the dump is being taken; A dubious feature IMO. Seems that the design potentially trades reliability of first failure data capture for availability. E.g. system crashes, reboots, resumes processing while copying dump, crashes again before dump procedure is complete. How is that handled, if at all? This is a simple version. The intent was not to have a complex dump taking mechanism in version 1. Subsequent versions will see planned improvement on the way the pages are tracked and freed. Also it is very easily possible now, to register for another dump as soon as the scratch area is copied to a user designated region. But for now this simple implementation exists. It is also possible to extend this further to only preserve pages that are kernel pages and free the non required pages like user/data pages etc. This would reduce the space preserved and would prevent any issues that are caused by reserving everything in memory except for the first 256 MB. Improvements and future versions are planned to make this efficient. But for now the intent is to get this off the ground and handle simple cases. with kdump, you can't go into production until after the dump is finished, and the system has been rebooted a second time. On systems with terabytes of RAM, the time difference can be hours. The difference in time it takes to resume the normal workload may be significant, yes. But the time it takes to get a usable dump image would seem to be the basically the same. Since you bring up large systems... a system with terabytes of RAM is practically guaranteed to be a NUMA configuration with dozens of cpus. When processing a dump on such a system, I wonder how well we fare: can we successfully boot with (say) 128 cpus and 256MB of usable memory? Do we have to hot-online nodes as system memory is freed up (and does that even work)? We need to be able to restore the system to its optimal topology when the dump is finished; if the best we can do is a degraded configuration, the workload will suffer and the system admin is likely to just reboot the machine again so the kernel will have the right NUMA topology. +Implementation details: +-- +In order for this scheme to work, memory needs to be reserved +quite early in the boot cycle. However, access to the device +tree this early in the boot cycle is difficult, and device-tree +access is needed to determine if there is a crash data waiting. I don't think this bit about early device tree access is correct. By the time your code is reserving memory (from early_init_devtree(), I think), RTAS has been instantiated and you are able to test for the existence of /rtas/ibm,dump-kernel. If I remember right, it was still too early to look up this token directly, so we wrote some code to crawl the flat device tree to find it. But not only was that a lot of work, but I somehow decided that doing this to the flat tree was wrong, as otherwise someone would surely have written the access code. If this can be made to work, that would be great, but we couldn't make it work at the time. +To work around this problem, all but 256MB of RAM is reserved +during early boot. A short while later in boot, a check is made +to determine if there is dump data waiting. If there isn't, +then the reserved memory is released to general kernel use. So I think these gymnastics are unneeded -- unless I'm misunderstanding something, you should be able to determine very early whether to reserve that memory. Only if you can get at rtas, but you can't get at rtas at that point. Sorry, but I think you are mistaken (see Michael's earlier reply). ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 1/8] pseries: phyp dump: Docmentation
It's in production with 256MB of RAM? Err. Sure as the dump progresses more RAM will be freed, but that's hardly production. I think Nathan's right, any sysadmin who wants predictability will probably double reboot anyway. Thats a changeable parameter. Its something we chose for now. It by no means is set in stone. Its not a design parameter. If you like to allocate 1GB we can. But that is something we did for now. we expect this to be a variable value dependent upon the size of the system. So if you have 128 GB system and you can spare 10 gb, you should be able to have 10 GB to boot with. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 7/8] pseries: phyp dump: Unregister and print dump areas.
Stephen, +/* Add addr value if not initialized before */ +if (ph-cpu_data.destination_address == 0) { +ph-cpu_data.destination_address += addr; Could be just '=' like further down, right? Actually the one below should be += as well. Thanks for catching it. +/* total reserved size - start of scratch area */ +second_addr_range = phdr.cpu_data.destination_address - +phyp_dump_info-init_reserve_size; +return sprintf(buf, CPU:0x%lx-0x%lx: HPTE:0x%lx-0x%lx: + DUMP:0x%lx-0x%lx, 0x%lx-0x%lx:\n, + phdr.cpu_data.destination_address, phdr.cpu_data.length_copied, + phdr.hpte_data.destination_address, phdr.hpte_data.length_copied, + phdr.kernel_data.destination_address, phdr.kernel_data.length_copied, + phyp_dump_info-init_reserve_start, second_addr_range); This indentation should be (probably) two tabs. I kept it one with a few spaces as otherwise it was exceeding 80, I guess, I can just have one per line and that should take care of that. +/* re-register the dump area, if old dump was invalid */ +if ((dump_header) (dump_header-status DUMP_ERROR_FLAG)) { ^ ^ Extra parentheses. Just for clarity.. I would prefer that, if thats okay. +invalidate_last_dump (phdr, dump_area_start); +register_dump_area (phdr, dump_area_start); No spaces after function names. Yeah, will take that out from here and other files as well. Thanks, Manish ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 0/8] pseries: phyp dump: hypervisor-assisted dump
The following series of patches implement a basic framework for hypervisor-assisted dump. The very first patch provides documentation explaining what this is :-) . Yes, its supposed to be an improvement over kdump. The patches mostly work; a list of open issues / todo list is included in the documentation. It also appears that the not-yet-released firmware versions this was tested on are still, ahem, incomplete; this work is also pending. -- Linas Manish ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 1/8] pseries: phyp dump: Docmentation
Basic documentation for hypervisor-assisted dump. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Documentation/powerpc/phyp-assisted-dump.txt | 129 +++ 1 file changed, 129 insertions(+) Index: 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt === --- /dev/null 1970-01-01 00:00:00.0 + +++ 2.6.24-rc5/Documentation/powerpc/phyp-assisted-dump.txt 2008-01-07 18:05:46.0 -0600 @@ -0,0 +1,129 @@ + + Hypervisor-Assisted Dump + + November 2007 + +The goal of hypervisor-assisted dump is to enable the dump of +a crashed system, and to do so from a fully-reset system, and +to minimize the total elapsed time until the system is back +in production use. + +As compared to kdump or other strategies, hypervisor-assisted +dump offers several strong, practical advantages: + +-- Unlike kdump, the system has been reset, and loaded + with a fresh copy of the kernel. In particular, + PCI and I/O devices have been reinitialized and are + in a clean, consistent state. +-- As the dump is performed, the dumped memory becomes + immediately available to the system for normal use. +-- After the dump is completed, no further reboots are + required; the system will be fully usable, and running + in it's normal, production mode on it normal kernel. + +The above can only be accomplished by coordination with, +and assistance from the hypervisor. The procedure is +as follows: + +-- When a system crashes, the hypervisor will save + the low 256MB of RAM to a previously registered + save region. It will also save system state, system + registers, and hardware PTE's. + +-- After the low 256MB area has been saved, the + hypervisor will reset PCI and other hardware state. + It will *not* clear RAM. It will then launch the + bootloader, as normal. + +-- The freshly booted kernel will notice that there + is a new node (ibm,dump-kernel) in the device tree, + indicating that there is crash data available from + a previous boot. It will boot into only 256MB of RAM, + reserving the rest of system memory. + +-- Userspace tools will parse /sys/kernel/release_region + and read /proc/vmcore to obtain the contents of memory, + which holds the previous crashed kernel. The userspace + tools may copy this info to disk, or network, nas, san, + iscsi, etc. as desired. + + For Example: the values in /sys/kernel/release-region + would look something like this (address-range pairs). + CPU:0x177fee000-0x1: HPTE:0x177ffe020-0x1000: / + DUMP:0x177fff020-0x1000, 0x1000-0x16F1D370A + +-- As the userspace tools complete saving a portion of + dump, they echo an offset and size to + /sys/kernel/release_region to release the reserved + memory back to general use. + + An example of this is: + echo 0x4000 0x1000 /sys/kernel/release_region + which will release 256MB at the 1GB boundary. + +Please note that the hypervisor-assisted dump feature +is only available on Power6-based systems with recent +firmware versions. + +Implementation details: +-- +In order for this scheme to work, memory needs to be reserved +quite early in the boot cycle. However, access to the device +tree this early in the boot cycle is difficult, and device-tree +access is needed to determine if there is a crash data waiting. +To work around this problem, all but 256MB of RAM is reserved +during early boot. A short while later in boot, a check is made +to determine if there is dump data waiting. If there isn't, +then the reserved memory is released to general kernel use. +If there is dump data, then the /sys/kernel/release_region +file is created, and the reserved memory is held. + +If there is no waiting dump data, then all but 256MB of the +reserved ram will be released for general kernel use. The +highest 256 MB of RAM will *not* be released: this region +will be kept permanently reserved, so that it can act as +a receptacle for a copy of the low 256MB in the case a crash +does occur. See, however, open issues below, as to whether +such a reserved region is really needed. + +Currently the dump will be copied from /proc/vmcore to a +a new file upon user intervention. The starting address +to be read and the range for each data point in provided +in /sys/kernel/release_region. + +The tools to examine the dump will be same as the ones +used for kdump. + + +General notes: +-- +Security: please note that there are potential security issues +with any sort of dump mechanism. In particular, plaintext +(unencrypted) data, and possibly passwords, may be present in +the dump data. Userspace tools must take adequate precautions to +preserve security. + +Open issues/ToDo: + + o The various code paths that tell the hypervisor
[PATCH 2/8] pseries: phyp dump: config file
Add hypervisor-assisted dump to kernel config Signed-off-by: Linas Vepstas [EMAIL PROTECTED] - arch/powerpc/Kconfig | 11 +++ 1 file changed, 11 insertions(+) Index: linux-2.6.24-rc2-git4/arch/powerpc/Kconfig === --- linux-2.6.24-rc2-git4.orig/arch/powerpc/Kconfig 2007-11-14 16:39:20.0 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/Kconfig 2007-11-15 14:27:33.0 -0600 @@ -261,6 +261,17 @@ config CRASH_DUMP Don't change this unless you know what you are doing. +config PHYP_DUMP + bool Hypervisor-assisted dump (EXPERIMENTAL) + depends on PPC_PSERIES EXPERIMENTAL + default y + help + Hypervisor-assisted dump is meant to be a kdump replacement + offering robustness and speed not possible without system + hypervisor assistence. + + If unsure, say Y + config PPCBUG_NVRAM bool Enable reading PPCBUG NVRAM during boot if PPLUS || LOPEC default y if PPC_PREP ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept
Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/kernel/prom.c | 33 + arch/powerpc/platforms/pseries/Makefile|1 arch/powerpc/platforms/pseries/phyp_dump.c | 71 + include/asm-powerpc/phyp_dump.h| 32 + 4 files changed, 137 insertions(+) Index: linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h 2007-11-19 17:44:21.0 -0600 @@ -0,0 +1,32 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyright (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _PPC64_PHYP_DUMP_H +#define _PPC64_PHYP_DUMP_H + +#ifdef CONFIG_PHYP_DUMP + +/* The RMR region will be saved for later dumping + * whenever the kernel crashes. Set this to 256MB. */ +#define PHYP_DUMP_RMR_START 0x0 +#define PHYP_DUMP_RMR_END (1UL28) + +struct phyp_dump { + /* Memory that is reserved during very early boot. */ + unsigned long init_reserve_start; + unsigned long init_reserve_size; +}; + +extern struct phyp_dump *phyp_dump_info; + +#endif /* CONFIG_PHYP_DUMP */ +#endif /* _PPC64_PHYP_DUMP_H */ Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-19 19:07:49.0 -0600 @@ -0,0 +1,71 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyrhgit (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ + +#include linux/init.h +#include linux/mm.h +#include linux/pfn.h +#include linux/swap.h + +#include asm/page.h +#include asm/phyp_dump.h + +/* Global, used to communicate data between early boot and late boot */ +static struct phyp_dump phyp_dump_global; +struct phyp_dump *phyp_dump_info = phyp_dump_global; + +/** + * release_memory_range -- release memory previously lmb_reserved + * @start_pfn: starting physical frame number + * @nr_pages: number of pages to free. + * + * This routine will release memory that had been previously + * lmb_reserved in early boot. The released memory becomes + * available for genreal use. + */ +static void +release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +{ + struct page *rpage; + unsigned long end_pfn; + long i; + + end_pfn = start_pfn + nr_pages; + + for (i=start_pfn; i = end_pfn; i++) { + rpage = pfn_to_page(i); + if (PageReserved(rpage)) { + ClearPageReserved(rpage); + init_page_count(rpage); + __free_page(rpage); + totalram_pages++; + } + } +} + +static int __init phyp_dump_setup(void) +{ + unsigned long start_pfn, nr_pages; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info-init_reserve_size == 0) + return 0; + + /* Release memory that was reserved in early boot */ + start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); + nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); + release_memory_range(start_pfn, nr_pages); + + return 0; +} + +subsys_initcall(phyp_dump_setup); Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile === --- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:43:52.0 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:44:21.0 -0600 @@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_HVCS) += hvcserver.o obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o +obj-$(CONFIG_PHYP_DUMP)+= phyp_dump.o Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c
[PATCH 4/8] pseries: phyp dump: use sysfs to release reserved mem
Check to see if there actually is data from a previously crashed kernel waiting. If so, Allow user-sapce tools to grab the data (by reading /proc/kcore). When user-space finishes dumping a section, it must release that memory by writing to sysfs. For example, echo 0x4000 0x1000 /sys/kernel/release_region will release 256MB starting at the 1GB. The released memory becomes free for general use. Signed-off-by: Linas Vepstas [EMAIL PROTECTED] -- arch/powerpc/platforms/pseries/phyp_dump.c | 101 +++-- 1 file changed, 96 insertions(+), 5 deletions(-) Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c === --- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:15:05.0 -0600 +++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 13:24:30.0 -0600 @@ -12,17 +12,24 @@ */ #include linux/init.h +#include linux/kobject.h #include linux/mm.h +#include linux/of.h #include linux/pfn.h #include linux/swap.h +#include linux/sysfs.h #include asm/page.h #include asm/phyp_dump.h +#include asm/rtas.h /* Global, used to communicate data between early boot and late boot */ static struct phyp_dump phyp_dump_global; struct phyp_dump *phyp_dump_info = phyp_dump_global; +static int ibm_configure_kernel_dump; + +/* - */ /** * release_memory_range -- release memory previously lmb_reserved * @start_pfn: starting physical frame number @@ -52,18 +59,102 @@ release_memory_range(unsigned long start } } -static int __init phyp_dump_setup(void) +/* - */ +/** + * sysfs_release_region -- sysfs interface to release memory range. + * + * Usage: + * echo start addr length /sys/kernel/release_region + * + * Example: + * echo 0x4000 0x1000 /sys/kernel/release_region + * + * will release 256MB starting at 1GB. + */ +static ssize_t +store_release_region(struct kset *kset, const char *buf, size_t count) { + unsigned long start_addr, length, end_addr; unsigned long start_pfn, nr_pages; + ssize_t ret; - /* If no memory was reserved in early boot, there is nothing to do */ - if (phyp_dump_info-init_reserve_size == 0) - return 0; + ret = sscanf(buf, %lx %lx, start_addr, length); + if (ret != 2) + return -EINVAL; + + /* Range-check - don't free any reserved memory that +* wasn't reserved for phyp-dump */ + if (start_addr phyp_dump_info-init_reserve_start) + start_addr = phyp_dump_info-init_reserve_start; + + end_addr = phyp_dump_info-init_reserve_start + + phyp_dump_info-init_reserve_size; + if (start_addr+length end_addr) + length = end_addr - start_addr; + + /* Release the region of memory assed in by user */ + start_pfn = PFN_DOWN(start_addr); + nr_pages = PFN_DOWN(length); + release_memory_range (start_pfn, nr_pages); + + return count; +} + +static ssize_t +show_release_region(struct kset * kset, char *buf) +{ + return sprintf(buf, ola\n); +} + +static struct subsys_attribute rr = __ATTR(release_region, 0600, +show_release_region, +store_release_region); + +/* - */ + +static void release_all (void) +{ + unsigned long start_pfn, nr_pages; - /* Release memory that was reserved in early boot */ + /* Release all memory that was reserved in early boot */ start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); release_memory_range(start_pfn, nr_pages); +} + +static int __init phyp_dump_setup(void) +{ + struct device_node *rtas; + const int *dump_header; + int header_len = 0; + int rc; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info-init_reserve_size == 0) + return 0; + + /* Return if phyp dump not supported */ + ibm_configure_kernel_dump = rtas_token(ibm,configure-kernel-dump); + if (ibm_configure_kernel_dump == RTAS_UNKNOWN_SERVICE) { + release_all(); + return -ENOSYS; + } + + /* Is there dump data waiting for us? */ + rtas = of_find_node_by_path(/rtas); + dump_header = of_get_property(rtas, ibm,kernel-dump, header_len); + if (dump_header == NULL) { + release_all(); + return 0; + } + + /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ + rc = subsys_create_file(kernel_subsys, rr); + if (rc) { + printk (KERN_ERR phyp-dump: unable to create
[PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept
Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] arch/powerpc/kernel/prom.c | 33 + arch/powerpc/platforms/pseries/Makefile|1 arch/powerpc/platforms/pseries/phyp_dump.c | 71 + include/asm-powerpc/phyp_dump.h| 32 + 4 files changed, 137 insertions(+) Index: linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.24-rc2-git4/include/asm-powerpc/phyp_dump.h 2007-11-19 17:44:21.0 -0600 @@ -0,0 +1,32 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyright (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#ifndef _PPC64_PHYP_DUMP_H +#define _PPC64_PHYP_DUMP_H + +#ifdef CONFIG_PHYP_DUMP + +/* The RMR region will be saved for later dumping + * whenever the kernel crashes. Set this to 256MB. */ +#define PHYP_DUMP_RMR_START 0x0 +#define PHYP_DUMP_RMR_END (1UL28) + +struct phyp_dump { + /* Memory that is reserved during very early boot. */ + unsigned long init_reserve_start; + unsigned long init_reserve_size; +}; + +extern struct phyp_dump *phyp_dump_info; + +#endif /* CONFIG_PHYP_DUMP */ +#endif /* _PPC64_PHYP_DUMP_H */ Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-19 19:07:49.0 -0600 @@ -0,0 +1,71 @@ +/* + * Hypervisor-assisted dump + * + * Linas Vepstas, Manish Ahuja 2007 + * Copyrhgit (c) 2007 IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + */ + +#include linux/init.h +#include linux/mm.h +#include linux/pfn.h +#include linux/swap.h + +#include asm/page.h +#include asm/phyp_dump.h + +/* Global, used to communicate data between early boot and late boot */ +static struct phyp_dump phyp_dump_global; +struct phyp_dump *phyp_dump_info = phyp_dump_global; + +/** + * release_memory_range -- release memory previously lmb_reserved + * @start_pfn: starting physical frame number + * @nr_pages: number of pages to free. + * + * This routine will release memory that had been previously + * lmb_reserved in early boot. The released memory becomes + * available for genreal use. + */ +static void +release_memory_range(unsigned long start_pfn, unsigned long nr_pages) +{ + struct page *rpage; + unsigned long end_pfn; + long i; + + end_pfn = start_pfn + nr_pages; + + for (i=start_pfn; i = end_pfn; i++) { + rpage = pfn_to_page(i); + if (PageReserved(rpage)) { + ClearPageReserved(rpage); + init_page_count(rpage); + __free_page(rpage); + totalram_pages++; + } + } +} + +static int __init phyp_dump_setup(void) +{ + unsigned long start_pfn, nr_pages; + + /* If no memory was reserved in early boot, there is nothing to do */ + if (phyp_dump_info-init_reserve_size == 0) + return 0; + + /* Release memory that was reserved in early boot */ + start_pfn = PFN_DOWN(phyp_dump_info-init_reserve_start); + nr_pages = PFN_DOWN(phyp_dump_info-init_reserve_size); + release_memory_range(start_pfn, nr_pages); + + return 0; +} + +subsys_initcall(phyp_dump_setup); Index: linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile === --- linux-2.6.24-rc2-git4.orig/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:43:52.0 -0600 +++ linux-2.6.24-rc2-git4/arch/powerpc/platforms/pseries/Makefile 2007-11-19 17:44:21.0 -0600 @@ -18,3 +18,4 @@ obj-$(CONFIG_HOTPLUG_CPU) += hotplug-cpu obj-$(CONFIG_HVC_CONSOLE) += hvconsole.o obj-$(CONFIG_HVCS) += hvcserver.o obj-$(CONFIG_HCALL_STATS) += hvCall_inst.o +obj-$(CONFIG_PHYP_DUMP)+= phyp_dump.o Index: linux-2.6.24-rc2-git4/arch/powerpc/kernel/prom.c
[PATCH 5/8] pseries: phyp dump: register dump area.
Set up the actual dump header, register it with the hypervisor. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] -- arch/powerpc/platforms/pseries/phyp_dump.c | 169 +++-- 1 file changed, 163 insertions(+), 6 deletions(-) Index: linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c === --- linux-2.6.24-rc3-git1.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 15:55:37.0 -0600 +++ linux-2.6.24-rc3-git1/arch/powerpc/platforms/pseries/phyp_dump.c 2007-11-21 16:06:52.0 -0600 @@ -30,6 +30,134 @@ struct phyp_dump *phyp_dump_info = phyp static int ibm_configure_kernel_dump; /* - */ +/* RTAS interfaces to declare the dump regions */ + +struct dump_section { + u32 dump_flags; + u16 source_type; + u16 error_flags; + u64 source_address; + u64 source_length; + u64 length_copied; + u64 destination_address; +}; + +struct phyp_dump_header { + u32 version; + u16 num_of_sections; + u16 status; + + u32 first_offset_section; + u32 dump_disk_section; + u64 block_num_dd; + u64 num_of_blocks_dd; + u32 offset_dd; + u32 maxtime_to_auto; + /* No dump disk path string used */ + + struct dump_section cpu_data; + struct dump_section hpte_data; + struct dump_section kernel_data; +}; + +/* The dump header *must be* in low memory, so .bss it */ +static struct phyp_dump_header phdr; + +#define NUM_DUMP_SECTIONS 3 +#define DUMP_HEADER_VERSION 0x1 +#define DUMP_REQUEST_FLAG 0x1 +#define DUMP_SOURCE_CPU 0x0001 +#define DUMP_SOURCE_HPTE 0x0002 +#define DUMP_SOURCE_RMO 0x0011 + +/** + * init_dump_header() - initialize the header declaring a dump + * Returns: length of dump save area. + * + * When the hypervisor saves crashed state, it needs to put + * it somewhere. The dump header tells the hypervisor where + * the data can be saved. + */ +static unsigned long init_dump_header(struct phyp_dump_header *ph) +{ + struct device_node *rtas; + const unsigned int *sizes; + int len; + unsigned long cpu_state_size = 0; + unsigned long hpte_region_size = 0; + unsigned long addr_offset = 0; + + /* Get the required dump region sizes */ + rtas = of_find_node_by_path(/rtas); + sizes = of_get_property(rtas, ibm,configure-kernel-dump-sizes, len); + if (!sizes || len 20) + return 0; + + if (sizes[0] == 1) + cpu_state_size = *((unsigned long *) sizes[1]); + + if (sizes[3] == 2) + hpte_region_size = *((unsigned long *) sizes[4]); + + /* Set up the dump header */ + ph-version = DUMP_HEADER_VERSION; + ph-num_of_sections = NUM_DUMP_SECTIONS; + ph-status = 0; + + ph-first_offset_section = + (u32) (((struct phyp_dump_header *) 0)-cpu_data); + ph-dump_disk_section = 0; + ph-block_num_dd = 0; + ph-num_of_blocks_dd = 0; + ph-offset_dd = 0; + + ph-maxtime_to_auto = 0; /* disabled */ + + /* The first two sections are mandatory */ + ph-cpu_data.dump_flags = DUMP_REQUEST_FLAG; + ph-cpu_data.source_type = DUMP_SOURCE_CPU; + ph-cpu_data.source_address = 0; + ph-cpu_data.source_length = cpu_state_size; + ph-cpu_data.destination_address = addr_offset; + addr_offset += cpu_state_size; + + ph-hpte_data.dump_flags = DUMP_REQUEST_FLAG; + ph-hpte_data.source_type = DUMP_SOURCE_HPTE; + ph-hpte_data.source_address = 0; + ph-hpte_data.source_length = hpte_region_size; + ph-hpte_data.destination_address = addr_offset; + addr_offset += hpte_region_size; + + /* This section describes the low kernel region */ + ph-kernel_data.dump_flags = DUMP_REQUEST_FLAG; + ph-kernel_data.source_type = DUMP_SOURCE_RMO; + ph-kernel_data.source_address = PHYP_DUMP_RMR_START; + ph-kernel_data.source_length = PHYP_DUMP_RMR_END; + ph-kernel_data.destination_address = addr_offset; + addr_offset += ph-kernel_data.source_length; + + return addr_offset; +} + +static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) +{ + int rc; + ph-cpu_data.destination_address += addr; + ph-hpte_data.destination_address += addr; + ph-kernel_data.destination_address += addr; + + do { + rc = rtas_call(ibm_configure_kernel_dump, 3, 1, NULL, + 1, ph, sizeof(struct phyp_dump_header)); + } while (rtas_busy_delay(rc)); + + if (rc) + { + printk (KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); + } +} + +/* - */ /** * release_memory_range -- release memory previously
[PATCH 6/8] pseries: phyp dump: debugging print routines.
Provide some basic debugging support. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepsts [EMAIL PROTECTED] - arch/powerpc/platforms/pseries/phyp_dump.c | 53 - 1 file changed, 52 insertions(+), 1 deletion(-) Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-01 23:24:10.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-01 23:24:27.0 -0600 @@ -2,7 +2,7 @@ * Hypervisor-assisted dump * * Linas Vepstas, Manish Ahuja 2007 - * Copyrhgit (c) 2007 IBM Corp. + * Copyright (c) 2007 IBM Corp. * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License @@ -139,6 +139,51 @@ static unsigned long init_dump_header(st return addr_offset; } +#ifdef DEBUG +static void print_dump_header(const struct phyp_dump_header *ph) +{ + printk(KERN_INFO dump header:\n); + /* setup some ph-sections required */ + printk(KERN_INFO version = %d\n, ph-version); + printk(KERN_INFO Sections = %d\n, ph-num_of_sections); + printk(KERN_INFO Status = 0x%x\n, ph-status); + + /* No ph-disk, so all should be set to 0 */ + printk(KERN_INFO Offset to first section 0x%x\n, ph-first_offset_section); + printk(KERN_INFO dump disk sections should be zero\n); + printk(KERN_INFO dump disk section = %d\n,ph-dump_disk_section); + printk(KERN_INFO block num = %ld\n,ph-block_num_dd); + printk(KERN_INFO number of blocks = %ld\n,ph-num_of_blocks_dd); + printk(KERN_INFO dump disk offset = %d\n,ph-offset_dd); + printk(KERN_INFO Max auto time= %d\n,ph-maxtime_to_auto); + + /*set cpu state and hpte states as well scratch pad area */ + printk(KERN_INFO CPU AREA \n); + printk(KERN_INFO cpu dump_flags =%d\n,ph-cpu_data.dump_flags); + printk(KERN_INFO cpu source_type =%d\n,ph-cpu_data.source_type); + printk(KERN_INFO cpu error_flags =%d\n,ph-cpu_data.error_flags); + printk(KERN_INFO cpu source_address =%lx\n,ph-cpu_data.source_address); + printk(KERN_INFO cpu source_length =%lx\n,ph-cpu_data.source_length); + printk(KERN_INFO cpu length_copied =%lx\n,ph-cpu_data.length_copied); + + printk(KERN_INFO HPTE AREA \n); + printk(KERN_INFO HPTE dump_flags =%d\n,ph-hpte_data.dump_flags); + printk(KERN_INFO HPTE source_type =%d\n,ph-hpte_data.source_type); + printk(KERN_INFO HPTE error_flags =%d\n,ph-hpte_data.error_flags); + printk(KERN_INFO HPTE source_address =%lx\n,ph-hpte_data.source_address); + printk(KERN_INFO HPTE source_length =%lx\n,ph-hpte_data.source_length); + printk(KERN_INFO HPTE length_copied =%lx\n,ph-hpte_data.length_copied); + + printk(KERN_INFO SRSD AREA \n); + printk(KERN_INFO SRSD dump_flags =%d\n,ph-kernel_data.dump_flags); + printk(KERN_INFO SRSD source_type =%d\n,ph-kernel_data.source_type); + printk(KERN_INFO SRSD error_flags =%d\n,ph-kernel_data.error_flags); + printk(KERN_INFO SRSD source_address =%lx\n,ph-kernel_data.source_address); + printk(KERN_INFO SRSD source_length =%lx\n,ph-kernel_data.source_length); + printk(KERN_INFO SRSD length_copied =%lx\n,ph-kernel_data.length_copied); +} +#endif + static void register_dump_area(struct phyp_dump_header *ph, unsigned long addr) { int rc; @@ -154,6 +199,9 @@ static void register_dump_area(struct ph if (rc) { printk (KERN_ERR phyp-dump: unexpected error (%d) on register\n, rc); +#ifdef DEBUG + print_dump_header (ph); +#endif } } @@ -271,6 +319,9 @@ static int __init phyp_dump_setup(void) release_all(); return -ENOSYS; } +#ifdef DEBUG + print_dump_header (dump_header); +#endif /* Is there dump data waiting for us? If there isn't, * then register a new dump area, and release all of ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 8/8] pseries: phyp dump: Tracking memory range freed.
This patch tracks the size freed. For now it does a simple rudimentary calculation of the ranges freed. The idea is to keep it simple at the external shell script level and send in large chunks for now. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] - --- arch/powerpc/platforms/pseries/phyp_dump.c | 36 + include/asm-powerpc/phyp_dump.h|3 ++ 2 files changed, 39 insertions(+) Index: 2.6.24-rc5/include/asm-powerpc/phyp_dump.h === --- 2.6.24-rc5.orig/include/asm-powerpc/phyp_dump.h 2008-01-07 22:55:28.0 -0600 +++ 2.6.24-rc5/include/asm-powerpc/phyp_dump.h 2008-01-07 22:58:02.0 -0600 @@ -24,6 +24,9 @@ struct phyp_dump { /* Memory that is reserved during very early boot. */ unsigned long init_reserve_start; unsigned long init_reserve_size; + /* Scratch area memory details */ + unsigned long scratch_reserve_start; + unsigned long scratch_reserve_size; }; extern struct phyp_dump *phyp_dump_info; Index: 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c === --- 2.6.24-rc5.orig/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-07 22:57:27.0 -0600 +++ 2.6.24-rc5/arch/powerpc/platforms/pseries/phyp_dump.c 2008-01-07 22:58:02.0 -0600 @@ -287,6 +287,39 @@ release_memory_range(unsigned long start } } +/** + * track_freed_range -- Counts the range being freed. + * Once the counter goes to zero, it re-registers dump for + * future use. + */ +static void +track_freed_range(unsigned long addr, unsigned long length) +{ + static unsigned long scratch_area_size, reserved_area_size; + + if (addr phyp_dump_info-init_reserve_start) + return; + + if ((addr = phyp_dump_info-init_reserve_start) + (addr = phyp_dump_info-init_reserve_start + +phyp_dump_info-init_reserve_size)) + reserved_area_size += length; + + if ((addr = phyp_dump_info-scratch_reserve_start) + (addr = phyp_dump_info-scratch_reserve_start + +phyp_dump_info-scratch_reserve_size)) + scratch_area_size += length; + + if ((reserved_area_size == phyp_dump_info-init_reserve_start) + (scratch_area_size == phyp_dump_info-scratch_reserve_size)) { + + invalidate_last_dump(phdr, + phyp_dump_info-scratch_reserve_start); + register_dump_area (phdr, + phyp_dump_info-scratch_reserve_start); + } +} + /* - */ /** * sysfs_release_region -- sysfs interface to release memory range. @@ -310,6 +343,8 @@ store_release_region(struct kset *kset, if (ret != 2) return -EINVAL; + track_freed_range(start_addr, length); + /* Range-check - don't free any reserved memory that * wasn't reserved for phyp-dump */ if (start_addr phyp_dump_info-init_reserve_start) @@ -414,6 +449,7 @@ static int __init phyp_dump_setup(void) } /* Don't allow user to release the 256MB scratch area */ + /* this might be wrong */ phyp_dump_info-init_reserve_size = free_area_length; /* Should we create a dump_subsys, analogous to s390/ipl.c ? */ ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 3/8] pseries: phyp dump: reserve-release proof-of-concept
Arnd, Sorry this patch ended up out of sequence. I reposted it properly again in the other thread. We did talk about using gmail address, but Linas was more comfortable using this as this is where he was, when we did this. Hence the use of Austin address with gmail being on the cc list. I am sure he will chime in with more details about it when he gets the opportunity. Thanks, Manish Arnd Bergmann wrote: On Tuesday 08 January 2008, Manish Ahuja wrote: Initial patch for reserving memory in early boot, and freeing it later. If the previous boot had ended with a crash, the reserved memory would contain a copy of the crashed kernel data. Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Signed-off-by: Linas Vepstas [EMAIL PROTECTED] I think the signed-off-by chain needs to be modified. The way it appears, you handled the patch first, then sent it to Linas, who forwarded it to whoever will take the patches from the list. This obviously isn't true, since you are actually the one who is sending out the patches. Moreover, I believe that the [EMAIL PROTECTED] address is now dead, and shouldn't be used for this any more. So, depending on which of you two wrote the majority of a patch, I think it should be either | Signed-off-by: Manish Ahuja [EMAIL PROTECTED] | Acked-by: Linas Vepstas [EMAIL PROTECTED] or | From: Linas Vepstas [EMAIL PROTECTED] | Signed-off-by: Manish Ahuja [EMAIL PROTECTED] Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Infinite loop/always true check possible with unsigned counter.
Paul Mackerras wrote: Andreas Schwab writes: ??? There is no rgn-cnt involved in the comparison. Look further down in lmb_add_region; there is a second for loop that does for (i = rgn-cnt-1; i = 0; i--) Which is exactly the one quoted above. I still don't see your point. You're right - my mistake. Paul. I presume the patch is good then. Do I need to change anything ? Thanks, Manish ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH] Infinite loop/always true check possible with unsigned counter.
Fix to correct a possible infinite loop or an always true check when the unsigned long counter i is used in lmb_add_region() in the following for loop: for (i = rgn-cnt-1; i = 0; i--) Signed-off-by: Manish Ahuja [EMAIL PROTECTED] --- arch/powerpc/mm/lmb.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: 2.6.22-rc4/arch/powerpc/mm/lmb.c === --- 2.6.22-rc4.orig/arch/powerpc/mm/lmb.c 2007-06-11 21:10:46.0 -0500 +++ 2.6.22-rc4/arch/powerpc/mm/lmb.c2007-07-06 21:47:40.0 -0500 @@ -138,8 +138,8 @@ void __init lmb_analyze(void) static long __init lmb_add_region(struct lmb_region *rgn, unsigned long base, unsigned long size) { - unsigned long i, coalesced = 0; - long adjacent; + unsigned long coalesced = 0; + long adjacent, i; /* First try and coalesce this LMB with another. */ for (i=0; i rgn-cnt; i++) { ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Infinite loop/always true check possible with unsigned counter.
Repost to fix my email id. Fix to correct a possible infinite loop or an always true check when the unsigned long counter i is used in lmb_add_region() in the following for loop: for (i = rgn-cnt-1; i = 0; i--) Signed-off-by: Manish Ahuja [EMAIL PROTECTED] --- arch/powerpc/mm/lmb.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: 2.6.22-rc4/arch/powerpc/mm/lmb.c === --- 2.6.22-rc4.orig/arch/powerpc/mm/lmb.c 2007-06-11 21:10:46.0 -0500 +++ 2.6.22-rc4/arch/powerpc/mm/lmb.c2007-07-06 21:47:40.0 -0500 @@ -138,8 +138,8 @@ void __init lmb_analyze(void) static long __init lmb_add_region(struct lmb_region *rgn, unsigned long base, unsigned long size) { - unsigned long i, coalesced = 0; - long adjacent; + unsigned long coalesced = 0; + long adjacent, i; /* First try and coalesce this LMB with another. */ for (i=0; i rgn-cnt; i++) { ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] pseries: Re: Minor: Removed double return.
Ah yes, my mistake. Does it require a repost then ? Thanks, Manish Linas Vepstas wrote: You want to say its a patch in the subject line. --linas On Fri, Jul 06, 2007 at 04:59:55PM -0500, Manish Ahuja wrote: Found 2 instances of return one right after each other in arch_add_memory(). This minor patch fixes it. Signed-off-by:Manish Ahuja [EMAIL PROTECTED] Index: 2.6.22-rc4/arch/powerpc/mm/mem.c === --- 2.6.22-rc4.orig/arch/powerpc/mm/mem.c2007-06-11 21:10:46.0 -0500 +++ 2.6.22-rc4/arch/powerpc/mm/mem.c 2007-06-29 22:52:42.0 -0500 @@ -129,8 +129,6 @@ zone = pgdata-node_zones; return __add_pages(zone, start_pfn, nr_pages); - -return 0; } /* ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev