Re: [PATCH] perf: Power7: Make CPI stack events available in sysfs
On Sat, Apr 06, 2013 at 09:48:03AM -0700, Sukadev Bhattiprolu wrote: From bdeacf7175241f6c79b5b2be0fa6b20b0d0b7d1c Mon Sep 17 00:00:00 2001 From: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Date: Sat, 6 Apr 2013 08:48:26 -0700 Subject: [PATCH] perf: Power7: Make CPI stack events available in sysfs A set of Power7 events are often used for Cycles Per Instruction (CPI) stack analysis. Make these events available in sysfs (/sys/devices/cpu/events/) so they can be identified using their symbolic names: perf stat -e 'cpu/PM_CMPLU_STALL_DCACHE_MISS/' /bin/ls Should we take these two via the powerpc tree? Or do you want to take them Arnaldo? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 1/4] powerpc: Move the setting of rflags out of loop in __hash_page_huge
On Fri, Apr 12, 2013 at 10:16:57AM +0800, Li Zhong wrote: It seems that rflags don't get changed in the repeating loop, so move it out of the loop. You've also changed the way new_pte is handled on repeat, but I think that's OK too. cheers diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c index cecad34..edb4129 100644 --- a/arch/powerpc/mm/hugetlbpage-hash64.c +++ b/arch/powerpc/mm/hugetlbpage-hash64.c @@ -87,10 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, pa = pte_pfn(__pte(old_pte)) PAGE_SHIFT; -repeat: - hpte_group = ((hash htab_hash_mask) * - HPTES_PER_GROUP) ~0x7UL; - /* clear HPTE slot informations in new PTE */ #ifdef CONFIG_PPC_64K_PAGES new_pte = (new_pte ~_PAGE_HPTEFLAGS) | _PAGE_HPTE_SUB0; ie. here new_pte was updated on repeat, but now it's not. @@ -101,6 +97,10 @@ repeat: rflags |= (new_pte (_PAGE_WRITETHRU | _PAGE_NO_CACHE | _PAGE_COHERENT | _PAGE_GUARDED)); +repeat: + hpte_group = ((hash htab_hash_mask) * + HPTES_PER_GROUP) ~0x7UL; + /* Insert into the hash table, primary slot */ slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, mmu_psize, ssize); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 4/4] powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()
On Fri, Apr 12, 2013 at 10:17:00AM +0800, Li Zhong wrote: This patch tries to fix following issue when CONFIG_DEBUG_PAGEALLOC is enabled: Please include a better changelog. This patch does fix (I hope?) the following oops, caused by xxx. Reproducible by doing yyy. cheers [ 543.075675] [ cut here ] [ 543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239! [ 543.075714] Oops: Exception in kernel mode, sig: 5 [#1] [ 543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [ 543.075741] Modules linked in: binfmt_misc ehea [ 543.075759] NIP: c0036eb0 LR: c0036ea4 CTR: c005a594 [ 543.075771] REGS: c000a90832c0 TRAP: 0700 Not tainted (3.8.0-next-20130222) [ 543.075781] MSR: 80029032 SF,EE,ME,IR,DR,RI CR: 4482 XER: [ 543.075816] SOFTE: 0 [ 543.075823] CFAR: c004c200 [ 543.075830] TASK = c000e506b750[23934] 'cc1' THREAD: c000a908 CPU: 1 GPR00: 0001 c000a9083540 c0c600a8 GPR04: 0050 fffa c000a90834e0 004ff594 GPR08: 0001 9592d4d8 c0c86854 GPR12: 0002 c6ead300 00a51000 0001 GPR16: f3354380 ff80 GPR20: 0001 c0c600a8 0001 0001 GPR24: 03354380 c000 c0b65950 GPR28: 0020 000cd50e 00bf50d9 c0c7c230 [ 543.076005] NIP [c0036eb0] .kernel_map_pages+0x1e0/0x3f8 [ 543.076016] LR [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 [ 543.076025] Call Trace: [ 543.076033] [c000a9083540] [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable) [ 543.076053] [c000a9083640] [c0167638] .get_page_from_freelist+0x6cc/0x8dc [ 543.076067] [c000a9083800] [c0167a48] .__alloc_pages_nodemask+0x200/0x96c [ 543.076082] [c000a90839c0] [c01ade44] .alloc_pages_vma+0x160/0x1e4 [ 543.076098] [c000a9083a80] [c018ce04] .handle_pte_fault+0x1b0/0x7e8 [ 543.076113] [c000a9083b50] [c018d5a8] .handle_mm_fault+0x16c/0x1a0 [ 543.076129] [c000a9083c00] [c07bf1dc] .do_page_fault+0x4d0/0x7a4 [ 543.076144] [c000a9083e30] [c00090e8] handle_page_fault+0x10/0x30 [ 543.076155] Instruction dump: [ 543.076163] 7c630038 78631d88 e80a f8410028 7c0903a6 e91f01de e96a0010 e84a0008 [ 543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 0b00 7f63db78 48785781 6000 [ 543.076224] ---[ end trace bd5807e8d6ae186b ]--- Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com --- arch/powerpc/mm/hash_utils_64.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index a7f54f0..4b449a0 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1272,7 +1272,7 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize); unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize); unsigned long mode = htab_convert_pte_flags(PAGE_KERNEL); - int ret; + long ret; hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize); hpteg = ((hash htab_hash_mask) * HPTES_PER_GROUP); @@ -1280,9 +1280,11 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) /* Don't create HPTE entries for bad address */ if (!vsid) return; - ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr), - mode, 0, - mmu_linear_psize, mmu_kernel_ssize); + + ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), + mode, mmu_linear_psize, + mmu_kernel_ssize); + BUG_ON (ret 0); spin_lock(linear_map_hash_lock); BUG_ON(linear_map_hash_slots[lmi] 0x80); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 3/4] powerpc: Don't bolt the hpte in kernel_map_linear_page()
On Mon, 2013-04-15 at 13:50 +1000, Paul Mackerras wrote: On Fri, Apr 12, 2013 at 10:16:59AM +0800, Li Zhong wrote: It seems that in kernel_unmap_linear_page(), it only checks whether there is a map in the linear_map_hash_slots array, so seems we don't need bolt the hpte. I don't exactly understand your rationale here, but I don't think it's safe not to have linear mapping pages bolted. Basically, if a page will be used in the process of calling hash_page to demand-fault an HPTE into the hash table, then that page needs to be bolted, otherwise we can get an infinite recursion of HPT misses. That includes all kernel stack pages, among other things, so I think we need to leave the HPTE_V_BOLTED in there. I suspect Li's confusion comes from the fact that he doesn't realizes that we might evict random hash slots. If the linear mapping hash entries could only be thrown out via kernel_unmap_linear_page() then his comment would make sense. However this isn't the case. Li: When faulting something in, if both the primary and secondary buckets are full, we somewhat randomly evict the content of a slot and replace it. However we only do that on non-bolted slots. This is why the linear mapping (and the vmemmap) must be bolted. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/8] Remove syslog prefix in uncompressed oops text
On Wed, Apr 10, 2013 at 12:51:00PM +0530, Aruna Balakrishnaiah wrote: Removal of syslog prefix in the uncompressed oops text will help in capturing more oops data. Why does it help? Does this effect any existing tools? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/perf: Power8 PMU support
On Mon, 2013-04-15 at 14:17 +1000, Michael Ellerman wrote: This patch adds preliminary support for the power8 PMU to perf. Might be worthwhile to have a small blurb explaining roughly what you mean by preliminary :-) Cheers, Ben. Signed-off-by: Michael Ellerman mich...@ellerman.id.au --- arch/powerpc/perf/Makefile |3 +- arch/powerpc/perf/power8-pmu.c | 454 2 files changed, 456 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/perf/power8-pmu.c diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile index af3fac2..472db18 100644 --- a/arch/powerpc/perf/Makefile +++ b/arch/powerpc/perf/Makefile @@ -4,7 +4,8 @@ obj-$(CONFIG_PERF_EVENTS) += callchain.o obj-$(CONFIG_PPC_PERF_CTRS) += core-book3s.o obj64-$(CONFIG_PPC_PERF_CTRS)+= power4-pmu.o ppc970-pmu.o power5-pmu.o \ -power5+-pmu.o power6-pmu.o power7-pmu.o +power5+-pmu.o power6-pmu.o power7-pmu.o \ +power8-pmu.o obj32-$(CONFIG_PPC_PERF_CTRS)+= mpc7450-pmu.o obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c new file mode 100644 index 000..106ae0b --- /dev/null +++ b/arch/powerpc/perf/power8-pmu.c @@ -0,0 +1,454 @@ +/* + * Performance counter support for POWER8 processors. + * + * Copyright 2009 Paul Mackerras, IBM Corporation. + * Copyright 2013 Michael Ellerman, IBM Corporation. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + */ + +#include linux/kernel.h +#include linux/perf_event.h +#include asm/firmware.h + + +/* + * Some power8 event codes. + */ +#define PM_CYC 0x0001e +#define PM_GCT_NOSLOT_CYC0x100f8 +#define PM_CMPLU_STALL 0x4000a /* or 0x1e054 */ +#define PM_INST_CMPL 0x2 +#define PM_BRU_FIN 0x10068 +#define PM_BR_MPRED_CMPL 0x400f6 + + +/* + * Raw event encoding for POWER8: + * + *60565248444036 32 + * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | + * [ thresh_cmp ] [ thresh_ctl ] + * | + * thresh start/stop OR FAB match -* + * + *2824201612 8 4 0 + * | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | - - - - | + * [ ] [ sample ] [cache] [ pmc ] [unit ] c m [ pmcxsel] + * || | | | + * || | | *- mark + * || *- L1/L2/L3 cache_sel | + * || | + * |*- sampling mode for marked events *- combine + * | + * *- thresh_sel + * + * Below uses IBM bit numbering. + * + * MMCR1[x:y] = unit(PMCxUNIT) + * MMCR1[x] = combine (PMCxCOMB) + * + * if pmc == 3 and unit == 0 and pmcxsel[0:6] == 0b0101011 + * # PM_MRK_FAB_RSP_MATCH + * MMCR1[20:27] = thresh_ctl (FAB_CRESP_MATCH / FAB_TYPE_MATCH) + * else if pmc == 4 and unit == 0xf and pmcxsel[0:6] == 0b0101001 + * # PM_MRK_FAB_RSP_MATCH_CYC + * MMCR1[20:27] = thresh_ctl (FAB_CRESP_MATCH / FAB_TYPE_MATCH) + * else + * MMCRA[48:55] = thresh_ctl (THRESH START/END) + * + * if thresh_sel: + * MMCRA[45:47] = thresh_sel + * + * if thresh_cmp: + * MMCRA[22:24] = thresh_cmp[0:2] + * MMCRA[25:31] = thresh_cmp[3:9] + * + * if unit == 6 or unit == 7 + * MMCRC[53:55] = cache_sel[1:3] (L2EVENT_SEL) + * else if unit == 8 or unit == 9: + * if cache_sel[0] == 0: # L3 bank + * MMCRC[47:49] = cache_sel[1:3] (L3EVENT_SEL0) + * else if cache_sel[0] == 1: + * MMCRC[50:51] = cache_sel[2:3] (L3EVENT_SEL1) + * else if cache_sel[1]: # L1 event + * MMCR1[16] = cache_sel[2] + * MMCR1[17] = cache_sel[3] + * + * if mark: + * MMCRA[63]= 1(SAMPLE_ENABLE) + * MMCRA[57:59] = sample[0:2] (RAND_SAMP_ELIG) + * MMCRA[61:62] = sample[3:4] (RAND_SAMP_MODE) + * + */ + +#define EVENT_THR_CMP_SHIFT 40 /* Threshold CMP value */ +#define EVENT_THR_CMP_MASK 0x3ff +#define EVENT_THR_CTL_SHIFT 32 /* Threshold control value (start/stop) */ +#define EVENT_THR_CTL_MASK 0xffull +#define EVENT_THR_SEL_SHIFT 29 /* Threshold select value */ +#define
Re: [PATCH 2/8] Add version and timestamp to oops header
On Wed, Apr 10, 2013 at 12:51:12PM +0530, Aruna Balakrishnaiah wrote: Introduce version and timestamp information in the oops header. oops_log_info (oops header) holds version (to distinguish between old and new format oops header), length of the oops text (compressed or uncompressed) and timestamp. This needs a much more detailed explanation. I think what you're doing is you're overlaying the new information so that the version field in oops_log_info sits in the same location as the length field in the old format. And then you're defining the version to be a value that is an illegal length. So existing tools will refuse to dump new style partitions, because they'll think the length is too large. You've tested that? Updated tools will know about both formats, so will be able to handle either old or new style partitions. Is that correct? And we're adding the timestamp just because we can and it'd be nice to have? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/8] Introduce generic read function to read nvram-partitions
On Wed, Apr 10, 2013 at 12:51:25PM +0530, Aruna Balakrishnaiah wrote: Introduce generic read function to read nvram partitions other than rtas. nvram_read_error_log will be retained which is used to read rtas partition from rtasd. nvram_read_partition is the generic read function to read from any nvram partition. Signed-off-by: Aruna Balakrishnaiah ar...@linux.vnet.ibm.com Reviewed-by: Jim Keniston jkeni...@us.ibm.com --- arch/powerpc/platforms/pseries/nvram.c | 34 +++- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c index 742735a..6701b71 100644 --- a/arch/powerpc/platforms/pseries/nvram.c +++ b/arch/powerpc/platforms/pseries/nvram.c @@ -293,34 +293,37 @@ int nvram_write_error_log(char * buff, int length, return rc; } -/* nvram_read_error_log +/* nvram_read_partition * - * Reads nvram for error log for at most 'length' + * Reads nvram partition for at most 'length' */ -int nvram_read_error_log(char * buff, int length, - unsigned int * err_type, unsigned int * error_log_cnt) +int nvram_read_partition(struct nvram_os_partition *part, char *buff, + int length, unsigned int *err_type, + unsigned int *error_log_cnt) { int rc; loff_t tmp_index; struct err_log_info info; - if (rtas_log_partition.index == -1) + if (part-index == -1) return -1; - if (length rtas_log_partition.size) - length = rtas_log_partition.size; + if (length part-size) + length = part-size; - tmp_index = rtas_log_partition.index; + tmp_index = part-index; rc = ppc_md.nvram_read((char *)info, sizeof(struct err_log_info), tmp_index); if (rc = 0) { - printk(KERN_ERR nvram_read_error_log: Failed nvram_read (%d)\n, rc); + printk(KERN_ERR nvram_read_partition: + Failed nvram_read (%d)\n, rc); Should be: pr_err(%s: Failed ..\n, __FUNCTION__, ..) cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/8] Nvram-to-pstore
On Wed, Apr 10, 2013 at 12:50:47PM +0530, Aruna Balakrishnaiah wrote: Currently the kernel provides the contents of p-series NVRAM only as a simple stream of bytes via /dev/nvram, which must be interpreted in user space by the nvram command in the powerpc-utils package. This patch set exploits the pstore subsystem to expose each partition in NVRAM as a separate file in /dev/pstore. For instance Oops messages will stored in a file named [dmesg-nvram-2]. Please try to fold some of that info into the commit messages for actual patches. The 0th patch is lost when we commit the series into git. Also all your patches should have a subject starting with powerpc/pseries:. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/8] Remove syslog prefix in uncompressed oops text
On Monday 15 April 2013 12:50 PM, Michael Ellerman wrote: On Wed, Apr 10, 2013 at 12:51:00PM +0530, Aruna Balakrishnaiah wrote: Removal of syslog prefix in the uncompressed oops text will help in capturing more oops data. Why does it help? Does this effect any existing tools? cheers By setting the (2nd) syslog argument of kmsg_dump_get_buffer() to false, we omit n line prefixes and thereby capture more of the printk buffer. No this should not effect any existing tools. Regards, Aruna ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 4/8] Read/Write oops nvram partition via pstore
On Wed, Apr 10, 2013 at 12:53:03PM +0530, Aruna Balakrishnaiah wrote: This patch exploits pstore infrastructure in power systems. IBM's system p machines provide persistent storage for LPARs In the kernel we use pseries instead of system p. through NVRAM. NVRAM's lnx,oops-log partition is used to log oops messages. In case pstore registration fails it will fall back to kmsg_dump mechanism. What are the implications of falling back to kmsg_dump()? Is there any reason we would not want to enable CONFIG_PSTORE ? ie. should the pseries platform just select it? diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c index 6701b71..82d32a2 100644 --- a/arch/powerpc/platforms/pseries/nvram.c +++ b/arch/powerpc/platforms/pseries/nvram.c @@ -18,6 +18,7 @@ #include linux/spinlock.h #include linux/slab.h #include linux/kmsg_dump.h +#include linux/pstore.h #include linux/ctype.h #include linux/zlib.h #include asm/uaccess.h @@ -87,6 +88,25 @@ static struct kmsg_dumper nvram_kmsg_dumper = { .dump = oops_to_nvram }; +static int nvram_pstore_open(struct pstore_info *psi); + +static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type, + int *count, struct timespec *time, char **buf, + struct pstore_info *psi); + +static int nvram_pstore_write(enum pstore_type_id type, + enum kmsg_dump_reason reason, u64 *id, + unsigned int part, int count, size_t size, + struct pstore_info *psi); I think you should be able to rearrange this so that you don't need the forward declarations. + +static struct pstore_info nvram_pstore_info = { + .owner = THIS_MODULE, + .name = nvram, + .open = nvram_pstore_open, + .read = nvram_pstore_read, + .write = nvram_pstore_write, +}; + /* See clobbering_unread_rtas_event() */ #define NVRAM_RTAS_READ_TIMEOUT 5/* seconds */ static unsigned long last_unread_rtas_event; /* timestamp */ @@ -121,6 +141,13 @@ static char *big_oops_buf, *oops_buf; static char *oops_data; static size_t oops_data_sz; +#ifdef CONFIG_PSTORE If we are going to have CONFIG_PSTORE #ifdefs in this file, I don't see why there can't be just a single block of code that is #ifdef'ed, rather than several like you have. +static enum pstore_type_id nvram_type_ids[] = { + PSTORE_TYPE_DMESG, + -1 +}; +static int read_type; I don't understand what you're doing with read_type. It looks fishy. +#endif /* Compression parameters */ #define COMPR_LEVEL 6 #define WINDOW_BITS 12 @@ -455,6 +482,23 @@ static void __init nvram_init_oops_partition(int rtas_partition_exists) oops_data = oops_buf + sizeof(struct oops_log_info); oops_data_sz = oops_log_partition.size - sizeof(struct oops_log_info); + nvram_pstore_info.buf = oops_data; + nvram_pstore_info.bufsize = oops_data_sz; + + rc = pstore_register(nvram_pstore_info); + + if (rc != 0) { + pr_err(nvram: pstore_register() failed, defaults to + kmsg_dump; returned %d\n, rc); + goto kmsg_dump; You don't need the goto. + } else { + /*TODO: Support compression when pstore is configured */ What is the issue here? + pr_info(nvram: Compression of oops text supported only when + pstore is not configured); + return; + } + +kmsg_dump: /* * Figure compression (preceded by elimination of each line's n * severity prefix) will reduce the oops/panic report to at most @@ -663,3 +707,104 @@ static void oops_to_nvram(struct kmsg_dumper *dumper, spin_unlock_irqrestore(lock, flags); } + +#ifdef CONFIG_PSTORE Same comment about too many ifdefs. +static int nvram_pstore_open(struct pstore_info *psi) +{ + read_type = -1; Locking? + return 0; +} + +/* Make it a kernel-doc style comment. + * Called by pstore_dump() when an oops or panic report is logged to the printk + * buffer. @size bytes have been written to oops_buf, starting after the + * oops_log_info header. @size bytes have, or @size bytes should be written? + */ +static int nvram_pstore_write(enum pstore_type_id type, + enum kmsg_dump_reason reason, + u64 *id, unsigned int part, int count, + size_t size, struct pstore_info *psi) +{ + struct oops_log_info *oops_hdr = (struct oops_log_info *) oops_buf; + + /* part 1 has the recent messages from printk buffer */ + if (part 1 || clobbering_unread_rtas_event()) + return -1; + + BUG_ON(type != PSTORE_TYPE_DMESG); + BUG_ON(sizeof(*oops_hdr) + size oops_log_partition.size); Why would we be called with the wrong type? Would it be better to
Re: [PATCH 2/8] Add version and timestamp to oops header
On Monday 15 April 2013 01:01 PM, Michael Ellerman wrote: On Wed, Apr 10, 2013 at 12:51:12PM +0530, Aruna Balakrishnaiah wrote: Introduce version and timestamp information in the oops header. oops_log_info (oops header) holds version (to distinguish between old and new format oops header), length of the oops text (compressed or uncompressed) and timestamp. This needs a much more detailed explanation. I think what you're doing is you're overlaying the new information so that the version field in oops_log_info sits in the same location as the length field in the old format. And then you're defining the version to be a value that is an illegal length. Thats right. So existing tools will refuse to dump new style partitions, because they'll think the length is too large. You've tested that? Yeah, I have tested that. Updated tools will know about both formats, so will be able to handle either old or new style partitions. Is that correct? Yeah, thats correct. And we're adding the timestamp just because we can and it'd be nice to have? Thats right. And also, the main reason behind adding timestamp is it will be used when we create a pstore file for oops messages. The pstore file's timestamp will reflect the timestamp in the oops-header added during the crash. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 5/8] Read rtas partition via pstore
On Wed, Apr 10, 2013 at 12:53:27PM +0530, Aruna Balakrishnaiah wrote: This patch exploits pstore infrastructure to read the details from NVRAM's rtas partition. Does that mean it's exposed in the pstore filesystem? Signed-off-by: Aruna Balakrishnaiah ar...@linux.vnet.ibm.com Reviewed-by: Jim Keniston jkeni...@us.ibm.com --- arch/powerpc/platforms/pseries/nvram.c | 33 +--- fs/pstore/inode.c |3 +++ include/linux/pstore.h |2 ++ 3 files changed, 31 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/platforms/pseries/nvram.c b/arch/powerpc/platforms/pseries/nvram.c index 82d32a2..d420b1d 100644 --- a/arch/powerpc/platforms/pseries/nvram.c +++ b/arch/powerpc/platforms/pseries/nvram.c @@ -144,9 +144,11 @@ static size_t oops_data_sz; #ifdef CONFIG_PSTORE static enum pstore_type_id nvram_type_ids[] = { PSTORE_TYPE_DMESG, + PSTORE_TYPE_RTAS, -1 }; static int read_type; +static unsigned long last_rtas_event; #endif /* Compression parameters */ #define COMPR_LEVEL 6 @@ -315,8 +317,13 @@ int nvram_write_error_log(char * buff, int length, { int rc = nvram_write_os_partition(rtas_log_partition, buff, length, err_type, error_log_cnt); - if (!rc) + if (!rc) { last_unread_rtas_event = get_seconds(); +#ifdef CONFIG_PSTORE + last_rtas_event = get_seconds(); +#endif + } + return rc; } @@ -745,7 +752,7 @@ static int nvram_pstore_write(enum pstore_type_id type, } /* - * Reads the oops/panic report. + * Reads the oops/panic report and ibm,rtas-log partition. * Returns the length of the data we read from each partition. * Returns 0 if we've been called before. */ @@ -765,6 +772,12 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type, part = oops_log_partition; *type = PSTORE_TYPE_DMESG; break; + case PSTORE_TYPE_RTAS: + part = rtas_log_partition; + *type = PSTORE_TYPE_RTAS; + time-tv_sec = last_rtas_event; + time-tv_nsec = 0; + break; default: return 0; } @@ -781,11 +794,17 @@ static ssize_t nvram_pstore_read(u64 *id, enum pstore_type_id *type, *count = 0; *id = id_no; - oops_hdr = (struct oops_log_info *)buff; - *buf = buff + sizeof(*oops_hdr); - time-tv_sec = oops_hdr-timestamp; - time-tv_nsec = 0; - return oops_hdr-report_length; + + if (nvram_type_ids[read_type] == PSTORE_TYPE_DMESG) { + oops_hdr = (struct oops_log_info *)buff; + *buf = buff + sizeof(*oops_hdr); + time-tv_sec = oops_hdr-timestamp; + time-tv_nsec = 0; + return oops_hdr-report_length; + } + + *buf = buff; + return part-size; } #else static int nvram_pstore_open(struct pstore_info *psi) diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index e4bcb2c..59b1454 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -324,6 +324,9 @@ int pstore_mkfile(enum pstore_type_id type, char *psname, u64 id, int count, case PSTORE_TYPE_MCE: sprintf(name, mce-%s-%lld, psname, id); break; + case PSTORE_TYPE_RTAS: + sprintf(name, rtas-%s-%lld, psname, id); + break; case PSTORE_TYPE_UNKNOWN: sprintf(name, unknown-%s-%lld, psname, id); break; diff --git a/include/linux/pstore.h b/include/linux/pstore.h index 75d0176..4eb94c9 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -35,6 +35,8 @@ enum pstore_type_id { PSTORE_TYPE_MCE = 1, PSTORE_TYPE_CONSOLE = 2, PSTORE_TYPE_FTRACE = 3, + /* PPC64 partition types */ + PSTORE_TYPE_RTAS= 10, PSTORE_TYPE_UNKNOWN = 255 I think you should probably just continue at 4, and call it PSTORE_TYPE_PPC_RTAS. But you must get an ACK from the pstore maintainers for this and the previous hunk, and I don't see them on CC. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 3/4] powerpc: Don't bolt the hpte in kernel_map_linear_page()
On Mon, 2013-04-15 at 08:56 +0200, Benjamin Herrenschmidt wrote: On Mon, 2013-04-15 at 13:50 +1000, Paul Mackerras wrote: On Fri, Apr 12, 2013 at 10:16:59AM +0800, Li Zhong wrote: It seems that in kernel_unmap_linear_page(), it only checks whether there is a map in the linear_map_hash_slots array, so seems we don't need bolt the hpte. Hi Paul, Ben Thank you both for the comments and detailed information. I'll keep it bolted in the next version. If you have time, please help to check whether my understanding below is correct. Thanks, Zhong I don't exactly understand your rationale here, but I don't think it's safe not to have linear mapping pages bolted. Basically, if a page will be used in the process of calling hash_page to demand-fault an HPTE into the hash table, then that page needs to be bolted, otherwise we can get an infinite recursion of HPT misses. So the infinite recursion happens like below? fault for PAGE A hash_page for PAGE A some page B needed by hash_page processing removed by others, before inserting the HPTE fault for PAGE B hash_page for PAGE B and recursion for ever That includes all kernel stack pages, among other things, so I think we need to leave the HPTE_V_BOLTED in there. I suspect Li's confusion comes from the fact that he doesn't realizes that we might evict random hash slots. If the linear mapping hash entries could only be thrown out via kernel_unmap_linear_page() then his comment would make sense. However this isn't the case. Li: When faulting something in, if both the primary and secondary buckets are full, we somewhat randomly evict the content of a slot and replace it. However we only do that on non-bolted slots. So the code is implemented in ppc_md.hpte_remove(), may be called by __hash_huge_page(), and asm code htab_call_hpte_remove? This is why the linear mapping (and the vmemmap) must be bolted. If not, it would result the infinite recursion like above? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 1/4] powerpc: Move the setting of rflags out of loop in __hash_page_huge
On Mon, 2013-04-15 at 16:32 +1000, Michael Ellerman wrote: On Fri, Apr 12, 2013 at 10:16:57AM +0800, Li Zhong wrote: It seems that rflags don't get changed in the repeating loop, so move it out of the loop. You've also changed the way new_pte is handled on repeat, but I think that's OK too. OK, I'll add it in the description :) Thanks, Zhong cheers diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c index cecad34..edb4129 100644 --- a/arch/powerpc/mm/hugetlbpage-hash64.c +++ b/arch/powerpc/mm/hugetlbpage-hash64.c @@ -87,10 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, pa = pte_pfn(__pte(old_pte)) PAGE_SHIFT; -repeat: - hpte_group = ((hash htab_hash_mask) * - HPTES_PER_GROUP) ~0x7UL; - /* clear HPTE slot informations in new PTE */ #ifdef CONFIG_PPC_64K_PAGES new_pte = (new_pte ~_PAGE_HPTEFLAGS) | _PAGE_HPTE_SUB0; ie. here new_pte was updated on repeat, but now it's not. @@ -101,6 +97,10 @@ repeat: rflags |= (new_pte (_PAGE_WRITETHRU | _PAGE_NO_CACHE | _PAGE_COHERENT | _PAGE_GUARDED)); +repeat: + hpte_group = ((hash htab_hash_mask) * + HPTES_PER_GROUP) ~0x7UL; + /* Insert into the hash table, primary slot */ slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, mmu_psize, ssize); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 4/4] powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()
On Mon, 2013-04-15 at 16:36 +1000, Michael Ellerman wrote: On Fri, Apr 12, 2013 at 10:17:00AM +0800, Li Zhong wrote: This patch tries to fix following issue when CONFIG_DEBUG_PAGEALLOC is enabled: Please include a better changelog. OK, I'll use the following as a template, thank you for the suggestion. This patch does fix (I hope?) the following oops, caused by xxx. Reproducible by doing yyy. cheers [ 543.075675] [ cut here ] [ 543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239! [ 543.075714] Oops: Exception in kernel mode, sig: 5 [#1] [ 543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [ 543.075741] Modules linked in: binfmt_misc ehea [ 543.075759] NIP: c0036eb0 LR: c0036ea4 CTR: c005a594 [ 543.075771] REGS: c000a90832c0 TRAP: 0700 Not tainted (3.8.0-next-20130222) [ 543.075781] MSR: 80029032 SF,EE,ME,IR,DR,RI CR: 4482 XER: [ 543.075816] SOFTE: 0 [ 543.075823] CFAR: c004c200 [ 543.075830] TASK = c000e506b750[23934] 'cc1' THREAD: c000a908 CPU: 1 GPR00: 0001 c000a9083540 c0c600a8 GPR04: 0050 fffa c000a90834e0 004ff594 GPR08: 0001 9592d4d8 c0c86854 GPR12: 0002 c6ead300 00a51000 0001 GPR16: f3354380 ff80 GPR20: 0001 c0c600a8 0001 0001 GPR24: 03354380 c000 c0b65950 GPR28: 0020 000cd50e 00bf50d9 c0c7c230 [ 543.076005] NIP [c0036eb0] .kernel_map_pages+0x1e0/0x3f8 [ 543.076016] LR [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 [ 543.076025] Call Trace: [ 543.076033] [c000a9083540] [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable) [ 543.076053] [c000a9083640] [c0167638] .get_page_from_freelist+0x6cc/0x8dc [ 543.076067] [c000a9083800] [c0167a48] .__alloc_pages_nodemask+0x200/0x96c [ 543.076082] [c000a90839c0] [c01ade44] .alloc_pages_vma+0x160/0x1e4 [ 543.076098] [c000a9083a80] [c018ce04] .handle_pte_fault+0x1b0/0x7e8 [ 543.076113] [c000a9083b50] [c018d5a8] .handle_mm_fault+0x16c/0x1a0 [ 543.076129] [c000a9083c00] [c07bf1dc] .do_page_fault+0x4d0/0x7a4 [ 543.076144] [c000a9083e30] [c00090e8] handle_page_fault+0x10/0x30 [ 543.076155] Instruction dump: [ 543.076163] 7c630038 78631d88 e80a f8410028 7c0903a6 e91f01de e96a0010 e84a0008 [ 543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 0b00 7f63db78 48785781 6000 [ 543.076224] ---[ end trace bd5807e8d6ae186b ]--- Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com --- arch/powerpc/mm/hash_utils_64.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index a7f54f0..4b449a0 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1272,7 +1272,7 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize); unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize); unsigned long mode = htab_convert_pte_flags(PAGE_KERNEL); - int ret; + long ret; hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize); hpteg = ((hash htab_hash_mask) * HPTES_PER_GROUP); @@ -1280,9 +1280,11 @@ static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) /* Don't create HPTE entries for bad address */ if (!vsid) return; - ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr), -mode, 0, -mmu_linear_psize, mmu_kernel_ssize); + + ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), + mode, mmu_linear_psize, + mmu_kernel_ssize); + BUG_ON (ret 0); spin_lock(linear_map_hash_lock); BUG_ON(linear_map_hash_slots[lmi] 0x80); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/perf: Power8 PMU support
On Mon, Apr 15, 2013 at 09:31:26AM +0200, Benjamin Herrenschmidt wrote: On Mon, 2013-04-15 at 14:17 +1000, Michael Ellerman wrote: This patch adds preliminary support for the power8 PMU to perf. Might be worthwhile to have a small blurb explaining roughly what you mean by preliminary :-) True. There's no alternative handling, and no cache events. I need to work with the HW folks on both of those. Also missing is EBB support. I will hopefully post that in the next few days. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 3/4] powerpc: Don't bolt the hpte in kernel_map_linear_page()
On Mon, 2013-04-15 at 16:15 +0800, Li Zhong wrote: So the code is implemented in ppc_md.hpte_remove(), may be called by __hash_huge_page(), and asm code htab_call_hpte_remove? This is why the linear mapping (and the vmemmap) must be bolted. If not, it would result the infinite recursion like above? Potentially, we don't expect to fault linear mapping or vmemmap entries on demand. We aren't equipped to do it and we occasionally have code path that access the linear mapping and cannot afford to have SRR0 and SRR1 clobbered by a page fault. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCHv3 1/2] ppc64: perform proper max_bus_speed detection
On Thu, Apr 11, 2013 at 10:13:13AM -0300, Lucas Kannebley Tavares wrote: On pseries machines the detection for max_bus_speed should be done through an OpenFirmware property. This patch adds a function to perform this detection and a hook to perform dynamic adding of the function only for pseries. This fails to build for me on ppc64_defconfig, with: arch/powerpc/include/asm/machdep.h:111:5: error: 'struct pci_host_bridge' declared inside parameter list [-Werror] Presumably you tested it using some other defconfig? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] powerpc: remove section changes from _GLOBAL() and friends
On Thu, Nov 29, 2012 at 10:55:25AM +1100, Stephen Rothwell wrote: These sometimes produce unexpected results and make it hard to put the start up code (for 64 bit) into the .head.text section. ... diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index 6f62a73..4ec5625 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -776,6 +776,8 @@ tlb_write_entry: mfspr r10, SPRN_SPRG_RSCRATCH0 rfi /* Force context change */ + ,text + I'm assuming this should be .text ? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] powerpc: remove section changes from _GLOBAL() and friends
Hi Michael, On Mon, 15 Apr 2013 22:00:17 +1000 Michael Ellerman mich...@ellerman.id.au wrote: On Thu, Nov 29, 2012 at 10:55:25AM +1100, Stephen Rothwell wrote: These sometimes produce unexpected results and make it hard to put the start up code (for 64 bit) into the .head.text section. ... diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index 6f62a73..4ec5625 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -776,6 +776,8 @@ tlb_write_entry: mfspr r10, SPRN_SPRG_RSCRATCH0 rfi /* Force context change */ + ,text + I'm assuming this should be .text ? Indeed. Oops :-) -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpOtP18MXk05.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] iommu: Move swap_pci_ref function to pci.h.
On Mon, Apr 15, 2013 at 12:42:00AM +0530, Varun Sethi wrote: swap_pci_ref function is used by the IOMMU API code for swapping pci device pointers, while determining the iommu group for the device. Currently this function was being implemented for different IOMMU drivers. This patch moves the function to pci.h so that the implementation can be shared across various IOMMU drivers. The function is only used in IOMMU code, so I think its fine to keep it there (unless Bjorn disagrees and wants it in PCI code). Joerg ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] iommu: Move swap_pci_ref function to pci.h.
On Mon, Apr 15, 2013 at 8:58 AM, Joerg Roedel j...@8bytes.org wrote: On Mon, Apr 15, 2013 at 12:42:00AM +0530, Varun Sethi wrote: swap_pci_ref function is used by the IOMMU API code for swapping pci device pointers, while determining the iommu group for the device. Currently this function was being implemented for different IOMMU drivers. This patch moves the function to pci.h so that the implementation can be shared across various IOMMU drivers. The function is only used in IOMMU code, so I think its fine to keep it there (unless Bjorn disagrees and wants it in PCI code). I agree; I don't think there's much benefit in putting something under #ifdef CONFIG_IOMMU_API into pci.h. Maybe there is or could be a shared iommu header file? Bjorn ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] powerpc: remove section changes from _GLOBAL() and friends
Hi Michael, On Mon, 15 Apr 2013 23:30:40 +1000 Stephen Rothwell s...@canb.auug.org.au wrote: On Mon, 15 Apr 2013 22:00:17 +1000 Michael Ellerman mich...@ellerman.id.au wrote: On Thu, Nov 29, 2012 at 10:55:25AM +1100, Stephen Rothwell wrote: These sometimes produce unexpected results and make it hard to put the start up code (for 64 bit) into the .head.text section. ... diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index 6f62a73..4ec5625 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -776,6 +776,8 @@ tlb_write_entry: mfspr r10, SPRN_SPRG_RSCRATCH0 rfi /* Force context change */ + ,text + I'm assuming this should be .text ? Indeed. Oops :-) BTW, those patches are almost certainly stale by now and would need to be redone before be included on the kernel proper. -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpARcb6kp0DK.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: fix usage of setup_pci_atmu()
On Mon, 15 Apr 2013 15:42:01 +1000 Michael Neuling mi...@neuling.org wrote: Linux next is currently failing to compile mpc85xx_defconfig with: arch/powerpc/sysdev/fsl_pci.c:944:2: error: too many arguments to function 'setup_pci_atmu' This is caused by (from Kumar's next branch): commit 34642bbb3d12121333efcf4ea7dfe66685e403a1 Author: Kumar Gala ga...@kernel.crashing.org powerpc/fsl-pci: Keep PCI SoC controller registers in pci_controller Which changed definition of setup_pci_atmu() but didn't update one of the callers. Below fixes this. Signed-off-by: Michael Neuling mi...@neuling.org --- Kumar: this is for your next tree Reviewed-by: Kim Phillips kim.phill...@freescale.com Thanks, Kim ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 1/3] iommu: Move swap_pci_ref function to pci.h.
Fine, I will move it to iommu.h. -Varun -Original Message- From: Joerg Roedel [mailto:j...@8bytes.org] Sent: Monday, April 15, 2013 8:29 PM To: Sethi Varun-B16395 Cc: Yoder Stuart-B08248; Wood Scott-B07421; iommu@lists.linux- foundation.org; linuxppc-dev@lists.ozlabs.org; linux- ker...@vger.kernel.org; ga...@kernel.crashing.org; b...@kernel.crashing.org; bhelg...@google.com Subject: Re: [PATCH 1/3] iommu: Move swap_pci_ref function to pci.h. On Mon, Apr 15, 2013 at 12:42:00AM +0530, Varun Sethi wrote: swap_pci_ref function is used by the IOMMU API code for swapping pci device pointers, while determining the iommu group for the device. Currently this function was being implemented for different IOMMU drivers. This patch moves the function to pci.h so that the implementation can be shared across various IOMMU drivers. The function is only used in IOMMU code, so I think its fine to keep it there (unless Bjorn disagrees and wants it in PCI code). Joerg ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 2/11] Add PRRN Event Handler
On 04/10/2013 03:30 AM, Michael Ellerman wrote: On Mon, Mar 25, 2013 at 01:52:32PM -0500, Nathan Fontenot wrote: From: Jesse Larrew jlar...@linux.vnet.ibm.com A PRRN event is signaled via the RTAS event-scan mechanism, which returns a Hot Plug Event message fixed part indicating Platform Resource Reassignment. In response to the Hot Plug Event message, we must call ibm,update-nodes to determine which resources were reassigned and then ibm,update-properties to obtain the new affinity information about those resources. .. Index: powerpc/arch/powerpc/kernel/rtasd.c === --- powerpc.orig/arch/powerpc/kernel/rtasd.c 2013-03-20 08:24:14.0 -0500 +++ powerpc/arch/powerpc/kernel/rtasd.c 2013-03-20 08:52:08.0 -0500 @@ -87,6 +87,8 @@ return Resource Deallocation Event; case RTAS_TYPE_DUMP: return Dump Notification Event; +case RTAS_TYPE_PRRN: +return Platform Resource Reassignment Event; } return rtas_type[0]; @@ -265,7 +267,38 @@ spin_unlock_irqrestore(rtasd_log_lock, s); return; } +} + +static s32 update_scope; + +static void prrn_work_fn(struct work_struct *work) +{ +/* + * For PRRN, we must pass the negative of the scope value in + * the RTAS event. + */ +pseries_devicetree_update(-update_scope); +} +static DECLARE_WORK(prrn_work, prrn_work_fn); This breaks the 32-bit build (ppc6xx_defconfig): arch/powerpc/kernel/rtasd.c:280: undefined reference to `pseries_devicetree_update' I'm not seeing this error. rtasd.c compilkes fine, but I am hitting another error later in the build that keeps it from finishing. arch/powerpc/platforms/52xx/mpc52xx_pic.c: In function ‘mpc52xx_irqhost_map’: arch/powerpc/platforms/52xx/mpc52xx_pic.c:343: error: ‘irqchip’ may be used uninitialized in this function -Nathan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [RFC PATCH v2 3/4] powerpc: Don't bolt the hpte in kernel_map_linear_page()
On Mon, 2013-04-15 at 13:27 +0200, Benjamin Herrenschmidt wrote: On Mon, 2013-04-15 at 16:15 +0800, Li Zhong wrote: So the code is implemented in ppc_md.hpte_remove(), may be called by __hash_huge_page(), and asm code htab_call_hpte_remove? This is why the linear mapping (and the vmemmap) must be bolted. If not, it would result the infinite recursion like above? Potentially, we don't expect to fault linear mapping or vmemmap entries on demand. We aren't equipped to do it and we occasionally have code path that access the linear mapping and cannot afford to have SRR0 and SRR1 clobbered by a page fault. Thank you for the education :) Thanks, Zhong Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 1/3] powerpc: Move the setting of rflags out of loop in __hash_page_huge
It seems that new_pte and rflags don't get changed in the repeating loop, so move their assignment out of the loop. Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com --- arch/powerpc/mm/hugetlbpage-hash64.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c index cecad34..edb4129 100644 --- a/arch/powerpc/mm/hugetlbpage-hash64.c +++ b/arch/powerpc/mm/hugetlbpage-hash64.c @@ -87,10 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, pa = pte_pfn(__pte(old_pte)) PAGE_SHIFT; -repeat: - hpte_group = ((hash htab_hash_mask) * - HPTES_PER_GROUP) ~0x7UL; - /* clear HPTE slot informations in new PTE */ #ifdef CONFIG_PPC_64K_PAGES new_pte = (new_pte ~_PAGE_HPTEFLAGS) | _PAGE_HPTE_SUB0; @@ -101,6 +97,10 @@ repeat: rflags |= (new_pte (_PAGE_WRITETHRU | _PAGE_NO_CACHE | _PAGE_COHERENT | _PAGE_GUARDED)); +repeat: + hpte_group = ((hash htab_hash_mask) * + HPTES_PER_GROUP) ~0x7UL; + /* Insert into the hash table, primary slot */ slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, mmu_psize, ssize); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[RFC PATCH v3 2/3] powerpc: Split the code trying to insert hpte repeatedly as an helper function
Move the logic trying to insert hpte in __hash_page_huge() to an helper function, so it could also be used by others. Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com --- arch/powerpc/mm/hash_utils_64.c | 35 ++ arch/powerpc/mm/hugetlbpage-hash64.c | 31 ++ 2 files changed, 41 insertions(+), 25 deletions(-) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index f410c3e..ead9fa8 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1230,6 +1230,41 @@ void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc) bad_page_fault(regs, address, SIGBUS); } +long hpte_insert_repeating(unsigned long hash, unsigned long vpn, + unsigned long pa, unsigned long rflags, + unsigned long vflags, int psize, int ssize) +{ + unsigned long hpte_group; + long slot; + +repeat: + hpte_group = ((hash htab_hash_mask) * + HPTES_PER_GROUP) ~0x7UL; + + /* Insert into the hash table, primary slot */ + slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, vflags, + psize, ssize); + + /* Primary is full, try the secondary */ + if (unlikely(slot == -1)) { + hpte_group = ((~hash htab_hash_mask) * + HPTES_PER_GROUP) ~0x7UL; + slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, + vflags | HPTE_V_SECONDARY, + psize, ssize); + if (slot == -1) { + if (mftb() 0x1) + hpte_group = ((hash htab_hash_mask) * + HPTES_PER_GROUP)~0x7UL; + + ppc_md.hpte_remove(hpte_group); + goto repeat; + } + } + + return slot; +} + #ifdef CONFIG_DEBUG_PAGEALLOC static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) { diff --git a/arch/powerpc/mm/hugetlbpage-hash64.c b/arch/powerpc/mm/hugetlbpage-hash64.c index edb4129..b913f41 100644 --- a/arch/powerpc/mm/hugetlbpage-hash64.c +++ b/arch/powerpc/mm/hugetlbpage-hash64.c @@ -14,6 +14,10 @@ #include asm/cacheflush.h #include asm/machdep.h +extern long hpte_insert_repeating(unsigned long hash, unsigned long vpn, + unsigned long pa, unsigned long rlags, + unsigned long vflags, int psize, int ssize); + int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, pte_t *ptep, unsigned long trap, int local, int ssize, unsigned int shift, unsigned int mmu_psize) @@ -83,7 +87,6 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, if (likely(!(old_pte _PAGE_HASHPTE))) { unsigned long hash = hpt_hash(vpn, shift, ssize); - unsigned long hpte_group; pa = pte_pfn(__pte(old_pte)) PAGE_SHIFT; @@ -97,30 +100,8 @@ int __hash_page_huge(unsigned long ea, unsigned long access, unsigned long vsid, rflags |= (new_pte (_PAGE_WRITETHRU | _PAGE_NO_CACHE | _PAGE_COHERENT | _PAGE_GUARDED)); -repeat: - hpte_group = ((hash htab_hash_mask) * - HPTES_PER_GROUP) ~0x7UL; - - /* Insert into the hash table, primary slot */ - slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, 0, - mmu_psize, ssize); - - /* Primary is full, try the secondary */ - if (unlikely(slot == -1)) { - hpte_group = ((~hash htab_hash_mask) * - HPTES_PER_GROUP) ~0x7UL; - slot = ppc_md.hpte_insert(hpte_group, vpn, pa, rflags, - HPTE_V_SECONDARY, - mmu_psize, ssize); - if (slot == -1) { - if (mftb() 0x1) - hpte_group = ((hash htab_hash_mask) * - HPTES_PER_GROUP)~0x7UL; - - ppc_md.hpte_remove(hpte_group); - goto repeat; -} - } + slot = hpte_insert_repeating(hash, vpn, pa, rflags, 0, +mmu_psize, ssize); /* * Hypervisor failure. Restore old pte and return -1 -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org
[RFC PATCH v3 3/3] powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page()
This patch fixes the following oops, which could be trigged by build the kernel with many concurrent threads, under CONFIG_DEBUG_PAGEALLOC. hpte_insert() might return -1, indicating that the bucket (primary here) is full. We are not necessarily reporting a BUG in this case. Instead, we could try repeatedly (try secondary, remove and try again) until we find a slot. [ 543.075675] [ cut here ] [ 543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239! [ 543.075714] Oops: Exception in kernel mode, sig: 5 [#1] [ 543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries [ 543.075741] Modules linked in: binfmt_misc ehea [ 543.075759] NIP: c0036eb0 LR: c0036ea4 CTR: c005a594 [ 543.075771] REGS: c000a90832c0 TRAP: 0700 Not tainted (3.8.0-next-20130222) [ 543.075781] MSR: 80029032 SF,EE,ME,IR,DR,RI CR: 4482 XER: [ 543.075816] SOFTE: 0 [ 543.075823] CFAR: c004c200 [ 543.075830] TASK = c000e506b750[23934] 'cc1' THREAD: c000a908 CPU: 1 GPR00: 0001 c000a9083540 c0c600a8 GPR04: 0050 fffa c000a90834e0 004ff594 GPR08: 0001 9592d4d8 c0c86854 GPR12: 0002 c6ead300 00a51000 0001 GPR16: f3354380 ff80 GPR20: 0001 c0c600a8 0001 0001 GPR24: 03354380 c000 c0b65950 GPR28: 0020 000cd50e 00bf50d9 c0c7c230 [ 543.076005] NIP [c0036eb0] .kernel_map_pages+0x1e0/0x3f8 [ 543.076016] LR [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 [ 543.076025] Call Trace: [ 543.076033] [c000a9083540] [c0036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable) [ 543.076053] [c000a9083640] [c0167638] .get_page_from_freelist+0x6cc/0x8dc [ 543.076067] [c000a9083800] [c0167a48] .__alloc_pages_nodemask+0x200/0x96c [ 543.076082] [c000a90839c0] [c01ade44] .alloc_pages_vma+0x160/0x1e4 [ 543.076098] [c000a9083a80] [c018ce04] .handle_pte_fault+0x1b0/0x7e8 [ 543.076113] [c000a9083b50] [c018d5a8] .handle_mm_fault+0x16c/0x1a0 [ 543.076129] [c000a9083c00] [c07bf1dc] .do_page_fault+0x4d0/0x7a4 [ 543.076144] [c000a9083e30] [c00090e8] handle_page_fault+0x10/0x30 [ 543.076155] Instruction dump: [ 543.076163] 7c630038 78631d88 e80a f8410028 7c0903a6 e91f01de e96a0010 e84a0008 [ 543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 0b00 7f63db78 48785781 6000 [ 543.076224] ---[ end trace bd5807e8d6ae186b ]--- Signed-off-by: Li Zhong zh...@linux.vnet.ibm.com --- arch/powerpc/mm/hash_utils_64.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c index ead9fa8..1ed4419 100644 --- a/arch/powerpc/mm/hash_utils_64.c +++ b/arch/powerpc/mm/hash_utils_64.c @@ -1268,21 +1268,22 @@ repeat: #ifdef CONFIG_DEBUG_PAGEALLOC static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi) { - unsigned long hash, hpteg; + unsigned long hash; unsigned long vsid = get_kernel_vsid(vaddr, mmu_kernel_ssize); unsigned long vpn = hpt_vpn(vaddr, vsid, mmu_kernel_ssize); unsigned long mode = htab_convert_pte_flags(PAGE_KERNEL); - int ret; + long ret; hash = hpt_hash(vpn, PAGE_SHIFT, mmu_kernel_ssize); - hpteg = ((hash htab_hash_mask) * HPTES_PER_GROUP); /* Don't create HPTE entries for bad address */ if (!vsid) return; - ret = ppc_md.hpte_insert(hpteg, vpn, __pa(vaddr), -mode, HPTE_V_BOLTED, -mmu_linear_psize, mmu_kernel_ssize); + + ret = hpte_insert_repeating(hash, vpn, __pa(vaddr), mode, + HPTE_V_BOLTED, + mmu_linear_psize, mmu_kernel_ssize); + BUG_ON (ret 0); spin_lock(linear_map_hash_lock); BUG_ON(linear_map_hash_slots[lmi] 0x80); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCHv3 2/2] radeon: use max_bus_speed to activate gen2 speeds
diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c index 305a657..3291f62 100644 --- a/drivers/gpu/drm/radeon/evergreen.c +++ b/drivers/gpu/drm/radeon/evergreen.c @@ -3855,8 +3855,7 @@ void evergreen_fini(struct radeon_device *rdev) void evergreen_pcie_gen2_enable(struct radeon_device *rdev) { - u32 link_width_cntl, speed_cntl, mask; - int ret; + u32 link_width_cntl, speed_cntl; if (radeon_pcie_gen2 == 0) return; @@ -3871,11 +3870,7 @@ void evergreen_pcie_gen2_enable(struct radeon_device *rdev) if (ASIC_IS_X2(rdev)) return; - ret = drm_pcie_get_speed_cap_mask(rdev-ddev, mask); - if (ret != 0) - return; - - if (!(mask DRM_PCIE_SPEED_50)) + if (rdev-pdev-bus-max_bus_speed PCIE_SPEED_5_0GT) For devices on a root bus, we previously dereferenced a NULL pointer in drm_pcie_get_speed_cap_mask() because pdev-bus-self is NULL on a root bus. (I think this is the original problem you tripped over, Lucas.) These patches fix that problem. On pseries, where the device *is* on a root bus, your patches set max_bus_speed so this will work as expected. On most other systems, max_bus_speed for root buses will be PCI_SPEED_UNKNOWN (set in pci_alloc_bus() and never updated because most arches don't have code like the pseries code you're adding). PCI_SPEED_UNKNOWN = 0xff, so if we see another machine with a GPU on the root bus, we'll attempt to enable Gen2 on the device even though we have no idea what the bus will support. That's why I originally suggested skipping the Gen2 stuff if max_bus_speed == PCI_SPEED_UNKNOWN. I was just being conservative, thinking that it's better to have a functional but slow GPU rather than the unknown (to me) effects of enabling Gen2 on a link that might not support it. But I'm fine with this being either way. It would be nice if we could get rid of drm_pcie_get_speed_cap_mask() altogether. It is exported, but I have no idea of anybody else uses it. Maybe it could at least be marked __deprecated now? I don't know who should take these patches. They don't touch drivers/pci, but I'd be happy to push them, given the appropriate ACKs from DRM and powerpc folks. Acked-by: Dave Airlie airl...@redhat.com I'm happy to see these go via pci tree to avoid interdependent trees. Dave. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: Fix audit crash due to save/restore PPR changes
On 04/14/2013 06:44 PM, Alistair Popple wrote: The current mainline crashes when hitting userspace with the following: kernel BUG at /home/alistair/Source/linux-stable/kernel/auditsc.c:1769! cpu 0x1: Vector: 700 (Program Check) at [c00023883a60] pc: c01047a8: .__audit_syscall_entry+0x38/0x130 lr: c000ed64: .do_syscall_trace_enter+0xc4/0x270 sp: c00023883ce0 msr: 80029032 current = 0xc0002380 paca= 0xcf080380 softe: 0irq_happened: 0x01 pid = 1629, comm = start_udev kernel BUG at /home/alistair/Source/linux-stable/kernel/auditsc.c:1769! enter ? for help [c00023883d80] c000ed64 .do_syscall_trace_enter+0xc4/0x270 [c00023883e30] c0009b08 syscall_dotrace+0xc/0x38 --- Exception: c00 (System Call) at 008010ec50dc Bisecting found the following patch caused it: commit 44e9309f1f357794b7ae93d5f3e3e6f11d2b8a7f Author: Haren Myneni ha...@linux.vnet.ibm.com powerpc: Implement PPR save/restore It was found this patch corrupted r9 when calling SET_DEFAULT_THREAD_PPR() Using r10 as a scratch register instead of r9 solved the problem. Thanks for fixing. Sorry I missed it Acked-by: Haren Myneni ha...@us.ibm.com Signed-off-by: Alistair Popple alist...@popple.id.au Acked-by: Michael Neuling mi...@neuling.org --- diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 256c5bf..3acb1a0 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -304,7 +304,7 @@ syscall_exit_work: subir12,r12,TI_FLAGS 4: /* Anything else left to do? */ - SET_DEFAULT_THREAD_PPR(r3, r9) /* Set thread.ppr = 3 */ + SET_DEFAULT_THREAD_PPR(r3, r10) /* Set thread.ppr = 3 */ andi. r0,r9,(_TIF_SYSCALL_T_OR_A|_TIF_SINGLESTEP) beq .ret_from_except_lite ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev