RE: [PATCH v9 01/13] KVM: PPC: POWERNV: move iommu_add_device earlier
Hi Alex, Looks like this patch is not picked by anyone, Are you going to pick this patch? My vfio/iommu patches have dependency on this patch (this is already tested by me). Thanks -Bharat -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey Kardashevskiy Sent: Wednesday, August 28, 2013 2:08 PM To: linuxppc-dev@lists.ozlabs.org Cc: k...@vger.kernel.org; Gleb Natapov; Alexey Kardashevskiy; Alexander Graf; kvm-...@vger.kernel.org; linux-ker...@vger.kernel.org; linux...@kvack.org; Paul Mackerras; Paolo Bonzini; David Gibson Subject: [PATCH v9 01/13] KVM: PPC: POWERNV: move iommu_add_device earlier The current implementation of IOMMU on sPAPR does not use iommu_ops and therefore does not call IOMMU API's bus_set_iommu() which 1) sets iommu_ops for a bus 2) registers a bus notifier Instead, PCI devices are added to IOMMU groups from subsys_initcall_sync(tce_iommu_init) which does basically the same thing without using iommu_ops callbacks. However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158) implements iommu_ops and when tce_iommu_init is called, every PCI device is already added to some group so there is a conflict. This patch does 2 things: 1. removes the loop in which PCI devices were added to groups and adds explicit iommu_add_device() calls to add devices as soon as they get the iommu_table pointer assigned to them. 2. moves a bus notifier to powernv code in order to avoid conflict with the notifier from Freescale driver. iommu_add_device() and iommu_del_device() are public now. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: v8: * added the check for iommu_group!=NULL before removing device from a group as suggested by Wei Yang weiy...@linux.vnet.ibm.com v2: * added a helper - set_iommu_table_base_and_group - which does set_iommu_table_base() and iommu_add_device() --- arch/powerpc/include/asm/iommu.h| 9 +++ arch/powerpc/kernel/iommu.c | 41 +++-- arch/powerpc/platforms/powernv/pci-ioda.c | 8 +++--- arch/powerpc/platforms/powernv/pci-p5ioc2.c | 2 +- arch/powerpc/platforms/powernv/pci.c| 33 ++- arch/powerpc/platforms/pseries/iommu.c | 8 +++--- 6 files changed, 55 insertions(+), 46 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index c34656a..19ad77f 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -103,6 +103,15 @@ extern struct iommu_table *iommu_init_table(struct iommu_table * tbl, int nid); extern void iommu_register_group(struct iommu_table *tbl, int pci_domain_number, unsigned long pe_num); +extern int iommu_add_device(struct device *dev); extern void +iommu_del_device(struct device *dev); + +static inline void set_iommu_table_base_and_group(struct device *dev, + void *base) +{ + set_iommu_table_base(dev, base); + iommu_add_device(dev); +} extern int iommu_map_sg(struct device *dev, struct iommu_table *tbl, struct scatterlist *sglist, int nelems, diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index b20ff17..15f8ca8 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1105,7 +1105,7 @@ void iommu_release_ownership(struct iommu_table *tbl) } EXPORT_SYMBOL_GPL(iommu_release_ownership); -static int iommu_add_device(struct device *dev) +int iommu_add_device(struct device *dev) { struct iommu_table *tbl; int ret = 0; @@ -1134,46 +1134,13 @@ static int iommu_add_device(struct device *dev) return ret; } +EXPORT_SYMBOL_GPL(iommu_add_device); -static void iommu_del_device(struct device *dev) +void iommu_del_device(struct device *dev) { iommu_group_remove_device(dev); } - -static int iommu_bus_notifier(struct notifier_block *nb, - unsigned long action, void *data) -{ - struct device *dev = data; - - switch (action) { - case BUS_NOTIFY_ADD_DEVICE: - return iommu_add_device(dev); - case BUS_NOTIFY_DEL_DEVICE: - iommu_del_device(dev); - return 0; - default: - return 0; - } -} - -static struct notifier_block tce_iommu_bus_nb = { - .notifier_call = iommu_bus_notifier, -}; - -static int __init tce_iommu_init(void) -{ - struct pci_dev *pdev = NULL; - - BUILD_BUG_ON(PAGE_SIZE IOMMU_PAGE_SIZE); - - for_each_pci_dev(pdev) - iommu_add_device(pdev-dev); - - bus_register_notifier(pci_bus_type, tce_iommu_bus_nb); - return 0; -} - -subsys_initcall_sync(tce_iommu_init); +EXPORT_SYMBOL_GPL(iommu_del_device);
RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle
-Original Message- From: Wang Dongsheng-B40534 Sent: Friday, October 18, 2013 8:07 AM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wood Scott-B07421 Sent: Friday, October 18, 2013 12:52 AM To: Wang Dongsheng-B40534 Cc: Bhushan Bharat-R65777; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle On Thu, 2013-10-17 at 00:51 -0500, Wang Dongsheng-B40534 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, October 17, 2013 11:20 AM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wang Dongsheng-B40534 Sent: Thursday, October 17, 2013 8:16 AM To: Bhushan Bharat-R65777; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, October 17, 2013 1:01 AM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wang Dongsheng-B40534 Sent: Tuesday, October 15, 2013 2:51 PM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle +static ssize_t show_pw20_wait_time(struct device *dev, + struct device_attribute *attr, char *buf) { + u32 value; + u64 tb_cycle; + s64 time; + + unsigned int cpu = dev-id; + + if (!pw20_wt) { + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1); + value = (value PWRMGTCR0_PW20_ENT) + PWRMGTCR0_PW20_ENT_SHIFT; + + tb_cycle = (1 (MAX_BIT - value)) * 2; Is value = 0 and value = 1 legal? These will make tb_cycle = 0, + time = div_u64(tb_cycle * 1000, tb_ticks_per_usec) - 1; And time = -1; Please look at the end of the function, :) return sprintf(buf, %llu\n, time 0 ? time : 0); I know you return 0 if value = 0/1, my question was that, is this correct as per specification? Ahh, also for value upto 7 you will return 0, no? If value = 0, MAX_BIT - value = 63 tb_cycle = 0x_, Actually, tb_cycle will be undefined because you shifted a 32-bit value (1) by more than 31 bits. s/1/1ULL/ What Scott is saying is the left shift of 1 for more than 31 will be undefined. Scott this will be sign-extended, right? -Bharat Actually, we have been discussing this situation that could not have happened. See !pw20_wt branch, this branch is read default wait bit. The default wait bit is 50, the time is about 1ms. The default wait bit cannot less than 50, means the wait entry time cannot greater than 1ms. We have already begun benchmark test, and we got a preliminary results. 55, 56, 57bit looks good, but we need more benchmark to get the default bit. if (!pw20_wt) { smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1); value = (value PWRMGTCR0_PW20_ENT) PWRMGTCR0_PW20_ENT_SHIFT; tb_cycle = (1 (MAX_BIT - value)) * 2; time = div_u64(tb_cycle * 1000, tb_ticks_per_usec) - 1; } else { time = pw20_wt; } If it caused confusion, we can add a comment. As I discuss with Bharat. tb_cycle * 1000 will overflow, but this situation is not possible. Because if the value = 0 means this feature will be disable. Now The default wait bit is 50(MAX_BIT - value, value = 13), the PW20/Altivec Idle wait entry time is about 1ms, this time is very long for wait idle time, and it's cannot be increased(means (MAX_BIT - value) cannot greater than 50). Why can it not be increased? see above, :) -dongsheng -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle
-Original Message- From: Wood Scott-B07421 Sent: Saturday, October 19, 2013 12:52 AM To: Wang Dongsheng-B40534 Cc: Bhushan Bharat-R65777; Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle On Thu, 2013-10-17 at 22:02 -0500, Wang Dongsheng-B40534 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, October 17, 2013 2:46 PM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wang Dongsheng-B40534 Sent: Thursday, October 17, 2013 11:22 AM To: Bhushan Bharat-R65777; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, October 17, 2013 11:20 AM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wang Dongsheng-B40534 Sent: Thursday, October 17, 2013 8:16 AM To: Bhushan Bharat-R65777; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, October 17, 2013 1:01 AM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wang Dongsheng-B40534 Sent: Tuesday, October 15, 2013 2:51 PM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle From: Wang Dongsheng dongsheng.w...@freescale.com Add a sys interface to enable/diable pw20 state or altivec idle, and control the wait entry time. Enable/Disable interface: 0, disable. 1, enable. /sys/devices/system/cpu/cpuX/pw20_state /sys/devices/system/cpu/cpuX/altivec_idle Set wait time interface:(Nanosecond) /sys/devices/system/cpu/cpuX/pw20_wait_time /sys/devices/system/cpu/cpuX/altivec_idle_wait_time Example: Base on TBfreq is 41MHZ. 1~48(ns): TB[63] 49~97(ns): TB[62] 98~195(ns): TB[61] 196~390(ns): TB[60] 391~780(ns): TB[59] 781~1560(ns): TB[58] ... Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com --- *v5: Change get_idle_ticks_bit function implementation. *v4: Move code from 85xx/common.c to kernel/sysfs.c. Remove has_pw20_altivec_idle function. Change wait entry_bit to wait time. diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 27a90b9..10d1128 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -85,6 +85,284 @@ __setup(smt-snooze-delay=, setup_smt_snooze_delay); #endif /* CONFIG_PPC64 */ +#ifdef CONFIG_FSL_SOC +#define MAX_BIT63 + +static u64 pw20_wt; +static u64 altivec_idle_wt; + +static unsigned int get_idle_ticks_bit(u64 ns) { + u64 cycle; + + if (ns = 1) + cycle = div_u64(ns + 500, 1000) * tb_ticks_per_usec; + else + cycle = div_u64(ns * tb_ticks_per_usec, 1000); + + if (!cycle) + return 0; + + return ilog2(cycle); } + +static void do_show_pwrmgtcr0(void *val) { + u32 *value = val; + + *value = mfspr(SPRN_PWRMGTCR0); } + +static ssize_t show_pw20_state(struct device *dev, + struct device_attribute *attr, char *buf) { + u32 value; + unsigned int cpu = dev-id; + + smp_call_function_single(cpu, do_show_pwrmgtcr0
RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle
-Original Message- From: Wang Dongsheng-B40534 Sent: Thursday, October 17, 2013 11:22 AM To: Bhushan Bharat-R65777; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, October 17, 2013 11:20 AM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wang Dongsheng-B40534 Sent: Thursday, October 17, 2013 8:16 AM To: Bhushan Bharat-R65777; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, October 17, 2013 1:01 AM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wang Dongsheng-B40534 Sent: Tuesday, October 15, 2013 2:51 PM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle From: Wang Dongsheng dongsheng.w...@freescale.com Add a sys interface to enable/diable pw20 state or altivec idle, and control the wait entry time. Enable/Disable interface: 0, disable. 1, enable. /sys/devices/system/cpu/cpuX/pw20_state /sys/devices/system/cpu/cpuX/altivec_idle Set wait time interface:(Nanosecond) /sys/devices/system/cpu/cpuX/pw20_wait_time /sys/devices/system/cpu/cpuX/altivec_idle_wait_time Example: Base on TBfreq is 41MHZ. 1~48(ns): TB[63] 49~97(ns): TB[62] 98~195(ns): TB[61] 196~390(ns): TB[60] 391~780(ns): TB[59] 781~1560(ns): TB[58] ... Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com --- *v5: Change get_idle_ticks_bit function implementation. *v4: Move code from 85xx/common.c to kernel/sysfs.c. Remove has_pw20_altivec_idle function. Change wait entry_bit to wait time. diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 27a90b9..10d1128 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -85,6 +85,284 @@ __setup(smt-snooze-delay=, setup_smt_snooze_delay); #endif /* CONFIG_PPC64 */ +#ifdef CONFIG_FSL_SOC +#define MAX_BIT 63 + +static u64 pw20_wt; +static u64 altivec_idle_wt; + +static unsigned int get_idle_ticks_bit(u64 ns) { + u64 cycle; + + if (ns = 1) + cycle = div_u64(ns + 500, 1000) * tb_ticks_per_usec; + else + cycle = div_u64(ns * tb_ticks_per_usec, 1000); + + if (!cycle) + return 0; + + return ilog2(cycle); +} + +static void do_show_pwrmgtcr0(void *val) { + u32 *value = val; + + *value = mfspr(SPRN_PWRMGTCR0); } + +static ssize_t show_pw20_state(struct device *dev, + struct device_attribute *attr, char *buf) { + u32 value; + unsigned int cpu = dev-id; + + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1); + + value = PWRMGTCR0_PW20_WAIT; + + return sprintf(buf, %u\n, value ? 1 : 0); } + +static void do_store_pw20_state(void *val) { + u32 *value = val; + u32 pw20_state; + + pw20_state = mfspr(SPRN_PWRMGTCR0); + + if (*value) + pw20_state |= PWRMGTCR0_PW20_WAIT; + else + pw20_state = ~PWRMGTCR0_PW20_WAIT; + + mtspr(SPRN_PWRMGTCR0, pw20_state); } + +static ssize_t store_pw20_state(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) { + u32 value; + unsigned int cpu = dev-id; + + if (kstrtou32(buf, 0, value)) + return -EINVAL; + + if (value 1) + return -EINVAL; + + smp_call_function_single(cpu, do_store_pw20_state, value, 1); + + return count; +} + +static ssize_t show_pw20_wait_time(struct device *dev, + struct device_attribute *attr, char *buf) { + u32 value; + u64 tb_cycle; + s64 time; + + unsigned int
RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle
-Original Message- From: Wang Dongsheng-B40534 Sent: Thursday, October 17, 2013 11:22 AM To: Bhushan Bharat-R65777; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, October 17, 2013 11:20 AM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wang Dongsheng-B40534 Sent: Thursday, October 17, 2013 8:16 AM To: Bhushan Bharat-R65777; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, October 17, 2013 1:01 AM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wang Dongsheng-B40534 Sent: Tuesday, October 15, 2013 2:51 PM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle From: Wang Dongsheng dongsheng.w...@freescale.com Add a sys interface to enable/diable pw20 state or altivec idle, and control the wait entry time. Enable/Disable interface: 0, disable. 1, enable. /sys/devices/system/cpu/cpuX/pw20_state /sys/devices/system/cpu/cpuX/altivec_idle Set wait time interface:(Nanosecond) /sys/devices/system/cpu/cpuX/pw20_wait_time /sys/devices/system/cpu/cpuX/altivec_idle_wait_time Example: Base on TBfreq is 41MHZ. 1~48(ns): TB[63] 49~97(ns): TB[62] 98~195(ns): TB[61] 196~390(ns): TB[60] 391~780(ns): TB[59] 781~1560(ns): TB[58] ... Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com --- *v5: Change get_idle_ticks_bit function implementation. *v4: Move code from 85xx/common.c to kernel/sysfs.c. Remove has_pw20_altivec_idle function. Change wait entry_bit to wait time. diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 27a90b9..10d1128 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -85,6 +85,284 @@ __setup(smt-snooze-delay=, setup_smt_snooze_delay); #endif /* CONFIG_PPC64 */ +#ifdef CONFIG_FSL_SOC +#define MAX_BIT 63 + +static u64 pw20_wt; +static u64 altivec_idle_wt; + +static unsigned int get_idle_ticks_bit(u64 ns) { + u64 cycle; + + if (ns = 1) + cycle = div_u64(ns + 500, 1000) * tb_ticks_per_usec; + else + cycle = div_u64(ns * tb_ticks_per_usec, 1000); + + if (!cycle) + return 0; + + return ilog2(cycle); +} + +static void do_show_pwrmgtcr0(void *val) { + u32 *value = val; + + *value = mfspr(SPRN_PWRMGTCR0); } + +static ssize_t show_pw20_state(struct device *dev, + struct device_attribute *attr, char *buf) { + u32 value; + unsigned int cpu = dev-id; + + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, +1); + + value = PWRMGTCR0_PW20_WAIT; + + return sprintf(buf, %u\n, value ? 1 : 0); } + +static void do_store_pw20_state(void *val) { + u32 *value = val; + u32 pw20_state; + + pw20_state = mfspr(SPRN_PWRMGTCR0); + + if (*value) + pw20_state |= PWRMGTCR0_PW20_WAIT; + else + pw20_state = ~PWRMGTCR0_PW20_WAIT; + + mtspr(SPRN_PWRMGTCR0, pw20_state); } + +static ssize_t store_pw20_state(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) { + u32 value; + unsigned int cpu = dev-id; + + if (kstrtou32(buf, 0, value)) + return -EINVAL; + + if (value 1) + return -EINVAL; + + smp_call_function_single(cpu, do_store_pw20_state, value, +1); + + return count; +} + +static ssize_t
RE: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for PCIe devices
-Original Message- From: Sethi Varun-B16395 Sent: Wednesday, October 16, 2013 4:53 PM To: j...@8bytes.org; io...@lists.linux-foundation.org; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; Yoder Stuart-B08248; Wood Scott-B07421; alex.william...@redhat.com; Bhushan Bharat-R65777 Cc: Sethi Varun-B16395 Subject: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for PCIe devices Once the PCIe device assigned to a guest VM (via VFIO) gets detached from the iommu domain (when guest terminates), its PAMU table entry is disabled. So, this would prevent the device from being used once it's assigned back to the host. This patch allows for creation of a default DMA window corresponding to the device and subsequently enabling the PAMU table entry. Before we enable the entry, we ensure that the device's bus master capability is disabled (device quiesced). Signed-off-by: Varun Sethi varun.se...@freescale.com --- drivers/iommu/fsl_pamu.c| 43 drivers/iommu/fsl_pamu.h|1 + drivers/iommu/fsl_pamu_domain.c | 46 --- 3 files changed, 78 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c index cba0498..fb4a031 100644 --- a/drivers/iommu/fsl_pamu.c +++ b/drivers/iommu/fsl_pamu.c @@ -225,6 +225,21 @@ static struct paace *pamu_get_spaace(struct paace *paace, u32 wnum) return spaace; } +/* + * Defaul PPAACE settings for an LIODN. + */ +static void setup_default_ppaace(struct paace *ppaace) { + pamu_init_ppaace(ppaace); + /* window size is 2^(WSE+1) bytes */ + set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE, 35); + ppaace-wbah = 0; + set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL, 0); + set_bf(ppaace-impl_attr, PAACE_IA_ATM, + PAACE_ATM_NO_XLATE); + set_bf(ppaace-addr_bitfields, PAACE_AF_AP, + PAACE_AP_PERMS_ALL); +} /** * pamu_get_fspi_and_allocate() - Allocates fspi index and reserves subwindows *required for primary PAACE in the secondary @@ -253,6 +268,24 @@ static unsigned long pamu_get_fspi_and_allocate(u32 subwin_cnt) return (spaace_addr - (unsigned long)spaact) / (sizeof(struct paace)); } +/* Reset the PAACE entry to the default state */ void +enable_default_dma_window(int liodn) { + struct paace *ppaace; + + ppaace = pamu_get_ppaace(liodn); + if (!ppaace) { + pr_debug(Invalid liodn entry\n); + return; + } + + memset(ppaace, 0, sizeof(struct paace)); + + setup_default_ppaace(ppaace); + mb(); + pamu_enable_liodn(liodn); +} + /* Release the subwindows reserved for a particular LIODN */ void pamu_free_subwins(int liodn) { @@ -752,15 +785,7 @@ static void __init setup_liodns(void) continue; } ppaace = pamu_get_ppaace(liodn); - pamu_init_ppaace(ppaace); - /* window size is 2^(WSE+1) bytes */ - set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE, 35); - ppaace-wbah = 0; - set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL, 0); - set_bf(ppaace-impl_attr, PAACE_IA_ATM, - PAACE_ATM_NO_XLATE); - set_bf(ppaace-addr_bitfields, PAACE_AF_AP, - PAACE_AP_PERMS_ALL); + setup_default_ppaace(ppaace); if (of_device_is_compatible(node, fsl,qman-portal)) setup_qbman_paace(ppaace, QMAN_PORTAL_PAACE); if (of_device_is_compatible(node, fsl,qman)) diff --git a/drivers/iommu/fsl_pamu.h b/drivers/iommu/fsl_pamu.h index 8fc1a12..0edc 100644 --- a/drivers/iommu/fsl_pamu.h +++ b/drivers/iommu/fsl_pamu.h @@ -406,5 +406,6 @@ void get_ome_index(u32 *omi_index, struct device *dev); int pamu_update_paace_stash(int liodn, u32 subwin, u32 value); int pamu_disable_spaace(int liodn, u32 subwin); u32 pamu_get_max_subwin_cnt(void); +void enable_default_dma_window(int liodn); #endif /* __FSL_PAMU_H */ diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index 966ae70..dd6cafc 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu/fsl_pamu_domain.c @@ -340,17 +340,57 @@ static inline struct device_domain_info *find_domain(struct device *dev) return dev-archdata.iommu_domain; } +/* Disable device DMA capability and enable default DMA window */ +static void disable_device_dma(struct device_domain_info *info, + int enable_dma_window) +{ +#ifdef CONFIG_PCI + if (info-dev-bus == pci_bus_type) { + struct pci_dev *pdev = NULL; + pdev = to_pci_dev(info-dev
RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle
-Original Message- From: Wang Dongsheng-B40534 Sent: Tuesday, October 15, 2013 2:51 PM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle From: Wang Dongsheng dongsheng.w...@freescale.com Add a sys interface to enable/diable pw20 state or altivec idle, and control the wait entry time. Enable/Disable interface: 0, disable. 1, enable. /sys/devices/system/cpu/cpuX/pw20_state /sys/devices/system/cpu/cpuX/altivec_idle Set wait time interface:(Nanosecond) /sys/devices/system/cpu/cpuX/pw20_wait_time /sys/devices/system/cpu/cpuX/altivec_idle_wait_time Example: Base on TBfreq is 41MHZ. 1~48(ns): TB[63] 49~97(ns): TB[62] 98~195(ns): TB[61] 196~390(ns): TB[60] 391~780(ns): TB[59] 781~1560(ns): TB[58] ... Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com --- *v5: Change get_idle_ticks_bit function implementation. *v4: Move code from 85xx/common.c to kernel/sysfs.c. Remove has_pw20_altivec_idle function. Change wait entry_bit to wait time. diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 27a90b9..10d1128 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -85,6 +85,284 @@ __setup(smt-snooze-delay=, setup_smt_snooze_delay); #endif /* CONFIG_PPC64 */ +#ifdef CONFIG_FSL_SOC +#define MAX_BIT 63 + +static u64 pw20_wt; +static u64 altivec_idle_wt; + +static unsigned int get_idle_ticks_bit(u64 ns) { + u64 cycle; + + if (ns = 1) + cycle = div_u64(ns + 500, 1000) * tb_ticks_per_usec; + else + cycle = div_u64(ns * tb_ticks_per_usec, 1000); + + if (!cycle) + return 0; + + return ilog2(cycle); +} + +static void do_show_pwrmgtcr0(void *val) { + u32 *value = val; + + *value = mfspr(SPRN_PWRMGTCR0); +} + +static ssize_t show_pw20_state(struct device *dev, + struct device_attribute *attr, char *buf) { + u32 value; + unsigned int cpu = dev-id; + + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1); + + value = PWRMGTCR0_PW20_WAIT; + + return sprintf(buf, %u\n, value ? 1 : 0); } + +static void do_store_pw20_state(void *val) { + u32 *value = val; + u32 pw20_state; + + pw20_state = mfspr(SPRN_PWRMGTCR0); + + if (*value) + pw20_state |= PWRMGTCR0_PW20_WAIT; + else + pw20_state = ~PWRMGTCR0_PW20_WAIT; + + mtspr(SPRN_PWRMGTCR0, pw20_state); +} + +static ssize_t store_pw20_state(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + u32 value; + unsigned int cpu = dev-id; + + if (kstrtou32(buf, 0, value)) + return -EINVAL; + + if (value 1) + return -EINVAL; + + smp_call_function_single(cpu, do_store_pw20_state, value, 1); + + return count; +} + +static ssize_t show_pw20_wait_time(struct device *dev, + struct device_attribute *attr, char *buf) { + u32 value; + u64 tb_cycle; + s64 time; + + unsigned int cpu = dev-id; + + if (!pw20_wt) { + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1); + value = (value PWRMGTCR0_PW20_ENT) + PWRMGTCR0_PW20_ENT_SHIFT; + + tb_cycle = (1 (MAX_BIT - value)) * 2; Is value = 0 and value = 1 legal? These will make tb_cycle = 0, + time = div_u64(tb_cycle * 1000, tb_ticks_per_usec) - 1; And time = -1; + } else { + time = pw20_wt; + } + + return sprintf(buf, %llu\n, time 0 ? time : 0); } + +static void set_pw20_wait_entry_bit(void *val) { + u32 *value = val; + u32 pw20_idle; + + pw20_idle = mfspr(SPRN_PWRMGTCR0); + + /* Set Automatic PW20 Core Idle Count */ + /* clear count */ + pw20_idle = ~PWRMGTCR0_PW20_ENT; + + /* set count */ + pw20_idle |= ((MAX_BIT - *value) PWRMGTCR0_PW20_ENT_SHIFT); + + mtspr(SPRN_PWRMGTCR0, pw20_idle); +} + +static ssize_t store_pw20_wait_time(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + u32 entry_bit; + u64 value; + + unsigned int cpu = dev-id; + + if (kstrtou64(buf, 0, value)) + return -EINVAL; + + if (!value) + return -EINVAL; + + entry_bit = get_idle_ticks_bit(value); + if (entry_bit MAX_BIT) + return -EINVAL; + + pw20_wt = value; + smp_call_function_single(cpu, set_pw20_wait_entry_bit, + entry_bit, 1); + + return count; +} + +static ssize_t
RE: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for PCIe devices
-Original Message- From: Sethi Varun-B16395 Sent: Wednesday, October 16, 2013 4:53 PM To: j...@8bytes.org; io...@lists.linux-foundation.org; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; Yoder Stuart-B08248; Wood Scott-B07421; alex.william...@redhat.com; Bhushan Bharat-R65777 Cc: Sethi Varun-B16395 Subject: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for PCIe devices Once the PCIe device assigned to a guest VM (via VFIO) gets detached from the iommu domain (when guest terminates), its PAMU table entry is disabled. So, this would prevent the device from being used once it's assigned back to the host. This patch allows for creation of a default DMA window corresponding to the device and subsequently enabling the PAMU table entry. Before we enable the entry, we ensure that the device's bus master capability is disabled (device quiesced). Signed-off-by: Varun Sethi varun.se...@freescale.com --- drivers/iommu/fsl_pamu.c| 43 --- - drivers/iommu/fsl_pamu.h|1 + drivers/iommu/fsl_pamu_domain.c | 46 --- 3 files changed, 78 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c index cba0498..fb4a031 100644 --- a/drivers/iommu/fsl_pamu.c +++ b/drivers/iommu/fsl_pamu.c @@ -225,6 +225,21 @@ static struct paace *pamu_get_spaace(struct paace *paace, u32 wnum) return spaace; } +/* + * Defaul PPAACE settings for an LIODN. + */ +static void setup_default_ppaace(struct paace *ppaace) { + pamu_init_ppaace(ppaace); + /* window size is 2^(WSE+1) bytes */ + set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE, 35); + ppaace-wbah = 0; + set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL, 0); + set_bf(ppaace-impl_attr, PAACE_IA_ATM, + PAACE_ATM_NO_XLATE); + set_bf(ppaace-addr_bitfields, PAACE_AF_AP, + PAACE_AP_PERMS_ALL); +} /** * pamu_get_fspi_and_allocate() - Allocates fspi index and reserves subwindows *required for primary PAACE in the secondary @@ -253,6 +268,24 @@ static unsigned long pamu_get_fspi_and_allocate(u32 subwin_cnt) return (spaace_addr - (unsigned long)spaact) / (sizeof(struct paace)); } +/* Reset the PAACE entry to the default state */ void +enable_default_dma_window(int liodn) { + struct paace *ppaace; + + ppaace = pamu_get_ppaace(liodn); + if (!ppaace) { + pr_debug(Invalid liodn entry\n); + return; + } + + memset(ppaace, 0, sizeof(struct paace)); + + setup_default_ppaace(ppaace); + mb(); + pamu_enable_liodn(liodn); +} + /* Release the subwindows reserved for a particular LIODN */ void pamu_free_subwins(int liodn) { @@ -752,15 +785,7 @@ static void __init setup_liodns(void) continue; } ppaace = pamu_get_ppaace(liodn); - pamu_init_ppaace(ppaace); - /* window size is 2^(WSE+1) bytes */ - set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE, 35); - ppaace-wbah = 0; - set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL, 0); - set_bf(ppaace-impl_attr, PAACE_IA_ATM, - PAACE_ATM_NO_XLATE); - set_bf(ppaace-addr_bitfields, PAACE_AF_AP, - PAACE_AP_PERMS_ALL); + setup_default_ppaace(ppaace); if (of_device_is_compatible(node, fsl,qman-portal)) setup_qbman_paace(ppaace, QMAN_PORTAL_PAACE); if (of_device_is_compatible(node, fsl,qman)) diff -- git a/drivers/iommu/fsl_pamu.h b/drivers/iommu/fsl_pamu.h index 8fc1a12..0edc 100644 --- a/drivers/iommu/fsl_pamu.h +++ b/drivers/iommu/fsl_pamu.h @@ -406,5 +406,6 @@ void get_ome_index(u32 *omi_index, struct device *dev); int pamu_update_paace_stash(int liodn, u32 subwin, u32 value); int pamu_disable_spaace(int liodn, u32 subwin); u32 pamu_get_max_subwin_cnt(void); +void enable_default_dma_window(int liodn); #endif /* __FSL_PAMU_H */ diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index 966ae70..dd6cafc 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu/fsl_pamu_domain.c @@ -340,17 +340,57 @@ static inline struct device_domain_info *find_domain(struct device *dev) return dev-archdata.iommu_domain; } +/* Disable device DMA capability and enable default DMA window */ +static void disable_device_dma(struct device_domain_info *info, + int enable_dma_window) +{ +#ifdef CONFIG_PCI + if (info-dev-bus == pci_bus_type
RE: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for PCIe devices
-Original Message- From: Sethi Varun-B16395 Sent: Wednesday, October 16, 2013 4:53 PM To: j...@8bytes.org; io...@lists.linux-foundation.org; linuxppc- d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; Yoder Stuart-B08248; Wood Scott-B07421; alex.william...@redhat.com; Bhushan Bharat-R65777 Cc: Sethi Varun-B16395 Subject: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for PCIe devices Once the PCIe device assigned to a guest VM (via VFIO) gets detached from the iommu domain (when guest terminates), its PAMU table entry is disabled. So, this would prevent the device from being used once it's assigned back to the host. This patch allows for creation of a default DMA window corresponding to the device and subsequently enabling the PAMU table entry. Before we enable the entry, we ensure that the device's bus master capability is disabled (device quiesced). Signed-off-by: Varun Sethi varun.se...@freescale.com --- drivers/iommu/fsl_pamu.c| 43 --- - drivers/iommu/fsl_pamu.h|1 + drivers/iommu/fsl_pamu_domain.c | 46 --- 3 files changed, 78 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c index cba0498..fb4a031 100644 --- a/drivers/iommu/fsl_pamu.c +++ b/drivers/iommu/fsl_pamu.c @@ -225,6 +225,21 @@ static struct paace *pamu_get_spaace(struct paace *paace, u32 wnum) return spaace; } +/* + * Defaul PPAACE settings for an LIODN. + */ +static void setup_default_ppaace(struct paace *ppaace) { + pamu_init_ppaace(ppaace); + /* window size is 2^(WSE+1) bytes */ + set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE, 35); + ppaace-wbah = 0; + set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL, 0); + set_bf(ppaace-impl_attr, PAACE_IA_ATM, + PAACE_ATM_NO_XLATE); + set_bf(ppaace-addr_bitfields, PAACE_AF_AP, + PAACE_AP_PERMS_ALL); +} /** * pamu_get_fspi_and_allocate() - Allocates fspi index and reserves subwindows *required for primary PAACE in the secondary @@ -253,6 +268,24 @@ static unsigned long pamu_get_fspi_and_allocate(u32 subwin_cnt) return (spaace_addr - (unsigned long)spaact) / (sizeof(struct paace)); } +/* Reset the PAACE entry to the default state */ void +enable_default_dma_window(int liodn) { + struct paace *ppaace; + + ppaace = pamu_get_ppaace(liodn); + if (!ppaace) { + pr_debug(Invalid liodn entry\n); + return; + } + + memset(ppaace, 0, sizeof(struct paace)); + + setup_default_ppaace(ppaace); + mb(); + pamu_enable_liodn(liodn); +} + /* Release the subwindows reserved for a particular LIODN */ void pamu_free_subwins(int liodn) { @@ -752,15 +785,7 @@ static void __init setup_liodns(void) continue; } ppaace = pamu_get_ppaace(liodn); - pamu_init_ppaace(ppaace); - /* window size is 2^(WSE+1) bytes */ - set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE, 35); - ppaace-wbah = 0; - set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL, 0); - set_bf(ppaace-impl_attr, PAACE_IA_ATM, - PAACE_ATM_NO_XLATE); - set_bf(ppaace-addr_bitfields, PAACE_AF_AP, - PAACE_AP_PERMS_ALL); + setup_default_ppaace(ppaace); if (of_device_is_compatible(node, fsl,qman- portal)) setup_qbman_paace(ppaace, QMAN_PORTAL_PAACE); if (of_device_is_compatible(node, fsl,qman)) diff -- git a/drivers/iommu/fsl_pamu.h b/drivers/iommu/fsl_pamu.h index 8fc1a12..0edc 100644 --- a/drivers/iommu/fsl_pamu.h +++ b/drivers/iommu/fsl_pamu.h @@ -406,5 +406,6 @@ void get_ome_index(u32 *omi_index, struct device *dev); int pamu_update_paace_stash(int liodn, u32 subwin, u32 value); int pamu_disable_spaace(int liodn, u32 subwin); u32 pamu_get_max_subwin_cnt(void); +void enable_default_dma_window(int liodn); #endif /* __FSL_PAMU_H */ diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c index 966ae70..dd6cafc 100644 --- a/drivers/iommu/fsl_pamu_domain.c +++ b/drivers/iommu
RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle
-Original Message- From: Wang Dongsheng-B40534 Sent: Thursday, October 17, 2013 8:16 AM To: Bhushan Bharat-R65777; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Bhushan Bharat-R65777 Sent: Thursday, October 17, 2013 1:01 AM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wang Dongsheng-B40534 Sent: Tuesday, October 15, 2013 2:51 PM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle From: Wang Dongsheng dongsheng.w...@freescale.com Add a sys interface to enable/diable pw20 state or altivec idle, and control the wait entry time. Enable/Disable interface: 0, disable. 1, enable. /sys/devices/system/cpu/cpuX/pw20_state /sys/devices/system/cpu/cpuX/altivec_idle Set wait time interface:(Nanosecond) /sys/devices/system/cpu/cpuX/pw20_wait_time /sys/devices/system/cpu/cpuX/altivec_idle_wait_time Example: Base on TBfreq is 41MHZ. 1~48(ns): TB[63] 49~97(ns): TB[62] 98~195(ns): TB[61] 196~390(ns): TB[60] 391~780(ns): TB[59] 781~1560(ns): TB[58] ... Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com --- *v5: Change get_idle_ticks_bit function implementation. *v4: Move code from 85xx/common.c to kernel/sysfs.c. Remove has_pw20_altivec_idle function. Change wait entry_bit to wait time. diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 27a90b9..10d1128 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -85,6 +85,284 @@ __setup(smt-snooze-delay=, setup_smt_snooze_delay); #endif /* CONFIG_PPC64 */ +#ifdef CONFIG_FSL_SOC +#define MAX_BIT 63 + +static u64 pw20_wt; +static u64 altivec_idle_wt; + +static unsigned int get_idle_ticks_bit(u64 ns) { + u64 cycle; + + if (ns = 1) + cycle = div_u64(ns + 500, 1000) * tb_ticks_per_usec; + else + cycle = div_u64(ns * tb_ticks_per_usec, 1000); + + if (!cycle) + return 0; + + return ilog2(cycle); +} + +static void do_show_pwrmgtcr0(void *val) { + u32 *value = val; + + *value = mfspr(SPRN_PWRMGTCR0); +} + +static ssize_t show_pw20_state(struct device *dev, + struct device_attribute *attr, char *buf) { + u32 value; + unsigned int cpu = dev-id; + + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1); + + value = PWRMGTCR0_PW20_WAIT; + + return sprintf(buf, %u\n, value ? 1 : 0); } + +static void do_store_pw20_state(void *val) { + u32 *value = val; + u32 pw20_state; + + pw20_state = mfspr(SPRN_PWRMGTCR0); + + if (*value) + pw20_state |= PWRMGTCR0_PW20_WAIT; + else + pw20_state = ~PWRMGTCR0_PW20_WAIT; + + mtspr(SPRN_PWRMGTCR0, pw20_state); } + +static ssize_t store_pw20_state(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + u32 value; + unsigned int cpu = dev-id; + + if (kstrtou32(buf, 0, value)) + return -EINVAL; + + if (value 1) + return -EINVAL; + + smp_call_function_single(cpu, do_store_pw20_state, value, 1); + + return count; +} + +static ssize_t show_pw20_wait_time(struct device *dev, + struct device_attribute *attr, char *buf) { + u32 value; + u64 tb_cycle; + s64 time; + + unsigned int cpu = dev-id; + + if (!pw20_wt) { + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1); + value = (value PWRMGTCR0_PW20_ENT) + PWRMGTCR0_PW20_ENT_SHIFT; + + tb_cycle = (1 (MAX_BIT - value)) * 2; Is value = 0 and value = 1 legal? These will make tb_cycle = 0, + time = div_u64(tb_cycle * 1000, tb_ticks_per_usec) - 1; And time = -1; Please look at the end of the function, :) return sprintf(buf, %llu\n, time 0 ? time : 0); I know you return 0 if value = 0/1, my question was that, is this correct as per specification? Ahh, also for value upto 7 you will return 0, no? -Bharat -dongsheng + } else { + time = pw20_wt; + } + + return sprintf(buf, %llu\n, time 0 ? time : 0); } + ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 1/7] powerpc: Add interface to get msi region information
-Original Message- From: j...@8bytes.org [mailto:j...@8bytes.org] Sent: Tuesday, October 08, 2013 10:32 PM To: Bjorn Helgaas Cc: Bhushan Bharat-R65777; alex.william...@redhat.com; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux-ker...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; linux-...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; io...@lists.linux-foundation.org Subject: Re: [PATCH 1/7] powerpc: Add interface to get msi region information On Tue, Oct 08, 2013 at 10:47:49AM -0600, Bjorn Helgaas wrote: I still have no idea what an aperture type IOMMU is, other than that it is different. An aperture based IOMMU is basically any GART-like IOMMU which can only remap a small window (the aperture) of the DMA address space. DMA outside of that window is either blocked completly or passed through untranslated. It is completely blocked for Freescale PAMU. So for this type of iommu what we have to do is to create a MSI mapping just after guest physical address, Example: guest have a 512M of memory then we create window of 1G (because of power of 2 requirement), then we have to FIT MSI just after 512M of guest. And for that we need 1) to know the physical address of MSI's in interrupt controller (for that this patch was all about of). 2) When guest enable MSI interrupt then we write MSI-address and MSI-DATA in device. The discussion with Alex Williamson is about that interface. Thanks -Bharat Joerg ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 1/7] powerpc: Add interface to get msi region information
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, October 09, 2013 4:27 AM To: Bhushan Bharat-R65777 Cc: alex.william...@redhat.com; j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux-ker...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; linux-...@vger.kernel.org; ag...@suse.de; io...@lists.linux-foundation.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 1/7] powerpc: Add interface to get msi region information On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote: @@ -376,6 +405,7 @@ static int fsl_of_msi_probe(struct platform_device *dev) int len; u32 offset; static const u32 all_avail[] = { 0, NR_MSI_IRQS }; + static int bank_index; match = of_match_device(fsl_of_msi_ids, dev-dev); if (!match) @@ -419,8 +449,8 @@ static int fsl_of_msi_probe(struct platform_device *dev) dev-dev.of_node-full_name); goto error_out; } - msi-msiir_offset = - features-msiir_offset + (res.start 0xf); + msi-msiir = res.start + features-msiir_offset; + printk(msi-msiir = %llx\n, msi-msiir); dev_dbg or remove Oops, sorry it was leftover of debugging :( } msi-feature = features-fsl_pic_ip; @@ -470,6 +500,7 @@ static int fsl_of_msi_probe(struct platform_device *dev) } } + msi-bank_index = bank_index++; What if multiple MSIs are boing probed in parallel? Ohh, I have not thought that it can be called in parallel bank_index is not atomic. Will declare bank_intex as atomic_t and use atomic_inc_return(bank_index) diff --git a/arch/powerpc/sysdev/fsl_msi.h b/arch/powerpc/sysdev/fsl_msi.h index 8225f86..6bd5cfc 100644 --- a/arch/powerpc/sysdev/fsl_msi.h +++ b/arch/powerpc/sysdev/fsl_msi.h @@ -29,12 +29,19 @@ struct fsl_msi { struct irq_domain *irqhost; unsigned long cascade_irq; - - u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */ + dma_addr_t msiir; /* MSIIR Address in CCSR */ Are you sure dma_addr_t is right here, versus phys_addr_t? It implies that it's the output of the DMA API, but I don't think the DMA API is used in the MSI driver. Perhaps it should be, but we still want the raw physical address to pass on to VFIO. Looking through the conversation I will make this phys_addr_t void __iomem *msi_regs; u32 feature; int msi_virqs[NR_MSI_REG]; + /* +* During probe each bank is assigned a index number. +* index number ranges from 0 to 2^32. +* Example MSI bank 1 = 0 +* MSI bank 2 = 1, and so on. +*/ + int bank_index; 2^32 doesn't fit in int (nor does 2^32 - 1). Right :( Just say that indices start at 0. Will correct this Thanks -Bharat -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 1/4] powerpc: Added __cmpdi2 for signed 64bit comparision
Oops it came as 1/4, I am sorry, please ignore this Thanks -Bharat -Original Message- From: Bhushan Bharat-R65777 Sent: Wednesday, October 09, 2013 10:39 AM To: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org; b...@kernel.crashing.org Cc: Bhushan Bharat-R65777; Bhushan Bharat-R65777 Subject: [PATCH 1/4] powerpc: Added __cmpdi2 for signed 64bit comparision This was missing on powerpc and I am getting compilation error drivers/vfio/pci/vfio_pci_rdwr.c:193: undefined reference to `__cmpdi2' drivers/vfio/pci/vfio_pci_rdwr.c:193: undefined reference to `__cmpdi2' Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kernel/misc_32.S | 14 ++ arch/powerpc/kernel/ppc_ksyms.c |2 ++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index 777d999..7c0eec2 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -644,6 +644,20 @@ _GLOBAL(__lshrdi3) blr /* + * 64-bit comparison: __cmpdi2(s64 a, s64 b) + * Returns 0 if a b, 1 if a == b, 2 if a b. + */ +_GLOBAL(__cmpdi2) + cmpwr3,r5 + li r3,1 + bne 1f + cmplw r4,r6 + beqlr +1: li r3,0 + bltlr + li r3,2 + blr +/* * 64-bit comparison: __ucmpdi2(u64 a, u64 b) * Returns 0 if a b, 1 if a == b, 2 if a b. */ diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c index 21646db..5674c00 100644 --- a/arch/powerpc/kernel/ppc_ksyms.c +++ b/arch/powerpc/kernel/ppc_ksyms.c @@ -143,6 +143,8 @@ EXPORT_SYMBOL(__ashldi3); EXPORT_SYMBOL(__lshrdi3); int __ucmpdi2(unsigned long long, unsigned long long); EXPORT_SYMBOL(__ucmpdi2); +int __cmpdi2(long long, long long); +EXPORT_SYMBOL(__cmpdi2); #endif long long __bswapdi2(long long); EXPORT_SYMBOL(__bswapdi2); -- 1.7.0.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device
Do you really want module dependencies between vfio and your core kernel MSI setup? Look at the vfio external user interface that we've already defined. That allows other components of the kernel to get a proper reference to a vfio group. From there you can work out how to get what you want. Another alternative is that vfio could register an MSI to IOVA mapping with architecture code when the mapping is created. The MSI setup path could then do a lookup in architecture code for the mapping. You could even store the MSI to IOVA mapping in VFIO and create an interface where SET_IRQ passes that mapping into setup code. Ok, What I want is to get IOVA associated with a physical address (physical address of MSI-bank). And currently I do not see a way to know IOVA of a physical address and doing all this domain get and then search through all of iommu-windows of that domain. What if we add an iommu-API which can return the IOVA mapping of a physical address. Current use case is setting up MSI's for aperture type of IOMMU also getting a phys_to_iova() mapping is independent of VFIO, your thought? A physical address can be mapped to multiple IOVAs, so the interface seems flawed by design. It also has the same problem as above, it's a backdoor that can be called asynchronous to the owner of the domain, so what reason is there to believe the result? It just replaces an iommu_domain pointer with an IOVA. VFIO knows this mapping, so why are we trying to go behind its back and ask the IOMMU? IOMMU is the final place where mapping is created, so may be today it is calling on behalf of VFIO, tomorrow it can be for normal Linux or some other interface. But I am fine to directly talk to vfio and will not try to solve a problem which does not exists today. MSI subsystem knows pdev (pci device) and physical address, then what interface it will use to get the IOVA from VFIO? Thanks -Bharat Thanks, Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Friday, October 04, 2013 11:42 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device On Fri, 2013-10-04 at 17:23 +, Bhushan Bharat-R65777 wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Friday, October 04, 2013 10:43 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device On Fri, 2013-10-04 at 16:47 +, Bhushan Bharat-R65777 wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Friday, October 04, 2013 9:15 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device On Fri, 2013-10-04 at 09:54 +, Bhushan Bharat-R65777 wrote: -Original Message- From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org] On Behalf Of Alex Williamson Sent: Wednesday, September 25, 2013 10:16 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote: This api return the iommu domain to which the device is attached. The iommu_domain is required for making API calls related to iommu. Follow up patches which use this API to know iommu maping. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- drivers/iommu/iommu.c | 10 ++ include/linux/iommu.h |7 +++ 2 files changed, 17 insertions(+), 0 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index fbe9ca7..6ac5f50 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -696,6 +696,16 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_detach_device); +struct iommu_domain *iommu_get_dev_domain(struct device *dev) { + struct iommu_ops *ops = dev-bus-iommu_ops; + + if (unlikely(ops == NULL || ops-get_dev_iommu_domain == NULL)) + return NULL; + + return ops-get_dev_iommu_domain(dev); } +EXPORT_SYMBOL_GPL(iommu_get_dev_domain); What prevents this from racing iommu_domain_free()? There's no references acquired, so there's no reason for the caller to assume the pointer is valid. Sorry for late query, somehow this email went into a folder and escaped; Just to be sure, there is not lock at generic struct iommu_domain, but IP specific structure (link FSL domain) linked in iommu_domain-priv have a lock, so we need to ensure this race in FSL iommu code (say drivers/iommu/fsl_pamu_domain.c), right? No, it's not sufficient to make sure that your use of the interface is race free. The interface itself needs to be designed so that it's difficult to use incorrectly. So we can define iommu_get_dev_domain()/iommu_put_dev_domain(); iommu_get_dev_domain() will return domain with the lock held, and iommu_put_dev_domain() will release the lock? And iommu_get_dev_domain() must always be followed by iommu_get_dev_domain(). What lock? get/put are generally used for reference counting, not locking in the kernel. That's not the case here. This is a backdoor to get the iommu domain from the iommu driver regardless of who is using it or how. The iommu domain is created and managed by vfio, so shouldn't we be looking at how to do this through vfio? Let me
RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device
-Original Message- From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org] On Behalf Of Alex Williamson Sent: Wednesday, September 25, 2013 10:16 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote: This api return the iommu domain to which the device is attached. The iommu_domain is required for making API calls related to iommu. Follow up patches which use this API to know iommu maping. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- drivers/iommu/iommu.c | 10 ++ include/linux/iommu.h |7 +++ 2 files changed, 17 insertions(+), 0 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index fbe9ca7..6ac5f50 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -696,6 +696,16 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_detach_device); +struct iommu_domain *iommu_get_dev_domain(struct device *dev) { + struct iommu_ops *ops = dev-bus-iommu_ops; + + if (unlikely(ops == NULL || ops-get_dev_iommu_domain == NULL)) + return NULL; + + return ops-get_dev_iommu_domain(dev); } +EXPORT_SYMBOL_GPL(iommu_get_dev_domain); What prevents this from racing iommu_domain_free()? There's no references acquired, so there's no reason for the caller to assume the pointer is valid. Sorry for late query, somehow this email went into a folder and escaped; Just to be sure, there is not lock at generic struct iommu_domain, but IP specific structure (link FSL domain) linked in iommu_domain-priv have a lock, so we need to ensure this race in FSL iommu code (say drivers/iommu/fsl_pamu_domain.c), right? Thanks -Bharat /* * IOMMU groups are really the natrual working unit of the IOMMU, but * the IOMMU API works on domains and devices. Bridge that gap by diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 7ea319e..fa046bd 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -127,6 +127,7 @@ struct iommu_ops { int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count); /* Get the numer of window per domain */ u32 (*domain_get_windows)(struct iommu_domain *domain); + struct iommu_domain *(*get_dev_iommu_domain)(struct device *dev); unsigned long pgsize_bitmap; }; @@ -190,6 +191,7 @@ extern int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr, phys_addr_t offset, u64 size, int prot); extern void iommu_domain_window_disable(struct iommu_domain *domain, u32 wnd_nr); +extern struct iommu_domain *iommu_get_dev_domain(struct device *dev); /** * report_iommu_fault() - report about an IOMMU fault to the IOMMU framework * @domain: the iommu domain where the fault has happened @@ -284,6 +286,11 @@ static inline void iommu_domain_window_disable(struct iommu_domain *domain, { } +static inline struct iommu_domain *iommu_get_dev_domain(struct device +*dev) { + return NULL; +} + static inline phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) { return 0; -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device
-Original Message- From: Bhushan Bharat-R65777 Sent: Friday, October 04, 2013 3:24 PM To: 'Alex Williamson' Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org Subject: RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device -Original Message- From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org] On Behalf Of Alex Williamson Sent: Wednesday, September 25, 2013 10:16 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote: This api return the iommu domain to which the device is attached. The iommu_domain is required for making API calls related to iommu. Follow up patches which use this API to know iommu maping. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- drivers/iommu/iommu.c | 10 ++ include/linux/iommu.h |7 +++ 2 files changed, 17 insertions(+), 0 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index fbe9ca7..6ac5f50 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -696,6 +696,16 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_detach_device); +struct iommu_domain *iommu_get_dev_domain(struct device *dev) { + struct iommu_ops *ops = dev-bus-iommu_ops; + + if (unlikely(ops == NULL || ops-get_dev_iommu_domain == NULL)) + return NULL; + + return ops-get_dev_iommu_domain(dev); } +EXPORT_SYMBOL_GPL(iommu_get_dev_domain); What prevents this from racing iommu_domain_free()? There's no references acquired, so there's no reason for the caller to assume the pointer is valid. Sorry for late query, somehow this email went into a folder and escaped; Just to be sure, there is not lock at generic struct iommu_domain, but IP specific structure (link FSL domain) linked in iommu_domain-priv have a lock, so we need to ensure this race in FSL iommu code (say drivers/iommu/fsl_pamu_domain.c), right? Further thinking of this, there are more problems here: - Like MSI subsystem will call iommu_get_dev_domain(), which will take a lock, find the domain pointer, release the lock, and return the domain - Now if domain in freed up - While MSI subsystem tries to do work on domain (like get_attribute/set_attribute etc) ??? So can we do like iommu_get_dev_domain() will return domain with the lock held, and iommu_put_dev_domain() will release the lock? And iommu_get_dev_domain() must always be followed by iommu_get_dev_domain() Thanks -Bharat Thanks -Bharat /* * IOMMU groups are really the natrual working unit of the IOMMU, but * the IOMMU API works on domains and devices. Bridge that gap by diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 7ea319e..fa046bd 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -127,6 +127,7 @@ struct iommu_ops { int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count); /* Get the numer of window per domain */ u32 (*domain_get_windows)(struct iommu_domain *domain); + struct iommu_domain *(*get_dev_iommu_domain)(struct device *dev); unsigned long pgsize_bitmap; }; @@ -190,6 +191,7 @@ extern int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr, phys_addr_t offset, u64 size, int prot); extern void iommu_domain_window_disable(struct iommu_domain *domain, u32 wnd_nr); +extern struct iommu_domain *iommu_get_dev_domain(struct device +*dev); /** * report_iommu_fault() - report about an IOMMU fault to the IOMMU framework * @domain: the iommu domain where the fault has happened @@ -284,6 +286,11 @@ static inline void iommu_domain_window_disable(struct iommu_domain *domain, { } +static inline struct iommu_domain *iommu_get_dev_domain(struct +device +*dev) { + return NULL; +} + static inline phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) { return 0; -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list Linuxppc-dev
RE: [PATCH 4/6 v5] kvm: powerpc: keep only pte search logic in lookup_linux_pte
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Friday, October 04, 2013 6:57 PM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; pau...@samba.org; k...@vger.kernel.org; kvm- p...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 4/6 v5] kvm: powerpc: keep only pte search logic in lookup_linux_pte On 19.09.2013, at 08:02, Bharat Bhushan wrote: lookup_linux_pte() was searching for a pte and also sets access flags is writable. This function now searches only pte while access flag setting is done explicitly. This pte lookup is not kvm specific, so moved to common code (asm/pgtable.h) My Followup patch will use this on booke. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v4-v5 - No change arch/powerpc/include/asm/pgtable.h | 24 +++ arch/powerpc/kvm/book3s_hv_rm_mmu.c | 36 +++--- 2 files changed, 36 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 7d6eacf..3a5de5c 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -223,6 +223,30 @@ extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, #endif pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, unsigned *shift); + +static inline pte_t *lookup_linux_pte(pgd_t *pgdir, unsigned long hva, +unsigned long *pte_sizep) +{ + pte_t *ptep; + unsigned long ps = *pte_sizep; + unsigned int shift; + + ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); + if (!ptep) + return __pte(0); This returns a struct pte_t, but your return value of the function is a struct pte_t *. So this code will fail compiling with STRICT_MM_TYPECHECKS set. Any reason you don't just return NULL here? I want to return the ptep (pte pointer) , so yes this should be NULL. Will correct this. Thanks -Bharat That way callers could simply check on if (ptep) ... or you leave the return value as struct pte_t. Alex + if (shift) + *pte_sizep = 1ul shift; + else + *pte_sizep = PAGE_SIZE; + + if (ps *pte_sizep) + return __pte(0); + + if (!pte_present(*ptep)) + return __pte(0); + + return ptep; +} #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 45e30d6..74fa7f8 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -134,25 +134,6 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index, unlock_rmap(rmap); } -static pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva, - int writing, unsigned long *pte_sizep) -{ - pte_t *ptep; - unsigned long ps = *pte_sizep; - unsigned int hugepage_shift; - - ptep = find_linux_pte_or_hugepte(pgdir, hva, hugepage_shift); - if (!ptep) - return __pte(0); - if (hugepage_shift) - *pte_sizep = 1ul hugepage_shift; - else - *pte_sizep = PAGE_SIZE; - if (ps *pte_sizep) - return __pte(0); - return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift); -} - static inline void unlock_hpte(unsigned long *hpte, unsigned long hpte_v) { asm volatile(PPC_RELEASE_BARRIER : : : memory); @@ -173,6 +154,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, unsigned long is_io; unsigned long *rmap; pte_t pte; + pte_t *ptep; unsigned int writing; unsigned long mmu_seq; unsigned long rcbits; @@ -231,8 +213,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, /* Look up the Linux PTE for the backing page */ pte_size = psize; - pte = lookup_linux_pte(pgdir, hva, writing, pte_size); - if (pte_present(pte)) { + ptep = lookup_linux_pte(pgdir, hva, pte_size); + if (pte_present(pte_val(*ptep))) { + pte = kvmppc_read_update_linux_pte(ptep, writing); if (writing !pte_write(pte)) /* make the actual HPTE be read-only */ ptel = hpte_make_readonly(ptel); @@ -661,15 +644,20 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, struct kvm_memory_slot *memslot; pgd_t *pgdir = vcpu-arch.pgdir; pte_t pte; + pte_t *ptep; psize = hpte_page_size(v, r); gfn = ((r HPTE_R_RPN) ~(psize - 1)) PAGE_SHIFT; memslot
RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Friday, October 04, 2013 9:15 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device On Fri, 2013-10-04 at 09:54 +, Bhushan Bharat-R65777 wrote: -Original Message- From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org] On Behalf Of Alex Williamson Sent: Wednesday, September 25, 2013 10:16 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote: This api return the iommu domain to which the device is attached. The iommu_domain is required for making API calls related to iommu. Follow up patches which use this API to know iommu maping. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- drivers/iommu/iommu.c | 10 ++ include/linux/iommu.h |7 +++ 2 files changed, 17 insertions(+), 0 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index fbe9ca7..6ac5f50 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -696,6 +696,16 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_detach_device); +struct iommu_domain *iommu_get_dev_domain(struct device *dev) { + struct iommu_ops *ops = dev-bus-iommu_ops; + + if (unlikely(ops == NULL || ops-get_dev_iommu_domain == NULL)) + return NULL; + + return ops-get_dev_iommu_domain(dev); } +EXPORT_SYMBOL_GPL(iommu_get_dev_domain); What prevents this from racing iommu_domain_free()? There's no references acquired, so there's no reason for the caller to assume the pointer is valid. Sorry for late query, somehow this email went into a folder and escaped; Just to be sure, there is not lock at generic struct iommu_domain, but IP specific structure (link FSL domain) linked in iommu_domain-priv have a lock, so we need to ensure this race in FSL iommu code (say drivers/iommu/fsl_pamu_domain.c), right? No, it's not sufficient to make sure that your use of the interface is race free. The interface itself needs to be designed so that it's difficult to use incorrectly. So we can define iommu_get_dev_domain()/iommu_put_dev_domain(); iommu_get_dev_domain() will return domain with the lock held, and iommu_put_dev_domain() will release the lock? And iommu_get_dev_domain() must always be followed by iommu_get_dev_domain(). That's not the case here. This is a backdoor to get the iommu domain from the iommu driver regardless of who is using it or how. The iommu domain is created and managed by vfio, so shouldn't we be looking at how to do this through vfio? Let me first describe what we are doing here: During initialization:- - vfio talks to MSI system to know the MSI-page and size - vfio then interacts with iommu to map the MSI-page in iommu (IOVA is decided by userspace and physical address is the MSI-page) - So the IOVA subwindow mapping is created in iommu and yes VFIO know about this mapping. Now do SET_IRQ(MSI/MSIX) ioctl: - calls pci_enable_msix()/pci_enable_msi_block(): which is supposed to set MSI address/data in device. - So in current implementation (this patchset) msi-subsystem gets the IOVA from iommu via this defined interface. - Are you saying that rather than getting this from iommu, we should get this from vfio? What difference does this make? Thanks -Bharat It seems like you'd want to use your device to get a vfio group reference, from which you could do something with the vfio external user interface and get the iommu domain reference. Thanks, Alex /* * IOMMU groups are really the natrual working unit of the IOMMU, but * the IOMMU API works on domains and devices. Bridge that gap by diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 7ea319e..fa046bd 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -127,6 +127,7 @@ struct iommu_ops { int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count); /* Get the numer of window per domain */ u32 (*domain_get_windows)(struct iommu_domain *domain); + struct iommu_domain *(*get_dev_iommu_domain
RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Friday, October 04, 2013 10:43 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device On Fri, 2013-10-04 at 16:47 +, Bhushan Bharat-R65777 wrote: -Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Friday, October 04, 2013 9:15 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device On Fri, 2013-10-04 at 09:54 +, Bhushan Bharat-R65777 wrote: -Original Message- From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org] On Behalf Of Alex Williamson Sent: Wednesday, September 25, 2013 10:16 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote: This api return the iommu domain to which the device is attached. The iommu_domain is required for making API calls related to iommu. Follow up patches which use this API to know iommu maping. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- drivers/iommu/iommu.c | 10 ++ include/linux/iommu.h |7 +++ 2 files changed, 17 insertions(+), 0 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index fbe9ca7..6ac5f50 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -696,6 +696,16 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev) } EXPORT_SYMBOL_GPL(iommu_detach_device); +struct iommu_domain *iommu_get_dev_domain(struct device *dev) { + struct iommu_ops *ops = dev-bus-iommu_ops; + + if (unlikely(ops == NULL || ops-get_dev_iommu_domain == NULL)) + return NULL; + + return ops-get_dev_iommu_domain(dev); } +EXPORT_SYMBOL_GPL(iommu_get_dev_domain); What prevents this from racing iommu_domain_free()? There's no references acquired, so there's no reason for the caller to assume the pointer is valid. Sorry for late query, somehow this email went into a folder and escaped; Just to be sure, there is not lock at generic struct iommu_domain, but IP specific structure (link FSL domain) linked in iommu_domain-priv have a lock, so we need to ensure this race in FSL iommu code (say drivers/iommu/fsl_pamu_domain.c), right? No, it's not sufficient to make sure that your use of the interface is race free. The interface itself needs to be designed so that it's difficult to use incorrectly. So we can define iommu_get_dev_domain()/iommu_put_dev_domain(); iommu_get_dev_domain() will return domain with the lock held, and iommu_put_dev_domain() will release the lock? And iommu_get_dev_domain() must always be followed by iommu_get_dev_domain(). What lock? get/put are generally used for reference counting, not locking in the kernel. That's not the case here. This is a backdoor to get the iommu domain from the iommu driver regardless of who is using it or how. The iommu domain is created and managed by vfio, so shouldn't we be looking at how to do this through vfio? Let me first describe what we are doing here: During initialization:- - vfio talks to MSI system to know the MSI-page and size - vfio then interacts with iommu to map the MSI-page in iommu (IOVA is decided by userspace and physical address is the MSI-page) - So the IOVA subwindow mapping is created in iommu and yes VFIO know about this mapping. Now do SET_IRQ(MSI/MSIX) ioctl: - calls pci_enable_msix()/pci_enable_msi_block(): which is supposed to set MSI address/data in device. - So in current implementation (this patchset) msi-subsystem gets the IOVA from iommu via this defined interface. - Are you saying that rather than getting this from iommu, we should get this from vfio? What difference does this make? Yes, you just said above
RE: [PATCH 1/7] powerpc: Add interface to get msi region information
-Original Message- From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org] On Behalf Of Bjorn Helgaas Sent: Wednesday, September 25, 2013 5:28 AM To: Bhushan Bharat-R65777 Cc: alex.william...@redhat.com; j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux-ker...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; linux-...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; io...@lists.linux-foundation.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 1/7] powerpc: Add interface to get msi region information On Thu, Sep 19, 2013 at 12:59:17PM +0530, Bharat Bhushan wrote: This patch adds interface to get following information - Number of MSI regions (which is number of MSI banks for powerpc). - Get the region address range: Physical page which have the address/addresses used for generating MSI interrupt and size of the page. These are required to create IOMMU (Freescale PAMU) mapping for devices which are directly assigned using VFIO. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/machdep.h |8 +++ arch/powerpc/include/asm/pci.h |2 + arch/powerpc/kernel/msi.c | 18 arch/powerpc/sysdev/fsl_msi.c | 39 +-- arch/powerpc/sysdev/fsl_msi.h | 11 - drivers/pci/msi.c | 26 include/linux/msi.h|8 +++ include/linux/pci.h| 13 8 files changed, 120 insertions(+), 5 deletions(-) ... diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index aca7578..6d85c15 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -30,6 +30,20 @@ static int pci_msi_enable = 1; /* Arch hooks */ +#ifndef arch_msi_get_region_count +int arch_msi_get_region_count(void) +{ + return 0; +} +#endif + +#ifndef arch_msi_get_region +int arch_msi_get_region(int region_num, struct msi_region *region) { + return 0; +} +#endif This #define strategy is gone; see 4287d824 (PCI: use weak functions for MSI arch-specific functions). Please use the weak function strategy for your new MSI region functions. ok + #ifndef arch_msi_check_device int arch_msi_check_device(struct pci_dev *dev, int nvec, int type) { @@ -903,6 +917,18 @@ void pci_disable_msi(struct pci_dev *dev) } EXPORT_SYMBOL(pci_disable_msi); +int msi_get_region_count(void) +{ + return arch_msi_get_region_count(); +} +EXPORT_SYMBOL(msi_get_region_count); + +int msi_get_region(int region_num, struct msi_region *region) { + return arch_msi_get_region(region_num, region); } +EXPORT_SYMBOL(msi_get_region); Please split these interface additions, i.e., the drivers/pci/msi.c, include/linux/msi.h, and include/linux/pci.h changes, into a separate patch. ok I don't know enough about VFIO to understand why these new interfaces are needed. Is this the first VFIO IOMMU driver? I see vfio_iommu_spapr_tce.c and vfio_iommu_type1.c but I don't know if they're comparable to the Freescale PAMU. Do other VFIO IOMMU implementations support MSI? If so, do they handle the problem of mapping the MSI regions in a different way? PAMU is an aperture type of IOMMU while other are paging type, So they are completely different from what PAMU is and handle that differently. /** * pci_msix_table_size - return the number of device's MSI-X table entries * @dev: pointer to the pci_dev data structure of MSI-X device function diff --git a/include/linux/msi.h b/include/linux/msi.h index ee66f3a..ae32601 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -50,6 +50,12 @@ struct msi_desc { struct kobject kobj; }; +struct msi_region { + int region_num; + dma_addr_t addr; + size_t size; +}; This needs some sort of explanatory comment. Ok -Bharat /* * The arch hook for setup up msi irqs */ @@ -58,5 +64,7 @@ void arch_teardown_msi_irq(unsigned int irq); int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type); void arch_teardown_msi_irqs(struct pci_dev *dev); int arch_msi_check_device(struct pci_dev* dev, int nvec, int type); +int arch_msi_get_region_count(void); +int arch_msi_get_region(int region_num, struct msi_region *region); #endif /* LINUX_MSI_H */ diff --git a/include/linux/pci.h b/include/linux/pci.h index 186540d..2b26a59 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1126,6 +1126,7 @@ struct msix_entry { u16 entry; /* driver uses to specify entry, OS writes */ }; +struct msi_region; #ifndef CONFIG_PCI_MSI static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec) @@ -1168,6 +1169,16 @@ static inline int pci_msi_enabled(void) { return 0; } + +static inline int
RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Dongsheng Wang Sent: Tuesday, September 24, 2013 2:59 PM To: Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle From: Wang Dongsheng dongsheng.w...@freescale.com Add a sys interface to enable/diable pw20 state or altivec idle, and control the wait entry time. Enable/Disable interface: 0, disable. 1, enable. /sys/devices/system/cpu/cpuX/pw20_state /sys/devices/system/cpu/cpuX/altivec_idle Set wait time interface:(Nanosecond) /sys/devices/system/cpu/cpuX/pw20_wait_time /sys/devices/system/cpu/cpuX/altivec_idle_wait_time Example: Base on TBfreq is 41MHZ. 1~47(ns): TB[63] 48~95(ns): TB[62] 96~191(ns): TB[61] 192~383(ns): TB[62] 384~767(ns): TB[60] ... Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com --- *v4: Move code from 85xx/common.c to kernel/sysfs.c. Remove has_pw20_altivec_idle function. Change wait entry_bit to wait time. arch/powerpc/kernel/sysfs.c | 291 1 file changed, 291 insertions(+) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 27a90b9..23fece6 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -85,6 +85,279 @@ __setup(smt-snooze-delay=, setup_smt_snooze_delay); #endif /* CONFIG_PPC64 */ +#ifdef CONFIG_FSL_SOC +#define MAX_BIT 63 + +static u64 pw20_wt; +static u64 altivec_idle_wt; + +static unsigned int get_idle_ticks_bit(u64 ns) +{ + u64 cycle; + + cycle = div_u64(ns, 1000 / tb_ticks_per_usec); When tb_ticks_per_usec 1000 (timebase frequency 1GHz) then this will always be ns, which is not correct, no? + if (!cycle) + return 0; + + return ilog2(cycle); +} + +static void do_show_pwrmgtcr0(void *val) +{ + u32 *value = val; + + *value = mfspr(SPRN_PWRMGTCR0); +} + +static ssize_t show_pw20_state(struct device *dev, + struct device_attribute *attr, char *buf) +{ + u32 value; + unsigned int cpu = dev-id; + + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1); + + value = PWRMGTCR0_PW20_WAIT; + + return sprintf(buf, %u\n, value ? 1 : 0); +} + +static void do_store_pw20_state(void *val) +{ + u32 *value = val; + u32 pw20_state; + + pw20_state = mfspr(SPRN_PWRMGTCR0); + + if (*value) + pw20_state |= PWRMGTCR0_PW20_WAIT; + else + pw20_state = ~PWRMGTCR0_PW20_WAIT; + + mtspr(SPRN_PWRMGTCR0, pw20_state); +} + +static ssize_t store_pw20_state(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + u32 value; + unsigned int cpu = dev-id; + + if (kstrtou32(buf, 0, value)) + return -EINVAL; + + if (value 1) + return -EINVAL; + + smp_call_function_single(cpu, do_store_pw20_state, value, 1); + + return count; +} + +static ssize_t show_pw20_wait_time(struct device *dev, + struct device_attribute *attr, char *buf) +{ + u32 value; + u64 tb_cycle; + u64 time; + + unsigned int cpu = dev-id; + + if (!pw20_wt) { + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1); + value = (value PWRMGTCR0_PW20_ENT) + PWRMGTCR0_PW20_ENT_SHIFT; + + tb_cycle = (1 (MAX_BIT - value)) * 2; + time = tb_cycle * (1000 / tb_ticks_per_usec) - 1; Similar to above comment. -Bharat + } else { + time = pw20_wt; + } + + return sprintf(buf, %llu\n, time); +} + +static void set_pw20_wait_entry_bit(void *val) +{ + u32 *value = val; + u32 pw20_idle; + + pw20_idle = mfspr(SPRN_PWRMGTCR0); + + /* Set Automatic PW20 Core Idle Count */ + /* clear count */ + pw20_idle = ~PWRMGTCR0_PW20_ENT; + + /* set count */ + pw20_idle |= ((MAX_BIT - *value) PWRMGTCR0_PW20_ENT_SHIFT); + + mtspr(SPRN_PWRMGTCR0, pw20_idle); +} + +static ssize_t store_pw20_wait_time(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + u32 entry_bit; + u64 value; + + unsigned int cpu = dev-id; + + if (kstrtou64(buf, 0, value)) + return -EINVAL; + + if (!value) + return -EINVAL; + + entry_bit = get_idle_ticks_bit(value); + if (entry_bit MAX_BIT) + return -EINVAL; + + pw20_wt = value; + smp_call_function_single(cpu, set_pw20_wait_entry_bit, + entry_bit, 1); + +
RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle
-Original Message- From: Wang Dongsheng-B40534 Sent: Wednesday, September 25, 2013 1:40 PM To: Bhushan Bharat-R65777; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Bhushan Bharat-R65777 Sent: Wednesday, September 25, 2013 2:23 PM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of bounces+Dongsheng Wang Sent: Tuesday, September 24, 2013 2:59 PM To: Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle From: Wang Dongsheng dongsheng.w...@freescale.com Add a sys interface to enable/diable pw20 state or altivec idle, and control the wait entry time. Enable/Disable interface: 0, disable. 1, enable. /sys/devices/system/cpu/cpuX/pw20_state /sys/devices/system/cpu/cpuX/altivec_idle Set wait time interface:(Nanosecond) /sys/devices/system/cpu/cpuX/pw20_wait_time /sys/devices/system/cpu/cpuX/altivec_idle_wait_time Example: Base on TBfreq is 41MHZ. 1~47(ns): TB[63] 48~95(ns): TB[62] 96~191(ns): TB[61] 192~383(ns): TB[62] 384~767(ns): TB[60] ... Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com --- *v4: Move code from 85xx/common.c to kernel/sysfs.c. Remove has_pw20_altivec_idle function. Change wait entry_bit to wait time. arch/powerpc/kernel/sysfs.c | 291 1 file changed, 291 insertions(+) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 27a90b9..23fece6 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -85,6 +85,279 @@ __setup(smt-snooze-delay=, setup_smt_snooze_delay); #endif /* CONFIG_PPC64 */ +#ifdef CONFIG_FSL_SOC +#define MAX_BIT 63 + +static u64 pw20_wt; +static u64 altivec_idle_wt; + +static unsigned int get_idle_ticks_bit(u64 ns) { + u64 cycle; + + cycle = div_u64(ns, 1000 / tb_ticks_per_usec); When tb_ticks_per_usec 1000 (timebase frequency 1GHz) then this will always be ns, which is not correct, no? 1000 / tb_ticks_per_usec means nsec_ticks_per_tb If timebase frequency 1GHz, this should be tb_ticks_per_usec / 1000 and to get tb_ticks_per_nsec. This should be changed to cycle = ns * tb_ticks_per_nsec; Yes, we need to change this to two line. But at present we do not have such a platform that timebase frequency more than 1GHz. And I think it is not need to support such a situation. Because we have no environment to test it. If later there will be more than 1GHZ platform at that time to add this support. Would like to leave it to Scott, but personally I think that if there is something simple to fix then it must be fixed rather than waiting for some error to happen and then fixing. -Bharat Thanks. -dongsheng ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 5/7] iommu: supress loff_t compilation error on powerpc
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Wednesday, September 25, 2013 10:10 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 5/7] iommu: supress loff_t compilation error on powerpc On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote: Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- drivers/vfio/pci/vfio_pci_rdwr.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 210db24..8a8156a 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -181,7 +181,8 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite) { int ret; - loff_t off, pos = *ppos VFIO_PCI_OFFSET_MASK; + loff_t off; + u64 pos = (u64 )(*ppos VFIO_PCI_OFFSET_MASK); void __iomem *iomem = NULL; unsigned int rsrc; bool is_ioport; What's the compile error that this fixes? I was getting below error; and after some googling I came to know that this is how it is fixed by other guys. /home/r65777/linux-vfio/drivers/vfio/pci/vfio_pci_rdwr.c:193: undefined reference to `__cmpdi2' /home/r65777/linux-vfio/drivers/vfio/pci/vfio_pci_rdwr.c:193: undefined reference to `__cmpdi2' Thanks -Bharat ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 6/7] vfio: moving some functions in common file
-Original Message- From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org] On Behalf Of Alex Williamson Sent: Wednesday, September 25, 2013 10:33 PM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 6/7] vfio: moving some functions in common file On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote: Some function defined in vfio_iommu_type1.c were common and we want to use these for FSL IOMMU (PAMU) and iommu-none driver. So some of them are moved to vfio_iommu_common.c I think we can do more of that but we will take this step by step. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- drivers/vfio/Makefile|4 +- drivers/vfio/vfio_iommu_common.c | 235 ++ drivers/vfio/vfio_iommu_common.h | 30 + drivers/vfio/vfio_iommu_type1.c | 206 +- 4 files changed, 268 insertions(+), 207 deletions(-) create mode 100644 drivers/vfio/vfio_iommu_common.c create mode 100644 drivers/vfio/vfio_iommu_common.h diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index 72bfabc..c5792ec 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -1,4 +1,4 @@ obj-$(CONFIG_VFIO) += vfio.o -obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o -obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o +obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_common.o +vfio_iommu_type1.o +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_common.o +vfio_iommu_spapr_tce.o obj-$(CONFIG_VFIO_PCI) += pci/ diff --git a/drivers/vfio/vfio_iommu_common.c b/drivers/vfio/vfio_iommu_common.c new file mode 100644 index 000..8bdc0ea --- /dev/null +++ b/drivers/vfio/vfio_iommu_common.c @@ -0,0 +1,235 @@ +/* + * VFIO: Common code for vfio IOMMU support + * + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson alex.william...@redhat.com + * Author: Bharat Bhushan bharat.bhus...@freescale.com + * + * This program is free software; you can redistribute it and/or +modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Derived from original vfio: + * Copyright 2010 Cisco Systems, Inc. All rights reserved. + * Author: Tom Lyon, p...@cisco.com + */ + +#include linux/compat.h +#include linux/device.h +#include linux/fs.h +#include linux/iommu.h +#include linux/module.h +#include linux/mm.h +#include linux/pci.h /* pci_bus_type */ +#include linux/rbtree.h +#include linux/sched.h +#include linux/slab.h +#include linux/uaccess.h +#include linux/vfio.h +#include linux/workqueue.h Please cleanup includes on both the source and target files. You obviously don't need linux/pci.h here for one. Will do. + +static bool disable_hugepages; +module_param_named(disable_hugepages, + disable_hugepages, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(disable_hugepages, +Disable VFIO IOMMU support for IOMMU hugepages.); + +struct vwork { + struct mm_struct*mm; + longnpage; + struct work_struct work; +}; + +/* delayed decrement/increment for locked_vm */ void +vfio_lock_acct_bg(struct work_struct *work) { + struct vwork *vwork = container_of(work, struct vwork, work); + struct mm_struct *mm; + + mm = vwork-mm; + down_write(mm-mmap_sem); + mm-locked_vm += vwork-npage; + up_write(mm-mmap_sem); + mmput(mm); + kfree(vwork); +} + +void vfio_lock_acct(long npage) +{ + struct vwork *vwork; + struct mm_struct *mm; + + if (!current-mm || !npage) + return; /* process exited or nothing to do */ + + if (down_write_trylock(current-mm-mmap_sem)) { + current-mm-locked_vm += npage; + up_write(current-mm-mmap_sem); + return; + } + + /* +* Couldn't get mmap_sem lock, so must setup to update +* mm-locked_vm later. If locked_vm were atomic, we +* wouldn't need this silliness +*/ + vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL); + if (!vwork) + return; + mm = get_task_mm(current); + if (!mm) { + kfree(vwork); + return; + } + INIT_WORK(vwork-work, vfio_lock_acct_bg); + vwork-mm = mm; + vwork-npage = npage; + schedule_work(vwork-work); +} + +/* + * Some mappings aren't backed by a struct page, for example an +mmap'd + * MMIO range for our own or another device. These use a different + * pfn
RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle
-Original Message- From: Wang Dongsheng-B40534 Sent: Thursday, September 26, 2013 8:02 AM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Wood Scott-B07421 Sent: Thursday, September 26, 2013 1:57 AM To: Wang Dongsheng-B40534 Cc: Bhushan Bharat-R65777; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle On Wed, 2013-09-25 at 03:10 -0500, Wang Dongsheng-B40534 wrote: -Original Message- From: Bhushan Bharat-R65777 Sent: Wednesday, September 25, 2013 2:23 PM To: Wang Dongsheng-B40534; Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf bounces+Of Dongsheng Wang Sent: Tuesday, September 24, 2013 2:59 PM To: Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle From: Wang Dongsheng dongsheng.w...@freescale.com Add a sys interface to enable/diable pw20 state or altivec idle, and control the wait entry time. Enable/Disable interface: 0, disable. 1, enable. /sys/devices/system/cpu/cpuX/pw20_state /sys/devices/system/cpu/cpuX/altivec_idle Set wait time interface:(Nanosecond) /sys/devices/system/cpu/cpuX/pw20_wait_time /sys/devices/system/cpu/cpuX/altivec_idle_wait_time Example: Base on TBfreq is 41MHZ. 1~47(ns): TB[63] 48~95(ns): TB[62] 96~191(ns): TB[61] 192~383(ns): TB[62] 384~767(ns): TB[60] ... Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com --- *v4: Move code from 85xx/common.c to kernel/sysfs.c. Remove has_pw20_altivec_idle function. Change wait entry_bit to wait time. arch/powerpc/kernel/sysfs.c | 291 1 file changed, 291 insertions(+) diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index 27a90b9..23fece6 100644 --- a/arch/powerpc/kernel/sysfs.c +++ b/arch/powerpc/kernel/sysfs.c @@ -85,6 +85,279 @@ __setup(smt-snooze-delay=, setup_smt_snooze_delay); #endif /* CONFIG_PPC64 */ +#ifdef CONFIG_FSL_SOC +#define MAX_BIT 63 + +static u64 pw20_wt; +static u64 altivec_idle_wt; + +static unsigned int get_idle_ticks_bit(u64 ns) { + u64 cycle; + + cycle = div_u64(ns, 1000 / tb_ticks_per_usec); When tb_ticks_per_usec 1000 (timebase frequency 1GHz) then this will always be ns, which is not correct, no? Actually it'll be a divide by zero in that case. tb_ticks_per_usec = ppc_tb_freq / 100; Means TB freq should be more than 1MHZ. if ppc_tb_freq less than 100, the tb_ticks_per_usec will be a divide by zero. If this condition is established, I think kernel cannot work as a normal. So I think we need to believe that the variable is not zero. We do believe it is non-zero but greater than 1000 :) And I think TB freq should not less than 1MHZ on PPC platform, because if TB freq less than 1MHZ, the precision time will become very poor and system response time will be slower. Not sure what you are describing here related to divide by zero we are mentioning. You are talking about if tb_ticks_per_usec is ZERO and we are talking about if (1000/tb_ticks_per_usec) will be zero. BTW, div_u64() handle the case where divider is zero. 1000 / tb_ticks_per_usec means nsec_ticks_per_tb If timebase frequency 1GHz, this should be tb_ticks_per_usec / 1000 and to get tb_ticks_per_nsec. This should be changed to cycle = ns * tb_ticks_per_nsec; But at present we do not have such a platform that timebase frequency more than 1GHz. And I think it is not need to support such a situation. Because we have no environment to test it. You can test it by hacking a wrong timebase frequency in and seeing what the calculation does. Or do something like this: if (ns = 1) ^^^ cycle = ((ns + 500) / 1000) * tb_ticks_per_usec; else cycle = div_u64((u64)ns * tb_ticks_per_usec, 1000); We cannot do this, because if (ns+500) 1000, we cannot get the entry bit, it'll always zero bit. There is a if condition of ns = 1, so ns+500 can not be less than 1000. We must to use per_nsec_tb_ticks, like my code 1000 / tb_ticks_per_usec. ...which can
RE: [PATCH 7/7] vfio pci: Add vfio iommu implementation for FSL_PAMU
-Original Message- From: Alex Williamson [mailto:alex.william...@redhat.com] Sent: Thursday, September 26, 2013 12:37 AM To: Bhushan Bharat-R65777 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; linux- ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 7/7] vfio pci: Add vfio iommu implementation for FSL_PAMU On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote: This patch adds vfio iommu support for Freescale IOMMU (PAMU - Peripheral Access Management Unit). The Freescale PAMU is an aperture-based IOMMU with the following characteristics. Each device has an entry in a table in memory describing the iova-phys mapping. The mapping has: -an overall aperture that is power of 2 sized, and has a start iova that is naturally aligned -has 1 or more windows within the aperture -number of windows must be power of 2, max is 256 -size of each window is determined by aperture size / # of windows -iova of each window is determined by aperture start iova / # of windows -the mapped region in each window can be different than the window size...mapping must power of 2 -physical address of the mapping must be naturally aligned with the mapping size Some of the code is derived from TYPE1 iommu (driver/vfio/vfio_iommu_type1.c). Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- drivers/vfio/Kconfig |6 + drivers/vfio/Makefile |1 + drivers/vfio/vfio_iommu_fsl_pamu.c | 952 include/uapi/linux/vfio.h | 100 4 files changed, 1059 insertions(+), 0 deletions(-) create mode 100644 drivers/vfio/vfio_iommu_fsl_pamu.c diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index 26b3d9d..7d1da26 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -8,11 +8,17 @@ config VFIO_IOMMU_SPAPR_TCE depends on VFIO SPAPR_TCE_IOMMU default n +config VFIO_IOMMU_FSL_PAMU + tristate + depends on VFIO + default n + menuconfig VFIO tristate VFIO Non-Privileged userspace driver framework depends on IOMMU_API select VFIO_IOMMU_TYPE1 if X86 select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES) + select VFIO_IOMMU_FSL_PAMU if FSL_PAMU help VFIO provides a framework for secure userspace device drivers. See Documentation/vfio.txt for more details. diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index c5792ec..7461350 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -1,4 +1,5 @@ obj-$(CONFIG_VFIO) += vfio.o obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_common.o vfio_iommu_type1.o obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_common.o vfio_iommu_spapr_tce.o +obj-$(CONFIG_VFIO_IOMMU_FSL_PAMU) += vfio_iommu_common.o +vfio_iommu_fsl_pamu.o obj-$(CONFIG_VFIO_PCI) += pci/ diff --git a/drivers/vfio/vfio_iommu_fsl_pamu.c b/drivers/vfio/vfio_iommu_fsl_pamu.c new file mode 100644 index 000..b29365f --- /dev/null +++ b/drivers/vfio/vfio_iommu_fsl_pamu.c @@ -0,0 +1,952 @@ +/* + * VFIO: IOMMU DMA mapping support for FSL PAMU IOMMU + * + * This program is free software; you can redistribute it and/or +modify + * it under the terms of the GNU General Public License, version 2, +as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + * + * Copyright (C) 2013 Freescale Semiconductor, Inc. + * + * Author: Bharat Bhushan bharat.bhus...@freescale.com + * + * This file is derived from driver/vfio/vfio_iommu_type1.c + * + * The Freescale PAMU is an aperture-based IOMMU with the following + * characteristics. Each device has an entry in a table in memory + * describing the iova-phys mapping. The mapping has: + * -an overall aperture that is power of 2 sized, and has a start iova that + * is naturally aligned + * -has 1 or more windows within the aperture + * -number of windows must be power of 2, max is 256 + * -size of each window is determined by aperture size / # of windows + * -iova of each window is determined by aperture start iova / # of windows + * -the mapped region in each window can be different than + * the window size...mapping must power
RE: [PATCH v4 1/4] powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 define
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Dongsheng Wang Sent: Tuesday, September 24, 2013 2:58 PM To: Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: [PATCH v4 1/4] powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 define From: Wang Dongsheng dongsheng.w...@freescale.com E6500 PVR and SPRN_PWRMGTCR0 will be used in subsequent pw20/altivec idle patches. Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com --- *v3: Add bit definitions for PWRMGTCR0. arch/powerpc/include/asm/reg.h | 2 ++ arch/powerpc/include/asm/reg_booke.h | 9 + 2 files changed, 11 insertions(+) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 64264bf..d4160ca 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1053,6 +1053,8 @@ #define PVR_8560 0x8020 #define PVR_VER_E500V1 0x8020 #define PVR_VER_E500V2 0x8021 +#define PVR_VER_E65000x8040 + /* * For the 8xx processors, all of them report the same PVR family for * the PowerPC core. The various versions of these processors must be diff -- git a/arch/powerpc/include/asm/reg_booke.h b/arch/powerpc/include/asm/reg_booke.h index ed8f836..4a6457e 100644 --- a/arch/powerpc/include/asm/reg_booke.h +++ b/arch/powerpc/include/asm/reg_booke.h @@ -170,6 +170,7 @@ #define SPRN_L2CSR1 0x3FA /* L2 Data Cache Control and Status Register 1 */ #define SPRN_DCCR0x3FA /* Data Cache Cacheability Register */ #define SPRN_ICCR0x3FB /* Instruction Cache Cacheability Register */ +#define SPRN_PWRMGTCR0 0x3FB /* Power management control register 0 */ Is this generic for booke or e6500 specific? I can't see this register either in ISA and EREF. Also I can see SPRN_ICCR also with same SPRN, how that is possible? -Bharat #define SPRN_SVR 0x3FF /* System Version Register */ /* @@ -216,6 +217,14 @@ #define CCR1_DPC0x0100 /* Disable L1 I-Cache/D-Cache parity checking */ #define CCR1_TCS0x0080 /* Timer Clock Select */ +/* Bit definitions for PWRMGTCR0. */ +#define PWRMGTCR0_PW20_WAIT (1 14) /* PW20 state enable bit */ +#define PWRMGTCR0_PW20_ENT_SHIFT 8 +#define PWRMGTCR0_PW20_ENT 0x3F00 +#define PWRMGTCR0_AV_IDLE_PD_EN (1 22) /* Altivec idle enable */ +#define PWRMGTCR0_AV_IDLE_CNT_SHIFT 16 +#define PWRMGTCR0_AV_IDLE_CNT0x3F + /* Bit definitions for the MCSR. */ #define MCSR_MCS 0x8000 /* Machine Check Summary */ #define MCSR_IB 0x4000 /* Instruction PLB Error */ -- 1.8.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v4 1/4] powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 define
-Original Message- From: Kumar Gala [mailto:ga...@kernel.crashing.org] Sent: Tuesday, September 24, 2013 9:19 PM To: Bhushan Bharat-R65777 Cc: Wang Dongsheng-B40534; Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH v4 1/4] powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 define On Sep 24, 2013, at 6:21 AM, Bhushan Bharat-R65777 wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of bounces+Dongsheng Wang Sent: Tuesday, September 24, 2013 2:58 PM To: Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534 Subject: [PATCH v4 1/4] powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 define From: Wang Dongsheng dongsheng.w...@freescale.com E6500 PVR and SPRN_PWRMGTCR0 will be used in subsequent pw20/altivec idle patches. Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com --- *v3: Add bit definitions for PWRMGTCR0. arch/powerpc/include/asm/reg.h | 2 ++ arch/powerpc/include/asm/reg_booke.h | 9 + 2 files changed, 11 insertions(+) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 64264bf..d4160ca 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -1053,6 +1053,8 @@ #define PVR_8560 0x8020 #define PVR_VER_E500V1 0x8020 #define PVR_VER_E500V2 0x8021 +#define PVR_VER_E6500 0x8040 + /* * For the 8xx processors, all of them report the same PVR family for * the PowerPC core. The various versions of these processors must be diff -- git a/arch/powerpc/include/asm/reg_booke.h b/arch/powerpc/include/asm/reg_booke.h index ed8f836..4a6457e 100644 --- a/arch/powerpc/include/asm/reg_booke.h +++ b/arch/powerpc/include/asm/reg_booke.h @@ -170,6 +170,7 @@ #define SPRN_L2CSR10x3FA /* L2 Data Cache Control and Status Register 1 */ #define SPRN_DCCR 0x3FA /* Data Cache Cacheability Register */ #define SPRN_ICCR 0x3FB /* Instruction Cache Cacheability Register */ +#define SPRN_PWRMGTCR00x3FB /* Power management control register 0 */ Is this generic for booke or e6500 specific? I can't see this register either in ISA and EREF. Also I can see SPRN_ICCR also with same SPRN, how that is possible? Its possibly because the register maybe in implementation specific region. I'm guessing ICCR is a 40x specific register. Kumar, this seems to create confusion? Although I do not like so many header files but still I think we can have reg_4xx.h, reg_fsl_booke.h etc for implementation specific definitions. -Bharat - k ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation
-Original Message- From: Wood Scott-B07421 Sent: Friday, September 20, 2013 9:48 PM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; b...@kernel.crashing.org; ag...@suse.de; pau...@samba.org; k...@vger.kernel.org; kvm-...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation On Thu, 2013-09-19 at 23:19 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Friday, September 20, 2013 2:38 AM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; ag...@suse.de; pau...@samba.org; k...@vger.kernel.org; kvm-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation This breaks when you have both E500_TLB_BITMAP and E500_TLB_TLB0 set. I do not see any case where we set both E500_TLB_BITMAP and E500_TLB_TLB0. This would happen if you have a guest TLB1 entry that is backed by some 4K pages and some larger pages (e.g. if the guest maps CCSR with one big TLB1 and there are varying I/O passthrough regions mapped). It's not common, but it's possible. Agree Also we have not optimized that yet (keeping track of multiple shadow TLB0 entries for one guest TLB1 entry) This is about correctness, not optimization. We uses these bit flags only for TLB1 and if size of stlbe is 4K then we set E500_TLB_TLB0 otherwise we set E500_TLB_BITMAP. Although I think that E500_TLB_BITMAP should be set only if stlbe size is less than gtlbe size. Why? Even if there's only one bit set in the map, we need it to keep track of which entry was used. If there is one entry then will not this be simple/faster to not lookup bitmap and guest-host array? A flag indicate it is 1:1 map and this is physical address. -Bharat -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation
-Original Message- From: Wood Scott-B07421 Sent: Friday, September 20, 2013 11:38 PM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; b...@kernel.crashing.org; ag...@suse.de; pau...@samba.org; k...@vger.kernel.org; kvm-...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation On Fri, 2013-09-20 at 13:04 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Friday, September 20, 2013 9:48 PM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; b...@kernel.crashing.org; ag...@suse.de; pau...@samba.org; k...@vger.kernel.org; kvm-...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org Subject: Re: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation On Thu, 2013-09-19 at 23:19 -0500, Bhushan Bharat-R65777 wrote: We uses these bit flags only for TLB1 and if size of stlbe is 4K then we set E500_TLB_TLB0 otherwise we set E500_TLB_BITMAP. Although I think that E500_TLB_BITMAP should be set only if stlbe size is less than gtlbe size. Why? Even if there's only one bit set in the map, we need it to keep track of which entry was used. If there is one entry then will not this be simple/faster to not lookup bitmap and guest-host array? A flag indicate it is 1:1 map and this is physical address. The difference would be negligible, and you'd have added overhead (both runtime and complexity) of making this a special case. May be you are right , I will see if I can give a try :) BTW I have already sent v6 of this patch. -Bharat -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation
-Original Message- From: Wood Scott-B07421 Sent: Friday, September 20, 2013 2:38 AM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; ag...@suse.de; pau...@samba.org; k...@vger.kernel.org; kvm-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation On Thu, 2013-09-19 at 11:32 +0530, Bharat Bhushan wrote: On booke, struct tlbe_ref contains host tlb mapping information (pfn: for guest-pfn to pfn, flags: attribute associated with this mapping) for a guest tlb entry. So when a guest creates a TLB entry then struct tlbe_ref is set to point to valid pfn and set attributes in flags field of the above said structure. When a guest TLB entry is invalidated then flags field of corresponding struct tlbe_ref is updated to point that this is no more valid, also we selectively clear some other attribute bits, example: if E500_TLB_BITMAP was set then we clear E500_TLB_BITMAP, if E500_TLB_TLB0 is set then we clear this. Ideally we should clear complete flags as this entry is invalid and does not have anything to re-used. The other part of the problem is that when we use the same entry again then also we do not clear (started doing or-ing etc). So far it was working because the selectively clearing mentioned above actually clears flags what was set during TLB mapping. But the problem starts coming when we add more attributes to this then we need to selectively clear them and which is not needed. This patch we do both - Clear flags when invalidating; - Clear flags when reusing same entry later Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v3- v5 - New patch (found this issue when doing vfio-pci development) arch/powerpc/kvm/e500_mmu_host.c | 12 +++- 1 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..60f5a3c 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -217,7 +217,8 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500 *vcpu_e500, int tlbsel, } mb(); vcpu_e500-g2h_tlb1_map[esel] = 0; - ref-flags = ~(E500_TLB_BITMAP | E500_TLB_VALID); + /* Clear flags as TLB is not backed by the host anymore */ + ref-flags = 0; local_irq_restore(flags); } This breaks when you have both E500_TLB_BITMAP and E500_TLB_TLB0 set. I do not see any case where we set both E500_TLB_BITMAP and E500_TLB_TLB0. Also we have not optimized that yet (keeping track of multiple shadow TLB0 entries for one guest TLB1 entry) We uses these bit flags only for TLB1 and if size of stlbe is 4K then we set E500_TLB_TLB0 otherwise we set E500_TLB_BITMAP. Although I think that E500_TLB_BITMAP should be set only if stlbe size is less than gtlbe size. Instead, just convert the final E500_TLB_VALID clearing at the end into ref-flags = 0, and convert the early return a few lines earlier into conditional execution of the tlbil_one(). This looks better, will send the patch shortly. Thanks -Bharat -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver
-Original Message- From: Chen Guangyu-B42378 Sent: Monday, August 19, 2013 11:55 AM To: Bhushan Bharat-R65777 Cc: broo...@kernel.org; l...@metafoo.de; p.za...@pengutronix.de; s.ha...@pengutronix.de; mark.rutl...@arm.com; devicet...@vger.kernel.org; alsa- de...@alsa-project.org; swar...@wwwdotorg.org; feste...@gmail.com; ti...@tabi.org; rob.herr...@calxeda.com; tomasz.f...@gmail.com; shawn@linaro.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver Hi Bhushan, I'll revise some as you suggest. Just a few replies here. On Mon, Aug 19, 2013 at 12:38:11PM +0800, Bhushan Bharat-R65777 wrote: We here suppose the reset bit would be cleared -- The software reset will last 8 cycles. from RM, so if this happened to be a failure, the whole IP module won't be normally working as well. Also add a comment describing this against why cycle = 1000 is selected. If it is done in 8 cycles, 1000-cycle will be surely a safe value for it. As long as it finished in 8 cycles, it would quit anyway. Why against? I am not against, I am saying why it was not 200 or 50 or 20 etc. I am saying that write a comment saying this much is sufficient as per specification and so keep 1000/etc as preservative. -Bharat +static bool fsl_spdif_volatile_reg(struct device *dev, unsigned +int reg) { + /* Sync all registers after reset */ Where us sync :) ? The return true would do that. For volatile registers, if no return true here, the whole regmap would use the value in cache, while for some bits we need to trace its true value from the physical registers not from cache. Where will be device registers cached? Do not we program them to be non- cacheable in core? regmap has a regcache for all the mapped registers. Set the regsiters as volatile will allow the driver to sync the regcache with physical memory each time when using regmap_read/write/update_bits(). But I think I can try to use the regcache_bypass instead. Thank you, Nicolin Chen ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: MPC8315 reboot failure, lockdep splat possibly related?
-Original Message- From: Anthony Foiani [mailto:t...@scrye.com] Sent: Sunday, August 18, 2013 5:37 AM To: Bhushan Bharat-R65777 Cc: linuxppc-dev@lists.ozlabs.org Subject: Re: MPC8315 reboot failure, lockdep splat possibly related? Bhushan Bharat-R65777 r65...@freescale.com writes: You should get rid of this by changing spin_lock/unlock() in fsl_sata_set_irq_coalescing() to spin_lock_irqsave/restore() I can verify that the suggested change removes the lockdep warning. The below patch is against 3.9.7 and has been tested on hardware with that release. It applies with slight fuzz to linux-next; I've compile-tested that version, but I have not booted that build on the hardware. The linux-next patch can be found here: http://scrye.com/~tkil/linux/fsl-sata-lockdep-201308/next-sata-fsl-save-irqs- while-coalescing.patch (or: http://preview.tinyurl.com/mpd4e9h ) Anthony, I would prefer if you can send the patch (In case not then let me know) Thanks -Bharat Unfortunately, the hang on reboot was not easily repeatable; I'll report whether it happens in the next few days or not. Thanks again, Anthony Foiani -- 8 -- From 2abb6df770c95eb4103476c70847a78f816fe5e3 Mon Sep 17 00:00:00 2001 From: Anthony Foiani anthony.foi...@gmail.com Date: Sat, 17 Aug 2013 13:28:17 -0600 Subject: [PATCH] sata: fsl: save irqs while coalescing Before this patch, I was seeing the following lockdep splat on my MPC8315 (PPC32) target: [9.086051] = [9.090393] [ INFO: inconsistent lock state ] [9.094744] 3.9.7-ajf-gc39503d #1 Not tainted [9.099087] - [9.103432] inconsistent {HARDIRQ-ON-W} - {IN-HARDIRQ-W} usage. [9.109431] scsi_eh_1/39 [HC1[1]:SC0[0]:HE0:SE1] takes: [9.114642] ((host-lock)-rlock){?.+...}, at: [c02f4168] sata_fsl_interrupt+0x50/0x250 [9.123137] {HARDIRQ-ON-W} state was registered at: [9.128004] [c006cdb8] lock_acquire+0x90/0xf4 [9.132737] [c043ef04] _raw_spin_lock+0x34/0x4c [9.137645] [c02f3560] fsl_sata_set_irq_coalescing+0x68/0x100 [9.143750] [c02f36a0] sata_fsl_init_controller+0xa8/0xc0 [9.149505] [c02f3f10] sata_fsl_probe+0x17c/0x2e8 [9.154568] [c02acc90] driver_probe_device+0x90/0x248 [9.159987] [c02acf0c] __driver_attach+0xc4/0xc8 [9.164964] [c02aae74] bus_for_each_dev+0x5c/0xa8 [9.170028] [c02ac218] bus_add_driver+0x100/0x26c [9.175091] [c02ad638] driver_register+0x88/0x198 [9.180155] [c0003a24] do_one_initcall+0x58/0x1b4 [9.185226] [c05aeeac] kernel_init_freeable+0x118/0x1c0 [9.190823] [c0004110] kernel_init+0x18/0x108 [9.195542] [c000f6b8] ret_from_kernel_thread+0x64/0x6c [9.201142] irq event stamp: 160 [9.204366] hardirqs last enabled at (159): [c043f778] _raw_spin_unlock_irq+0x30/0x50 [9.212469] hardirqs last disabled at (160): [c000f414] reenable_mmu+0x30/0x88 [9.219867] softirqs last enabled at (144): [c002ae5c] __do_softirq+0x168/0x218 [9.227435] softirqs last disabled at (137): [c002b0d4] irq_exit+0xa8/0xb4 [9.234481] [9.234481] other info that might help us debug this: [9.240995] Possible unsafe locking scenario: [9.240995] [9.246898]CPU0 [9.249337] [9.251776] lock((host-lock)-rlock); [9.255878] Interrupt [9.258492] lock((host-lock)-rlock); [9.262765] [9.262765] *** DEADLOCK *** [9.262765] [9.268684] no locks held by scsi_eh_1/39. [9.272767] [9.272767] stack backtrace: [9.277117] Call Trace: [9.279589] [cfff9da0] [c0008504] show_stack+0x48/0x150 (unreliable) [9.285972] [cfff9de0] [c0447d5c] print_usage_bug.part.35+0x268/0x27c [9.292425] [cfff9e10] [c006ace4] mark_lock+0x2ac/0x658 [9.297660] [cfff9e40] [c006b7e4] __lock_acquire+0x754/0x1840 [9.303414] [cfff9ee0] [c006cdb8] lock_acquire+0x90/0xf4 [9.308745] [cfff9f20] [c043ef04] _raw_spin_lock+0x34/0x4c [9.314250] [cfff9f30] [c02f4168] sata_fsl_interrupt+0x50/0x250 [9.320187] [cfff9f70] [c0079ff0] handle_irq_event_percpu+0x90/0x254 [9.326547] [cfff9fc0] [c007a1fc] handle_irq_event+0x48/0x78 [9.332220] [cfff9fe0] [c007c95c] handle_level_irq+0x9c/0x104 [9.337981] [cfff9ff0] [c000d978] call_handle_irq+0x18/0x28 [9.343568] [cc7139f0] [c000608c] do_IRQ+0xf0/0x1a8 [9.348464] [cc713a20] [c000fc8c] ret_from_except+0x0/0x14 [9.353983] --- Exception: 501 at _raw_spin_unlock_irq+0x40/0x50 [9.353983] LR = _raw_spin_unlock_irq+0x30/0x50 [9.364839] [cc713af0] [c043db10] wait_for_common+0xac/0x188 [9.370513] [cc713b30] [c02ddee4] ata_exec_internal_sg+0x2b0/0x4f0 [9.376699] [cc713be0] [c02de18c] ata_exec_internal+0x68/0xa8 [9.382454] [cc713c20
RE: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver
-Original Message- From: Chen Guangyu-B42378 Sent: Monday, August 19, 2013 8:38 AM To: Bhushan Bharat-R65777 Cc: broo...@kernel.org; l...@metafoo.de; p.za...@pengutronix.de; s.ha...@pengutronix.de; mark.rutl...@arm.com; devicet...@vger.kernel.org; alsa- de...@alsa-project.org; swar...@wwwdotorg.org; feste...@gmail.com; ti...@tabi.org; rob.herr...@calxeda.com; tomasz.f...@gmail.com; shawn@linaro.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver Hi Bhushan, Thank you for the comments :) I'll fix some in v7. Here is my some replies to you. On Sat, Aug 17, 2013 at 02:24:19AM +0800, Bhushan Bharat-R65777 wrote: This patch add S/PDIF controller driver for Freescale SoC. Please give some more description of the driver? I've referred some ASoC drivers, all of them seem to be brief as mine. So I'm not sure what else information I should provide here. It's already kinda okay to me. Other does not have description does not mean we also should not add description here. Please describe in few lines about this driver and devices it handles? +struct spdif_mixer_control { + /* buffer ptrs for writer */ + u32 upos; + u32 qpos; They does not look like pointer? They are more like offsets to get the correspond pointer. But I'll change the confusing comments. +/* U/Q Channel receive register full */ static void +spdif_irq_uqrx_full(struct fsl_spdif_priv *spdif_priv, char name) { + struct spdif_mixer_control *ctrl = spdif_priv-fsl_spdif_control; + struct regmap *regmap = spdif_priv-regmap; + struct platform_device *pdev = spdif_priv-pdev; + u32 *pos, size, val, reg; + + switch (name) { + case 'U': + pos = ctrl-upos; + size = SPDIF_UBITS_SIZE; + reg = REG_SPDIF_SRU; + break; + case 'Q': + pos = ctrl-qpos; + size = SPDIF_QSUB_SIZE; + reg = REG_SPDIF_SRQ; + break; + default: + return; Should return error. IMHO, this should be fine. It's a void type function and being used in the isr(). The params 'name' is totally controlled by driver itself, so basically we don't need to worry about the default path. Silently returning on potential error is bad. At least add a printk/BUGON or something similar which points that some unexpected parameter is passed. + if (*pos = size * 2) { + *pos = 0; + } else if (unlikely((*pos % size) + 3 size)) { + dev_err(pdev-dev, User bit receivce buffer overflow\n); + return; Should return error. Ditto, it's being used in isr(), we don't need to detect the return value, just use dev_err() to warn users and let the driver clear the irq. Same as above +/* U/Q Channel framing error */ +static void spdif_irq_uq_err(struct fsl_spdif_priv *spdif_priv) { + struct spdif_mixer_control *ctrl = spdif_priv-fsl_spdif_control; + struct regmap *regmap = spdif_priv-regmap; + struct platform_device *pdev = spdif_priv-pdev; + u32 val; + + dev_dbg(pdev-dev, isr: U/Q Channel framing error\n); + + /* read U/Q data and do buffer reset */ + regmap_read(regmap, REG_SPDIF_SRU, val); + regmap_read(regmap, REG_SPDIF_SRQ, val); Above prints says read u/q data and buffer reset, what is buffer reset? Is that read on clear? That's the behavior needed by IP, according to the reference manual: U Channel receive register full, can't be cleared with reg. IntClear. To clear it, read from U Rx reg. and Q Channel receive register full, can't be cleared with reg. IntClear. To clear it, read from Q Rx reg. Then please add this behavior in comment. +static void spdif_softreset(struct fsl_spdif_priv *spdif_priv) { + struct regmap *regmap = spdif_priv-regmap; + u32 val, cycle = 1000; + + regmap_write(regmap, REG_SPDIF_SCR, SCR_SOFT_RESET); + regcache_sync(regmap); + + /* RESET bit would be cleared after finishing its reset procedure */ + do { + regmap_read(regmap, REG_SPDIF_SCR, val); + } while ((val SCR_SOFT_RESET) cycle--); What if reset is not cleared and timeout happen? We here suppose the reset bit would be cleared -- The software reset will last 8 cycles. from RM, so if this happened to be a failure, the whole IP module won't be normally working as well. Also add a comment describing this against why cycle = 1000 is selected. Well, but I don't mind to put here an extra failed return to make it clear. +static u8 reverse_bits(u8 input) +{ + u8 tmp = input; + + tmp = ((tmp 0b10101010) 1) | ((tmp 1) 0b10101010); + tmp = ((tmp 0b11001100) 2) | ((tmp 2) 0b11001100); + tmp = ((tmp 0b) 4) | ((tmp 4) 0b); What is this logic, can the hardcoding be removed and some description on above calculation? This was provided by Philipp Zabel in his
RE: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Nicolin Chen Sent: Friday, August 16, 2013 6:27 PM To: broo...@kernel.org; l...@metafoo.de; p.za...@pengutronix.de; s.ha...@pengutronix.de Cc: mark.rutl...@arm.com; devicet...@vger.kernel.org; alsa-devel@alsa- project.org; swar...@wwwdotorg.org; feste...@gmail.com; ti...@tabi.org; rob.herr...@calxeda.com; tomasz.f...@gmail.com; shawn@linaro.org; linuxppc- d...@lists.ozlabs.org Subject: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver This patch add S/PDIF controller driver for Freescale SoC. Please give some more description of the driver? Signed-off-by: Nicolin Chen b42...@freescale.com --- .../devicetree/bindings/sound/fsl,spdif.txt| 56 + sound/soc/fsl/Kconfig |3 + sound/soc/fsl/Makefile |2 + sound/soc/fsl/fsl_spdif.c | 1272 sound/soc/fsl/fsl_spdif.h | 224 5 files changed, 1557 insertions(+), 0 deletions(-) create mode 100644 Documentation/devicetree/bindings/sound/fsl,spdif.txt create mode 100644 sound/soc/fsl/fsl_spdif.c create mode 100644 sound/soc/fsl/fsl_spdif.h diff --git a/Documentation/devicetree/bindings/sound/fsl,spdif.txt b/Documentation/devicetree/bindings/sound/fsl,spdif.txt new file mode 100644 index 000..5549ce3 --- /dev/null +++ b/Documentation/devicetree/bindings/sound/fsl,spdif.txt @@ -0,0 +1,56 @@ +Freescale Sony/Philips Digital Interface Format (S/PDIF) Controller + +The Freescale S/PDIF audio block is a stereo transceiver that allows the +processor to receive and transmit digital audio via an coaxial cable or +a fibre cable. + +Required properties: + + - compatible : Compatible list, contains fsl,chip-spdif. + + - reg : Offset and length of the register set for the device. + + - interrupts : Contains spdif interrupt. + + - dmas : Generic dma devicetree binding as described in + Documentation/devicetree/bindings/dma/dma.txt. + + - dma-names : Two dmas have to be defined, tx and rx. + + - clocks : Contains an entry for each entry in clock-names. + + - clock-names : Includes the following entries: + namecomments + core The core clock of spdif controller + rxtx0-7 Clock source list for tx and rx clock. + This clock list should be identical to + the source list connecting to the spdif + clock mux in SPDIF Transceiver Clock + Diagram of SoC reference manual. It + can also be referred to TxClk_Source + bit of register SPDIF_STC. + +Example: + +spdif: spdif@02004000 { + compatible = fsl,imx6q-spdif, + fsl,imx35-spdif; + reg = 0x02004000 0x4000; + interrupts = 0 52 0x04; + dmas = sdma 14 18 0, +sdma 15 18 0; + dma-names = rx, tx; + + clocks = clks 197, clks 3, +clks 197, clks 107, +clks 0, clks 118, +clks 62, clks 139, +clks 0; + clock-names = core, rxtx0, + rxtx1, rxtx2, + rxtx3, rxtx4, + rxtx5, rxtx6, + rxtx7; + + status = okay; +}; diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig index e15f771..2c518db 100644 --- a/sound/soc/fsl/Kconfig +++ b/sound/soc/fsl/Kconfig @@ -1,6 +1,9 @@ config SND_SOC_FSL_SSI tristate +config SND_SOC_FSL_SPDIF + tristate + config SND_SOC_FSL_UTILS tristate diff --git a/sound/soc/fsl/Makefile b/sound/soc/fsl/Makefile index d4b4aa8..4b5970e 100644 --- a/sound/soc/fsl/Makefile +++ b/sound/soc/fsl/Makefile @@ -12,9 +12,11 @@ obj-$(CONFIG_SND_SOC_P1022_RDK) += snd-soc-p1022-rdk.o # Freescale PowerPC SSI/DMA Platform Support snd-soc-fsl-ssi-objs := fsl_ssi.o +snd-soc-fsl-spdif-objs := fsl_spdif.o snd-soc-fsl-utils-objs := fsl_utils.o snd-soc-fsl-dma-objs := fsl_dma.o obj-$(CONFIG_SND_SOC_FSL_SSI) += snd-soc-fsl-ssi.o +obj-$(CONFIG_SND_SOC_FSL_SPDIF) += snd-soc-fsl-spdif.o obj-$(CONFIG_SND_SOC_FSL_UTILS) += snd-soc-fsl-utils.o obj-$(CONFIG_SND_SOC_POWERPC_DMA) += snd-soc-fsl-dma.o diff --git a/sound/soc/fsl/fsl_spdif.c b/sound/soc/fsl/fsl_spdif.c new file mode 100644 index 000..e00125e --- /dev/null +++ b/sound/soc/fsl/fsl_spdif.c @@ -0,0 +1,1272 @@ +/* + * Freescale S/PDIF ALSA SoC Digital Audio Interface (DAI) driver + * + * Copyright (C) 2013 Freescale Semiconductor, Inc. + * + * Based on stmp3xxx_spdif_dai.c + * Vladimir Barinov vbari...@embeddedalley.com + * Copyright 2008 SigmaTel, Inc + * Copyright 2008 Embedded Alley Solutions, Inc + * + * This file is licensed under the terms of the GNU General Public License + * version 2. This program is licensed as is
RE: MPC8315 reboot failure, lockdep splat possibly related?
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Anthony Foiani Sent: Saturday, August 17, 2013 7:10 AM To: linuxppc-dev@lists.ozlabs.org Subject: MPC8315 reboot failure, lockdep splat possibly related? Greetings. I've been experiencing occasional lockups at reboot for a few weeks, but only once every 10-20 boots. A good reboot looks like this: [47529.721640] lm77 0-0048: shutdown [47529.725160] rtc-m41t80 0-0068: shutdown [47529.729169] i2c i2c-0: shutdown [47529.732534] fsl-ehci fsl-ehci.0: shutdown [47529.736842] sd 1:0:0:0: shutdown [47529.740239] sd 1:0:0:0: [sda] Synchronizing SCSI cache [47529.747091] uio_pci_generic :00:0a.0: shutdown [47529.752079] pci :00:00.0: shutdown [47529.756021] Restarting system. While a bad one fails after the EHCI shutdown: [ 747.578001] lm77 0-0048: shutdown [ 747.581522] rtc-m41t80 0-0068: shutdown [ 747.585538] i2c i2c-0: shutdown [ 747.588909] sd 1:0:0:0: shutdown [ 747.592304] sd 1:0:0:0: [sda] Synchronizing SCSI cache [ 747.597973] fsl-ehci fsl-ehci.0: shutdown I enabled lockdep, and I get this splat on every boot, regardless of whether it locks up at reboot or not. Could it possibly be related? Any other ideas on how to avoid the reboot lockup? [9.086051] = [9.090393] [ INFO: inconsistent lock state ] [9.094744] 3.9.7-ajf-gc39503d #1 Not tainted [9.099087] - [9.103432] inconsistent {HARDIRQ-ON-W} - {IN-HARDIRQ-W} usage. [9.109431] scsi_eh_1/39 [HC1[1]:SC0[0]:HE0:SE1] takes: [9.114642] ((host-lock)-rlock){?.+...}, at: [c02f4168] sata_fsl_interrupt+0x50/0x250 [9.123137] {HARDIRQ-ON-W} state was registered at: [9.128004] [c006cdb8] lock_acquire+0x90/0xf4 [9.132737] [c043ef04] _raw_spin_lock+0x34/0x4c [9.137645] [c02f3560] fsl_sata_set_irq_coalescing+0x68/0x100 [9.143750] [c02f36a0] sata_fsl_init_controller+0xa8/0xc0 [9.149505] [c02f3f10] sata_fsl_probe+0x17c/0x2e8 [9.154568] [c02acc90] driver_probe_device+0x90/0x248 [9.159987] [c02acf0c] __driver_attach+0xc4/0xc8 [9.164964] [c02aae74] bus_for_each_dev+0x5c/0xa8 [9.170028] [c02ac218] bus_add_driver+0x100/0x26c [9.175091] [c02ad638] driver_register+0x88/0x198 [9.180155] [c0003a24] do_one_initcall+0x58/0x1b4 [9.185226] [c05aeeac] kernel_init_freeable+0x118/0x1c0 [9.190823] [c0004110] kernel_init+0x18/0x108 [9.195542] [c000f6b8] ret_from_kernel_thread+0x64/0x6c [9.201142] irq event stamp: 160 [9.204366] hardirqs last enabled at (159): [c043f778] _raw_spin_unlock_irq+0x30/0x50 [9.212469] hardirqs last disabled at (160): [c000f414] reenable_mmu+0x30/0x88 [9.219867] softirqs last enabled at (144): [c002ae5c] __do_softirq+0x168/0x218 [9.227435] softirqs last disabled at (137): [c002b0d4] irq_exit+0xa8/0xb4 [9.234481] [9.234481] other info that might help us debug this: [9.240995] Possible unsafe locking scenario: [9.240995] [9.246898]CPU0 [9.249337] [9.251776] lock((host-lock)-rlock); [9.255878] Interrupt [9.258492] lock((host-lock)-rlock); [9.262765] [9.262765] *** DEADLOCK *** You should get rid of this by changing spin_lock/unlock() in fsl_sata_set_irq_coalescing() to spin_lock_irqsave/restore() -Bharat [9.262765] [9.268684] no locks held by scsi_eh_1/39. [9.272767] [9.272767] stack backtrace: [9.277117] Call Trace: [9.279589] [cfff9da0] [c0008504] show_stack+0x48/0x150 (unreliable) [9.285972] [cfff9de0] [c0447d5c] print_usage_bug.part.35+0x268/0x27c [9.292425] [cfff9e10] [c006ace4] mark_lock+0x2ac/0x658 [9.297660] [cfff9e40] [c006b7e4] __lock_acquire+0x754/0x1840 [9.303414] [cfff9ee0] [c006cdb8] lock_acquire+0x90/0xf4 [9.308745] [cfff9f20] [c043ef04] _raw_spin_lock+0x34/0x4c [9.314250] [cfff9f30] [c02f4168] sata_fsl_interrupt+0x50/0x250 [9.320187] [cfff9f70] [c0079ff0] handle_irq_event_percpu+0x90/0x254 [9.326547] [cfff9fc0] [c007a1fc] handle_irq_event+0x48/0x78 [9.332220] [cfff9fe0] [c007c95c] handle_level_irq+0x9c/0x104 [9.337981] [cfff9ff0] [c000d978] call_handle_irq+0x18/0x28 [9.343568] [cc7139f0] [c000608c] do_IRQ+0xf0/0x1a8 [9.348464] [cc713a20] [c000fc8c] ret_from_except+0x0/0x14 [9.353983] --- Exception: 501 at _raw_spin_unlock_irq+0x40/0x50 [9.353983] LR = _raw_spin_unlock_irq+0x30/0x50 [9.364839] [cc713af0] [c043db10] wait_for_common+0xac/0x188 [9.370513] [cc713b30] [c02ddee4] ata_exec_internal_sg+0x2b0/0x4f0 [9.376699] [cc713be0] [c02de18c] ata_exec_internal+0x68/0xa8 [
RE: [PATCH] KVM: PPC: POWERNV: move iommu_add_device earlier
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey Kardashevskiy Sent: Wednesday, August 14, 2013 2:55 PM To: linuxppc-dev@lists.ozlabs.org Cc: Alexey Kardashevskiy; Paul Mackerras; linux-ker...@vger.kernel.org Subject: [PATCH] KVM: PPC: POWERNV: move iommu_add_device earlier The current implementation of IOMMU on sPAPR does not use iommu_ops and therefore does not call IOMMU API's bus_set_iommu() which 1) sets iommu_ops for a bus 2) registers a bus notifier Instead, PCI devices are added to IOMMU groups from subsys_initcall_sync(tce_iommu_init) which does basically the same thing without using iommu_ops callbacks. However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158) implements iommu_ops and when tce_iommu_init is called, every PCI device is already added to some group so there is a conflict. This patch does 2 things: 1. removes the loop in which PCI devices were added to groups and adds explicit iommu_add_device() calls to add devices as soon as they get the iommu_table pointer assigned to them. 2. moves a bus notifier to powernv code in order to avoid conflict with the notifier from Freescale driver. iommu_add_device() and iommu_del_device() are public now. This works for me (able to boot Linux, as expected) :-) But a question, why not move arch/powerpc/kernel/iommu.c in platform/ ? or use this for book3s or not_book3e only? Thanks -Bharat Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- arch/powerpc/include/asm/iommu.h| 2 ++ arch/powerpc/kernel/iommu.c | 41 +++-- arch/powerpc/platforms/powernv/pci-ioda.c | 12 ++--- arch/powerpc/platforms/powernv/pci-p5ioc2.c | 1 + arch/powerpc/platforms/powernv/pci.c| 31 ++ arch/powerpc/platforms/pseries/iommu.c | 7 +++-- 6 files changed, 51 insertions(+), 43 deletions(-) diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h index c34656a..ba74329 100644 --- a/arch/powerpc/include/asm/iommu.h +++ b/arch/powerpc/include/asm/iommu.h @@ -103,6 +103,8 @@ extern struct iommu_table *iommu_init_table(struct iommu_table * tbl, int nid); extern void iommu_register_group(struct iommu_table *tbl, int pci_domain_number, unsigned long pe_num); +extern int iommu_add_device(struct device *dev); +extern void iommu_del_device(struct device *dev); extern int iommu_map_sg(struct device *dev, struct iommu_table *tbl, struct scatterlist *sglist, int nelems, diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index b20ff17..15f8ca8 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -1105,7 +1105,7 @@ void iommu_release_ownership(struct iommu_table *tbl) } EXPORT_SYMBOL_GPL(iommu_release_ownership); -static int iommu_add_device(struct device *dev) +int iommu_add_device(struct device *dev) { struct iommu_table *tbl; int ret = 0; @@ -1134,46 +1134,13 @@ static int iommu_add_device(struct device *dev) return ret; } +EXPORT_SYMBOL_GPL(iommu_add_device); -static void iommu_del_device(struct device *dev) +void iommu_del_device(struct device *dev) { iommu_group_remove_device(dev); } - -static int iommu_bus_notifier(struct notifier_block *nb, - unsigned long action, void *data) -{ - struct device *dev = data; - - switch (action) { - case BUS_NOTIFY_ADD_DEVICE: - return iommu_add_device(dev); - case BUS_NOTIFY_DEL_DEVICE: - iommu_del_device(dev); - return 0; - default: - return 0; - } -} - -static struct notifier_block tce_iommu_bus_nb = { - .notifier_call = iommu_bus_notifier, -}; - -static int __init tce_iommu_init(void) -{ - struct pci_dev *pdev = NULL; - - BUILD_BUG_ON(PAGE_SIZE IOMMU_PAGE_SIZE); - - for_each_pci_dev(pdev) - iommu_add_device(pdev-dev); - - bus_register_notifier(pci_bus_type, tce_iommu_bus_nb); - return 0; -} - -subsys_initcall_sync(tce_iommu_init); +EXPORT_SYMBOL_GPL(iommu_del_device); #else diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c index d8140b1..a9f8fef 100644 --- a/arch/powerpc/platforms/powernv/pci-ioda.c +++ b/arch/powerpc/platforms/powernv/pci-ioda.c @@ -441,6 +441,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev pe = phb-ioda.pe_array[pdn-pe_number]; set_iommu_table_base(pdev-dev, pe-tce32_table); + iommu_add_device(pdev-dev); } static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus) @@ -449,6 +450,7 @@ static void pnv_ioda_setup_bus_dma(struct
RE: Powerpc: Kernel warn_on when enabling IOMMU_API
-Original Message- From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru] Sent: Tuesday, August 13, 2013 5:41 AM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API On 08/13/2013 02:14 AM, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru] Sent: Monday, August 12, 2013 7:44 PM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API On 08/12/2013 08:20 PM, Bhushan Bharat-R65777 wrote: And this simple fix work for me diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index b20ff17..8869b0d 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -48,6 +48,8 @@ #include asm/vio.h #include asm/tce.h +#define DEBUG + #define DBG(...) static int novmerge; @@ -871,7 +873,7 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size, } } -#ifdef CONFIG_IOMMU_API +#ifdef SPAPR_TCE_IOMMU /* * SPAPR TCE API */ -- And with this fix, what does ls -laR /sys/kernel/iommu_groups/ print? It shows the list of group id and respective devices: Is it vanilla 3.11-rc1 kernel? Wow. What does lspci show there? It is 3.11-rc1 + (FSL_IOMMU + VFIO-PCI : local changes). root@p5040ds:~# lspci 00:00.0 Class 0604: 1957:0450 01:00.0 Class 0200: 8086:10fb 00:00.0 Class 0604: 1957:0450 01:00.0 Class 0200: 8086:10d3 We uses the bus_set_iommu(), generic iommu api, which creates a iommu_group for a device (drivers/iommu/iommu.c) using. Also this have notifier to support hotplug-able device. So when this initcall (in arch/powerpc/kernel/iommu.c) is called, iommu group is already setup for the device/s. I think we do not need this piece of code for powerpc. So what is the best way to stub this out for FSL PowerPC/IOMMU? Will the above #ifdef SPAPR_TCE_IOMMU work? Other way can be selecting iommu.c and dma-iommu.c in Makefile if SPAPR_TCE_IOMMU defined and not if CONFIG_64BIT. -Bharat -- Alexey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Powerpc: Kernel warn_on when enabling IOMMU_API
-Original Message- From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru] Sent: Tuesday, August 13, 2013 6:25 PM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API On 08/13/2013 08:44 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru] Sent: Tuesday, August 13, 2013 5:41 AM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API On 08/13/2013 02:14 AM, Bhushan Bharat-R65777 wrote: -Original Message- From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru] Sent: Monday, August 12, 2013 7:44 PM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API On 08/12/2013 08:20 PM, Bhushan Bharat-R65777 wrote: And this simple fix work for me diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index b20ff17..8869b0d 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -48,6 +48,8 @@ #include asm/vio.h #include asm/tce.h +#define DEBUG + #define DBG(...) static int novmerge; @@ -871,7 +873,7 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size, } } -#ifdef CONFIG_IOMMU_API +#ifdef SPAPR_TCE_IOMMU /* * SPAPR TCE API */ -- And with this fix, what does ls -laR /sys/kernel/iommu_groups/ print? It shows the list of group id and respective devices: Is it vanilla 3.11-rc1 kernel? Wow. What does lspci show there? It is 3.11-rc1 + (FSL_IOMMU + VFIO-PCI : local changes). root@p5040ds:~# lspci 00:00.0 Class 0604: 1957:0450 01:00.0 Class 0200: 8086:10fb 00:00.0 Class 0604: 1957:0450 01:00.0 Class 0200: 8086:10d3 Is it one PCI domain or two PCI domains? Hm. We uses the bus_set_iommu(), generic iommu api, which creates a iommu_group for a device (drivers/iommu/iommu.c) using. Also this have notifier to support hotplug-able device. So when this initcall (in arch/powerpc/kernel/iommu.c) is called, iommu group is already setup for the device/s. I think we do not need this piece of code for powerpc. So what is the best way to stub this out for FSL PowerPC/IOMMU? So you implemented iommu_ops? Can you share your code somewhere, just to have a look? https://lkml.org/lkml/2013/7/1/158 Will the above #ifdef SPAPR_TCE_IOMMU work? Other way can be selecting iommu.c and dma-iommu.c in Makefile if SPAPR_TCE_IOMMU defined and not if CONFIG_64BIT. If SPAPR_TCE_IOMMU is enabled, the code would compile and the subsys_init would be called anyway, so normal production kernel will fail anyway. We will not enable this on FSL powerpc, -Bharat -- Alexey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Powerpc: Kernel warn_on when enabling IOMMU_API
Hi Alexey/Ben, When I enable the IOMMU_API then I get warn_on in arch/powerpc/kernel/iommu.c (here is the code snapshot) { 1110 static int iommu_add_device(struct device *dev) { 1112 struct iommu_table *tbl; 1113 int ret = 0; 1114 1115 if (WARN_ON(dev-iommu_group)) { This is the point is Warn_on. 1116 pr_warn(iommu_tce: device %s is already in iommu group %d, skipping\n, 1117 dev_name(dev), 1118 iommu_group_id(dev-iommu_group)); 1119 return -EBUSY; 1120 } } ---This is the bootlog with #define DEBUG in iommu.c- Using P5040 DS machine description MMU: Supported page sizes 4 KB as direct 4096 KB as direct 16384 KB as direct 65536 KB as direct 262144 KB as direct 1048576 KB as direct MMU: Book3E HW tablewalk not supported Found initrd at 0xc0002b759000:0xc00024ab bootconsole [udbg0] enabled CPU maps initialized for 1 thread per core Starting Linux PPC64 #16 SMP Mon Aug 12 15:22:11 IST 2013 - ppc64_pft_size= 0x0 physicalMemorySize= 0x2 ppc64_caches.dcache_line_size = 0x40 ppc64_caches.icache_line_size = 0x40 - Linux version 3.11.0-rc1-10505-g8d33668-dirty (r65777@perfidc-01) (gcc version 4.5.1 (Sourcery G++ Lite 2010.09-55) ) #16 SMP Mon Aug 12 15:22:11 IST 2013 CF12 Setup Arch [boot]0012 Setup Arch P5040 DS board from Freescale Semiconductor Zone ranges: DMA [mem 0x-0x1] Normal empty Movable zone start for each node Early memory node ranges node 0: [mem 0x-0x1] MMU: Allocated 2112 bytes of context maps for 255 contexts CF15 Setup Done [boot]0015 Setup Done PERCPU: Embedded 10 pages/cpu @cb10 s11200 r0 d29760 u262144 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 2068480 Kernel command line: console=ttyS0,115200 ramdisk_size=1000 root=/dev/ram rw PID hash table entries: 4096 (order: 3, 32768 bytes) Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes) Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes) Sorting __ex_table... Memory: 8110276K/8388608K available (6276K kernel code, 1104K rwdata, 2212K rodata, 268K init, 325K bss, 278332K reserved) SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 Hierarchical RCU implementation. RCU restricting CPUs from NR_CPUS=24 to nr_cpu_ids=4. NR_IRQS:512 nr_irqs:512 16 mpic: Setting up MPIC OpenPIC version 1.2 at ffe04, max 4 CPUs mpic: ISU size: 512, shift: 9, mask: 1ff mpic: Initializing for 512 sources clocksource: timebase mult[1400] shift[24] registered Console: colour dummy device 80x25 pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 256 mpic: requesting IPIs... Brought up 4 CPUs devtmpfs: initialized NET: Registered protocol family 16 Found FSL PCI host bridge at 0x000ffe20. Firmware bus number: 0-1 PCI host bridge /pcie@ffe20 (primary) ranges: MEM 0x000c..0x000c1fff - 0xe000 IO 0x000ff800..0x000ff800 - 0x /pcie@ffe20: PCICSRBAR @ 0xdf00 /pcie@ffe20: Setup 64-bit PCI DMA window /pcie@ffe20: DMA window size is 0xdf00 Found FSL PCI host bridge at 0x000ffe201000. Firmware bus number: 0-1 PCI host bridge /pcie@ffe201000 ranges: MEM 0x000c2000..0x000c3fff - 0xe000 IO 0x000ff801..0x000ff801 - 0x /pcie@ffe201000: PCICSRBAR @ 0xdf00 /pcie@ffe201000: Setup 64-bit PCI DMA window /pcie@ffe201000: DMA window size is 0xdf00 software IO TLB [mem 0x0bdca000-0x0fdca000] (64MB) mapped at [cbdca000-cfdc9fff] PCI: Probing PCI hardware fsl-pci ffe20.pcie: PCI host bridge to bus :00 pci_bus :00: root bus resource [io 0x1-0x1] (bus address [0x-0x]) pci_bus :00: root bus resource [mem 0xc-0xc1fff] (bus address [0xe000-0x]) pci_bus :00: root bus resource [bus 00-01] pci :00:00.0: ignoring class 0x0b2000 (doesn't match header type 01) pci :00:00.0: PCI bridge to [bus 01-ff] fsl-pci ffe201000.pcie: PCI host bridge to bus 0001:00 pci_bus 0001:00: root bus resource [io 0x21000-0x30fff] (bus address [0x-0x]) pci_bus 0001:00: root bus resource [mem 0xc2000-0xc3fff] (bus address [0xe000-0x]) pci_bus 0001:00: root bus resource [bus 00-01] pci 0001:00:00.0: ignoring class 0x0b2000 (doesn't match header type 01) pci 0001:00:00.0: PCI bridge to [bus 01-ff] pci :00:00.0: PCI bridge to [bus 01] pci :00:00.0: bridge window [io 0x1-0x1] pci :00:00.0: bridge window [mem 0xc-0xc1fff] pci 0001:00:00.0: BAR 9: can't assign mem pref (size 0x10)
RE: Powerpc: Kernel warn_on when enabling IOMMU_API
-Original Message- From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru] Sent: Monday, August 12, 2013 7:44 PM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API On 08/12/2013 08:20 PM, Bhushan Bharat-R65777 wrote: And this simple fix work for me diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index b20ff17..8869b0d 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -48,6 +48,8 @@ #include asm/vio.h #include asm/tce.h +#define DEBUG + #define DBG(...) static int novmerge; @@ -871,7 +873,7 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size, } } -#ifdef CONFIG_IOMMU_API +#ifdef SPAPR_TCE_IOMMU /* * SPAPR TCE API */ -- And with this fix, what does ls -laR /sys/kernel/iommu_groups/ print? It shows the list of group id and respective devices: root@p5040ds:~# ls -laR /sys/kernel/iommu_groups/ /sys/kernel/iommu_groups/: total 0 drwxr-xr-x 15 root root 0 Sep 6 01:42 . drwxr-xr-x 6 root root 0 Jan 1 1970 .. drwxr-xr-x 3 root root 0 Sep 6 01:43 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 1 drwxr-xr-x 3 root root 0 Sep 6 01:43 10 drwxr-xr-x 3 root root 0 Sep 6 01:43 11 drwxr-xr-x 3 root root 0 Sep 6 01:43 12 drwxr-xr-x 3 root root 0 Sep 6 01:43 2 drwxr-xr-x 3 root root 0 Sep 6 01:43 3 drwxr-xr-x 3 root root 0 Sep 6 01:43 4 drwxr-xr-x 3 root root 0 Sep 6 01:43 5 drwxr-xr-x 3 root root 0 Sep 6 01:43 6 drwxr-xr-x 3 root root 0 Sep 6 01:43 7 drwxr-xr-x 3 root root 0 Sep 6 01:43 8 drwxr-xr-x 3 root root 0 Sep 6 01:43 9 /sys/kernel/iommu_groups/0: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/0/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe100300.dma - ../../../../devices/ffe00.soc/ffe100300.dma /sys/kernel/iommu_groups/1: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/1/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe101300.dma - ../../../../devices/ffe00.soc/ffe101300.dma /sys/kernel/iommu_groups/10: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/10/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe302000.jr - ../../../../devices/ffe00.soc/ffe30.crypto/ffe302000.jr /sys/kernel/iommu_groups/11: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/11/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe303000.jr - ../../../../devices/ffe00.soc/ffe30.crypto/ffe303000.jr /sys/kernel/iommu_groups/12: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/12/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe304000.jr - ../../../../devices/ffe00.soc/ffe30.crypto/ffe304000.jr /sys/kernel/iommu_groups/2: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/2/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe114000.sdhc - ../../../../devices/ffe00.soc/ffe114000.sdhc /sys/kernel/iommu_groups/3: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/3/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe21.usb - ../../../../devices/ffe00.soc/ffe21.usb /sys/kernel/iommu_groups/4: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/4/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe211000.usb - ../../../../devices/ffe00.soc/ffe211000.usb /sys/kernel/iommu_groups/5: total 0 drwxr-xr-x 3 root
RE: Powerpc: Kernel warn_on when enabling IOMMU_API
-Original Message- From: Bhushan Bharat-R65777 Sent: Monday, August 12, 2013 9:45 PM To: 'Alexey Kardashevskiy' Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org Subject: RE: Powerpc: Kernel warn_on when enabling IOMMU_API -Original Message- From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru] Sent: Monday, August 12, 2013 7:44 PM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API On 08/12/2013 08:20 PM, Bhushan Bharat-R65777 wrote: And this simple fix work for me diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index b20ff17..8869b0d 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c @@ -48,6 +48,8 @@ #include asm/vio.h #include asm/tce.h +#define DEBUG + #define DBG(...) static int novmerge; @@ -871,7 +873,7 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t size, } } -#ifdef CONFIG_IOMMU_API +#ifdef SPAPR_TCE_IOMMU /* * SPAPR TCE API */ -- And with this fix, what does ls -laR /sys/kernel/iommu_groups/ print? It shows the list of group id and respective devices: We uses the common iommu code to add a device with iommu_group (drivers/iommu/iommu.c) using bus_set_iommu(). Also this have notifier to support hotplug-able device. So when this initcall (in arch/powerpc/kernel/iommu.c) is called, iommu group is already setup. So we do not this piece of code for powerpc. BTW why we need this with Power/TCE, does not the code in driver/iommu/iommu.c serve the purpose? -Bharat root@p5040ds:~# ls -laR /sys/kernel/iommu_groups/ /sys/kernel/iommu_groups/: total 0 drwxr-xr-x 15 root root 0 Sep 6 01:42 . drwxr-xr-x 6 root root 0 Jan 1 1970 .. drwxr-xr-x 3 root root 0 Sep 6 01:43 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 1 drwxr-xr-x 3 root root 0 Sep 6 01:43 10 drwxr-xr-x 3 root root 0 Sep 6 01:43 11 drwxr-xr-x 3 root root 0 Sep 6 01:43 12 drwxr-xr-x 3 root root 0 Sep 6 01:43 2 drwxr-xr-x 3 root root 0 Sep 6 01:43 3 drwxr-xr-x 3 root root 0 Sep 6 01:43 4 drwxr-xr-x 3 root root 0 Sep 6 01:43 5 drwxr-xr-x 3 root root 0 Sep 6 01:43 6 drwxr-xr-x 3 root root 0 Sep 6 01:43 7 drwxr-xr-x 3 root root 0 Sep 6 01:43 8 drwxr-xr-x 3 root root 0 Sep 6 01:43 9 /sys/kernel/iommu_groups/0: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/0/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe100300.dma - ../../../../devices/ffe00.soc/ffe100300.dma /sys/kernel/iommu_groups/1: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/1/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe101300.dma - ../../../../devices/ffe00.soc/ffe101300.dma /sys/kernel/iommu_groups/10: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/10/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe302000.jr - ../../../../devices/ffe00.soc/ffe30.crypto/ffe302000.jr /sys/kernel/iommu_groups/11: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/11/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe303000.jr - ../../../../devices/ffe00.soc/ffe30.crypto/ffe303000.jr /sys/kernel/iommu_groups/12: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/12/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe304000.jr - ../../../../devices/ffe00.soc/ffe30.crypto/ffe304000.jr /sys/kernel/iommu_groups/2: total 0 drwxr-xr-x 3 root root 0 Sep 6 01:43 . drwxr-xr-x 15 root root 0 Sep 6 01:42 .. drwxr-xr-x 2 root root 0 Sep 6 01:43 devices /sys/kernel/iommu_groups/2/devices: total 0 drwxr-xr-x 2 root root 0 Sep 6 01:43 . drwxr-xr-x 3 root root 0 Sep 6 01:43 .. lrwxrwxrwx 1 root root 0 Sep 6 01:43 ffe114000.sdhc - ../../../../devices/ffe00.soc/ffe114000.sdhc /sys/kernel/iommu_groups/3: total 0
RE: [PATCH 6/6 v3] kvm: powerpc: use caching attributes as per linux pte
-Original Message- From: Wood Scott-B07421 Sent: Saturday, August 10, 2013 6:35 AM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; ag...@suse.de; pau...@samba.org; k...@vger.kernel.org; kvm-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 6/6 v3] kvm: powerpc: use caching attributes as per linux pte On Tue, 2013-08-06 at 17:01 +0530, Bharat Bhushan wrote: @@ -449,7 +446,16 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, gvaddr = ~((tsize_pages PAGE_SHIFT) - 1); } - kvmppc_e500_ref_setup(ref, gtlbe, pfn); + pgdir = vcpu_e500-vcpu.arch.pgdir; + ptep = lookup_linux_pte(pgdir, hva, tsize_pages); + if (pte_present(*ptep)) { + wimg = (pte_val(*ptep) PTE_WIMGE_SHIFT) MAS2_WIMGE_MASK; + } else { + printk(KERN_ERR pte not present: gfn %lx, pfn %lx\n, + (long)gfn, pfn); + return -EINVAL; Don't let the guest spam the host kernel console by repeatedly accessing bad mappings (even if it requires host userspace to assist by pointing a memslot at a bad hva). This should at most be printk_ratelimited(), and probably just pr_debug(). It should also have __func__ context. Very good point, I will make this printk_ratelimited() in this patch. And convert this and other error prints to pr_debug() when we will send machine check on error in this flow. Also, I don't see the return value getting checked (the immediate callers check it and propogate the error, but kvmppc_mmu_map() doesn't). We want to send a machine check to the guest if this happens (or possibly exit to userspace since it indicates a bad memslot, not just a guest bug). We don't want to just silently retry over and over. I completely agree with you, but this was something already missing (error return by this function is nothing new added in this patch), So I would like to take that separately. Otherwise, this series looks good to me. Thank you. :) -Bharat -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
-Original Message- From: Bhushan Bharat-R65777 Sent: Tuesday, August 06, 2013 6:42 AM To: Wood Scott-B07421 Cc: Benjamin Herrenschmidt; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s -Original Message- From: Wood Scott-B07421 Sent: Tuesday, August 06, 2013 12:49 AM To: Bhushan Bharat-R65777 Cc: Benjamin Herrenschmidt; Wood Scott-B07421; ag...@suse.de; kvm- p...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Saturday, August 03, 2013 9:54 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote: One of the problem I saw was that if I put this code in asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other friend function (on which this code depends) are defined in pgtable.h. And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it defines pte_present() and friends functions. Ok I move wove this in asm/pgtable*.h, initially I fought with myself to take this code in pgtable* but finally end up doing here (got biased by book3s :)). Is there a reason why these routines can not be completely generic in pgtable.h ? How about the generic function: diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index d257d98..21daf28 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct *mm, return old; } +static inline unsigned long pte_read(pte_t *p) { #ifdef +PTE_ATOMIC_UPDATES + pte_t pte; + pte_t tmp; + __asm__ __volatile__ ( + 1: ldarx %0,0,%3\n + andi. %1,%0,%4\n + bne-1b\n + ori %1,%0,%4\n + stdcx. %1,0,%3\n + bne-1b + : =r (pte), =r (tmp), =m (*p) + : r (p), i (_PAGE_BUSY) + : cc); + + return pte; +#else + return pte_val(*p); +#endif +#endif +} static inline int __ptep_test_and_clear_young(struct mm_struct *mm, unsigned long addr, pte_t *ptep) Please leave a blank line between functions. { diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 690c8c2..dad712c 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -254,6 +254,45 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, } #endif /* !CONFIG_HUGETLB_PAGE */ +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva, +int writing, unsigned long +*pte_sizep) The name implies that it just reads the PTE. Setting accessed/dirty shouldn't be an undocumented side-effect. Ok, will rename and document. Why can't the caller do that (or a different function that the caller calls afterward if desired)? The current implementation in book3s is; 1) find a pte/hugepte 2) return null if pte not present 3) take _PAGE_BUSY lock 4) set accessed/dirty 5) clear _PAGE_BUSY. What I tried was 1) find a pte/hugepte 2) return null if pte not present 3) return pte (not take lock by not setting _PAGE_BUSY) 4) then user calls __ptep_set_access_flags() to atomic update the dirty/accessed flags in pte. - but the benchmark results were not good - Also can there be race as we do not take lock in step 3 and update in step 4 ? Though even then you have the undocumented side effect of locking the PTE on certain targets. +{ + pte_t *ptep; + pte_t pte; + unsigned long ps = *pte_sizep; + unsigned int shift; + + ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); + if (!ptep) + return __pte(0); + if (shift) + *pte_sizep = 1ul shift; + else + *pte_sizep = PAGE_SIZE; + + if (ps *pte_sizep) + return __pte(0); + + if (!pte_present(*ptep)) + return __pte(0); + +#ifdef CONFIG_PPC64 + /* Lock
RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, August 06, 2013 12:49 AM To: Bhushan Bharat-R65777 Cc: Benjamin Herrenschmidt; Wood Scott-B07421; ag...@suse.de; kvm- p...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Saturday, August 03, 2013 9:54 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote: One of the problem I saw was that if I put this code in asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other friend function (on which this code depends) are defined in pgtable.h. And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it defines pte_present() and friends functions. Ok I move wove this in asm/pgtable*.h, initially I fought with myself to take this code in pgtable* but finally end up doing here (got biased by book3s :)). Is there a reason why these routines can not be completely generic in pgtable.h ? How about the generic function: diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index d257d98..21daf28 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct *mm, return old; } +static inline unsigned long pte_read(pte_t *p) { #ifdef +PTE_ATOMIC_UPDATES + pte_t pte; + pte_t tmp; + __asm__ __volatile__ ( + 1: ldarx %0,0,%3\n + andi. %1,%0,%4\n + bne-1b\n + ori %1,%0,%4\n + stdcx. %1,0,%3\n + bne-1b + : =r (pte), =r (tmp), =m (*p) + : r (p), i (_PAGE_BUSY) + : cc); + + return pte; +#else + return pte_val(*p); +#endif +#endif +} static inline int __ptep_test_and_clear_young(struct mm_struct *mm, unsigned long addr, pte_t *ptep) Please leave a blank line between functions. { diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 690c8c2..dad712c 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -254,6 +254,45 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, } #endif /* !CONFIG_HUGETLB_PAGE */ +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva, +int writing, unsigned long +*pte_sizep) The name implies that it just reads the PTE. Setting accessed/dirty shouldn't be an undocumented side-effect. Why can't the caller do that (or a different function that the caller calls afterward if desired)? Scott, I sent the next version of patch based on above idea. Now I think we do not need to update the pte flags on booke So we do not need to solve the kvmppc_read_update_linux_pte() stuff of book3s. -Bharat Though even then you have the undocumented side effect of locking the PTE on certain targets. +{ + pte_t *ptep; + pte_t pte; + unsigned long ps = *pte_sizep; + unsigned int shift; + + ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); + if (!ptep) + return __pte(0); + if (shift) + *pte_sizep = 1ul shift; + else + *pte_sizep = PAGE_SIZE; + + if (ps *pte_sizep) + return __pte(0); + + if (!pte_present(*ptep)) + return __pte(0); + +#ifdef CONFIG_PPC64 + /* Lock PTE (set _PAGE_BUSY) and read */ + pte = pte_read(ptep); +#else + pte = pte_val(*ptep); +#endif What about 32-bit platforms that need atomic PTEs? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
-Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Saturday, August 03, 2013 9:54 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote: One of the problem I saw was that if I put this code in asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other friend function (on which this code depends) are defined in pgtable.h. And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it defines pte_present() and friends functions. Ok I move wove this in asm/pgtable*.h, initially I fought with myself to take this code in pgtable* but finally end up doing here (got biased by book3s :)). Is there a reason why these routines can not be completely generic in pgtable.h ? How about the generic function: diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index d257d98..21daf28 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct *mm, return old; } +static inline unsigned long pte_read(pte_t *p) +{ +#ifdef PTE_ATOMIC_UPDATES + pte_t pte; + pte_t tmp; + __asm__ __volatile__ ( + 1: ldarx %0,0,%3\n + andi. %1,%0,%4\n + bne-1b\n + ori %1,%0,%4\n + stdcx. %1,0,%3\n + bne-1b + : =r (pte), =r (tmp), =m (*p) + : r (p), i (_PAGE_BUSY) + : cc); + + return pte; +#else + return pte_val(*p); +#endif +#endif +} static inline int __ptep_test_and_clear_young(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 690c8c2..dad712c 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -254,6 +254,45 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, } #endif /* !CONFIG_HUGETLB_PAGE */ +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva, +int writing, unsigned long *pte_sizep) +{ + pte_t *ptep; + pte_t pte; + unsigned long ps = *pte_sizep; + unsigned int shift; + + ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); + if (!ptep) + return __pte(0); + if (shift) + *pte_sizep = 1ul shift; + else + *pte_sizep = PAGE_SIZE; + + if (ps *pte_sizep) + return __pte(0); + + if (!pte_present(*ptep)) + return __pte(0); + +#ifdef CONFIG_PPC64 + /* Lock PTE (set _PAGE_BUSY) and read */ + pte = pte_read(ptep); +#else + pte = pte_val(*ptep); +#endif + if (pte_present(pte)) { + pte = pte_mkyoung(pte); + if (writing pte_write(pte)) + pte = pte_mkdirty(pte); + } + + *ptep = __pte(pte); /* 64bit: Also unlock pte (clear _PAGE_BUSY) */ + + return pte; +} + #endif /* __ASSEMBLY__ */ #endif /* __KERNEL__ */ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, August 06, 2013 12:49 AM To: Bhushan Bharat-R65777 Cc: Benjamin Herrenschmidt; Wood Scott-B07421; ag...@suse.de; kvm- p...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote: -Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Saturday, August 03, 2013 9:54 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote: One of the problem I saw was that if I put this code in asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other friend function (on which this code depends) are defined in pgtable.h. And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it defines pte_present() and friends functions. Ok I move wove this in asm/pgtable*.h, initially I fought with myself to take this code in pgtable* but finally end up doing here (got biased by book3s :)). Is there a reason why these routines can not be completely generic in pgtable.h ? How about the generic function: diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index d257d98..21daf28 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct *mm, return old; } +static inline unsigned long pte_read(pte_t *p) { #ifdef +PTE_ATOMIC_UPDATES + pte_t pte; + pte_t tmp; + __asm__ __volatile__ ( + 1: ldarx %0,0,%3\n + andi. %1,%0,%4\n + bne-1b\n + ori %1,%0,%4\n + stdcx. %1,0,%3\n + bne-1b + : =r (pte), =r (tmp), =m (*p) + : r (p), i (_PAGE_BUSY) + : cc); + + return pte; +#else + return pte_val(*p); +#endif +#endif +} static inline int __ptep_test_and_clear_young(struct mm_struct *mm, unsigned long addr, pte_t *ptep) Please leave a blank line between functions. { diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 690c8c2..dad712c 100644 --- a/arch/powerpc/include/asm/pgtable.h +++ b/arch/powerpc/include/asm/pgtable.h @@ -254,6 +254,45 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, } #endif /* !CONFIG_HUGETLB_PAGE */ +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva, +int writing, unsigned long +*pte_sizep) The name implies that it just reads the PTE. Setting accessed/dirty shouldn't be an undocumented side-effect. Ok, will rename and document. Why can't the caller do that (or a different function that the caller calls afterward if desired)? The current implementation in book3s is; 1) find a pte/hugepte 2) return null if pte not present 3) take _PAGE_BUSY lock 4) set accessed/dirty 5) clear _PAGE_BUSY. What I tried was 1) find a pte/hugepte 2) return null if pte not present 3) return pte (not take lock by not setting _PAGE_BUSY) 4) then user calls __ptep_set_access_flags() to atomic update the dirty/accessed flags in pte. - but the benchmark results were not good - Also can there be race as we do not take lock in step 3 and update in step 4 ? Though even then you have the undocumented side effect of locking the PTE on certain targets. +{ + pte_t *ptep; + pte_t pte; + unsigned long ps = *pte_sizep; + unsigned int shift; + + ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); + if (!ptep) + return __pte(0); + if (shift) + *pte_sizep = 1ul shift; + else + *pte_sizep = PAGE_SIZE; + + if (ps *pte_sizep) + return __pte(0); + + if (!pte_present(*ptep)) + return __pte(0); + +#ifdef CONFIG_PPC64 + /* Lock PTE (set _PAGE_BUSY) and read */ + pte = pte_read(ptep); +#else + pte = pte_val(*ptep); +#endif What about 32-bit platforms that need atomic PTEs? I called __ptep_set_access_flags() for both 32/64bit (for 64bit I was not calling pte_read()), which handles atomic updates. Somehow the benchmark result were not good, will try again. Thanks -Bharat -Scott ___ Linuxppc-dev mailing list Linuxppc-dev
RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
-Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Saturday, August 03, 2013 4:47 AM To: Wood Scott-B07421 Cc: Bhushan Bharat-R65777; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s On Fri, 2013-08-02 at 17:58 -0500, Scott Wood wrote: What about 64-bit PTEs on 32-bit kernels? In any case, this code does not belong in KVM. It should be in the main PPC mm code, even if KVM is the only user. Also don't we do similar things in BookS KVM ? At the very least that sutff should become common. And yes, I agree, it should probably also move to pgtable* One of the problem I saw was that if I put this code in asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other friend function (on which this code depends) are defined in pgtable.h. And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it defines pte_present() and friends functions. Ok I move wove this in asm/pgtable*.h, initially I fought with myself to take this code in pgtable* but finally end up doing here (got biased by book3s :)). Thanks -Bharat Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte
-Original Message- From: Wood Scott-B07421 Sent: Saturday, August 03, 2013 5:05 AM To: Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte On Thu, Aug 01, 2013 at 04:42:38PM +0530, Bharat Bhushan wrote: diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 17722d8..eb2 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -697,7 +697,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) #endif kvmppc_fix_ee_before_entry(); - + vcpu-arch.pgdir = current-mm-pgd; ret = __kvmppc_vcpu_run(kvm_run, vcpu); kvmppc_fix_ee_before_entry() is supposed to be the last thing that happens before __kvmppc_vcpu_run(). @@ -332,6 +324,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, unsigned long hva; int pfnmap = 0; int tsize = BOOK3E_PAGESZ_4K; + pte_t pte; + int wimg = 0; /* * Translate guest physical to true physical, acquiring @@ -437,6 +431,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, if (likely(!pfnmap)) { unsigned long tsize_pages = 1 (tsize + 10 - PAGE_SHIFT); + pgd_t *pgdir; + pfn = gfn_to_pfn_memslot(slot, gfn); if (is_error_noslot_pfn(pfn)) { printk(KERN_ERR Couldn't get real page for gfn %lx!\n, @@ -447,9 +443,18 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, /* Align guest and physical address to page map boundaries */ pfn = ~(tsize_pages - 1); gvaddr = ~((tsize_pages PAGE_SHIFT) - 1); + pgdir = vcpu_e500-vcpu.arch.pgdir; + pte = lookup_linux_pte(pgdir, hva, 1, tsize_pages); + if (pte_present(pte)) { + wimg = (pte PTE_WIMGE_SHIFT) MAS2_WIMGE_MASK; + } else { + printk(KERN_ERR pte not present: gfn %lx, pfn %lx\n, + (long)gfn, pfn); + return -EINVAL; + } } How does wimg get set in the pfnmap case? Pfnmap is not kernel managed pages, right? So should we set I+G there ? Could you explain why we need to set dirty/referenced on the PTE, when we didn't need to do that before? All we're getting from the PTE is wimg. We have MMU notifiers to take care of the page being unmapped, and we've already marked the page itself as dirty if the TLB entry is writeable. I pulled this code from book3s. Ben, can you describe why we need this on book3s ? Thanks -Bharat -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages
-Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Saturday, July 27, 2013 3:57 AM To: Bhushan Bharat-R65777 Cc: Alexander Graf; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc- d...@lists.ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages On Fri, 2013-07-26 at 15:03 +, Bhushan Bharat-R65777 wrote: Will not searching the Linux PTE is a overkill? That's the best approach. Also we are searching it already to resolve the page fault. That does mean we search twice but on the other hand that also means it's hot in the cache. Below is early git diff (not a proper cleanup patch), to be sure that this is what we want on PowerPC and take early feedback. Also I run some benchmark to understand the overhead if any. Using kvm_is_mmio_pfn(); what the current patch does: Real: 0m46.616s + 0m49.517s + 0m49.510s + 0m46.936s + 0m46.889s + 0m46.684s = Avg; 47.692s User: 0m31.636s + 0m31.816s + 0m31.456s + 0m31.752s + 0m32.028s + 0m31.848s = Avg; 31.756s Sys: 0m11.596s + 0m11.868s + 0m12.244s + 0m11.672s + 0m11.356s + 0m11.432s = Avg; 11.695s Using kernel page table search (below changes): Real: 0m46.431s + 0m50.269s + 0m46.724s + 0m46.645s + 0m46.670s + 0m50.259s = Avg; 47.833s User: 0m31.568s + 0m31.816s + 0m31.444s + 0m31.808s + 0m31.312s + 0m31.740s = Avg; 31.614s Sys: 0m11.516s + 0m12.060s + 0m11.872s + 0m11.476s + 0m12.000s + 0m12.152s = Avg; 11.846s -- diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 3328353..d6d0dac 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -532,6 +532,7 @@ struct kvm_vcpu_arch { u32 epr; u32 crit_save; struct kvmppc_booke_debug_reg dbg_reg; + pgd_t *pgdir; #endif gpa_t paddr_accessed; gva_t vaddr_accessed; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 17722d8..eb2 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -697,7 +697,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) #endif kvmppc_fix_ee_before_entry(); - + vcpu-arch.pgdir = current-mm-pgd; ret = __kvmppc_vcpu_run(kvm_run, vcpu); /* No need for kvm_guest_exit. It's done in handle_exit. diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h index 4fd9650..fc4b2f6 100644 --- a/arch/powerpc/kvm/e500.h +++ b/arch/powerpc/kvm/e500.h @@ -31,11 +31,13 @@ enum vcpu_ftr { #define E500_TLB_NUM 2 /* entry is mapped somewhere in host TLB */ -#define E500_TLB_VALID (1 0) +#define E500_TLB_VALID (1 31) /* TLB1 entry is mapped by host TLB1, tracked by bitmaps */ -#define E500_TLB_BITMAP(1 1) +#define E500_TLB_BITMAP(1 30) /* TLB1 entry is mapped by host TLB0 */ -#define E500_TLB_TLB0 (1 2) +#define E500_TLB_TLB0 (1 29) +/* Lower 5 bits have WIMGE value */ +#define E500_TLB_WIMGE_MASK(0x1f) struct tlbe_ref { pfn_t pfn; /* valid only for TLB0, except briefly */ diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 5cbdc8f..a48c13f 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -40,6 +40,84 @@ static struct kvmppc_e500_tlb_params host_tlb_params[E500_TLB_NUM]; +/* + * find_linux_pte returns the address of a linux pte for a given + * effective address and directory. If not found, it returns zero. + */ +static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea) +{ +pgd_t *pg; +pud_t *pu; +pmd_t *pm; +pte_t *pt = NULL; + +pg = pgdir + pgd_index(ea); +if (!pgd_none(*pg)) { +pu = pud_offset(pg, ea); +if (!pud_none(*pu)) { +pm = pmd_offset(pu, ea); +if (pmd_present(*pm)) +pt = pte_offset_kernel(pm, ea); +} +} +return pt; +} + +#ifdef CONFIG_HUGETLB_PAGE +pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, + unsigned *shift); +#else +static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, + unsigned *shift) +{ +if (shift) +*shift = 0; +return find_linux_pte(pgdir, ea); +} +#endif /* !CONFIG_HUGETLB_PAGE */ + +/* + * Lock and read a linux PTE. If it's present and writable, atomically + * set dirty and referenced bits and return the PTE, otherwise return 0. + */ +static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing) +{ + pte_t pte = pte_val(*p); + + if (pte_present(pte)) { + pte = pte_mkyoung
RE: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, July 31, 2013 12:19 AM To: Bhushan Bharat-R65777 Cc: Benjamin Herrenschmidt; Alexander Graf; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421 Subject: Re: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages On 07/30/2013 11:22:54 AM, Bhushan Bharat-R65777 wrote: diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 5cbdc8f..a48c13f 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -40,6 +40,84 @@ static struct kvmppc_e500_tlb_params host_tlb_params[E500_TLB_NUM]; +/* + * find_linux_pte returns the address of a linux pte for a given + * effective address and directory. If not found, it returns zero. + */ +static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea) { +pgd_t *pg; +pud_t *pu; +pmd_t *pm; +pte_t *pt = NULL; + +pg = pgdir + pgd_index(ea); +if (!pgd_none(*pg)) { +pu = pud_offset(pg, ea); +if (!pud_none(*pu)) { +pm = pmd_offset(pu, ea); +if (pmd_present(*pm)) +pt = pte_offset_kernel(pm, ea); +} +} +return pt; +} How is this specific to KVM or e500? +#ifdef CONFIG_HUGETLB_PAGE +pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, + unsigned *shift); #else static +inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea, + unsigned *shift) { +if (shift) +*shift = 0; +return find_linux_pte(pgdir, ea); } #endif /* +!CONFIG_HUGETLB_PAGE */ This is already declared in asm/pgtable.h. If we need a non-hugepage alternative, that should also go in asm/pgtable.h. +/* + * Lock and read a linux PTE. If it's present and writable, atomically + * set dirty and referenced bits and return the PTE, otherwise return 0. + */ +static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing) +{ + pte_t pte = pte_val(*p); + + if (pte_present(pte)) { + pte = pte_mkyoung(pte); + if (writing pte_write(pte)) + pte = pte_mkdirty(pte); + } + + *p = pte; + + return pte; +} + +static pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva, + int writing, unsigned long *pte_sizep) { + pte_t *ptep; + unsigned long ps = *pte_sizep; + unsigned int shift; + + ptep = find_linux_pte_or_hugepte(pgdir, hva, shift); + if (!ptep) + return __pte(0); + if (shift) + *pte_sizep = 1ul shift; + else + *pte_sizep = PAGE_SIZE; + + if (ps *pte_sizep) + return __pte(0); + if (!pte_present(*ptep)) + return __pte(0); + + return kvmppc_read_update_linux_pte(ptep, writing); } + None of this belongs in this file either. @@ -326,8 +405,8 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; - stlbe-mas2 = (gvaddr MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe-mas2, pfn); + stlbe-mas2 = (gvaddr MAS2_EPN) | (ref-flags E500_TLB_WIMGE_MASK); +// e500_shadow_mas2_attrib(gtlbe-mas2, pfn); MAS2_E and MAS2_G should be safe to come from the guest. This is handled when setting WIMGE in ref-flags. How does this work for TLB1? One ref corresponds to one guest entry, which may correspond to multiple host entries, potentially each with different WIM settings. Yes, one ref corresponds to one guest entry. To understand how this will work when a one guest tlb1 entry may maps to many host tlb0/1 entry; on guest tlbwe, KVM setup one guest tlb entry and then pre-map one host tlb entry (out of many) and ref (ref-pfn etc) points to this pre-map entry for that guest entry. Now a guest TLB miss happens which falls on same guest tlb entry and but demands another host tlb entry. In that flow we change/overwrite ref (ref-pfn etc) to point to new host mapping for same guest mapping. stlbe-mas7_3 = ((u64)pfn PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe-mas7_3, pr); @@ -346,6 +425,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, unsigned long hva; int pfnmap = 0; int tsize = BOOK3E_PAGESZ_4K; + pte_t pte; + int wimg = 0; /* * Translate guest physical to true physical, acquiring
RE: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages
-Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Friday, July 26, 2013 1:57 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; ag...@suse.de; Wood Scott-B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages On Fri, 2013-07-26 at 11:16 +0530, Bharat Bhushan wrote: If the page is RAM then map this as cacheable and coherent (set M bit) otherwise this page is treated as I/O and map this as cache inhibited and guarded (set I + G) This helps setting proper MMU mapping for direct assigned device. NOTE: There can be devices that require cacheable mapping, which is not yet supported. Why don't you do like server instead and enforce the use of the same I and M bits as the corresponding qemu PTE ? Ben/Alex, I will look into the code. Can you please describe how this is handled on server? Thanks -Bharat Cheers, Ben. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/e500_mmu_host.c | 24 +++- 1 files changed, 19 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c index 1c6a9d7..5cbdc8f 100644 --- a/arch/powerpc/kvm/e500_mmu_host.c +++ b/arch/powerpc/kvm/e500_mmu_host.c @@ -64,13 +64,27 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode) return mas3; } -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode) +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn) { + u32 mas2_attr; + + mas2_attr = mas2 MAS2_ATTRIB_MASK; + + if (kvm_is_mmio_pfn(pfn)) { + /* +* If page is not RAM then it is treated as I/O page. +* Map it with cache inhibited and guarded (set I + G). +*/ + mas2_attr |= MAS2_I | MAS2_G; + return mas2_attr; + } + + /* Map RAM pages as cacheable (Not setting I in MAS2) */ #ifdef CONFIG_SMP - return (mas2 MAS2_ATTRIB_MASK) | MAS2_M; -#else - return mas2 MAS2_ATTRIB_MASK; + /* Also map as coherent (set M) in SMP */ + mas2_attr |= MAS2_M; #endif + return mas2_attr; } /* @@ -313,7 +327,7 @@ static void kvmppc_e500_setup_stlbe( /* Force IPROT=0 for all guest mappings. */ stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID; stlbe-mas2 = (gvaddr MAS2_EPN) | - e500_shadow_mas2_attrib(gtlbe-mas2, pr); + e500_shadow_mas2_attrib(gtlbe-mas2, pfn); stlbe-mas7_3 = ((u64)pfn PAGE_SHIFT) | e500_shadow_mas3_attrib(gtlbe-mas7_3, pr); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages
-Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf Sent: Friday, July 26, 2013 2:20 PM To: Benjamin Herrenschmidt Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages On 26.07.2013, at 10:26, Benjamin Herrenschmidt wrote: On Fri, 2013-07-26 at 11:16 +0530, Bharat Bhushan wrote: If the page is RAM then map this as cacheable and coherent (set M bit) otherwise this page is treated as I/O and map this as cache inhibited and guarded (set I + G) This helps setting proper MMU mapping for direct assigned device. NOTE: There can be devices that require cacheable mapping, which is not yet supported. Why don't you do like server instead and enforce the use of the same I and M bits as the corresponding qemu PTE ? Specifically, Ben is talking about this code: /* Translate to host virtual address */ hva = __gfn_to_hva_memslot(memslot, gfn); /* Look up the Linux PTE for the backing page */ pte_size = psize; pte = lookup_linux_pte(pgdir, hva, writing, pte_size); if (pte_present(pte)) { if (writing !pte_write(pte)) /* make the actual HPTE be read-only */ ptel = hpte_make_readonly(ptel); is_io = hpte_cache_bits(pte_val(pte)); pa = pte_pfn(pte) PAGE_SHIFT; } Ok Thanks -Bharat Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages
-Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf Sent: Friday, July 26, 2013 2:20 PM To: Benjamin Herrenschmidt Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages On 26.07.2013, at 10:26, Benjamin Herrenschmidt wrote: On Fri, 2013-07-26 at 11:16 +0530, Bharat Bhushan wrote: If the page is RAM then map this as cacheable and coherent (set M bit) otherwise this page is treated as I/O and map this as cache inhibited and guarded (set I + G) This helps setting proper MMU mapping for direct assigned device. NOTE: There can be devices that require cacheable mapping, which is not yet supported. Why don't you do like server instead and enforce the use of the same I and M bits as the corresponding qemu PTE ? Specifically, Ben is talking about this code: /* Translate to host virtual address */ hva = __gfn_to_hva_memslot(memslot, gfn); /* Look up the Linux PTE for the backing page */ pte_size = psize; pte = lookup_linux_pte(pgdir, hva, writing, pte_size); if (pte_present(pte)) { if (writing !pte_write(pte)) /* make the actual HPTE be read-only */ ptel = hpte_make_readonly(ptel); is_io = hpte_cache_bits(pte_val(pte)); pa = pte_pfn(pte) PAGE_SHIFT; } Will not searching the Linux PTE is a overkill? =Bharat ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [v3][PATCH 1/8] powerpc/book3e: rename interrupt_end_book3e with __end_interrupts
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun Chen Sent: Tuesday, July 09, 2013 1:33 PM To: b...@kernel.crashing.org Cc: linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: [v3][PATCH 1/8] powerpc/book3e: rename interrupt_end_book3e with __end_interrupts We can rename 'interrupt_end_book3e' with '__end_interrupts' then book3s/book3e can share this unique label to make sure we can use this conveniently. I think we can be consistent with start and end names, no? -Bharat Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/kernel/exceptions-64e.S |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 645170a..a518e48 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -309,8 +309,8 @@ interrupt_base_book3e: /* fake trap */ EXCEPTION_STUB(0x300, hypercall) EXCEPTION_STUB(0x320, ehpriv) - .globl interrupt_end_book3e -interrupt_end_book3e: + .globl __end_interrupts +__end_interrupts: /* Critical Input Interrupt */ START_EXCEPTION(critical_input); @@ -493,7 +493,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) beq+1f LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) - LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e) + LOAD_REG_IMMEDIATE(r15,__end_interrupts) cmpld cr0,r10,r14 cmpld cr1,r10,r15 blt+cr0,1f @@ -559,7 +559,7 @@ kernel_dbg_exc: beq+1f LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) - LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e) + LOAD_REG_IMMEDIATE(r15,__end_interrupts) cmpld cr0,r10,r14 cmpld cr1,r10,r15 blt+cr0,1f -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [v3][PATCH 7/8] book3e/kexec/kdump: redefine VIRT_PHYS_OFFSET
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun Chen Sent: Tuesday, July 09, 2013 1:33 PM To: b...@kernel.crashing.org Cc: linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: [v3][PATCH 7/8] book3e/kexec/kdump: redefine VIRT_PHYS_OFFSET Book3e is always aligned 1GB to create TLB so we should use (KERNELBASE - MEMORY_START) as VIRT_PHYS_OFFSET to get __pa/__va properly while boot kdump. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/include/asm/page.h |2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index 988c812..5b00081 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -112,6 +112,8 @@ extern long long virt_phys_offset; /* See Description below for VIRT_PHYS_OFFSET */ #ifdef CONFIG_RELOCATABLE_PPC32 #define VIRT_PHYS_OFFSET virt_phys_offset +#elif defined(CONFIG_PPC_BOOK3E_64) +#define VIRT_PHYS_OFFSET (KERNELBASE - MEMORY_START) Can you please explain this code a bit more. I am not understanding this part:) -Bharat #else #define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START) #endif -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [v2][PATCH 1/7] powerpc/book3e: support CONFIG_RELOCATABLE
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun Chen Sent: Thursday, June 20, 2013 1:23 PM To: b...@kernel.crashing.org Cc: linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: [v2][PATCH 1/7] powerpc/book3e: support CONFIG_RELOCATABLE book3e is different with book3s since 3s includes the exception vectors code in head_64.S as it relies on absolute addressing which is only possible within this compilation unit. So we have to get that label address with got. And when boot a relocated kernel, we should reset ipvr properly again after .relocate. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/include/asm/exception-64e.h |8 arch/powerpc/kernel/exceptions-64e.S | 15 ++- arch/powerpc/kernel/head_64.S| 22 ++ arch/powerpc/lib/feature-fixups.c|7 +++ 4 files changed, 51 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/exception-64e.h b/arch/powerpc/include/asm/exception-64e.h index 51fa43e..89e940d 100644 --- a/arch/powerpc/include/asm/exception-64e.h +++ b/arch/powerpc/include/asm/exception-64e.h @@ -214,10 +214,18 @@ exc_##label##_book3e: #define TLB_MISS_STATS_SAVE_INFO_BOLTED #endif +#ifndef CONFIG_RELOCATABLE #define SET_IVOR(vector_number, vector_offset) \ li r3,vector_offset@l; \ ori r3,r3,interrupt_base_book3e@l; \ mtspr SPRN_IVOR##vector_number,r3; +#else +#define SET_IVOR(vector_number, vector_offset) \ + LOAD_REG_ADDR(r3,interrupt_base_book3e);\ + rlwinm r3,r3,0,15,0; \ + ori r3,r3,vector_offset@l; \ + mtspr SPRN_IVOR##vector_number,r3; +#endif #endif /* _ASM_POWERPC_EXCEPTION_64E_H */ diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 645170a..4b23119 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -1097,7 +1097,15 @@ skpinv:addir6,r6,1 /* Increment */ * r4 = MAS0 w/TLBSEL ESEL for the temp mapping */ /* Now we branch the new virtual address mapped by this entry */ +#ifdef CONFIG_RELOCATABLE + /* We have to find out address from lr. */ + bl 1f /* Find our address */ +1: mflrr6 + addir6,r6,(2f - 1b) + tovirt(r6,r6) +#else LOAD_REG_IMMEDIATE(r6,2f) +#endif lis r7,MSR_KERNEL@h ori r7,r7,MSR_KERNEL@l mtspr SPRN_SRR0,r6 @@ -1348,9 +1356,14 @@ _GLOBAL(book3e_secondary_thread_init) mflrr28 b 3b -_STATIC(init_core_book3e) +_GLOBAL(init_core_book3e) /* Establish the interrupt vector base */ +#ifdef CONFIG_RELOCATABLE + tovirt(r2,r2) + LOAD_REG_ADDR(r3, interrupt_base_book3e) +#else LOAD_REG_IMMEDIATE(r3, interrupt_base_book3e) +#endif mtspr SPRN_IVPR,r3 sync blr diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index b61363d..0942f3a 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -414,12 +414,22 @@ _STATIC(__after_prom_start) /* process relocations for the final address of the kernel */ lis r25,PAGE_OFFSET@highest /* compute virtual base of kernel */ sldir25,r25,32 +#if defined(CONFIG_PPC_BOOK3E) + tovirt(r26,r26) /* on booke, we already run at PAGE_OFFSET */ +#endif lwz r7,__run_at_load-_stext(r26) +#if defined(CONFIG_PPC_BOOK3E) + tophys(r26,r26) /* Restore for the remains. */ +#endif cmplwi cr0,r7,1/* flagged to stay where we are ? */ bne 1f add r25,r25,r26 1: mr r3,r25 bl .relocate +#if defined(CONFIG_PPC_BOOK3E) + /* We should set ivpr again after .relocate. */ + bl .init_core_book3e +#endif #endif /* @@ -447,12 +457,24 @@ _STATIC(__after_prom_start) * variable __run_at_load, if it is set the kernel is treated as relocatable * kernel, otherwise it will be moved to PHYSICAL_START */ +#if defined(CONFIG_PPC_BOOK3E) + tovirt(r26,r26) /* on booke, we already run at PAGE_OFFSET */ +#endif lwz r7,__run_at_load-_stext(r26) +#if defined(CONFIG_PPC_BOOK3E) + tophys(r26,r26) /* Restore for the remains. */ +#endif cmplwi cr0,r7,1 bne 3f +#ifdef CONFIG_PPC_BOOK3E + LOAD_REG_ADDR(r5, interrupt_end_book3e) + LOAD_REG_ADDR(r11, _stext) + sub r5,r5,r11 +#else /* just copy interrupts */ LOAD_REG_IMMEDIATE(r5, __end_interrupts - _stext) +#endif b 5f 3: #endif diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature- fixups.c
RE: [v2][PATCH 2/7] book3e/kexec/kdump: enable kexec for kernel
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun Chen Sent: Thursday, June 20, 2013 1:23 PM To: b...@kernel.crashing.org Cc: linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: [v2][PATCH 2/7] book3e/kexec/kdump: enable kexec for kernel We need to active KEXEC for book3e and bypass or convert non-book3e stuff in kexec coverage. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/Kconfig |2 +- arch/powerpc/kernel/machine_kexec_64.c |6 ++ arch/powerpc/kernel/misc_64.S |6 ++ 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index c33e3ad..6ecf3c9 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -364,7 +364,7 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE config KEXEC bool kexec system call - depends on (PPC_BOOK3S || FSL_BOOKE || (44x !SMP)) + depends on (PPC_BOOK3S || FSL_BOOKE || (44x !SMP)) || PPC_BOOK3E help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c index 611acdf..ef39271 100644 --- a/arch/powerpc/kernel/machine_kexec_64.c +++ b/arch/powerpc/kernel/machine_kexec_64.c @@ -33,6 +33,7 @@ int default_machine_kexec_prepare(struct kimage *image) { int i; +#ifndef CONFIG_PPC_BOOK3E unsigned long begin, end; /* limits of segment */ unsigned long low, high;/* limits of blocked memory range */ struct device_node *node; @@ -41,6 +42,7 @@ int default_machine_kexec_prepare(struct kimage *image) if (!ppc_md.hpte_clear_all) return -ENOENT; +#endif Do we really need this function for book3e? can we have a separate function rather than multiple confusing ifdef? -Bharat /* * Since we use the kernel fault handlers and paging code to @@ -51,6 +53,7 @@ int default_machine_kexec_prepare(struct kimage *image) if (image-segment[i].mem __pa(_end)) return -ETXTBSY; +#ifndef CONFIG_PPC_BOOK3E /* * For non-LPAR, we absolutely can not overwrite the mmu hash * table, since we are still using the bolted entries in it to @@ -92,6 +95,7 @@ int default_machine_kexec_prepare(struct kimage *image) return -ETXTBSY; } } +#endif return 0; } @@ -367,6 +371,7 @@ void default_machine_kexec(struct kimage *image) /* NOTREACHED */ } +#ifndef CONFIG_PPC_BOOK3E /* Values we need to export to the second kernel via the device tree. */ static unsigned long htab_base; @@ -411,3 +416,4 @@ static int __init export_htab_values(void) return 0; } late_initcall(export_htab_values); +#endif diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index 6820e45..f1a7ce7 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -543,9 +543,13 @@ _GLOBAL(kexec_sequence) lhz r25,PACAHWCPUID(r13)/* get our phys cpu from paca */ /* disable interrupts, we are overwriting kernel data next */ +#ifndef CONFIG_PPC_BOOK3E mfmsr r3 rlwinm r3,r3,0,17,15 mtmsrd r3,1 +#else + wrteei 0 +#endif /* copy dest pages, flush whole dest image */ mr r3,r29 @@ -567,10 +571,12 @@ _GLOBAL(kexec_sequence) li r6,1 stw r6,kexec_flag-1b(5) +#ifndef CONFIG_PPC_BOOK3E /* clear out hardware hash page table and tlb */ ld r5,0(r27) /* deref function descriptor */ mtctr r5 bctrl /* ppc_md.hpte_clear_all(void); */ +#endif /* * kexec image calling is: -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [v2][PATCH 4/7] book3e/kexec/kdump: introduce a kexec kernel flag
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun Chen Sent: Thursday, June 20, 2013 1:23 PM To: b...@kernel.crashing.org Cc: linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org Subject: [v2][PATCH 4/7] book3e/kexec/kdump: introduce a kexec kernel flag We need to introduce a flag to indicate we're already running a kexec kernel then we can go proper path. For example, We shouldn't access spin_table from the bootloader to up any secondary cpu for kexec kernel, and kexec kernel already know how to jump to generic_secondary_smp_init. Signed-off-by: Tiejun Chen tiejun.c...@windriver.com --- arch/powerpc/include/asm/smp.h|3 +++ arch/powerpc/kernel/head_64.S | 12 arch/powerpc/kernel/misc_64.S |6 ++ arch/powerpc/platforms/85xx/smp.c | 14 ++ 4 files changed, 35 insertions(+) diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h index ffbaabe..fbc3d9b 100644 --- a/arch/powerpc/include/asm/smp.h +++ b/arch/powerpc/include/asm/smp.h @@ -200,6 +200,9 @@ extern void generic_secondary_thread_init(void); extern unsigned long __secondary_hold_spinloop; extern unsigned long __secondary_hold_acknowledge; extern char __secondary_hold; +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP) +extern unsigned long __run_at_kexec; +#endif extern void __early_start(void); #endif /* __ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 3e19ba2..ffa4b18 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -89,6 +89,12 @@ __secondary_hold_spinloop: __secondary_hold_acknowledge: .llong 0x0 +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP) + .globl __run_at_kexec +__run_at_kexec: + .llong 0x0 /* Flag for the secondary kernel from kexec. */ +#endif + #ifdef CONFIG_RELOCATABLE /* This flag is set to 1 by a loader if the kernel should run * at the loaded address instead of the linked address. This @@ -417,6 +423,12 @@ _STATIC(__after_prom_start) #if defined(CONFIG_PPC_BOOK3E) tovirt(r26,r26) /* on booke, we already run at PAGE_OFFSET */ #endif +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP) + /* If relocated we need to restore this flag on that relocated address. */ + ld r7,__run_at_kexec-_stext(r26) + std r7,__run_at_kexec-_stext(r26) +#endif + lwz r7,__run_at_load-_stext(r26) #if defined(CONFIG_PPC_BOOK3E) tophys(r26,r26) /* Restore for the remains. */ diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index 20cbb98..c89aead 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -619,6 +619,12 @@ _GLOBAL(kexec_sequence) bl .copy_and_flush /* (dest, src, copy limit, start offset) */ 1: /* assume normal blr return */ + /* notify we're going into kexec kernel for SMP. */ + LOAD_REG_ADDR(r3,__run_at_kexec) + li r4,1 + std r4,0(r3) + sync + /* release other cpus to the new kernel secondary start at 0x60 */ mflrr5 li r6,1 diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c index 6a17599..b308373 100644 --- a/arch/powerpc/platforms/85xx/smp.c +++ b/arch/powerpc/platforms/85xx/smp.c @@ -150,6 +150,9 @@ static int __cpuinit smp_85xx_kick_cpu(int nr) int hw_cpu = get_hard_smp_processor_id(nr); int ioremappable; int ret = 0; +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP) + unsigned long *ptr; +#endif What about if we can remove the ifdef around *ptr ... WARN_ON(nr 0 || nr = NR_CPUS); WARN_ON(hw_cpu 0 || hw_cpu = NR_CPUS); @@ -238,11 +241,22 @@ out: #else smp_generic_kick_cpu(nr); +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP) + ptr = (unsigned long *)((unsigned long)__run_at_kexec); ... #endif here ... + /* We shouldn't access spin_table from the bootloader to up any + * secondary cpu for kexec kernel, and kexec kernel already + * know how to jump to generic_secondary_smp_init. + */ + if (!*ptr) { +#endif ... remove #endif ... flush_spin_table(spin_table); out_be32(spin_table-pir, hw_cpu); out_be64((u64 *)(spin_table-addr_h), __pa((u64)*((unsigned long long *)generic_secondary_smp_init))); flush_spin_table(spin_table); +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP) + } +#endif --- remove above 3 lines -Bharat #endif local_irq_restore(flags); -- 1.7.9.5 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction
-Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Wednesday, June 26, 2013 12:25 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org; linux- ker...@vger.kernel.org; mi...@neuling.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction On 06/26/2013 01:42 PM, Bharat Bhushan wrote: ehpriv instruction is used for setting software breakpoints by user space. This patch adds support to exit to user space with run-debug have relevant information. As this is the first point we are using run-debug, also defined the run-debug structure. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/disassemble.h |4 arch/powerpc/include/uapi/asm/kvm.h| 21 + arch/powerpc/kvm/e500_emulate.c| 27 +++ 3 files changed, 48 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/disassemble.h b/arch/powerpc/include/asm/disassemble.h index 9b198d1..856f8de 100644 --- a/arch/powerpc/include/asm/disassemble.h +++ b/arch/powerpc/include/asm/disassemble.h @@ -77,4 +77,8 @@ static inline unsigned int get_d(u32 inst) return inst 0x; } +static inline unsigned int get_oc(u32 inst) +{ + return (inst 11) 0x7fff; +} #endif /* __ASM_PPC_DISASSEMBLE_H__ */ diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..ded0607 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -269,7 +269,24 @@ struct kvm_fpu { __u64 fpr[32]; }; +/* + * Defines for h/w breakpoint, watchpoint (read, write or both) and + * software breakpoint. + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status + * for KVM_DEBUG_EXIT. + */ +#define KVMPPC_DEBUG_NONE 0x0 +#define KVMPPC_DEBUG_BREAKPOINT(1UL 1) +#define KVMPPC_DEBUG_WATCH_WRITE (1UL 2) +#define KVMPPC_DEBUG_WATCH_READ(1UL 3) struct kvm_debug_exit_arch { + __u64 address; + /* +* exiting to userspace because of h/w breakpoint, watchpoint +* (read, write or both) and software breakpoint. +*/ + __u32 status; + __u32 reserved; }; /* for KVM_SET_GUEST_DEBUG */ @@ -281,10 +298,6 @@ struct kvm_guest_debug_arch { * Type denotes h/w breakpoint, read watchpoint, write * watchpoint or watchpoint (both read and write). */ -#define KVMPPC_DEBUG_NONE 0x0 -#define KVMPPC_DEBUG_BREAKPOINT(1UL 1) -#define KVMPPC_DEBUG_WATCH_WRITE (1UL 2) -#define KVMPPC_DEBUG_WATCH_READ(1UL 3) __u32 type; __u32 reserved; } bp[16]; diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index b10a012..dab9d07 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -26,6 +26,8 @@ #define XOP_TLBRE 946 #define XOP_TLBWE 978 #define XOP_TLBILX 18 +#define XOP_EHPRIV 270 +#define EHPRIV_OC_DEBUG 0 As I think the case, OC = 0, is a bit specific since IIRC, if the OC operand is omitted, its equal 0 by default. So I think we should start this OC value from 1 or other magic number. ehpriv instruction is defined to be used as: ehpriv OC // where OC can be 0,1, ... n and in extended for it can be used as ehpriv // With no OC, and here it assumes OC = 0 So OC = 0 is not specific but ehpriv is same as ehpriv 0. I do not think of any special reason to reserve ehpriv and ehpriv 0. Thanks -Bharat And if possible, we'd better add some comments to describe this to make the OC definition readable. Tiejun #ifdef CONFIG_KVM_E500MC static int dbell2prio(ulong param) @@ -82,6 +84,26 @@ static int kvmppc_e500_emul_msgsnd(struct kvm_vcpu *vcpu, int rb) } #endif +static int kvmppc_e500_emul_ehpriv(struct kvm_run *run, struct kvm_vcpu *vcpu, + unsigned int inst, int *advance) +{ + int emulated = EMULATE_DONE; + + switch (get_oc(inst)) { + case EHPRIV_OC_DEBUG: + run-exit_reason = KVM_EXIT_DEBUG; + run-debug.arch.address = vcpu-arch.pc; + run-debug.arch.status = 0; + kvmppc_account_exit(vcpu, DEBUG_EXITS); + emulated = EMULATE_EXIT_USER; + *advance = 0; + break; + default: + emulated = EMULATE_FAIL; + } + return emulated; +} + int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int inst, int *advance) { @@ -130,6 +152,11 @@ int kvmppc_core_emulate_op(struct
RE: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction
-Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Wednesday, June 26, 2013 2:47 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org; linux- ker...@vger.kernel.org; mi...@neuling.org Subject: Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction On 06/26/2013 04:44 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Wednesday, June 26, 2013 12:25 PM To: Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott- B07421; b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org; linux- ker...@vger.kernel.org; mi...@neuling.org; Bhushan Bharat-R65777 Subject: Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction On 06/26/2013 01:42 PM, Bharat Bhushan wrote: ehpriv instruction is used for setting software breakpoints by user space. This patch adds support to exit to user space with run-debug have relevant information. As this is the first point we are using run-debug, also defined the run-debug structure. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/disassemble.h |4 arch/powerpc/include/uapi/asm/kvm.h| 21 + arch/powerpc/kvm/e500_emulate.c| 27 +++ 3 files changed, 48 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/include/asm/disassemble.h b/arch/powerpc/include/asm/disassemble.h index 9b198d1..856f8de 100644 --- a/arch/powerpc/include/asm/disassemble.h +++ b/arch/powerpc/include/asm/disassemble.h @@ -77,4 +77,8 @@ static inline unsigned int get_d(u32 inst) return inst 0x; } +static inline unsigned int get_oc(u32 inst) { + return (inst 11) 0x7fff; +} #endif /* __ASM_PPC_DISASSEMBLE_H__ */ diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..ded0607 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -269,7 +269,24 @@ struct kvm_fpu { __u64 fpr[32]; }; +/* + * Defines for h/w breakpoint, watchpoint (read, write or both) and + * software breakpoint. + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status + * for KVM_DEBUG_EXIT. + */ +#define KVMPPC_DEBUG_NONE0x0 +#define KVMPPC_DEBUG_BREAKPOINT (1UL 1) +#define KVMPPC_DEBUG_WATCH_WRITE (1UL 2) +#define KVMPPC_DEBUG_WATCH_READ (1UL 3) struct kvm_debug_exit_arch { + __u64 address; + /* + * exiting to userspace because of h/w breakpoint, watchpoint + * (read, write or both) and software breakpoint. + */ + __u32 status; + __u32 reserved; }; /* for KVM_SET_GUEST_DEBUG */ @@ -281,10 +298,6 @@ struct kvm_guest_debug_arch { * Type denotes h/w breakpoint, read watchpoint, write * watchpoint or watchpoint (both read and write). */ -#define KVMPPC_DEBUG_NONE0x0 -#define KVMPPC_DEBUG_BREAKPOINT (1UL 1) -#define KVMPPC_DEBUG_WATCH_WRITE (1UL 2) -#define KVMPPC_DEBUG_WATCH_READ (1UL 3) __u32 type; __u32 reserved; } bp[16]; diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index b10a012..dab9d07 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -26,6 +26,8 @@ #define XOP_TLBRE 946 #define XOP_TLBWE 978 #define XOP_TLBILX 18 +#define XOP_EHPRIV 270 +#define EHPRIV_OC_DEBUG 0 As I think the case, OC = 0, is a bit specific since IIRC, if the OC operand is omitted, its equal 0 by default. So I think we should start this OC value from 1 or other magic number. ehpriv instruction is defined to be used as: ehpriv OC // where OC can be 0,1, ... n and in extended for it can be used as ehpriv // With no OC, and here it assumes OC = 0 So OC = 0 is not specific but ehpriv is same as ehpriv 0. Yes, this is just what I mean. I do not think of any special reason to reserve ehpriv and ehpriv 0. So I still prefer we can reserve the 'ehpriv' without OC operand as one simple approach to test or develop something for KVM quickly because its really convenient to trap into the hypervisor only with one 'ehpriv' instruction easily. But I have no further objection if you guys are fine to this ;-) I can see the using ehpriv can be a default choice. But all ehvpriv trap is handled at one place (in a single function) so the accidently overlap with debug should not be an issue. I too do not have any strong opinion to keep either way, so want
RE: [PATCH 0/2 v3] powerpc: Make ptrace work reliably
Hi Ben, Ping; Please review this patchset .. Thanks -Bharat -Original Message- From: Bhushan Bharat-R65777 Sent: Wednesday, May 22, 2013 9:51 AM To: ga...@kernel.crashing.org; b...@kernel.crashing.org; linuxppc- d...@lists.ozlabs.org; Wood Scott-B07421; Yoder Stuart-B08248; Yang James-RA8135 Cc: Bhushan Bharat-R65777 Subject: [PATCH 0/2 v3] powerpc: Make ptrace work reliably From: Bharat Bhushan bharat.bhus...@freescale.com v2-v3 - Load PACACURRENT immediately after _MSR(r1), and load DBCR0 just after beq resume_kernel - Added lat_sysycal results before and after the patch v1-v2 - Subject line was missing 0/2, 1/2, 2/2 Bharat Bhushan (2): powerpc: debug control and status registers are 32bit = This patch makes debug control and status registers as 32bit as they are. This does not fix anything powerpc: restore dbcr0 on user space exit = This patch fixes the ptrace reliability issue. The description is the patch describes one of the case where it does not work reliably arch/powerpc/include/asm/processor.h |8 arch/powerpc/kernel/asm-offsets.c|1 + arch/powerpc/kernel/entry_64.S | 28 3 files changed, 29 insertions(+), 8 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: SATA FSL and upstreaming
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Benjamin Herrenschmidt Sent: Thursday, May 16, 2013 11:16 AM To: Liu Qiang-B32616 Cc: linuxppc-dev@lists.ozlabs.org; Fleming Andy-AFLEMING; Xie Shaohui-B21989 Subject: Re: SATA FSL and upstreaming On Thu, 2013-05-16 at 14:47 +1000, Benjamin Herrenschmidt wrote: Hi folks ! So I was trying to use my 5020ds to test some stuff today. Since I hadn't used it in a while, I decided to upgrade it to the latest NOR etc... On another note, I can't seem to get any PCIe card recognized in any slot... Can you give me an example config of the DIP switches that is known to work with some slots ? Is there some EEPROM config needed ? If yes, any pointers ? (I can't quite make sense of either u-boot or the doc there). Can you give RCW dump? Or can try the attached RCW. Thanks -Bharat Thanks, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev rcw_15g_2000mhz.rcw Description: rcw_15g_2000mhz.rcw ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: SATA FSL and upstreaming
Try: From bank 0 tftp 0x100 rcw_2sgmii_1500mhz.bin protect off 0xec00 +$filesize; erase 0xec00 +$filesize; cp.b 0x100 0xec00 $filesize Thanks -Bharat -Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Thursday, May 16, 2013 11:54 AM To: Zang Roy-R61911 Cc: Bhushan Bharat-R65777; Liu Qiang-B32616; Fleming Andy-AFLEMING; linuxppc- d...@lists.ozlabs.org; Xie Shaohui-B21989 Subject: Re: SATA FSL and upstreaming On Thu, 2013-05-16 at 06:17 +, Zang Roy-R61911 wrote: Do you try slot7? PCIe1 connects to slot7 directly. I tried all slots. None of them sees any card. The card also doesn't seem to be powered up (none of the LEDs blink, it's an e1000 since I don't have networking with upstream). I also tried a different card and uboot is pretty adamant at saying no link :- ) I'll try to update the RCW when I know how to do it :-) Cheers, Ben. Roy -Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Thursday, May 16, 2013 2:09 PM To: Zang Roy-R61911 Cc: Bhushan Bharat-R65777; Liu Qiang-B32616; Fleming Andy-AFLEMING; linuxppc-dev@lists.ozlabs.org; Xie Shaohui-B21989 Subject: Re: SATA FSL and upstreaming On Thu, 2013-05-16 at 06:05 +, Zang Roy-R61911 wrote: I do not suggest changing the RCW. If the RCW is broken on Ben's side, it is not easy to recover for him. Let's check the U-boot output first. U-Boot 2013.01-9-g7bcd7f4 (Mar 14 2013 - 14:23:16) CPU0: P5020E, Version: 1.0, (0x82280010) Core: E5500, Version: 1.0, (0x80240010) Clock Configuration: CPU0:2000 MHz, CPU1:2000 MHz, CCB:800 MHz, DDR:666.667 MHz (1333.333 MT/s data rate) (Asynchronous), LBC:100 MHz FMAN1: 600 MHz QMAN: 400 MHz PME: 400 MHz L1:D-cache 32 kB enabled I-cache 32 kB enabled Board: P5020DS, Sys ID: 0x1c, Sys Ver: 0x12, FPGA Ver: 0x05, vBank: 0 Reset Configuration Word (RCW): : 0c54 1e12 0010: d8984a01 03002000 de80 4100 0020: 1007 0030: SERDES Reference Clocks: Bank1=100Mhz Bank2=125Mhz Bank3=125Mhz I2C: ready SPI: ready DRAM: Initializingusing SPD Detected UDIMM i-DIMM Detected UDIMM i-DIMM 2 GiB left unmapped 4 GiB (DDR3, 64-bit, CL=9, ECC on) DDR Controller Interleaving Mode: cache line DDR Chip-Select Interleaving Mode: CS0+CS1 Testing 0x - 0x7fff Testing 0x8000 - 0x Remap DDR 2 GiB left unmapped POST memory PASSED Flash: 128 MiB L2:512 KB enabled Corenet Platform Cache: 2048 KB enabled SRIO1: disabled SRIO2: disabled NAND: 1024 MiB MMC: FSL_SDHC: 0 EEPROM: NXID v1 PCIe1: Root Complex, no link, regs @ 0xfe20 PCIe1: Bus 00 - 00 PCIe2: disabled PCIe3: Root Complex, no link, regs @ 0xfe202000 PCIe3: Bus 01 - 01 PCIe4: disabled In:serial Out: serial Err: serial Net: Initializing Fman Fman1: Uploading microcode version 106.1.7 PHY reset timed out PHY reset timed out PHY reset timed out PHY reset timed out FM1@DTSEC1, FM1@DTSEC2, FM1@DTSEC3, FM1@DTSEC4, FM1@DTSEC5, FM1@TGEC1 Hit any key to stop autoboot: 0 = Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: SATA FSL and upstreaming
-Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 16, 2013 12:13 PM To: Benjamin Herrenschmidt Cc: Zang Roy-R61911; Liu Qiang-B32616; Fleming Andy-AFLEMING; linuxppc- d...@lists.ozlabs.org; Xie Shaohui-B21989; Bhushan Bharat-R65777 Subject: Re: SATA FSL and upstreaming On 05/16/2013 02:40 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-05-16 at 14:35 +0800, tiejun.chen wrote: On 05/16/2013 02:21 PM, Benjamin Herrenschmidt wrote: On Thu, 2013-05-16 at 14:17 +0800, tiejun.chen wrote: I think you can use Bharat's RCW, which seems RR_HXAPNSP_0x36, then please take a look at this: Ok, how do I update my RCW to bse Bharat's ? Firstly please check which flash bank is used since we have to know where should be updated RCW. What is SW7[1:4]? Or we have another simple way in u-boot prompt: = md.b ffdf002c ffdf002c: 4f 00 fe 00 39 00 00 00 00 00 00 00 00 00 00 00 O...9... ... ffdf002c: 0f 00 fe 00 00 00 00 00 00 00 00 00 00 00 00 00 This means we're on bank4. I assume that means bank0 ? Yes, RCW should be burned to 0xec00. In u-boot prompt: = loady ## Ready for binary (ymodem) download to 0x0100 at 115200 bps... C Then send that RCW with ymodem in your terminal client. 1) Load RCW as Tiejun on some address in DDR. 2) Brun RCW at 0xec00: protect off 0xec00 +$filesize; erase 0xec00 +$filesize; cp.b 0x100 0xec00 $filesize 3) run pix altbak command 4) check you are on bank4 5) If you are luckier then networking will work for you. Thanks -Bharat Tiejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: SATA FSL and upstreaming
Ben, Which SDK you are using? -Bharat -Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Thursday, May 16, 2013 12:36 PM To: Zang Roy-R61911 Cc: Bhushan Bharat-R65777; tiejun.chen; Liu Qiang-B32616; Fleming Andy-AFLEMING; linuxppc-dev@lists.ozlabs.org; Xie Shaohui-B21989 Subject: Re: SATA FSL and upstreaming On Thu, 2013-05-16 at 07:01 +, Zang Roy-R61911 wrote: I just tried your RCW. one e1000 card works in slot7. we may need to check others ... Tried 4 and 7 ... Note that this *used* to work. Last year I had this machine up with 2 cards doing things. Not sure what changed, it's possible that the DIP got inadvertently changed. Or somebody stole a jumper from it in the lab :-) U-Boot 2013.01-00078-g2741c99 (May 03 2013 - 00:20:41) CPU0: P5020E, Version: 2.0, (0x82280020) Core: E5500, Version: 1.2, (0x80240012) Clock Configuration: CPU0:2000 MHz, CPU1:2000 MHz, CCB:800 MHz, DDR:666.667 MHz (1333.333 MT/s data rate) (Asynchronous), LBC:100 MHz FMAN1: 600 MHz QMAN: 400 MHz PME: 400 MHz L1:D-cache 32 kB enabled I-cache 32 kB enabled Reset Configuration Word (RCW): : 0c54 1e12 0010: d8984a01 03002000 de80 4100 0020: 1007 0030: My RCW is identical Board: P5020DS, Sys ID: 0x1c, Sys Ver: 0x02, FPGA Ver: 0x04, vBank: 4 Mine is: Board: P5020DS, Sys ID: 0x1c, Sys Ver: 0x12, FPGA Ver: 0x05, vBank: 4 SERDES Reference Clocks: Bank1=100Mhz Bank2=125Mhz Bank3=125Mhz Same. I2C: ready SPI: ready DRAM: Initializingusing SPD Detected UDIMM i-DIMM Detected UDIMM i-DIMM 2 GiB left unmapped 4 GiB (DDR3, 64-bit, CL=9, ECC on) DDR Controller Interleaving Mode: cache line DDR Chip-Select Interleaving Mode: CS0+CS1 Testing 0x - 0x7fff Testing 0x8000 - 0x Remap DDR 2 GiB left unmapped POST memory PASSED Flash: 128 MiB L2:512 KB enabled Corenet Platform Cache: 2048 KB enabled SRIO1: disabled SRIO2: disabled NAND: 1024 MiB MMC: FSL_SDHC: 0 EEPROM: Invalid ID (ff ff ff ff) PCIe1: Root Complex, x2, regs @ 0xfe20 01:00.0 - 8086:105e - Network controller 01:00.1 - 8086:105e - Network controller PCIe1: Bus 00 - 01 PCIe2: disabled PCIe3: Root Complex, no link, regs @ 0xfe202000 PCIe3: Bus 02 - 02 PCIe4: disabled And I never see anything here anymore... In:serial Out: serial Err: serial Net: Initializing Fman Fman1: Uploading microcode version 106.1.6 PHY reset timed out PHY reset timed out PHY reset timed out PHY reset timed out e1000: 00:15:17:16:ce:b8 e1000: 00:15:17:16:ce:b9 FM1@DTSEC1, FM1@DTSEC2, FM1@DTSEC3, FM1@DTSEC4 [PRIME], FM1@DTSEC5, FM1@TGEC1, e1000#0 Warning: e1000#0 MAC addresses don't match: Address in SROM is 00:15:17:16:ce:b8 Address in environment is 00:1b:21:68:5e:d4 , e1000#1 Warning: e1000#1 using MAC address from net device = ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: SATA FSL and upstreaming
Ben, If you are using SDK1.3 and later then the support for p5020ds rev 1.0 support is removed. So use earlier sdk for rev 1.0 or wait for rev2.0 :) Thanks -Bharat -Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Thursday, May 16, 2013 12:36 PM To: Zang Roy-R61911 Cc: Bhushan Bharat-R65777; tiejun.chen; Liu Qiang-B32616; Fleming Andy-AFLEMING; linuxppc-dev@lists.ozlabs.org; Xie Shaohui-B21989 Subject: Re: SATA FSL and upstreaming On Thu, 2013-05-16 at 07:01 +, Zang Roy-R61911 wrote: I just tried your RCW. one e1000 card works in slot7. we may need to check others ... Tried 4 and 7 ... Note that this *used* to work. Last year I had this machine up with 2 cards doing things. Not sure what changed, it's possible that the DIP got inadvertently changed. Or somebody stole a jumper from it in the lab :-) U-Boot 2013.01-00078-g2741c99 (May 03 2013 - 00:20:41) CPU0: P5020E, Version: 2.0, (0x82280020) Core: E5500, Version: 1.2, (0x80240012) Clock Configuration: CPU0:2000 MHz, CPU1:2000 MHz, CCB:800 MHz, DDR:666.667 MHz (1333.333 MT/s data rate) (Asynchronous), LBC:100 MHz FMAN1: 600 MHz QMAN: 400 MHz PME: 400 MHz L1:D-cache 32 kB enabled I-cache 32 kB enabled Reset Configuration Word (RCW): : 0c54 1e12 0010: d8984a01 03002000 de80 4100 0020: 1007 0030: My RCW is identical Board: P5020DS, Sys ID: 0x1c, Sys Ver: 0x02, FPGA Ver: 0x04, vBank: 4 Mine is: Board: P5020DS, Sys ID: 0x1c, Sys Ver: 0x12, FPGA Ver: 0x05, vBank: 4 SERDES Reference Clocks: Bank1=100Mhz Bank2=125Mhz Bank3=125Mhz Same. I2C: ready SPI: ready DRAM: Initializingusing SPD Detected UDIMM i-DIMM Detected UDIMM i-DIMM 2 GiB left unmapped 4 GiB (DDR3, 64-bit, CL=9, ECC on) DDR Controller Interleaving Mode: cache line DDR Chip-Select Interleaving Mode: CS0+CS1 Testing 0x - 0x7fff Testing 0x8000 - 0x Remap DDR 2 GiB left unmapped POST memory PASSED Flash: 128 MiB L2:512 KB enabled Corenet Platform Cache: 2048 KB enabled SRIO1: disabled SRIO2: disabled NAND: 1024 MiB MMC: FSL_SDHC: 0 EEPROM: Invalid ID (ff ff ff ff) PCIe1: Root Complex, x2, regs @ 0xfe20 01:00.0 - 8086:105e - Network controller 01:00.1 - 8086:105e - Network controller PCIe1: Bus 00 - 01 PCIe2: disabled PCIe3: Root Complex, no link, regs @ 0xfe202000 PCIe3: Bus 02 - 02 PCIe4: disabled And I never see anything here anymore... In:serial Out: serial Err: serial Net: Initializing Fman Fman1: Uploading microcode version 106.1.6 PHY reset timed out PHY reset timed out PHY reset timed out PHY reset timed out e1000: 00:15:17:16:ce:b8 e1000: 00:15:17:16:ce:b9 FM1@DTSEC1, FM1@DTSEC2, FM1@DTSEC3, FM1@DTSEC4 [PRIME], FM1@DTSEC5, FM1@TGEC1, e1000#0 Warning: e1000#0 MAC addresses don't match: Address in SROM is 00:15:17:16:ce:b8 Address in environment is 00:1b:21:68:5e:d4 , e1000#1 Warning: e1000#1 using MAC address from net device = ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 2/2 v2] powerpc: restore dbcr0 on user space exit
-Original Message- From: Wood Scott-B07421 Sent: Thursday, May 16, 2013 10:24 PM To: Bhushan Bharat-R65777 Cc: ga...@kernel.crashing.org; b...@kernel.crashing.org; linuxppc- d...@lists.ozlabs.org; Yoder Stuart-B08248; Yang James-RA8135; Bhushan Bharat- R65777 Subject: Re: [PATCH 2/2 v2] powerpc: restore dbcr0 on user space exit On 05/16/2013 12:34:32 AM, Bharat Bhushan wrote: On BookE (Branch taken + Single Step) is as same as Branch Taken on BookS and in Linux we simulate BookS behavior for BookE as well. When doing so, in Branch taken handling we want to set DBCR0_IC but we update the current-thread-dbcr0 and not DBCR0. Now on 64bit the current-thread.dbcr0 (and other debug registers) is synchronized ONLY on context switch flow. But after handling Branch taken in debug exception if we return back to user space without context switch then single stepping change (DBCR0_ICMP) does not get written in h/w DBCR0 and Instruction Complete exception does not happen. This fixes using ptrace reliably on BookE-PowerPC Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v1-v2 - Subject line was not having 2/2 arch/powerpc/kernel/asm-offsets.c |1 + arch/powerpc/kernel/entry_64.S| 24 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index b51a97c..1e2f450 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -103,6 +103,7 @@ int main(void) #endif /* CONFIG_VSX */ #ifdef CONFIG_PPC64 DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid)); + DEFINE(THREAD_DBCR0, offsetof(struct thread_struct, dbcr0)); #else /* CONFIG_PPC64 */ DEFINE(PGDIR, offsetof(struct thread_struct, pgdir)); #if defined(CONFIG_4xx) || defined(CONFIG_BOOKE) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 794889b..561630d 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -614,7 +614,9 @@ _GLOBAL(ret_from_except_lite) * from the interrupt. */ #ifdef CONFIG_PPC_BOOK3E + ld r3,PACACURRENT(r13) wrteei 0 + lwz r10,(THREAD+THREAD_DBCR0)(r3) I know I asked you to move these earlier, but this is probably too early... wrteei has synchronization, so it will probably have to wait until the ld completes, defeating the purpose of moving it earlier. Ideal would probably be to load PACACURRENT immediately after _MSR(r1), and load DBCR0 just after beq resume_kernel. ok Or, move DBCR0 to therad_info as I suggested internally. If no one have objection on moving dbcr0 to thread_info then I am happy to do that. Regardless of what you do, could you run a basic syscall benchmark (e.g. from lmbench) before and after the patch? Sure. -Bharat -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca-irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Embedded Perfmon interrupt is also asynchronous, Why that is not in the list of masked interruts. -Bharat Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. -Mike ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
-Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 09, 2013 1:18 PM To: Bhushan Bharat-R65777 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of bounces+Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca- irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Embedded Perfmon interrupt is also asynchronous, Why that is not in the list of masked interruts. Are you saying perfmon? If so, its also in that list: START_EXCEPTION(perfmon); NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR, PROLOG_ADDITION_NONE) EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE) Where it is recorded in paca-irq_happned to be replayed later ? Tiejun -Bharat Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. -Mike ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
-Original Message- From: Kevin Hao [mailto:haoke...@gmail.com] Sent: Thursday, May 09, 2013 1:38 PM To: Bhushan Bharat-R65777 Cc: tiejun.chen; Caraman Mihai Claudiu-B02008; k...@vger.kernel.org; Wood Scott- B07421; ag...@suse.de; kvm-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On Thu, May 09, 2013 at 07:51:09AM +, Bhushan Bharat-R65777 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 09, 2013 1:18 PM To: Bhushan Bharat-R65777 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf bounces+Of Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca- irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Embedded Perfmon interrupt is also asynchronous, Why that is not in the list of masked interruts. Are you saying perfmon? If so, its also in that list: START_EXCEPTION(perfmon); NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR, PROLOG_ADDITION_NONE) EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE) Where it is recorded in paca-irq_happned to be replayed later ? Actually we don't want replay the perfmon interrupt later. We would run it even soft irq is disabled and just treat it as NMI. Please see the following function quoted from arch/powerpc/perf/core-fsl-emb.c: /* * If interrupts were soft-disabled when a PMU interrupt occurs, treat * it as an NMI. */ static inline int perf_intr_is_nmi(struct pt_regs *regs) { #ifdef __powerpc64__ return !regs-softe; #else return 0; #endif } Is it because that we cannot afford to lose perfmon interrupt for more accurate capturing of data ? -Bharat Thanks, Kevin Tiejun -Bharat Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. -Mike ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca-irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. Another Question: The case is: Case 1) - Local_irq_disable() will set soft_enabled = 0 - Now Externel interrupt happens, there we set PACA_IRQ_EE in irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard disabled. No more other interrupt gated by MSR.EE can happen. Looks like the idea here is to not let a device keep on inserting interrupt till the interrupt condition on device is cleared, right? - local_irq_enable() - This checks that irq_happened is set, and replays Now the case 2) Case 2) - Local_irq_disable() will set soft_enabled = 0 - Now DEC interrupt happens. We set PACA_IRQ_DEC in irq_happened, But do not clear EE in SRR1 and rfi. So interrupts are not hard disabled. - Now say EE interrupt happens, there we set PACA_IRQ_EE in irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard disabled. - local_irq_enable() - This checks that irq_happened is set. IIUC, it replays only one interrupt? is not it? -Bharat ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
-Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 09, 2013 1:48 PM To: Bhushan Bharat-R65777 Cc: Kevin Hao; Caraman Mihai Claudiu-B02008; k...@vger.kernel.org; Wood Scott- B07421; ag...@suse.de; kvm-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On 05/09/2013 04:12 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Kevin Hao [mailto:haoke...@gmail.com] Sent: Thursday, May 09, 2013 1:38 PM To: Bhushan Bharat-R65777 Cc: tiejun.chen; Caraman Mihai Claudiu-B02008; k...@vger.kernel.org; Wood Scott- B07421; ag...@suse.de; kvm-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On Thu, May 09, 2013 at 07:51:09AM +, Bhushan Bharat-R65777 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 09, 2013 1:18 PM To: Bhushan Bharat-R65777 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf bounces+Of Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca- irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Embedded Perfmon interrupt is also asynchronous, Why that is not in the list of masked interruts. Are you saying perfmon? If so, its also in that list: START_EXCEPTION(perfmon); NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR, PROLOG_ADDITION_NONE) EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE) Where it is recorded in paca-irq_happned to be replayed later ? Actually we don't want replay the perfmon interrupt later. We would run it even soft irq is disabled and just treat it as NMI. Please see the following function quoted from arch/powerpc/perf/core-fsl-emb.c: /* * If interrupts were soft-disabled when a PMU interrupt occurs, treat * it as an NMI. */ static inline int perf_intr_is_nmi(struct pt_regs *regs) { #ifdef __powerpc64__ return !regs-softe; #else return 0; #endif } Is it because that we cannot afford to lose perfmon interrupt for more accurate capturing of data ? powerpc/perf: e500 support This implements perf_event support for the Freescale embedded performance monitor, based on the existing perf_event.c that supports server/classic chips. Some limitations: - Performance monitor interrupts are regular EE interrupts, and thus you can't profile places with interrupts disabled. We may want to implement soft IRQ-disabling, with perfmon interrupts exempted and treated as NMIs. Ahh, that gives the answer and same as I expected :) -Bharat Tiejun -Bharat Thanks, Kevin Tiejun -Bharat Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. -Mike ___ Linuxppc-dev mailing list
RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
-Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 09, 2013 3:15 PM To: Bhushan Bharat-R65777 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On 05/09/2013 04:23 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of bounces+Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca- irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. Another Question: The case is: Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still occur as I recall. Case 1) - Local_irq_disable() will set soft_enabled = 0 - Now Externel interrupt happens, there we set PACA_IRQ_EE in irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard disabled. No more other interrupt gated by MSR.EE can happen. Looks like the idea here is to not let a device keep on inserting interrupt till the interrupt condition on device is cleared, right? I don't understand the interrupt condition on device is cleared here. I think regardless if you clear the device interrupt status, the system still receive a pending interrupt once EE or GS = 1. Once yes, but I think to avoid flood of device interrupt we disable MSR.EE when soft-disabled. - local_irq_enable() - This checks that irq_happened is set, and replays ret_from_except also check to replay. Now the case 2) Case 2) - Local_irq_disable() will set soft_enabled = 0 - Now DEC interrupt happens. We set PACA_IRQ_DEC in irq_happened, But do not clear EE in SRR1 and rfi. So interrupts are not hard disabled. - Now say EE interrupt happens, there we set PACA_IRQ_EE in irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard disabled. - local_irq_enable() - This checks that irq_happened is set. IIUC, it replays only one interrupt? is not it? After anyone is replayed in arch_local_irq_restore(), we will set soft/hard interrupt there: set_soft_enabled(1); __hard_irq_enable(); Then any pending interrupt can be executed now. Do you mean that the interrupt should fire again? Additionally, ret_from_except probably check to replay all. Local_irq_enable() will not take us to ret_from_except. -Bharat Tiejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts
-Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 09, 2013 3:48 PM To: Bhushan Bharat-R65777 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On 05/09/2013 06:00 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: tiejun.chen [mailto:tiejun.c...@windriver.com] Sent: Thursday, May 09, 2013 3:15 PM To: Bhushan Bharat-R65777 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc- d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts On 05/09/2013 04:23 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of bounces+Caraman Mihai Claudiu-B02008 Sent: Wednesday, May 08, 2013 6:44 PM To: Wood Scott-B07421; tiejun.chen Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org; k...@vger.kernel.org Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts This only disable soft interrupt for kvmppc_restart_interrupt() that restarts interrupts if they were meant for the host: a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL | BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL Those aren't the only exceptions that can end up going to the host. We could get a TLB miss that results in a heavyweight MMIO exit, etc. And shouldn't we handle kvmppc_restart_interrupt() like the original HOST flow? #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr, ack) \ START_EXCEPTION(label); \ NORMAL_EXCEPTION_PROLOG(trapnum, intnum, PROLOG_ADDITION_MASKABLE)\ EXCEPTION_COMMON(trapnum, PACA_EXGEN, *INTS_DISABLE*) \ ... Could you elaborate on what you mean? I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL interrupts. There is special macro masked_interrupt_book3e in those exception handlers that sets paca- irq_happened. The list of replied interrupts is limited to asynchronous noncritical interrupts which can be masked by MSR[EE] (therefore no TLB miss). Now on KVM book3e we don't want to put them in the irq_happened lazy state but rather to execute them directly, so there is no reason for exception handling symmetry between host and guest. Another Question: The case is: Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still occur as I recall. Case 1) - Local_irq_disable() will set soft_enabled = 0 - Now Externel interrupt happens, there we set PACA_IRQ_EE in irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard disabled. No more other interrupt gated by MSR.EE can happen. Looks like the idea here is to not let a device keep on inserting interrupt till the interrupt condition on device is cleared, right? I don't understand the interrupt condition on device is cleared here. I think regardless if you clear the device interrupt status, the system still receive a pending interrupt once EE or GS = 1. Once yes, but I think to avoid flood of device interrupt we disable MSR.EE when soft-disabled. But we neither ACK nor send EOI to that irq in the interrupt controller, so that should be in pending state. - local_irq_enable() - This checks that irq_happened is set, and replays ret_from_except also check to replay. Now the case 2) Case 2) - Local_irq_disable() will set soft_enabled = 0 - Now DEC interrupt happens. We set PACA_IRQ_DEC in irq_happened, But do not clear EE in SRR1 and rfi. So interrupts are not hard disabled. - Now say EE interrupt happens, there we set PACA_IRQ_EE in irq_happened, Also clears EE in SRR1 and rfi. So interrupts are hard disabled. - local_irq_enable() - This checks that irq_happened is set. IIUC, it replays only one interrupt? is not it? After anyone is replayed in arch_local_irq_restore(), we will set soft/hard interrupt there: set_soft_enabled(1); __hard_irq_enable(); Then any pending interrupt can be executed now. Do you mean that the interrupt should fire again? I means the pending exception including external interrupt, the decrementer exception and the doorbell exception, can trap CPU once EE=1 with __hard_irq_enable() here. Then the kernel can handle those exception since soft enable is also 1 now. Additionally, ret_from_except probably check to replay all. Local_irq_enable() will not take us to ret_from_except. Yes. I just say ret_from_except can provide an approach to replay all
RE: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry
-Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Scott Wood Sent: Friday, May 10, 2013 8:40 AM To: Alexander Graf; Benjamin Herrenschmidt Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421 Subject: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry Currently this is only being done on 64-bit. Rather than just move it out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is consistent with lazy ee state, and so that we don't track more host code as interrupts-enabled than necessary. Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect that this function now has a role on 32-bit as well. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/include/asm/kvm_ppc.h | 11 --- arch/powerpc/kvm/book3s_pr.c |4 ++-- arch/powerpc/kvm/booke.c |4 ++-- arch/powerpc/kvm/powerpc.c |2 -- 4 files changed, 12 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index a5287fe..6885846 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -394,10 +394,15 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn) } } -/* Please call after prepare_to_enter. This function puts the lazy ee state - back to normal mode, without actually enabling interrupts. */ -static inline void kvmppc_lazy_ee_enable(void) +/* + * Please call after prepare_to_enter. This function puts the lazy ee and irq + * disabled tracking state back to normal mode, without actually enabling + * interrupts. + */ +static inline void kvmppc_fix_ee_before_entry(void) { + trace_hardirqs_on(); + #ifdef CONFIG_PPC64 /* Only need to enable IRQs by hard enabling them after this */ local_paca-irq_happened = 0; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index bdc40b8..0b97ce4 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -890,7 +890,7 @@ program_interrupt: local_irq_enable(); r = s; } else { - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); } } @@ -1161,7 +1161,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) if (vcpu-arch.shared-msr MSR_FP) kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP); - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); ret = __kvmppc_vcpu_run(kvm_run, vcpu); diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 705fc5c..eb89b83 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -673,7 +673,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) ret = s; goto out; } - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); local_irq_disable() is called before kvmppc_prepare_to_enter(). Now we put the irq_happend and soft_enabled back to previous state without checking for any interrupt happened in between. If any interrupt happens in between, will not that be lost? -Bharat kvm_guest_enter(); @@ -1154,7 +1154,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, local_irq_enable(); r = (s 2) | RESUME_HOST | (r RESUME_FLAG_NV); } else { - kvmppc_lazy_ee_enable(); + kvmppc_fix_ee_before_entry(); } } diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 6316ee3..4e05f8c 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -117,8 +117,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) kvm_guest_exit(); continue; } - - trace_hardirqs_on(); #endif kvm_guest_enter(); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v2 2/4] kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
-Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Scott Wood Sent: Friday, May 10, 2013 8:40 AM To: Alexander Graf; Benjamin Herrenschmidt Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421 Subject: [PATCH v2 2/4] kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit() EE is hard-disabled on entry to kvmppc_handle_exit(), so call hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled is unset. Without this, we get warnings such as arch/powerpc/kernel/time.c:300, and sometimes host kernel hangs. Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/kvm/booke.c |5 + 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 1020119..705fc5c 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -833,6 +833,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, int r = RESUME_HOST; int s; +#ifdef CONFIG_PPC64 + WARN_ON(local_paca-irq_happened != 0); +#endif + hard_irq_disable(); It is not actually to hard disable as EE is already clear but to make it looks like hard_disable to host. Right? If so, should we write a comment here on why we are doing this? -Bharat + /* update before a new last_exit_type is rewritten */ kvmppc_update_timing_stats(vcpu); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v2 4/4] kvm/ppc: IRQ disabling cleanup
-Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Scott Wood Sent: Friday, May 10, 2013 8:40 AM To: Alexander Graf; Benjamin Herrenschmidt Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421 Subject: [PATCH v2 4/4] kvm/ppc: IRQ disabling cleanup Simplify the handling of lazy EE by going directly from fully-enabled to hard-disabled. This replaces the lazy_irq_pending() check (including its misplaced kvm_guest_exit() call). As suggested by Tiejun Chen, move the interrupt disabling into kvmppc_prepare_to_enter() rather than have each caller do it. Also move the IRQ enabling on heavyweight exit into kvmppc_prepare_to_enter(). Don't move kvmppc_fix_ee_before_entry() into kvmppc_prepare_to_enter(), so that the caller can avoid marking interrupts enabled earlier than necessary (e.g. book3s_pr waits until after FP save/restore is done). Signed-off-by: Scott Wood scottw...@freescale.com --- arch/powerpc/include/asm/kvm_ppc.h |6 ++ arch/powerpc/kvm/book3s_pr.c | 12 +++- arch/powerpc/kvm/booke.c |9 ++--- arch/powerpc/kvm/powerpc.c | 21 - 4 files changed, 19 insertions(+), 29 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 6885846..e4474f8 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -404,6 +404,12 @@ static inline void kvmppc_fix_ee_before_entry(void) trace_hardirqs_on(); #ifdef CONFIG_PPC64 + /* + * To avoid races, the caller must have gone directly from having + * interrupts fully-enabled to hard-disabled. + */ + WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS); + /* Only need to enable IRQs by hard enabling them after this */ local_paca-irq_happened = 0; local_paca-soft_enabled = 1; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index 0b97ce4..e61e39e 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -884,14 +884,11 @@ program_interrupt: * and if we really did time things so badly, then we just exit * again due to a host external interrupt. */ - local_irq_disable(); s = kvmppc_prepare_to_enter(vcpu); - if (s = 0) { - local_irq_enable(); + if (s = 0) r = s; - } else { + else kvmppc_fix_ee_before_entry(); - } } trace_kvm_book3s_reenter(r, vcpu); @@ -1121,12 +1118,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) * really did time things so badly, then we just exit again due to * a host external interrupt. */ - local_irq_disable(); ret = kvmppc_prepare_to_enter(vcpu); - if (ret = 0) { - local_irq_enable(); + if (ret = 0) goto out; - } /* Save FPU state in stack */ if (current-thread.regs-msr MSR_FP) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index eb89b83..f7c0111 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -666,10 +666,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) return -EINVAL; } - local_irq_disable(); s = kvmppc_prepare_to_enter(vcpu); if (s = 0) { - local_irq_enable(); ret = s; goto out; } @@ -1148,14 +1146,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, * aren't already exiting to userspace for some other reason. */ if (!(r RESUME_HOST)) { - local_irq_disable(); Ok, Now we do not soft disable before kvmppc_prapare_to_enter(). s = kvmppc_prepare_to_enter(vcpu); - if (s = 0) { - local_irq_enable(); + if (s = 0) r = (s 2) | RESUME_HOST | (r RESUME_FLAG_NV); - } else { + else kvmppc_fix_ee_before_entry(); - } } return r; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 4e05f8c..f8659aa 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -64,12 +64,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) { int r = 1; - WARN_ON_ONCE(!irqs_disabled()); + WARN_ON(irqs_disabled()); + hard_irq_disable(); Here we hard disable in kvmppc_prepare_to_enter(), so my comment in other patch about interrupt loss is no more valid. So here MSR.EE = 0 local_paca-soft_enabled = 0 local_paca-irq_happened |= PACA_IRQ_HARD_DIS; + while (true)
RE: [PATCH] bookehv: Handle debug exception on guest exit
Hi Kumar/Benh, After further looking into the code I think that if we correct the vector range below in DebugDebug handler then we do not need the change I provided in this patch. Here is the snapshot for 32 bit (head_booke.h, same will be true for 64 bit): #define DEBUG_DEBUG_EXCEPTION \ START_EXCEPTION(DebugDebug); \ DEBUG_EXCEPTION_PROLOG; \ \ /*\ * If there is a single step or branch-taken exception in an \ * exception entry sequence, it was probably meant to apply to\ * the code where the exception occurred (since exception entry \ * doesn't turn off DE automatically). We simulate the effect\ * of turning off DE on entry to an exception handler by turning \ * off DE in the DSRR1 value and clearing the debug status. \ */ \ mfspr r10,SPRN_DBSR; /* check single-step/branch taken */ \ andis. r10,r10,(DBSR_IC|DBSR_BT)@h; \ beq+2f; \ \ lis r10,KERNELBASE@h; /* check if exception in vectors */ \ ori r10,r10,KERNELBASE@l; \ cmplw r12,r10; \ blt+2f; /* addr below exception vectors */\ \ lis r10,DebugDebug@h;\ ori r10,r10,DebugDebug@l; \ Here we assume all exception vector ends at DebugDebug, which is not correct. We probably should get proper end by using some start_vector and end_vector lebels or at least use end at Ehvpriv (which is last defined in head_fsl_booke.S for PowerPC. Is that correct? cmplw r12,r10; \ bgt+2f; /* addr above exception vectors */\ Thanks -Bharat -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Bhushan Bharat-R65777 Sent: Thursday, April 04, 2013 8:29 PM To: Alexander Graf Cc: linuxppc-dev@lists.ozlabs.org; k...@vger.kernel.org; kvm-...@vger.kernel.org; Wood Scott-B07421 Subject: RE: [PATCH] bookehv: Handle debug exception on guest exit -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, April 04, 2013 6:55 PM To: Bhushan Bharat-R65777 Cc: linuxppc-dev@lists.ozlabs.org; k...@vger.kernel.org; kvm-...@vger.kernel.org; Wood Scott-B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH] bookehv: Handle debug exception on guest exit On 20.03.2013, at 18:45, Bharat Bhushan wrote: EPCR.DUVD controls whether the debug events can come in hypervisor mode or not. When KVM guest is using the debug resource then we do not want debug events to be captured in guest entry/exit path. So we set EPCR.DUVD when entering and clears EPCR.DUVD when exiting from guest. Debug instruction complete is a post-completion debug exception but debug event gets posted on the basis of MSR before the instruction is executed. Now if the instruction switches the context from guest mode (MSR.GS = 1) to hypervisor mode (MSR.GS = 0) then the xSRR0 points to first instruction of KVM handler and xSRR1 points that MSR.GS is clear (hypervisor context). Now as xSRR1.GS is used to decide whether KVM handler will be invoked to handle the exception or host host kernel debug handler will be invoked to handle the exception. This leads to host kernel debug handler handling the exception which should either be handled by KVM. This is tested on e500mc in 32 bit mode Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v0: - Do not apply this change for debug_crit as we do not know those chips have issue or not. - corrected 64bit case branching arch/powerpc/kernel/exceptions-64e.S | 29 - arch/powerpc/kernel/head_booke.h | 26 ++ 2 files changed, 54 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 4684e33..8b26294 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel
RE: [PATCH] bookehv: Handle debug exception on guest exit
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, April 04, 2013 6:55 PM To: Bhushan Bharat-R65777 Cc: linuxppc-dev@lists.ozlabs.org; k...@vger.kernel.org; kvm-...@vger.kernel.org; Wood Scott-B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH] bookehv: Handle debug exception on guest exit On 20.03.2013, at 18:45, Bharat Bhushan wrote: EPCR.DUVD controls whether the debug events can come in hypervisor mode or not. When KVM guest is using the debug resource then we do not want debug events to be captured in guest entry/exit path. So we set EPCR.DUVD when entering and clears EPCR.DUVD when exiting from guest. Debug instruction complete is a post-completion debug exception but debug event gets posted on the basis of MSR before the instruction is executed. Now if the instruction switches the context from guest mode (MSR.GS = 1) to hypervisor mode (MSR.GS = 0) then the xSRR0 points to first instruction of KVM handler and xSRR1 points that MSR.GS is clear (hypervisor context). Now as xSRR1.GS is used to decide whether KVM handler will be invoked to handle the exception or host host kernel debug handler will be invoked to handle the exception. This leads to host kernel debug handler handling the exception which should either be handled by KVM. This is tested on e500mc in 32 bit mode Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- v0: - Do not apply this change for debug_crit as we do not know those chips have issue or not. - corrected 64bit case branching arch/powerpc/kernel/exceptions-64e.S | 29 - arch/powerpc/kernel/head_booke.h | 26 ++ 2 files changed, 54 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index 4684e33..8b26294 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -516,6 +516,33 @@ kernel_dbg_exc: andis. r15,r14,DBSR_IC@h beq+1f +#ifdef CONFIG_KVM_BOOKE_HV + /* +* EPCR.DUVD controls whether the debug events can come in +* hypervisor mode or not. When KVM guest is using the debug +* resource then we do not want debug events to be captured +* in guest entry/exit path. So we set EPCR.DUVD when entering +* and clears EPCR.DUVD when exiting from guest. +* Debug instruction complete is a post-completion debug +* exception but debug event gets posted on the basis of MSR +* before the instruction is executed. Now if the instruction +* switches the context from guest mode (MSR.GS = 1) to hypervisor +* mode (MSR.GS = 0) then the xSRR0 points to first instruction of Can't we just execute that code path with MSR.DE=0? Single stepping uses DBCR0.IC (instruction complete). Can you describe how MSR.DE = 0 will work? Alex +* KVM handler and xSRR1 points that MSR.GS is clear +* (hypervisor context). Now as xSRR1.GS is used to decide whether +* KVM handler will be invoked to handle the exception or host +* host kernel debug handler will be invoked to handle the exception. +* This leads to host kernel debug handler handling the exception +* which should either be handled by KVM. +*/ + mfspr r10, SPRN_EPCR + andis. r10,r10,SPRN_EPCR_DUVD@h + beq+2f + + andis. r10,r9,MSR_GS@h + beq+3f +2: +#endif LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e) cmpld cr0,r10,r14 @@ -523,7 +550,7 @@ kernel_dbg_exc: blt+cr0,1f bge+cr1,1f - /* here it looks like we got an inappropriate debug exception. */ +3: /* here it looks like we got an inappropriate debug exception. */ lis r14,DBSR_IC@h /* clear the IC event */ rlwinm r11,r11,0,~MSR_DE /* clear DE in the DSRR1 value */ mtspr SPRN_DBSR,r14 diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h index 5f051ee..edc6a3b 100644 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -285,7 +285,33 @@ label: mfspr r10,SPRN_DBSR; /* check single-step/branch taken */ \ andis. r10,r10,(DBSR_IC|DBSR_BT)@h; \ beq+2f; \ +#ifdef CONFIG_KVM_BOOKE_HV \ + /*\ +* EPCR.DUVD controls whether the debug events can come in\ +* hypervisor mode or not. When KVM guest is using the debug \ +* resource then we do not want debug events to be captured \ +* in guest entry/exit path. So we set EPCR.DUVD when entering\ +* and clears
Clearing DBSR and DBCR0 in host handler.
Hi All, The kernel exception handling code for 32 bit (transfer_to_handler in entry_32.S) clear DBSR and load DBCR0 with 0 (global_dbcr0 variable, which is zero) if user space used debug (DBCR0.IDM set). But I do not same (clearing DBCR0 and DBSR) in 64bit exception handler. Is this an issue or I am missing something? Thanks -Bharat ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: Clearing DBSR and DBCR0 in host handler.
-Original Message- From: Kumar Gala [mailto:ga...@kernel.crashing.org] Sent: Wednesday, April 03, 2013 9:41 PM To: Bhushan Bharat-R65777 Cc: linuxppc-dev@lists.ozlabs.org; Benjamin Herrenschmidt; Alexander Graf; Wood Scott-B07421 Subject: Re: Clearing DBSR and DBCR0 in host handler. On Apr 3, 2013, at 10:24 AM, Bhushan Bharat-R65777 wrote: Hi All, The kernel exception handling code for 32 bit (transfer_to_handler in entry_32.S) clear DBSR and load DBCR0 with 0 (global_dbcr0 variable, which is zero) if user space used debug (DBCR0.IDM set). But I do not same (clearing DBCR0 and DBSR) in 64bit exception handler. Is this an issue or I am missing something? Thanks -Bharat Are you having a problem with debug w/the 64-bit kernel? No not any issue, I was looking into code where it saves/restores of debug register. I observed the above said inconsistency in 32 bit and 64 bit. The 32-bit kernel supports several kernel level debug features that the 64-bit doesn't support. I am talking about the a user process debugging: - A user process is under debugging using gdb. So the h/w debug register will have thread context. - An interrupt/exception happens in user process. - Now on 32 bit we clear the DBSR (pending events) and DBCR0 (so no new events get captured). But on 64bit we do not follow same, Why it is so? Are we doing something extra on 32 bit or something is missing on 64 bit? Can it happen that on 64 bit some more debug events get captured and debug interrupts get fired if MSR.DE is set, which is undesired. Or I am missing something here ? Thanks -Bharat So if you are having an issue that might be more helpful to convey that just asking about exception code path. - k ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
using request_irq_percpu()
Hi All, request_irq_percpu() is defined in kernel/irq/manage.c, this takes a percpu pointer which will be unique based upon on which cpu the handler executes. So, it looks like we can use this to have multiple bottom half interrupt handler executing at same time on different CPU and each can handle this independently. Flow will be like: -- Interrupt occurs on CPU1 - handler save some context for bottom half and then clears the interrupt condition, and return (in between the interrupt affinity will be moved to next CPU in round robin fashion). -- CPU 1 executing its bottom half. -- Again interrupt occurs, which will come on CPU 2 -- CPU 2 handler similar to CPU1 and so on. This way multiple similar bottom half can run at same time on different CPU Thanks -Bharat ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: BOOKE KVM calling load_up_fpu from C?
-Original Message- From: Michael Neuling [mailto:mi...@neuling.org] Sent: Tuesday, February 12, 2013 9:46 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org Subject: Re: BOOKE KVM calling load_up_fpu from C? Bhushan Bharat-R65777 r65...@freescale.com wrote: -Original Message- From: Michael Neuling [mailto:mi...@neuling.org] Sent: Tuesday, February 12, 2013 9:16 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org Subject: Re: BOOKE KVM calling load_up_fpu from C? Bhushan Bharat-R65777 r65...@freescale.com wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf bounces+Of Michael Neuling Sent: Tuesday, February 12, 2013 8:59 AM To: Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: BOOKE KVM calling load_up_fpu from C? Scott, I was looking at changing how load_up_fpu works and I found this in arch/powerpc/kvm/booke.h: static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu) { #ifdef CONFIG_PPC_FPU if (vcpu-fpu_active !(current-thread.regs-msr MSR_FP)) { load_up_fpu(); current-thread.regs-msr |= MSR_FP; } #endif } I'm wondering how this is suppose to work since load_up_fpu is suppose to have MSR in R12? Is not the load_up_fpu() does mfmsr: _GLOBAL(load_up_fpu) mfmsr r5 ori r5,r5,MSR_FP #ifdef CONFIG_VSX BEGIN_FTR_SECTION orisr5,r5,MSR_VSX@h END_FTR_SECTION_IFSET(CPU_FTR_VSX) #endif SYNC MTMSRD(r5) /* enable use of fpu now */ isync snip Look further down... #ifdef CONFIG_PPC32 mfspr r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) */ lwz r4,THREAD_FPEXC_MODE(r5) ori r9,r9,MSR_FP/* enable FP for current */ or r9,r9,r4 #else ld r4,PACACURRENT(r13) addir5,r4,THREAD/* Get THREAD */ lwz r4,THREAD_FPEXC_MODE(r5) ori r12,r12,MSR_FP or r12,r12,r4 std r12,_MSR(r1) #endif R12 is loaded with SRR1 in the exception prolog before load_up_fpu is called. Yes it is SRR1 not MSR. Yes, SRR1 == the MSR of the user process, not the current MSR. Also on 32bit it looks like that R9 is assumed to have SRR1. Yep that too. So any idea how it's suppose to work or is it broken? To me this looks wrong. And this seems to works because the thread-reg-msr is not actually used to write SRR1 (and eventually the thread MSR) when doing rfi to enter guest. Infact Guest(shadow_msr) MSR is used as SRR1 and which will have proper MSR (including FP set). But Yes, Scott is right person to comment, So let us wait for him comment. Thanks -Bharat ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH v2] net: fec_mpc52xx: Read MAC address from device-tree
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Stefan Roese Sent: Tuesday, February 12, 2013 2:38 PM To: net...@vger.kernel.org Cc: linuxppc-...@ozlabs.org; Anatolij Gustschin Subject: [PATCH v2] net: fec_mpc52xx: Read MAC address from device-tree Until now, the MPC5200 FEC ethernet driver relied upon the bootloader (U-Boot) to write the MAC address into the ethernet controller registers. The Linux driver should not rely on such a thing. So lets read the MAC address from the DT as it should be done here. The following priority is now used to read the MAC address: 1) First, try OF node MAC address, if not present or invalid, then: 2) Read from MAC address registers, if invalid, then: Why we read from MAC registers if Linux should not rely on bootloader? -Bharat 3) Log a warning message, and choose a random MAC address. This fixes a problem with a MPC5200 board that uses the SPL U-Boot version without FEC initialization before Linux booting for boot speedup. Additionally a status line is now be printed upon successful driver probing, also displaying this MAC address. Signed-off-by: Stefan Roese s...@denx.de Cc: Anatolij Gustschin ag...@denx.de --- v2: - Remove module parameter mpc52xx_fec_mac_addr - Priority for MAC address probing now is DT, controller regs If the resulting MAC address is invalid, a random address will be generated and used with a warning message - Use np variable to simplify the code drivers/net/ethernet/freescale/fec_mpc52xx.c | 61 +--- 1 file changed, 37 insertions(+), 24 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_mpc52xx.c b/drivers/net/ethernet/freescale/fec_mpc52xx.c index 817d081..8b725f4 100644 --- a/drivers/net/ethernet/freescale/fec_mpc52xx.c +++ b/drivers/net/ethernet/freescale/fec_mpc52xx.c @@ -76,10 +76,6 @@ static void mpc52xx_fec_stop(struct net_device *dev); static void mpc52xx_fec_start(struct net_device *dev); static void mpc52xx_fec_reset(struct net_device *dev); -static u8 mpc52xx_fec_mac_addr[6]; -module_param_array_named(mac, mpc52xx_fec_mac_addr, byte, NULL, 0); - MODULE_PARM_DESC(mac, six hex digits, ie. 0x1,0x2,0xc0,0x01,0xba,0xbe); - #define MPC52xx_MESSAGES_DEFAULT ( NETIF_MSG_DRV | NETIF_MSG_PROBE | \ NETIF_MSG_LINK | NETIF_MSG_IFDOWN | NETIF_MSG_IFUP) static int debug = -1; /* the above default */ @@ -110,15 +106,6 @@ static void mpc52xx_fec_set_paddr(struct net_device *dev, u8 *mac) out_be32(fec-paddr2, (*(u16 *)(mac[4]) 16) | FEC_PADDR2_TYPE); } -static void mpc52xx_fec_get_paddr(struct net_device *dev, u8 *mac) -{ - struct mpc52xx_fec_priv *priv = netdev_priv(dev); - struct mpc52xx_fec __iomem *fec = priv-fec; - - *(u32 *)(mac[0]) = in_be32(fec-paddr1); - *(u16 *)(mac[4]) = in_be32(fec-paddr2) 16; -} - static int mpc52xx_fec_set_mac_address(struct net_device *dev, void *addr) { struct sockaddr *sock = addr; @@ -853,6 +840,8 @@ static int mpc52xx_fec_probe(struct platform_device *op) struct resource mem; const u32 *prop; int prop_size; + struct device_node *np = op-dev.of_node; + const void *p; phys_addr_t rx_fifo; phys_addr_t tx_fifo; @@ -866,7 +855,7 @@ static int mpc52xx_fec_probe(struct platform_device *op) priv-ndev = ndev; /* Reserve FEC control zone */ - rv = of_address_to_resource(op-dev.of_node, 0, mem); + rv = of_address_to_resource(np, 0, mem); if (rv) { printk(KERN_ERR DRIVER_NAME : Error while parsing device node resource\n ); @@ - 919,7 +908,7 @@ static int mpc52xx_fec_probe(struct platform_device *op) /* Get the IRQ we need one by one */ /* Control */ - ndev-irq = irq_of_parse_and_map(op-dev.of_node, 0); + ndev-irq = irq_of_parse_and_map(np, 0); /* RX */ priv-r_irq = bcom_get_task_irq(priv-rx_dmatsk); @@ -927,11 +916,33 @@ static int mpc52xx_fec_probe(struct platform_device *op) /* TX */ priv-t_irq = bcom_get_task_irq(priv-tx_dmatsk); - /* MAC address init */ - if (!is_zero_ether_addr(mpc52xx_fec_mac_addr)) - memcpy(ndev-dev_addr, mpc52xx_fec_mac_addr, 6); - else - mpc52xx_fec_get_paddr(ndev, ndev-dev_addr); + /* + * MAC address init: + * + * First try to read MAC address from DT + */ + p = of_get_property(np, local-mac-address, NULL); + if (p != NULL) { + memcpy(ndev-dev_addr, p, 6); + } else { + struct mpc52xx_fec __iomem *fec = priv-fec; + + /* + * If the MAC addresse is not provided via DT then read + * it back from the controller regs + */ + *(u32 *)(ndev-dev_addr[0])
RE: [PATCH v2] net: fec_mpc52xx: Read MAC address from device-tree
-Original Message- From: Stefan Roese [mailto:s...@denx.de] Sent: Tuesday, February 12, 2013 4:34 PM To: Bhushan Bharat-R65777 Cc: net...@vger.kernel.org; linuxppc-...@ozlabs.org; Anatolij Gustschin; David S. Miller Subject: Re: [PATCH v2] net: fec_mpc52xx: Read MAC address from device-tree On 12.02.2013 11:56, Bhushan Bharat-R65777 wrote: Until now, the MPC5200 FEC ethernet driver relied upon the bootloader (U-Boot) to write the MAC address into the ethernet controller registers. The Linux driver should not rely on such a thing. So lets read the MAC address from the DT as it should be done here. The following priority is now used to read the MAC address: 1) First, try OF node MAC address, if not present or invalid, then: 2) Read from MAC address registers, if invalid, then: Why we read from MAC registers if Linux should not rely on bootloader? It was suggested by David. Backwards compatibility. Here Davids comment to my original patch which removed this register reading completely: I don't think this is a conservative enough change. You have to keep the MAC register reading code around, as a backup code path in case the OF device node lacks a MAC address Ok, But this is really a backward compatibility or hiding some bug? My thought is that if DT does not have a valid MAC address then it is a BUG and should be fixed. Is not it? -Bharat Thanks, Stefan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: BOOKE KVM calling load_up_fpu from C?
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, February 13, 2013 12:03 AM To: Bhushan Bharat-R65777 Cc: Michael Neuling; Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org Subject: Re: BOOKE KVM calling load_up_fpu from C? On 02/12/2013 03:01:07 AM, Bhushan Bharat-R65777 wrote: -Original Message- From: Michael Neuling [mailto:mi...@neuling.org] Sent: Tuesday, February 12, 2013 9:46 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org Subject: Re: BOOKE KVM calling load_up_fpu from C? Bhushan Bharat-R65777 r65...@freescale.com wrote: -Original Message- From: Michael Neuling [mailto:mi...@neuling.org] Sent: Tuesday, February 12, 2013 9:16 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org Subject: Re: BOOKE KVM calling load_up_fpu from C? Look further down... #ifdef CONFIG_PPC32 mfspr r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) */ lwz r4,THREAD_FPEXC_MODE(r5) ori r9,r9,MSR_FP/* enable FP for current */ or r9,r9,r4 #else ld r4,PACACURRENT(r13) addir5,r4,THREAD/* Get THREAD */ lwz r4,THREAD_FPEXC_MODE(r5) ori r12,r12,MSR_FP or r12,r12,r4 std r12,_MSR(r1) #endif R12 is loaded with SRR1 in the exception prolog before load_up_fpu is called. Yes it is SRR1 not MSR. Yes, SRR1 == the MSR of the user process, not the current MSR. Also on 32bit it looks like that R9 is assumed to have SRR1. Yep that too. So any idea how it's suppose to work or is it broken? To me this looks wrong. And this seems to works because the thread-reg-msr is not actually used to write SRR1 (and eventually the thread MSR) when doing rfi to enter guest. Infact Guest(shadow_msr) MSR is used as SRR1 and which will have proper MSR (including FP set). But Yes, Scott is right person to comment, So let us wait for him comment. I don't think it's actually a problem on 32-bit, since r9 is modified but never actually used for anything. Is not the epilog loads srr1 in r9 and load_up_fpu() changes r9 and then r9 is written back in srr1 ? On 64-bit, though, there's a store to the caller's stack frame (yuck) which the kvm/booke.h caller is not prepared for. So if caller is using r12 then it can lead to come corruption, right ? Indeed, book3s's kvmppc_load_up_fpu creates an interrupt-like stack frame, but does not load r9 or r12. It would be really nice if assumptions like these were put in a code comment above load_up_fpu... and if we didn't have so many random differences between 32-bit and 64-bit. :-P :) Thanks -Bharat -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: BOOKE KVM calling load_up_fpu from C?
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, February 13, 2013 6:53 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; Michael Neuling; linuxppc-dev@lists.ozlabs.org Subject: Re: BOOKE KVM calling load_up_fpu from C? On 02/12/2013 07:18:14 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Wednesday, February 13, 2013 12:03 AM To: Bhushan Bharat-R65777 Cc: Michael Neuling; Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org Subject: Re: BOOKE KVM calling load_up_fpu from C? On 02/12/2013 03:01:07 AM, Bhushan Bharat-R65777 wrote: To me this looks wrong. And this seems to works because the thread-reg-msr is not actually used to write SRR1 (and eventually the thread MSR) when doing rfi to enter guest. Infact Guest(shadow_msr) MSR is used as SRR1 and which will have proper MSR (including FP set). But Yes, Scott is right person to comment, So let us wait for him comment. I don't think it's actually a problem on 32-bit, since r9 is modified but never actually used for anything. Is not the epilog loads srr1 in r9 and load_up_fpu() changes r9 and then r9 is written back in srr1 ? What epilog? We're talking about the case where it's called from C code When it's called from an exception handler, then r9 is used, but in that case it's also initialized before calling load_up_fpu, by the prolog. Agree. Was just confirming the exception handler case. On 64-bit, though, there's a store to the caller's stack frame (yuck) which the kvm/booke.h caller is not prepared for. So if caller is using r12 then it can lead to come corruption, right ? No, r12 is a volatile register in the ABI, as is r9. The issue is that the stack can be corrupted. Ok, Thanks -Bharat -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: BOOKE KVM calling load_up_fpu from C?
-Original Message- From: Wood Scott-B07421 Sent: Wednesday, February 13, 2013 6:53 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; Michael Neuling; linuxppc-dev@lists.ozlabs.org Subject: Re: BOOKE KVM calling load_up_fpu from C? On 02/12/2013 07:18:14 PM, Bhushan Bharat-R65777 wrote: -Original Message- From: Wood Scott-B07421 Sent: Wednesday, February 13, 2013 12:03 AM To: Bhushan Bharat-R65777 Cc: Michael Neuling; Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org Subject: Re: BOOKE KVM calling load_up_fpu from C? On 02/12/2013 03:01:07 AM, Bhushan Bharat-R65777 wrote: To me this looks wrong. And this seems to works because the thread-reg-msr is not actually used to write SRR1 (and eventually the thread MSR) when doing rfi to enter guest. Infact Guest(shadow_msr) MSR is used as SRR1 and which will have proper MSR (including FP set). But Yes, Scott is right person to comment, So let us wait for him comment. I don't think it's actually a problem on 32-bit, since r9 is modified but never actually used for anything. Is not the epilog loads srr1 in r9 and load_up_fpu() changes r9 and then r9 is written back in srr1 ? What epilog? We're talking about the case where it's called from C code. When it's called from an exception handler, then r9 is used, but in that case it's also initialized before calling load_up_fpu, by the prolog. On 64-bit, though, there's a store to the caller's stack frame (yuck) which the kvm/booke.h caller is not prepared for. So if caller is using r12 then it can lead to come corruption, right ? No, r12 is a volatile register in the ABI, as is r9. The issue is that the stack can be corrupted. What do you mean by stack is corrupted? My understanding is that when calling the assembly function from C function then stack frame will not be pushed and assembly function uses the caller stack frame. Example function1() calls function2() which calls assembly_routine() functio1() |-| | Stack Frame 1 | | function1 caller | | registers etc | |-| Calls function 2 |--| | Stack Frame 2| | function1 registers | | etc | |--| | Stack Frame 1| | function1 caller| | registers etc | |--| calls assembly_routine(); Now no stack frame push; And the assembly_routine() changes register values saved in stack. So when stack will be unrolled then registers of function1() will get corrupted, right? Thanks -Bharat ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: BOOKE KVM calling load_up_fpu from C?
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Michael Neuling Sent: Tuesday, February 12, 2013 8:59 AM To: Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: BOOKE KVM calling load_up_fpu from C? Scott, I was looking at changing how load_up_fpu works and I found this in arch/powerpc/kvm/booke.h: static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu) { #ifdef CONFIG_PPC_FPU if (vcpu-fpu_active !(current-thread.regs-msr MSR_FP)) { load_up_fpu(); current-thread.regs-msr |= MSR_FP; } #endif } I'm wondering how this is suppose to work since load_up_fpu is suppose to have MSR in R12? Is not the load_up_fpu() does mfmsr: _GLOBAL(load_up_fpu) mfmsr r5 ori r5,r5,MSR_FP #ifdef CONFIG_VSX BEGIN_FTR_SECTION orisr5,r5,MSR_VSX@h END_FTR_SECTION_IFSET(CPU_FTR_VSX) #endif SYNC MTMSRD(r5) /* enable use of fpu now */ isync snip -Bharat Mikey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: BOOKE KVM calling load_up_fpu from C?
-Original Message- From: Michael Neuling [mailto:mi...@neuling.org] Sent: Tuesday, February 12, 2013 9:16 AM To: Bhushan Bharat-R65777 Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org Subject: Re: BOOKE KVM calling load_up_fpu from C? Bhushan Bharat-R65777 r65...@freescale.com wrote: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of bounces+Michael Neuling Sent: Tuesday, February 12, 2013 8:59 AM To: Wood Scott-B07421 Cc: linuxppc-dev@lists.ozlabs.org Subject: BOOKE KVM calling load_up_fpu from C? Scott, I was looking at changing how load_up_fpu works and I found this in arch/powerpc/kvm/booke.h: static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu) { #ifdef CONFIG_PPC_FPU if (vcpu-fpu_active !(current-thread.regs-msr MSR_FP)) { load_up_fpu(); current-thread.regs-msr |= MSR_FP; } #endif } I'm wondering how this is suppose to work since load_up_fpu is suppose to have MSR in R12? Is not the load_up_fpu() does mfmsr: _GLOBAL(load_up_fpu) mfmsr r5 ori r5,r5,MSR_FP #ifdef CONFIG_VSX BEGIN_FTR_SECTION orisr5,r5,MSR_VSX@h END_FTR_SECTION_IFSET(CPU_FTR_VSX) #endif SYNC MTMSRD(r5) /* enable use of fpu now */ isync snip Look further down... #ifdef CONFIG_PPC32 mfspr r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) */ lwz r4,THREAD_FPEXC_MODE(r5) ori r9,r9,MSR_FP/* enable FP for current */ or r9,r9,r4 #else ld r4,PACACURRENT(r13) addir5,r4,THREAD/* Get THREAD */ lwz r4,THREAD_FPEXC_MODE(r5) ori r12,r12,MSR_FP or r12,r12,r4 std r12,_MSR(r1) #endif R12 is loaded with SRR1 in the exception prolog before load_up_fpu is called. Yes it is SRR1 not MSR. Also on 32bit it looks like that R9 is assumed to have SRR1. -Bharat It's the MSR of the user process, not the current MSR. Mikey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] using get/put_user64 apis on 64bit machine
-Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Monday, September 10, 2012 10:12 AM To: Bhushan Bharat-R65777 Cc: linuxppc-dev@lists.ozlabs.org; ga...@kernel.crashing.org; ag...@suse.de; Wood Scott-B07421; Bhushan Bharat-R65777 Subject: Re: [PATCH] using get/put_user64 apis on 64bit machine On Mon, 2012-07-23 at 15:46 +0530, Bharat Bhushan wrote: On powerpc64 machine get/put_user64() is same as get/put_user() while on powerpc32 machine get_user64 is different. With this patch we can use get_user64() and put_user64() on 32 and 64 bit machines. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- There appear to be no user of any of these APIs in the tree. There's also no get_user64 - __get_user64() macros either. Why not just remove the whole lot ? When I sent the patch I did not search the users, I agree that we can remove the __get_user64 and __put_user64 altogether. Thanks -Bharat Cheers, Ben. arch/powerpc/include/asm/uaccess.h |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 17bb40c..01743aa 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -114,10 +114,17 @@ struct exception_table_entry { #define __put_user(x, ptr) \ __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr))) +/* + * On powerpc64 machine get/put_user64() is same as get/put_user() +while + * on powerpc32 machine get_user64 is different. + */ #ifndef __powerpc64__ #define __get_user64(x, ptr) \ __get_user64_nocheck((x), (ptr), sizeof(*(ptr))) #define __put_user64(x, ptr) __put_user(x, ptr) +#else +#define __get_user64(x, ptr) __get_user(x, ptr) #define +__put_user64(x, ptr) __put_user(x, ptr) #endif #define __get_user_inatomic(x, ptr) \ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] powerpc/mm: add ZONE_NORMAL zone for 64 bit kernel
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Benjamin Herrenschmidt Sent: Tuesday, July 24, 2012 10:16 AM To: Tabi Timur-B04825 Cc: Wood Scott-B07421; Hu Mingkai-B21284; linuxppc-dev@lists.ozlabs.org; Xie Shaohui-B21989; Chen Yuanquan-B41889 Subject: Re: [PATCH] powerpc/mm: add ZONE_NORMAL zone for 64 bit kernel On Tue, 2012-07-24 at 04:04 +, Tabi Timur-B04825 wrote: Benjamin Herrenschmidt wrote: Sure but I don't want to create the zones in the first place (and thus introduce the added pressure on the memory management) on machines that don't need it. One thing that does confuse me -- by default, we don't create a ZONE_NORMAL. We only create a ZONE_DMA. Why is that? Shouldn't it be the other way around? Because ZONE_NORMAL allocations can be serviced from the ZONE_DMA while the other way isn't possible. Say, if we have defined only one zone (ZONE_DMA) to which we give all memory ( 4G). Device set the DMA_MASK to 4G or less. dma_alloc_coherent() will set GFP_DMA flag, But that is of no use, because the memory allocator have only one zone which have all memory (which assumes all dma-able). And can return memory at address at 4G. which will crash !! I think we have to have at least one zone which gives memory to be dma-able for all devices (memory limit should be set by platform, because different platform have different devices with different limits.). And another ( 1 or more) will cover rest of memory. Thanks -Bharat Especially in the old days, there were quite a few cases of drivers and/or subsystems who were a bit heavy handed at using ZONE_DMA, so not having one would essentially make them not work at all. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH][v2] powerpc/watchdog: move booke watchdog param related code to setup-common.c
ACK: -Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Shaohui Xie Sent: Wednesday, July 11, 2012 3:26 PM To: linux-watch...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Cc: Xie Shaohui-B21989 Subject: [PATCH][v2] powerpc/watchdog: move booke watchdog param related code to setup-common.c Currently, BOOKE watchdog code for checking wdt and wdt_period is in setup_32.c, it cannot be used in 64-bit, so move it to a common place setup- common.c, which will be shared by 32-bit and 64-bit. Also, replace the simple_strtoul with kstrtol. Signed-off-by: Shaohui Xie shaohui@freescale.com --- changes for v2: use setup-common.c instead of prom.c arch/powerpc/kernel/setup-common.c | 27 +++ arch/powerpc/kernel/setup_32.c | 24 2 files changed, 27 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup- common.c index afd4f05..bdc499c 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -720,6 +720,33 @@ static int powerpc_debugfs_init(void) arch_initcall(powerpc_debugfs_init); #endif +#ifdef CONFIG_BOOKE_WDT +extern u32 booke_wdt_enabled; +extern u32 booke_wdt_period; + +/* Checks wdt=x and wdt_period=xx command-line option */ notrace int +__init early_parse_wdt(char *p) { + if (p strncmp(p, 0, 1) != 0) + booke_wdt_enabled = 1; + + return 0; +} +early_param(wdt, early_parse_wdt); + +int __init early_parse_wdt_period(char *p) { + unsigned long ret; + if (p) { + if (!kstrtol(p, 0, ret)) + booke_wdt_period = ret; + } + + return 0; +} +early_param(wdt_period, early_parse_wdt_period); +#endif /* CONFIG_BOOKE_WDT */ + void ppc_printk_progress(char *s, unsigned short hex) { pr_info(%s\n, s); diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c index ec8a53f..a8f54ec 100644 --- a/arch/powerpc/kernel/setup_32.c +++ b/arch/powerpc/kernel/setup_32.c @@ -149,30 +149,6 @@ notrace void __init machine_init(u64 dt_ptr) ppc_md.progress(id mach(): done, 0x200); } -#ifdef CONFIG_BOOKE_WDT -extern u32 booke_wdt_enabled; -extern u32 booke_wdt_period; - -/* Checks wdt=x and wdt_period=xx command-line option */ -notrace int __init early_parse_wdt(char *p) -{ - if (p strncmp(p, 0, 1) != 0) -booke_wdt_enabled = 1; - - return 0; -} -early_param(wdt, early_parse_wdt); - -int __init early_parse_wdt_period (char *p) -{ - if (p) - booke_wdt_period = simple_strtoul(p, NULL, 0); - - return 0; -} -early_param(wdt_period, early_parse_wdt_period); -#endif /* CONFIG_BOOKE_WDT */ - /* Checks l2cr= command-line option */ int __init ppc_setup_l2cr(char *str) { -- 1.6.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 0/6] Description for PCI patches using platform driver
Hello Ben, Kumar, others Please provide your comments/thoughts on this ? Thanks -Bharat -Original Message- From: Jia Hongtao-B38951 Sent: Friday, June 08, 2012 3:12 PM To: linuxppc-dev@lists.ozlabs.org; ga...@kernel.crashing.org Cc: Li Yang-R58472; b...@kernel.crashing.org; Wood Scott-B07421; Bhushan Bharat- R65777; Jia Hongtao-B38951 Subject: [PATCH 0/6] Description for PCI patches using platform driver This series of patches are to unify pci initialization code and add PM support for all 85xx/86xx powerpc boards. But two side effects are introduced by this mechanism which listed below: 1. of_platform_bus_probe() will be called twice but in some cases duplication warning occured. We fix this in [PATCH 5/6]. 2. Edac driver failed to register pci nodes as platform devices. We fix this in [PATCH 6/6]. With these patches will not the SWIOTLB will not be initialized even if PCI/PCIe demanded? Thanks -Bharat These patches still have the swiotlb init problem if ppc_swiotlb_enable is only demanded by PCI/PCIe. One of the purposes of sending out these patches is to let us start a discussion for this problem in upstream. Ok, I did not find any mention of that, so I thought that you have resolved the issue by some means in these patches which I did not catch. So, these patches introduces the issue, that SWIOTLB will not be initialized if requested by pci/pcie. The request is raised by setting the flag ppc_swiotlb_enable. The swiotlb_init() will be called in mem_init() if ppc_swiotlb_enable is set. Now with these patches, the request is raised after mem_init() is called. So request not handled :). Following are the solutions we have thought of during our internal discussions (if I did not missed any): 1. These patches move the code from platform init to device init (arch_initcall()). Rather than moving the whole code, let us divide the code into two. First, which is needed to raise the swiotlb init request and second the rest. Define this first as an function in arch/powerpc/sysdev/fsl_pci.c and call this from platform init code of the SOCs. 2. All known devices, the lowest PCIe outbound range starts at 0x8000, but there's nothing above 0xc000. So the inbound of size 0x8000_ is always availbe on all devices. Hardcode the check in platform code to check memblock_end_of_DRAM() to 0x8000. Something like this: diff --git a/arch/powerpc/platforms/85xx/corenet_ds.c b/arch/powerpc/platforms/85xx/corenet_ds.c index 1f7028e..ef4e215 100644 --- a/arch/powerpc/platforms/85xx/corenet_ds.c +++ b/arch/powerpc/platforms/85xx/corenet_ds.c @@ -79,7 +79,7 @@ void __init corenet_ds_setup_arch(void) #endif #ifdef CONFIG_SWIOTLB - if (memblock_end_of_DRAM() 0x) + if (memblock_end_of_DRAM() 0xff00) ppc_swiotlb_enable = 1; #endif pr_info(%s board from Freescale Semiconductor\n, ppc_md.name); - 3. Always do swiotlb_init() in mem_init() and later after PCI init, if the swiotlb is not needed then free it (swiotlb_free()). 4. etc, please provide some other better way. Thanks -Bharat Thanks. In my point of view the 2nd solution is better for it does not treat PCI/PCIe as the special kind of devices from others. -Jia Hongtao. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 0/6] Description for PCI patches using platform driver
-Original Message- From: Jia Hongtao-B38951 Sent: Monday, June 11, 2012 8:03 AM To: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; ga...@kernel.crashing.org Cc: Li Yang-R58472; b...@kernel.crashing.org; Wood Scott-B07421 Subject: RE: [PATCH 0/6] Description for PCI patches using platform driver -Original Message- From: Bhushan Bharat-R65777 Sent: Friday, June 08, 2012 6:47 PM To: Jia Hongtao-B38951; linuxppc-dev@lists.ozlabs.org; ga...@kernel.crashing.org Cc: Li Yang-R58472; b...@kernel.crashing.org; Wood Scott-B07421 Subject: RE: [PATCH 0/6] Description for PCI patches using platform driver -Original Message- From: Jia Hongtao-B38951 Sent: Friday, June 08, 2012 3:12 PM To: linuxppc-dev@lists.ozlabs.org; ga...@kernel.crashing.org Cc: Li Yang-R58472; b...@kernel.crashing.org; Wood Scott-B07421; Bhushan Bharat- R65777; Jia Hongtao-B38951 Subject: [PATCH 0/6] Description for PCI patches using platform driver This series of patches are to unify pci initialization code and add PM support for all 85xx/86xx powerpc boards. But two side effects are introduced by this mechanism which listed below: 1. of_platform_bus_probe() will be called twice but in some cases duplication warning occured. We fix this in [PATCH 5/6]. 2. Edac driver failed to register pci nodes as platform devices. We fix this in [PATCH 6/6]. With these patches will not the SWIOTLB will not be initialized even if PCI/PCIe demanded? Thanks -Bharat These patches still have the swiotlb init problem if ppc_swiotlb_enable is only demanded by PCI/PCIe. One of the purposes of sending out these patches is to let us start a discussion for this problem in upstream. Ok, I did not find any mention of that, so I thought that you have resolved the issue by some means in these patches which I did not catch. So, these patches introduces the issue, that SWIOTLB will not be initialized if requested by pci/pcie. The request is raised by setting the flag ppc_swiotlb_enable. The swiotlb_init() will be called in mem_init() if ppc_swiotlb_enable is set. Now with these patches, the request is raised after mem_init() is called. So request not handled :). Following are the solutions we have thought of during our internal discussions (if I did not missed any): 1. These patches move the code from platform init to device init (arch_initcall()). Rather than moving the whole code, let us divide the code into two. First, which is needed to raise the swiotlb init request and second the rest. Define this first as an function in arch/powerpc/sysdev/fsl_pci.c and call this from platform init code of the SOCs. 2. All known devices, the lowest PCIe outbound range starts at 0x8000, but there's nothing above 0xc000. So the inbound of size 0x8000_ is always availbe on all devices. Hardcode the check in platform code to check memblock_end_of_DRAM() to 0x8000. Something like this: diff --git a/arch/powerpc/platforms/85xx/corenet_ds.c b/arch/powerpc/platforms/85xx/corenet_ds.c index 1f7028e..ef4e215 100644 --- a/arch/powerpc/platforms/85xx/corenet_ds.c +++ b/arch/powerpc/platforms/85xx/corenet_ds.c @@ -79,7 +79,7 @@ void __init corenet_ds_setup_arch(void) #endif #ifdef CONFIG_SWIOTLB - if (memblock_end_of_DRAM() 0x) + if (memblock_end_of_DRAM() 0xff00) ppc_swiotlb_enable = 1; #endif pr_info(%s board from Freescale Semiconductor\n, ppc_md.name); - 3. Always do swiotlb_init() in mem_init() and later after PCI init, if the swiotlb is not needed then free it (swiotlb_free()). 4. etc, please provide some other better way. Thanks -Bharat ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 0/6] Description for PCI patches using platform driver
-Original Message- From: Jia Hongtao-B38951 Sent: Friday, June 08, 2012 3:12 PM To: linuxppc-dev@lists.ozlabs.org; ga...@kernel.crashing.org Cc: Li Yang-R58472; b...@kernel.crashing.org; Wood Scott-B07421; Bhushan Bharat- R65777; Jia Hongtao-B38951 Subject: [PATCH 0/6] Description for PCI patches using platform driver This series of patches are to unify pci initialization code and add PM support for all 85xx/86xx powerpc boards. But two side effects are introduced by this mechanism which listed below: 1. of_platform_bus_probe() will be called twice but in some cases duplication warning occured. We fix this in [PATCH 5/6]. 2. Edac driver failed to register pci nodes as platform devices. We fix this in [PATCH 6/6]. With these patches will not the SWIOTLB will not be initialized even if PCI/PCIe demanded? Thanks -Bharat These patches are against 'next' branch on: http://git.kernel.org/?p=linux/kernel/git/galak/powerpc.git ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH] powerpc: Fix assmption of end_of_DRAM() returns end address
-Original Message- From: David Miller [mailto:da...@davemloft.net] Sent: Wednesday, June 06, 2012 3:51 AM To: b...@kernel.crashing.org Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; linux- ker...@vger.kernel.org; ga...@kernel.crashing.org; Bhushan Bharat-R65777 Subject: Re: [PATCH] powerpc: Fix assmption of end_of_DRAM() returns end address From: Benjamin Herrenschmidt b...@kernel.crashing.org Date: Wed, 06 Jun 2012 08:17:39 +1000 On Tue, 2012-06-05 at 19:25 +0530, Bharat Bhushan wrote: memblock_end_of_DRAM() returns end_address + 1, not end address. While some code assumes that it returns end address. Shouldn't we instead fix it the other way around ? IE, make memblock_end_of_DRAM() does what the name implies, which is to return the last byte of DRAM, and fix the -other- callers not to make bad assumptions ? That was my impression too when I saw this patch. Initially I also intended to do so. I initiated a email on linux-mm@ subject memblock_end_of_DRAM() return end address + 1 and the only response I received from Andrea was: It's normal that end means first byte offset out of the range. End = not ok. end = start+size. This is true for vm_end too. So it's better to keep it that way. My suggestion is to just fix point 1 below and audit the rest :) Thanks -Bharat ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: [PATCH 2/2] powerpc/watchdog: replace CONFIG_FSL_BOOKE with CONFIG_FSL_SOC_BOOKE
-Original Message- From: linuxppc-dev-bounces+bharat.bhushan=freescale@lists.ozlabs.org [mailto:linuxppc-dev-bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Shaohui Xie Sent: Tuesday, May 08, 2012 11:38 AM To: linux-watch...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org Cc: Xie Shaohui-B21989 Subject: [PATCH 2/2] powerpc/watchdog: replace CONFIG_FSL_BOOKE with CONFIG_FSL_SOC_BOOKE CONFIG_FSL_SOC_BOOKE looks like for SOC config option and watchdog is cpu feature. Should not we use PPC_FSL_BOOK3E? Thanks -Bharat ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev