from:"Bhushan Bharat\-R65777"

RE: [PATCH v9 01/13] KVM: PPC: POWERNV: move iommu_add_device earlier

2013-10-29 Thread Bhushan Bharat-R65777

Hi Alex,

Looks like this patch is not picked by anyone, Are you going to pick this patch?
My vfio/iommu patches have dependency on this patch (this is already tested by 
me).

Thanks
-Bharat

 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey
 Kardashevskiy
 Sent: Wednesday, August 28, 2013 2:08 PM
 To: linuxppc-dev@lists.ozlabs.org
 Cc: k...@vger.kernel.org; Gleb Natapov; Alexey Kardashevskiy; Alexander Graf;
 kvm-...@vger.kernel.org; linux-ker...@vger.kernel.org; linux...@kvack.org; 
 Paul
 Mackerras; Paolo Bonzini; David Gibson
 Subject: [PATCH v9 01/13] KVM: PPC: POWERNV: move iommu_add_device earlier
 
 The current implementation of IOMMU on sPAPR does not use iommu_ops and
 therefore does not call IOMMU API's bus_set_iommu() which
 1) sets iommu_ops for a bus
 2) registers a bus notifier
 Instead, PCI devices are added to IOMMU groups from
 subsys_initcall_sync(tce_iommu_init) which does basically the same thing 
 without
 using iommu_ops callbacks.
 
 However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158)
 implements iommu_ops and when tce_iommu_init is called, every PCI device is
 already added to some group so there is a conflict.
 
 This patch does 2 things:
 1. removes the loop in which PCI devices were added to groups and adds 
 explicit
 iommu_add_device() calls to add devices as soon as they get the iommu_table
 pointer assigned to them.
 2. moves a bus notifier to powernv code in order to avoid conflict with the
 notifier from Freescale driver.
 
 iommu_add_device() and iommu_del_device() are public now.
 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
 Changes:
 v8:
 * added the check for iommu_group!=NULL before removing device from a group as
 suggested by Wei Yang weiy...@linux.vnet.ibm.com
 
 v2:
 * added a helper - set_iommu_table_base_and_group - which does
 set_iommu_table_base() and iommu_add_device()
 ---
  arch/powerpc/include/asm/iommu.h|  9 +++
  arch/powerpc/kernel/iommu.c | 41 
 +++--
  arch/powerpc/platforms/powernv/pci-ioda.c   |  8 +++---
  arch/powerpc/platforms/powernv/pci-p5ioc2.c |  2 +-
  arch/powerpc/platforms/powernv/pci.c| 33 ++-
  arch/powerpc/platforms/pseries/iommu.c  |  8 +++---
  6 files changed, 55 insertions(+), 46 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index c34656a..19ad77f 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -103,6 +103,15 @@ extern struct iommu_table *iommu_init_table(struct
 iommu_table * tbl,
   int nid);
  extern void iommu_register_group(struct iommu_table *tbl,
int pci_domain_number, unsigned long pe_num);
 +extern int iommu_add_device(struct device *dev); extern void
 +iommu_del_device(struct device *dev);
 +
 +static inline void set_iommu_table_base_and_group(struct device *dev,
 +   void *base)
 +{
 + set_iommu_table_base(dev, base);
 + iommu_add_device(dev);
 +}
 
  extern int iommu_map_sg(struct device *dev, struct iommu_table *tbl,
   struct scatterlist *sglist, int nelems, diff --git
 a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index
 b20ff17..15f8ca8 100644
 --- a/arch/powerpc/kernel/iommu.c
 +++ b/arch/powerpc/kernel/iommu.c
 @@ -1105,7 +1105,7 @@ void iommu_release_ownership(struct iommu_table *tbl)  }
 EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
 -static int iommu_add_device(struct device *dev)
 +int iommu_add_device(struct device *dev)
  {
   struct iommu_table *tbl;
   int ret = 0;
 @@ -1134,46 +1134,13 @@ static int iommu_add_device(struct device *dev)
 
   return ret;
  }
 +EXPORT_SYMBOL_GPL(iommu_add_device);
 
 -static void iommu_del_device(struct device *dev)
 +void iommu_del_device(struct device *dev)
  {
   iommu_group_remove_device(dev);
  }
 -
 -static int iommu_bus_notifier(struct notifier_block *nb,
 -   unsigned long action, void *data)
 -{
 - struct device *dev = data;
 -
 - switch (action) {
 - case BUS_NOTIFY_ADD_DEVICE:
 - return iommu_add_device(dev);
 - case BUS_NOTIFY_DEL_DEVICE:
 - iommu_del_device(dev);
 - return 0;
 - default:
 - return 0;
 - }
 -}
 -
 -static struct notifier_block tce_iommu_bus_nb = {
 - .notifier_call = iommu_bus_notifier,
 -};
 -
 -static int __init tce_iommu_init(void)
 -{
 - struct pci_dev *pdev = NULL;
 -
 - BUILD_BUG_ON(PAGE_SIZE  IOMMU_PAGE_SIZE);
 -
 - for_each_pci_dev(pdev)
 - iommu_add_device(pdev-dev);
 -
 - bus_register_notifier(pci_bus_type, tce_iommu_bus_nb);
 - return 0;
 -}
 -
 -subsys_initcall_sync(tce_iommu_init);
 +EXPORT_SYMBOL_GPL(iommu_del_device);

RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

2013-10-18 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wang Dongsheng-B40534
 Sent: Friday, October 18, 2013 8:07 AM
 To: Wood Scott-B07421
 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org
 Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec
 idle

  -Original Message-
  From: Wood Scott-B07421
  Sent: Friday, October 18, 2013 12:52 AM
  To: Wang Dongsheng-B40534
  Cc: Bhushan Bharat-R65777; Wood Scott-B07421; linuxppc-
  d...@lists.ozlabs.org
  Subject: Re: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and
  altivec idle

  On Thu, 2013-10-17 at 00:51 -0500, Wang Dongsheng-B40534 wrote:

-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, October 17, 2013 11:20 AM
To: Wang Dongsheng-B40534; Wood Scott-B07421
Cc: linuxppc-dev@lists.ozlabs.org
Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state
and altivec idle

 -Original Message-
 From: Wang Dongsheng-B40534
 Sent: Thursday, October 17, 2013 8:16 AM
 To: Bhushan Bharat-R65777; Wood Scott-B07421
 Cc: linuxppc-dev@lists.ozlabs.org
 Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20
 state and altivec idle

  -Original Message-
  From: Bhushan Bharat-R65777
  Sent: Thursday, October 17, 2013 1:01 AM
  To: Wang Dongsheng-B40534; Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org
  Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20
  state and altivec idle

   -Original Message-
   From: Wang Dongsheng-B40534
   Sent: Tuesday, October 15, 2013 2:51 PM
   To: Wood Scott-B07421
   Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org;
   Wang
  Dongsheng-B40534
   Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20
   state and
  altivec idle

   +static ssize_t show_pw20_wait_time(struct device *dev,
   + struct device_attribute *attr, char 
   *buf) {
   + u32 value;
   + u64 tb_cycle;
   + s64 time;
   +
   + unsigned int cpu = dev-id;
   +
   + if (!pw20_wt) {
   + smp_call_function_single(cpu, do_show_pwrmgtcr0, value,
1);
   + value = (value  PWRMGTCR0_PW20_ENT) 
   + PWRMGTCR0_PW20_ENT_SHIFT;
   +
   + tb_cycle = (1  (MAX_BIT - value)) * 2;

  Is value = 0 and value = 1 legal? These will make tb_cycle =
  0,

   + time = div_u64(tb_cycle * 1000, tb_ticks_per_usec) - 1;

  And time = -1;

 Please look at the end of the function, :)

 return sprintf(buf, %llu\n, time  0 ? time : 0);

I know you return 0 if value = 0/1, my question was that, is this
correct as per specification?

Ahh, also for value upto 7 you will return 0, no?

   If value = 0, MAX_BIT - value = 63
   tb_cycle = 0x_,

  Actually, tb_cycle will be undefined because you shifted a 32-bit
  value
  (1) by more than 31 bits.  s/1/1ULL/

What Scott is saying is the left shift of 1 for more than 31 will be 
undefined.
Scott this will be sign-extended, right?

-Bharat

 Actually, we have been discussing this situation that could not have happened.
 See !pw20_wt branch, this branch is read default wait bit.
 The default wait bit is 50, the time is about 1ms.
 The default wait bit cannot less than 50, means the wait entry time cannot
 greater than 1ms.
 We have already begun benchmark test, and we got a preliminary results.
 55, 56, 57bit looks good, but we need more benchmark to get the default bit.

   if (!pw20_wt) {
   smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1);
   value = (value  PWRMGTCR0_PW20_ENT) 
   PWRMGTCR0_PW20_ENT_SHIFT;

   tb_cycle = (1  (MAX_BIT - value)) * 2;
   time = div_u64(tb_cycle * 1000, tb_ticks_per_usec) - 1;
   } else {
   time = pw20_wt;
   }

 If it caused confusion, we can add a comment. As I discuss with Bharat.

   tb_cycle * 1000 will overflow, but this situation is not possible.
   Because if the value = 0 means this feature will be disable.
   Now The default wait bit is 50(MAX_BIT - value, value = 13), the
   PW20/Altivec Idle wait entry time is about 1ms, this time is very
   long for wait idle time, and it's cannot be increased(means (MAX_BIT
   -
   value) cannot greater than 50).

  Why can it not be increased?

 see above, :)

 -dongsheng
  -Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

2013-10-18 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Wood Scott-B07421
 Sent: Saturday, October 19, 2013 12:52 AM
 To: Wang Dongsheng-B40534
 Cc: Bhushan Bharat-R65777; Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec
 idle
 
 On Thu, 2013-10-17 at 22:02 -0500, Wang Dongsheng-B40534 wrote:
 
   -Original Message-
   From: Bhushan Bharat-R65777
   Sent: Thursday, October 17, 2013 2:46 PM
   To: Wang Dongsheng-B40534; Wood Scott-B07421
   Cc: linuxppc-dev@lists.ozlabs.org
   Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state
   and altivec idle
  
  
  
  -Original Message-
  From: Wang Dongsheng-B40534
  Sent: Thursday, October 17, 2013 11:22 AM
  To: Bhushan Bharat-R65777; Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org
  Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20
  state and altivec idle
 
 
 
   -Original Message-
   From: Bhushan Bharat-R65777
   Sent: Thursday, October 17, 2013 11:20 AM
   To: Wang Dongsheng-B40534; Wood Scott-B07421
   Cc: linuxppc-dev@lists.ozlabs.org
   Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20
   state and altivec idle
  
  
  
-Original Message-
From: Wang Dongsheng-B40534
Sent: Thursday, October 17, 2013 8:16 AM
To: Bhushan Bharat-R65777; Wood Scott-B07421
Cc: linuxppc-dev@lists.ozlabs.org
Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for
pw20 state and altivec idle
   
   
   
 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Thursday, October 17, 2013 1:01 AM
 To: Wang Dongsheng-B40534; Wood Scott-B07421
 Cc: linuxppc-dev@lists.ozlabs.org
 Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for
 pw20 state and altivec idle



  -Original Message-
  From: Wang Dongsheng-B40534
  Sent: Tuesday, October 15, 2013 2:51 PM
  To: Wood Scott-B07421
  Cc: Bhushan Bharat-R65777;
  linuxppc-dev@lists.ozlabs.org; Wang
 Dongsheng-B40534
  Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for
  pw20 state and
 altivec idle
 
  From: Wang Dongsheng dongsheng.w...@freescale.com
 
  Add a sys interface to enable/diable pw20 state or
  altivec idle, and
 control the
  wait entry time.
 
  Enable/Disable interface:
  0, disable. 1, enable.
  /sys/devices/system/cpu/cpuX/pw20_state
  /sys/devices/system/cpu/cpuX/altivec_idle
 
  Set wait time interface:(Nanosecond)
  /sys/devices/system/cpu/cpuX/pw20_wait_time
  /sys/devices/system/cpu/cpuX/altivec_idle_wait_time
  Example: Base on TBfreq is 41MHZ.
  1~48(ns): TB[63]
  49~97(ns): TB[62]
  98~195(ns): TB[61]
  196~390(ns): TB[60]
  391~780(ns): TB[59]
  781~1560(ns): TB[58]
  ...
 
  Signed-off-by: Wang Dongsheng
  dongsheng.w...@freescale.com
  ---
  *v5:
  Change get_idle_ticks_bit function implementation.
 
  *v4:
  Move code from 85xx/common.c to kernel/sysfs.c.
 
  Remove has_pw20_altivec_idle function.
 
  Change wait entry_bit to wait time.
 
  diff --git a/arch/powerpc/kernel/sysfs.c
  b/arch/powerpc/kernel/sysfs.c
 index
  27a90b9..10d1128 100644
  --- a/arch/powerpc/kernel/sysfs.c
  +++ b/arch/powerpc/kernel/sysfs.c
  @@ -85,6 +85,284 @@ __setup(smt-snooze-delay=,
 setup_smt_snooze_delay);
 
   #endif /* CONFIG_PPC64 */
 
  +#ifdef CONFIG_FSL_SOC
  +#define MAX_BIT63
  +
  +static u64 pw20_wt;
  +static u64 altivec_idle_wt;
  +
  +static unsigned int get_idle_ticks_bit(u64 ns) {
  +   u64 cycle;
  +
  +   if (ns = 1)
  +   cycle = div_u64(ns + 500, 1000) *
   tb_ticks_per_usec;
  +   else
  +   cycle = div_u64(ns * tb_ticks_per_usec, 1000);
  +
  +   if (!cycle)
  +   return 0;
  +
  +   return ilog2(cycle); }
  +
  +static void do_show_pwrmgtcr0(void *val) {
  +   u32 *value = val;
  +
  +   *value = mfspr(SPRN_PWRMGTCR0); }
  +
  +static ssize_t show_pw20_state(struct device *dev,
  +   struct device_attribute *attr, 
  char
   *buf) {
  +   u32 value;
  +   unsigned int cpu = dev-id;
  +
  +   smp_call_function_single(cpu, do_show_pwrmgtcr0

RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

2013-10-17 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Wang Dongsheng-B40534
 Sent: Thursday, October 17, 2013 11:22 AM
 To: Bhushan Bharat-R65777; Wood Scott-B07421
 Cc: linuxppc-dev@lists.ozlabs.org
 Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec
 idle
 
 
 
  -Original Message-
  From: Bhushan Bharat-R65777
  Sent: Thursday, October 17, 2013 11:20 AM
  To: Wang Dongsheng-B40534; Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org
  Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and
  altivec idle
 
 
 
   -Original Message-
   From: Wang Dongsheng-B40534
   Sent: Thursday, October 17, 2013 8:16 AM
   To: Bhushan Bharat-R65777; Wood Scott-B07421
   Cc: linuxppc-dev@lists.ozlabs.org
   Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state
   and altivec idle
  
  
  
-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, October 17, 2013 1:01 AM
To: Wang Dongsheng-B40534; Wood Scott-B07421
Cc: linuxppc-dev@lists.ozlabs.org
Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state
and altivec idle
   
   
   
 -Original Message-
 From: Wang Dongsheng-B40534
 Sent: Tuesday, October 15, 2013 2:51 PM
 To: Wood Scott-B07421
 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; Wang
Dongsheng-B40534
 Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state
 and
altivec idle

 From: Wang Dongsheng dongsheng.w...@freescale.com

 Add a sys interface to enable/diable pw20 state or altivec idle,
 and
control the
 wait entry time.

 Enable/Disable interface:
 0, disable. 1, enable.
 /sys/devices/system/cpu/cpuX/pw20_state
 /sys/devices/system/cpu/cpuX/altivec_idle

 Set wait time interface:(Nanosecond)
 /sys/devices/system/cpu/cpuX/pw20_wait_time
 /sys/devices/system/cpu/cpuX/altivec_idle_wait_time
 Example: Base on TBfreq is 41MHZ.
 1~48(ns): TB[63]
 49~97(ns): TB[62]
 98~195(ns): TB[61]
 196~390(ns): TB[60]
 391~780(ns): TB[59]
 781~1560(ns): TB[58]
 ...

 Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com
 ---
 *v5:
 Change get_idle_ticks_bit function implementation.

 *v4:
 Move code from 85xx/common.c to kernel/sysfs.c.

 Remove has_pw20_altivec_idle function.

 Change wait entry_bit to wait time.

 diff --git a/arch/powerpc/kernel/sysfs.c
 b/arch/powerpc/kernel/sysfs.c
index
 27a90b9..10d1128 100644
 --- a/arch/powerpc/kernel/sysfs.c
 +++ b/arch/powerpc/kernel/sysfs.c
 @@ -85,6 +85,284 @@ __setup(smt-snooze-delay=,
setup_smt_snooze_delay);

  #endif /* CONFIG_PPC64 */

 +#ifdef CONFIG_FSL_SOC
 +#define MAX_BIT  63
 +
 +static u64 pw20_wt;
 +static u64 altivec_idle_wt;
 +
 +static unsigned int get_idle_ticks_bit(u64 ns) {
 + u64 cycle;
 +
 + if (ns = 1)
 + cycle = div_u64(ns + 500, 1000) * tb_ticks_per_usec;
 + else
 + cycle = div_u64(ns * tb_ticks_per_usec, 1000);
 +
 + if (!cycle)
 + return 0;
 +
 + return ilog2(cycle);
 +}
 +
 +static void do_show_pwrmgtcr0(void *val) {
 + u32 *value = val;
 +
 + *value = mfspr(SPRN_PWRMGTCR0); }
 +
 +static ssize_t show_pw20_state(struct device *dev,
 + struct device_attribute *attr, char 
 *buf) {
 + u32 value;
 + unsigned int cpu = dev-id;
 +
 + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1);
 +
 + value = PWRMGTCR0_PW20_WAIT;
 +
 + return sprintf(buf, %u\n, value ? 1 : 0); }
 +
 +static void do_store_pw20_state(void *val) {
 + u32 *value = val;
 + u32 pw20_state;
 +
 + pw20_state = mfspr(SPRN_PWRMGTCR0);
 +
 + if (*value)
 + pw20_state |= PWRMGTCR0_PW20_WAIT;
 + else
 + pw20_state = ~PWRMGTCR0_PW20_WAIT;
 +
 + mtspr(SPRN_PWRMGTCR0, pw20_state); }
 +
 +static ssize_t store_pw20_state(struct device *dev,
 + struct device_attribute *attr,
 + const char *buf, size_t count) {
 + u32 value;
 + unsigned int cpu = dev-id;
 +
 + if (kstrtou32(buf, 0, value))
 + return -EINVAL;
 +
 + if (value  1)
 + return -EINVAL;
 +
 + smp_call_function_single(cpu, do_store_pw20_state, value, 1);
 +
 + return count;
 +}
 +
 +static ssize_t show_pw20_wait_time(struct device *dev,
 + struct device_attribute *attr, char 
 *buf) {
 + u32 value;
 + u64 tb_cycle;
 + s64 time;
 +
 + unsigned int

RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

2013-10-17 Thread Bhushan Bharat-R65777



   -Original Message-
   From: Wang Dongsheng-B40534
   Sent: Thursday, October 17, 2013 11:22 AM
   To: Bhushan Bharat-R65777; Wood Scott-B07421
   Cc: linuxppc-dev@lists.ozlabs.org
   Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state
   and altivec idle
  
  
  
-Original Message-
From: Bhushan Bharat-R65777
Sent: Thursday, October 17, 2013 11:20 AM
To: Wang Dongsheng-B40534; Wood Scott-B07421
Cc: linuxppc-dev@lists.ozlabs.org
Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state
and altivec idle
   
   
   
 -Original Message-
 From: Wang Dongsheng-B40534
 Sent: Thursday, October 17, 2013 8:16 AM
 To: Bhushan Bharat-R65777; Wood Scott-B07421
 Cc: linuxppc-dev@lists.ozlabs.org
 Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20
 state and altivec idle



  -Original Message-
  From: Bhushan Bharat-R65777
  Sent: Thursday, October 17, 2013 1:01 AM
  To: Wang Dongsheng-B40534; Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org
  Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20
  state and altivec idle
 
 
 
   -Original Message-
   From: Wang Dongsheng-B40534
   Sent: Tuesday, October 15, 2013 2:51 PM
   To: Wood Scott-B07421
   Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org;
   Wang
  Dongsheng-B40534
   Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20
   state and
  altivec idle
  
   From: Wang Dongsheng dongsheng.w...@freescale.com
  
   Add a sys interface to enable/diable pw20 state or altivec
   idle, and
  control the
   wait entry time.
  
   Enable/Disable interface:
   0, disable. 1, enable.
   /sys/devices/system/cpu/cpuX/pw20_state
   /sys/devices/system/cpu/cpuX/altivec_idle
  
   Set wait time interface:(Nanosecond)
   /sys/devices/system/cpu/cpuX/pw20_wait_time
   /sys/devices/system/cpu/cpuX/altivec_idle_wait_time
   Example: Base on TBfreq is 41MHZ.
   1~48(ns): TB[63]
   49~97(ns): TB[62]
   98~195(ns): TB[61]
   196~390(ns): TB[60]
   391~780(ns): TB[59]
   781~1560(ns): TB[58]
   ...
  
   Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com
   ---
   *v5:
   Change get_idle_ticks_bit function implementation.
  
   *v4:
   Move code from 85xx/common.c to kernel/sysfs.c.
  
   Remove has_pw20_altivec_idle function.
  
   Change wait entry_bit to wait time.
  
   diff --git a/arch/powerpc/kernel/sysfs.c
   b/arch/powerpc/kernel/sysfs.c
  index
   27a90b9..10d1128 100644
   --- a/arch/powerpc/kernel/sysfs.c
   +++ b/arch/powerpc/kernel/sysfs.c
   @@ -85,6 +85,284 @@ __setup(smt-snooze-delay=,
  setup_smt_snooze_delay);
  
#endif /* CONFIG_PPC64 */
  
   +#ifdef CONFIG_FSL_SOC
   +#define MAX_BIT  63
   +
   +static u64 pw20_wt;
   +static u64 altivec_idle_wt;
   +
   +static unsigned int get_idle_ticks_bit(u64 ns) {
   + u64 cycle;
   +
   + if (ns = 1)
   + cycle = div_u64(ns + 500, 1000) * tb_ticks_per_usec;
   + else
   + cycle = div_u64(ns * tb_ticks_per_usec, 1000);
   +
   + if (!cycle)
   + return 0;
   +
   + return ilog2(cycle);
   +}
   +
   +static void do_show_pwrmgtcr0(void *val) {
   + u32 *value = val;
   +
   + *value = mfspr(SPRN_PWRMGTCR0); }
   +
   +static ssize_t show_pw20_state(struct device *dev,
   + struct device_attribute *attr, char 
   *buf) {
   + u32 value;
   + unsigned int cpu = dev-id;
   +
   + smp_call_function_single(cpu, do_show_pwrmgtcr0, value,
   +1);
   +
   + value = PWRMGTCR0_PW20_WAIT;
   +
   + return sprintf(buf, %u\n, value ? 1 : 0); }
   +
   +static void do_store_pw20_state(void *val) {
   + u32 *value = val;
   + u32 pw20_state;
   +
   + pw20_state = mfspr(SPRN_PWRMGTCR0);
   +
   + if (*value)
   + pw20_state |= PWRMGTCR0_PW20_WAIT;
   + else
   + pw20_state = ~PWRMGTCR0_PW20_WAIT;
   +
   + mtspr(SPRN_PWRMGTCR0, pw20_state); }
   +
   +static ssize_t store_pw20_state(struct device *dev,
   + struct device_attribute *attr,
   + const char *buf, size_t count) {
   + u32 value;
   + unsigned int cpu = dev-id;
   +
   + if (kstrtou32(buf, 0, value))
   + return -EINVAL;
   +
   + if (value  1)
   + return -EINVAL;
   +
   + smp_call_function_single(cpu, do_store_pw20_state, value,
   +1);
   +
   + return count;
   +}
   +
   +static ssize_t

RE: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for PCIe devices

2013-10-16 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Sethi Varun-B16395
 Sent: Wednesday, October 16, 2013 4:53 PM
 To: j...@8bytes.org; io...@lists.linux-foundation.org; linuxppc-
 d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; Yoder Stuart-B08248; Wood
 Scott-B07421; alex.william...@redhat.com; Bhushan Bharat-R65777
 Cc: Sethi Varun-B16395
 Subject: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for PCIe devices
 
 Once the PCIe device assigned to a guest VM (via VFIO) gets detached from the
 iommu domain (when guest terminates), its PAMU table entry is disabled. So, 
 this
 would prevent the device from being used once it's assigned back to the host.
 
 This patch allows for creation of a default DMA window corresponding to the
 device and subsequently enabling the PAMU table entry. Before we enable the
 entry, we ensure that the device's bus master capability is disabled (device
 quiesced).
 
 Signed-off-by: Varun Sethi varun.se...@freescale.com
 ---
  drivers/iommu/fsl_pamu.c|   43 
  drivers/iommu/fsl_pamu.h|1 +
  drivers/iommu/fsl_pamu_domain.c |   46 
 ---
  3 files changed, 78 insertions(+), 12 deletions(-)
 
 diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c index
 cba0498..fb4a031 100644
 --- a/drivers/iommu/fsl_pamu.c
 +++ b/drivers/iommu/fsl_pamu.c
 @@ -225,6 +225,21 @@ static struct paace *pamu_get_spaace(struct paace *paace,
 u32 wnum)
   return spaace;
  }
 
 +/*
 + * Defaul PPAACE settings for an LIODN.
 + */
 +static void setup_default_ppaace(struct paace *ppaace) {
 + pamu_init_ppaace(ppaace);
 + /* window size is 2^(WSE+1) bytes */
 + set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE, 35);
 + ppaace-wbah = 0;
 + set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL, 0);
 + set_bf(ppaace-impl_attr, PAACE_IA_ATM,
 + PAACE_ATM_NO_XLATE);
 + set_bf(ppaace-addr_bitfields, PAACE_AF_AP,
 + PAACE_AP_PERMS_ALL);
 +}
  /**
   * pamu_get_fspi_and_allocate() - Allocates fspi index and reserves 
 subwindows
   *required for primary PAACE in the secondary
 @@ -253,6 +268,24 @@ static unsigned long pamu_get_fspi_and_allocate(u32
 subwin_cnt)
   return (spaace_addr - (unsigned long)spaact) / (sizeof(struct paace));  
 }
 
 +/* Reset the PAACE entry to the default state */ void
 +enable_default_dma_window(int liodn) {
 + struct paace *ppaace;
 +
 + ppaace = pamu_get_ppaace(liodn);
 + if (!ppaace) {
 + pr_debug(Invalid liodn entry\n);
 + return;
 + }
 +
 + memset(ppaace, 0, sizeof(struct paace));
 +
 + setup_default_ppaace(ppaace);
 + mb();
 + pamu_enable_liodn(liodn);
 +}
 +
  /* Release the subwindows reserved for a particular LIODN */  void
 pamu_free_subwins(int liodn)  { @@ -752,15 +785,7 @@ static void __init
 setup_liodns(void)
   continue;
   }
   ppaace = pamu_get_ppaace(liodn);
 - pamu_init_ppaace(ppaace);
 - /* window size is 2^(WSE+1) bytes */
 - set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE, 35);
 - ppaace-wbah = 0;
 - set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL, 0);
 - set_bf(ppaace-impl_attr, PAACE_IA_ATM,
 - PAACE_ATM_NO_XLATE);
 - set_bf(ppaace-addr_bitfields, PAACE_AF_AP,
 - PAACE_AP_PERMS_ALL);
 + setup_default_ppaace(ppaace);
   if (of_device_is_compatible(node, fsl,qman-portal))
   setup_qbman_paace(ppaace, QMAN_PORTAL_PAACE);
   if (of_device_is_compatible(node, fsl,qman)) diff 
 --git
 a/drivers/iommu/fsl_pamu.h b/drivers/iommu/fsl_pamu.h index 8fc1a12..0edc
 100644
 --- a/drivers/iommu/fsl_pamu.h
 +++ b/drivers/iommu/fsl_pamu.h
 @@ -406,5 +406,6 @@ void get_ome_index(u32 *omi_index, struct device *dev);  
 int
 pamu_update_paace_stash(int liodn, u32 subwin, u32 value);  int
 pamu_disable_spaace(int liodn, u32 subwin);
  u32 pamu_get_max_subwin_cnt(void);
 +void enable_default_dma_window(int liodn);
 
  #endif  /* __FSL_PAMU_H */
 diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
 index 966ae70..dd6cafc 100644
 --- a/drivers/iommu/fsl_pamu_domain.c
 +++ b/drivers/iommu/fsl_pamu_domain.c
 @@ -340,17 +340,57 @@ static inline struct device_domain_info
 *find_domain(struct device *dev)
   return dev-archdata.iommu_domain;
  }
 
 +/* Disable device DMA capability and enable default DMA window */
 +static void disable_device_dma(struct device_domain_info *info,
 + int enable_dma_window)
 +{
 +#ifdef CONFIG_PCI
 + if (info-dev-bus == pci_bus_type) {
 + struct pci_dev *pdev = NULL;
 + pdev = to_pci_dev(info-dev

RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

2013-10-16 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Wang Dongsheng-B40534
 Sent: Tuesday, October 15, 2013 2:51 PM
 To: Wood Scott-B07421
 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; Wang 
 Dongsheng-B40534
 Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec 
 idle
 
 From: Wang Dongsheng dongsheng.w...@freescale.com
 
 Add a sys interface to enable/diable pw20 state or altivec idle, and control 
 the
 wait entry time.
 
 Enable/Disable interface:
 0, disable. 1, enable.
 /sys/devices/system/cpu/cpuX/pw20_state
 /sys/devices/system/cpu/cpuX/altivec_idle
 
 Set wait time interface:(Nanosecond)
 /sys/devices/system/cpu/cpuX/pw20_wait_time
 /sys/devices/system/cpu/cpuX/altivec_idle_wait_time
 Example: Base on TBfreq is 41MHZ.
 1~48(ns): TB[63]
 49~97(ns): TB[62]
 98~195(ns): TB[61]
 196~390(ns): TB[60]
 391~780(ns): TB[59]
 781~1560(ns): TB[58]
 ...
 
 Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com
 ---
 *v5:
 Change get_idle_ticks_bit function implementation.
 
 *v4:
 Move code from 85xx/common.c to kernel/sysfs.c.
 
 Remove has_pw20_altivec_idle function.
 
 Change wait entry_bit to wait time.
 
 diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c index
 27a90b9..10d1128 100644
 --- a/arch/powerpc/kernel/sysfs.c
 +++ b/arch/powerpc/kernel/sysfs.c
 @@ -85,6 +85,284 @@ __setup(smt-snooze-delay=, setup_smt_snooze_delay);
 
  #endif /* CONFIG_PPC64 */
 
 +#ifdef CONFIG_FSL_SOC
 +#define MAX_BIT  63
 +
 +static u64 pw20_wt;
 +static u64 altivec_idle_wt;
 +
 +static unsigned int get_idle_ticks_bit(u64 ns) {
 + u64 cycle;
 +
 + if (ns = 1)
 + cycle = div_u64(ns + 500, 1000) * tb_ticks_per_usec;
 + else
 + cycle = div_u64(ns * tb_ticks_per_usec, 1000);
 +
 + if (!cycle)
 + return 0;
 +
 + return ilog2(cycle);
 +}
 +
 +static void do_show_pwrmgtcr0(void *val) {
 + u32 *value = val;
 +
 + *value = mfspr(SPRN_PWRMGTCR0);
 +}
 +
 +static ssize_t show_pw20_state(struct device *dev,
 + struct device_attribute *attr, char *buf) {
 + u32 value;
 + unsigned int cpu = dev-id;
 +
 + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1);
 +
 + value = PWRMGTCR0_PW20_WAIT;
 +
 + return sprintf(buf, %u\n, value ? 1 : 0); }
 +
 +static void do_store_pw20_state(void *val) {
 + u32 *value = val;
 + u32 pw20_state;
 +
 + pw20_state = mfspr(SPRN_PWRMGTCR0);
 +
 + if (*value)
 + pw20_state |= PWRMGTCR0_PW20_WAIT;
 + else
 + pw20_state = ~PWRMGTCR0_PW20_WAIT;
 +
 + mtspr(SPRN_PWRMGTCR0, pw20_state);
 +}
 +
 +static ssize_t store_pw20_state(struct device *dev,
 + struct device_attribute *attr,
 + const char *buf, size_t count)
 +{
 + u32 value;
 + unsigned int cpu = dev-id;
 +
 + if (kstrtou32(buf, 0, value))
 + return -EINVAL;
 +
 + if (value  1)
 + return -EINVAL;
 +
 + smp_call_function_single(cpu, do_store_pw20_state, value, 1);
 +
 + return count;
 +}
 +
 +static ssize_t show_pw20_wait_time(struct device *dev,
 + struct device_attribute *attr, char *buf) {
 + u32 value;
 + u64 tb_cycle;
 + s64 time;
 +
 + unsigned int cpu = dev-id;
 +
 + if (!pw20_wt) {
 + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1);
 + value = (value  PWRMGTCR0_PW20_ENT) 
 + PWRMGTCR0_PW20_ENT_SHIFT;
 +
 + tb_cycle = (1  (MAX_BIT - value)) * 2;

Is value = 0 and value = 1 legal? These will make tb_cycle = 0,

 + time = div_u64(tb_cycle * 1000, tb_ticks_per_usec) - 1;

And time = -1;


 + } else {
 + time = pw20_wt;
 + }
 +
 + return sprintf(buf, %llu\n, time  0 ? time : 0);
 }
 +
 +static void set_pw20_wait_entry_bit(void *val) {
 + u32 *value = val;
 + u32 pw20_idle;
 +
 + pw20_idle = mfspr(SPRN_PWRMGTCR0);
 +
 + /* Set Automatic PW20 Core Idle Count */
 + /* clear count */
 + pw20_idle = ~PWRMGTCR0_PW20_ENT;
 +
 + /* set count */
 + pw20_idle |= ((MAX_BIT - *value)  PWRMGTCR0_PW20_ENT_SHIFT);
 +
 + mtspr(SPRN_PWRMGTCR0, pw20_idle);
 +}
 +
 +static ssize_t store_pw20_wait_time(struct device *dev,
 + struct device_attribute *attr,
 + const char *buf, size_t count)
 +{
 + u32 entry_bit;
 + u64 value;
 +
 + unsigned int cpu = dev-id;
 +
 + if (kstrtou64(buf, 0, value))
 + return -EINVAL;
 +
 + if (!value)
 + return -EINVAL;
 +
 + entry_bit = get_idle_ticks_bit(value);
 + if (entry_bit  MAX_BIT)
 + return -EINVAL;
 +
 + pw20_wt = value;
 + smp_call_function_single(cpu, set_pw20_wait_entry_bit,
 + entry_bit, 1);
 +
 + return count;
 +}
 +
 +static ssize_t

RE: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for PCIe devices

2013-10-16 Thread Bhushan Bharat-R65777



 
 
   -Original Message-
   From: Sethi Varun-B16395
   Sent: Wednesday, October 16, 2013 4:53 PM
   To: j...@8bytes.org; io...@lists.linux-foundation.org; linuxppc-
   d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; Yoder
   Stuart-B08248; Wood Scott-B07421; alex.william...@redhat.com;
   Bhushan
   Bharat-R65777
   Cc: Sethi Varun-B16395
   Subject: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for
   PCIe devices
  
   Once the PCIe device assigned to a guest VM (via VFIO) gets detached
   from the iommu domain (when guest terminates), its PAMU table entry
   is disabled. So, this would prevent the device from being used once
   it's
  assigned back to the host.
  
   This patch allows for creation of a default DMA window corresponding
   to the device and subsequently enabling the PAMU table entry. Before
   we enable the entry, we ensure that the device's bus master
   capability is disabled (device quiesced).
  
   Signed-off-by: Varun Sethi varun.se...@freescale.com
   ---
drivers/iommu/fsl_pamu.c|   43 ---
  -
drivers/iommu/fsl_pamu.h|1 +
drivers/iommu/fsl_pamu_domain.c |   46
  ---
3 files changed, 78 insertions(+), 12 deletions(-)
  
   diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
   index
   cba0498..fb4a031 100644
   --- a/drivers/iommu/fsl_pamu.c
   +++ b/drivers/iommu/fsl_pamu.c
   @@ -225,6 +225,21 @@ static struct paace *pamu_get_spaace(struct
   paace *paace,
   u32 wnum)
 return spaace;
}
  
   +/*
   + * Defaul PPAACE settings for an LIODN.
   + */
   +static void setup_default_ppaace(struct paace *ppaace) {
   + pamu_init_ppaace(ppaace);
   + /* window size is 2^(WSE+1) bytes */
   + set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE, 35);
   + ppaace-wbah = 0;
   + set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL, 0);
   + set_bf(ppaace-impl_attr, PAACE_IA_ATM,
   + PAACE_ATM_NO_XLATE);
   + set_bf(ppaace-addr_bitfields, PAACE_AF_AP,
   + PAACE_AP_PERMS_ALL);
   +}
/**
 * pamu_get_fspi_and_allocate() - Allocates fspi index and reserves
  subwindows
 *required for primary PAACE in the
  secondary
   @@ -253,6 +268,24 @@ static unsigned long
   pamu_get_fspi_and_allocate(u32
   subwin_cnt)
 return (spaace_addr - (unsigned long)spaact) / (sizeof(struct
   paace));  }
  
   +/* Reset the PAACE entry to the default state */ void
   +enable_default_dma_window(int liodn) {
   + struct paace *ppaace;
   +
   + ppaace = pamu_get_ppaace(liodn);
   + if (!ppaace) {
   + pr_debug(Invalid liodn entry\n);
   + return;
   + }
   +
   + memset(ppaace, 0, sizeof(struct paace));
   +
   + setup_default_ppaace(ppaace);
   + mb();
   + pamu_enable_liodn(liodn);
   +}
   +
/* Release the subwindows reserved for a particular LIODN */  void
   pamu_free_subwins(int liodn)  { @@ -752,15 +785,7 @@ static void
   __init
   setup_liodns(void)
 continue;
 }
 ppaace = pamu_get_ppaace(liodn);
   - pamu_init_ppaace(ppaace);
   - /* window size is 2^(WSE+1) bytes */
   - set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE, 35);
   - ppaace-wbah = 0;
   - set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL, 0);
   - set_bf(ppaace-impl_attr, PAACE_IA_ATM,
   - PAACE_ATM_NO_XLATE);
   - set_bf(ppaace-addr_bitfields, PAACE_AF_AP,
   - PAACE_AP_PERMS_ALL);
   + setup_default_ppaace(ppaace);
 if (of_device_is_compatible(node, fsl,qman-portal))
 setup_qbman_paace(ppaace, QMAN_PORTAL_PAACE);
 if (of_device_is_compatible(node, fsl,qman)) diff --
  git
   a/drivers/iommu/fsl_pamu.h b/drivers/iommu/fsl_pamu.h index
   8fc1a12..0edc
   100644
   --- a/drivers/iommu/fsl_pamu.h
   +++ b/drivers/iommu/fsl_pamu.h
   @@ -406,5 +406,6 @@ void get_ome_index(u32 *omi_index, struct device
   *dev);  int pamu_update_paace_stash(int liodn, u32 subwin, u32
   value); int pamu_disable_spaace(int liodn, u32 subwin);
u32 pamu_get_max_subwin_cnt(void);
   +void enable_default_dma_window(int liodn);
  
#endif  /* __FSL_PAMU_H */
   diff --git a/drivers/iommu/fsl_pamu_domain.c
   b/drivers/iommu/fsl_pamu_domain.c index 966ae70..dd6cafc 100644
   --- a/drivers/iommu/fsl_pamu_domain.c
   +++ b/drivers/iommu/fsl_pamu_domain.c
   @@ -340,17 +340,57 @@ static inline struct device_domain_info
   *find_domain(struct device *dev)
 return dev-archdata.iommu_domain;  }
  
   +/* Disable device DMA capability and enable default DMA window */
   +static void disable_device_dma(struct device_domain_info *info,
   + int enable_dma_window)
   +{
   +#ifdef CONFIG_PCI
   + if (info-dev-bus == pci_bus_type

RE: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for PCIe devices

2013-10-16 Thread Bhushan Bharat-R65777


 
   
   
 -Original Message-
 From: Sethi Varun-B16395
 Sent: Wednesday, October 16, 2013 4:53 PM
 To: j...@8bytes.org; io...@lists.linux-foundation.org; linuxppc-
 d...@lists.ozlabs.org; linux-ker...@vger.kernel.org; Yoder
 Stuart-B08248; Wood Scott-B07421; alex.william...@redhat.com;
 Bhushan
 Bharat-R65777
 Cc: Sethi Varun-B16395
 Subject: [PATCH 2/3 v2] iommu/fsl: Enable default DMA window for
 PCIe devices

 Once the PCIe device assigned to a guest VM (via VFIO) gets
 detached from the iommu domain (when guest terminates), its PAMU
 table entry is disabled. So, this would prevent the device from
 being used once it's
assigned back to the host.

 This patch allows for creation of a default DMA window
 corresponding to the device and subsequently enabling the PAMU
 table entry. Before we enable the entry, we ensure that the
 device's bus master capability is disabled (device quiesced).

 Signed-off-by: Varun Sethi varun.se...@freescale.com
 ---
  drivers/iommu/fsl_pamu.c|   43
  ---
-
  drivers/iommu/fsl_pamu.h|1 +
  drivers/iommu/fsl_pamu_domain.c |   46
---
  3 files changed, 78 insertions(+), 12 deletions(-)

 diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
 index
 cba0498..fb4a031 100644
 --- a/drivers/iommu/fsl_pamu.c
 +++ b/drivers/iommu/fsl_pamu.c
 @@ -225,6 +225,21 @@ static struct paace *pamu_get_spaace(struct
 paace *paace,
 u32 wnum)
   return spaace;
  }

 +/*
 + * Defaul PPAACE settings for an LIODN.
 + */
 +static void setup_default_ppaace(struct paace *ppaace) {
 + pamu_init_ppaace(ppaace);
 + /* window size is 2^(WSE+1) bytes */
 + set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE, 35);
 + ppaace-wbah = 0;
 + set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL, 0);
 + set_bf(ppaace-impl_attr, PAACE_IA_ATM,
 + PAACE_ATM_NO_XLATE);
 + set_bf(ppaace-addr_bitfields, PAACE_AF_AP,
 + PAACE_AP_PERMS_ALL);
 +}
  /**
   * pamu_get_fspi_and_allocate() - Allocates fspi index and
 reserves
subwindows
   *required for primary PAACE in
  the
secondary
 @@ -253,6 +268,24 @@ static unsigned long
 pamu_get_fspi_and_allocate(u32
 subwin_cnt)
   return (spaace_addr - (unsigned long)spaact) / (sizeof(struct
 paace));  }

 +/* Reset the PAACE entry to the default state */ void
 +enable_default_dma_window(int liodn) {
 + struct paace *ppaace;
 +
 + ppaace = pamu_get_ppaace(liodn);
 + if (!ppaace) {
 + pr_debug(Invalid liodn entry\n);
 + return;
 + }
 +
 + memset(ppaace, 0, sizeof(struct paace));
 +
 + setup_default_ppaace(ppaace);
 + mb();
 + pamu_enable_liodn(liodn);
 +}
 +
  /* Release the subwindows reserved for a particular LIODN */
 void pamu_free_subwins(int liodn)  { @@ -752,15 +785,7 @@ static
 void __init
 setup_liodns(void)
   continue;
   }
   ppaace = pamu_get_ppaace(liodn);
 - pamu_init_ppaace(ppaace);
 - /* window size is 2^(WSE+1) bytes */
 - set_bf(ppaace-addr_bitfields, PPAACE_AF_WSE,
  35);
 - ppaace-wbah = 0;
 - set_bf(ppaace-addr_bitfields, PPAACE_AF_WBAL,
  0);
 - set_bf(ppaace-impl_attr, PAACE_IA_ATM,
 - PAACE_ATM_NO_XLATE);
 - set_bf(ppaace-addr_bitfields, PAACE_AF_AP,
 - PAACE_AP_PERMS_ALL);
 + setup_default_ppaace(ppaace);
   if (of_device_is_compatible(node, fsl,qman-
  portal))
   setup_qbman_paace(ppaace,
  QMAN_PORTAL_PAACE);
   if (of_device_is_compatible(node, fsl,qman))
  diff --
git
 a/drivers/iommu/fsl_pamu.h b/drivers/iommu/fsl_pamu.h index
 8fc1a12..0edc
 100644
 --- a/drivers/iommu/fsl_pamu.h
 +++ b/drivers/iommu/fsl_pamu.h
 @@ -406,5 +406,6 @@ void get_ome_index(u32 *omi_index, struct
 device *dev);  int pamu_update_paace_stash(int liodn, u32
 subwin,
 u32 value); int pamu_disable_spaace(int liodn, u32 subwin);
  u32 pamu_get_max_subwin_cnt(void);
 +void enable_default_dma_window(int liodn);

  #endif  /* __FSL_PAMU_H */
 diff --git a/drivers/iommu/fsl_pamu_domain.c
 b/drivers/iommu/fsl_pamu_domain.c index 966ae70..dd6cafc 100644
 --- a/drivers/iommu/fsl_pamu_domain.c
 +++ b/drivers/iommu

RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

2013-10-16 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Wang Dongsheng-B40534
 Sent: Thursday, October 17, 2013 8:16 AM
 To: Bhushan Bharat-R65777; Wood Scott-B07421
 Cc: linuxppc-dev@lists.ozlabs.org
 Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and altivec
 idle
 
 
 
  -Original Message-
  From: Bhushan Bharat-R65777
  Sent: Thursday, October 17, 2013 1:01 AM
  To: Wang Dongsheng-B40534; Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org
  Subject: RE: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and
  altivec idle
 
 
 
   -Original Message-
   From: Wang Dongsheng-B40534
   Sent: Tuesday, October 15, 2013 2:51 PM
   To: Wood Scott-B07421
   Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; Wang
  Dongsheng-B40534
   Subject: [PATCH v5 4/4] powerpc/85xx: add sysfs for pw20 state and
  altivec idle
  
   From: Wang Dongsheng dongsheng.w...@freescale.com
  
   Add a sys interface to enable/diable pw20 state or altivec idle, and
  control the
   wait entry time.
  
   Enable/Disable interface:
   0, disable. 1, enable.
   /sys/devices/system/cpu/cpuX/pw20_state
   /sys/devices/system/cpu/cpuX/altivec_idle
  
   Set wait time interface:(Nanosecond)
   /sys/devices/system/cpu/cpuX/pw20_wait_time
   /sys/devices/system/cpu/cpuX/altivec_idle_wait_time
   Example: Base on TBfreq is 41MHZ.
   1~48(ns): TB[63]
   49~97(ns): TB[62]
   98~195(ns): TB[61]
   196~390(ns): TB[60]
   391~780(ns): TB[59]
   781~1560(ns): TB[58]
   ...
  
   Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com
   ---
   *v5:
   Change get_idle_ticks_bit function implementation.
  
   *v4:
   Move code from 85xx/common.c to kernel/sysfs.c.
  
   Remove has_pw20_altivec_idle function.
  
   Change wait entry_bit to wait time.
  
   diff --git a/arch/powerpc/kernel/sysfs.c
   b/arch/powerpc/kernel/sysfs.c
  index
   27a90b9..10d1128 100644
   --- a/arch/powerpc/kernel/sysfs.c
   +++ b/arch/powerpc/kernel/sysfs.c
   @@ -85,6 +85,284 @@ __setup(smt-snooze-delay=,
  setup_smt_snooze_delay);
  
#endif /* CONFIG_PPC64 */
  
   +#ifdef CONFIG_FSL_SOC
   +#define MAX_BIT  63
   +
   +static u64 pw20_wt;
   +static u64 altivec_idle_wt;
   +
   +static unsigned int get_idle_ticks_bit(u64 ns) {
   + u64 cycle;
   +
   + if (ns = 1)
   + cycle = div_u64(ns + 500, 1000) * tb_ticks_per_usec;
   + else
   + cycle = div_u64(ns * tb_ticks_per_usec, 1000);
   +
   + if (!cycle)
   + return 0;
   +
   + return ilog2(cycle);
   +}
   +
   +static void do_show_pwrmgtcr0(void *val) {
   + u32 *value = val;
   +
   + *value = mfspr(SPRN_PWRMGTCR0);
   +}
   +
   +static ssize_t show_pw20_state(struct device *dev,
   + struct device_attribute *attr, char *buf) {
   + u32 value;
   + unsigned int cpu = dev-id;
   +
   + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1);
   +
   + value = PWRMGTCR0_PW20_WAIT;
   +
   + return sprintf(buf, %u\n, value ? 1 : 0); }
   +
   +static void do_store_pw20_state(void *val) {
   + u32 *value = val;
   + u32 pw20_state;
   +
   + pw20_state = mfspr(SPRN_PWRMGTCR0);
   +
   + if (*value)
   + pw20_state |= PWRMGTCR0_PW20_WAIT;
   + else
   + pw20_state = ~PWRMGTCR0_PW20_WAIT;
   +
   + mtspr(SPRN_PWRMGTCR0, pw20_state); }
   +
   +static ssize_t store_pw20_state(struct device *dev,
   + struct device_attribute *attr,
   + const char *buf, size_t count)
   +{
   + u32 value;
   + unsigned int cpu = dev-id;
   +
   + if (kstrtou32(buf, 0, value))
   + return -EINVAL;
   +
   + if (value  1)
   + return -EINVAL;
   +
   + smp_call_function_single(cpu, do_store_pw20_state, value, 1);
   +
   + return count;
   +}
   +
   +static ssize_t show_pw20_wait_time(struct device *dev,
   + struct device_attribute *attr, char *buf) {
   + u32 value;
   + u64 tb_cycle;
   + s64 time;
   +
   + unsigned int cpu = dev-id;
   +
   + if (!pw20_wt) {
   + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1);
   + value = (value  PWRMGTCR0_PW20_ENT) 
   + PWRMGTCR0_PW20_ENT_SHIFT;
   +
   + tb_cycle = (1  (MAX_BIT - value)) * 2;
 
  Is value = 0 and value = 1 legal? These will make tb_cycle = 0,
 
   + time = div_u64(tb_cycle * 1000, tb_ticks_per_usec) - 1;
 
  And time = -1;
 
 Please look at the end of the function, :)
 
 return sprintf(buf, %llu\n, time  0 ? time : 0);

I know you return 0 if value = 0/1, my question was that, is this correct as 
per specification?

Ahh, also for value upto 7 you will return 0, no?

-Bharat

 
 -dongsheng
 
 
   + } else {
   + time = pw20_wt;
   + }
   +
   + return sprintf(buf, %llu\n, time  0 ? time : 0);
   }
   +


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-10-08 Thread Bhushan Bharat-R65777

 -Original Message-
 From: j...@8bytes.org [mailto:j...@8bytes.org]
 Sent: Tuesday, October 08, 2013 10:32 PM
 To: Bjorn Helgaas
 Cc: Bhushan Bharat-R65777; alex.william...@redhat.com; 
 b...@kernel.crashing.org;
 ga...@kernel.crashing.org; linux-ker...@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; linux-...@vger.kernel.org; ag...@suse.de; Wood Scott-
 B07421; io...@lists.linux-foundation.org
 Subject: Re: [PATCH 1/7] powerpc: Add interface to get msi region information

 On Tue, Oct 08, 2013 at 10:47:49AM -0600, Bjorn Helgaas wrote:
  I still have no idea what an aperture type IOMMU is, other than that
  it is different.

 An aperture based IOMMU is basically any GART-like IOMMU which can only remap 
 a
 small window (the aperture) of the DMA address space. DMA outside of that 
 window
 is either blocked completly or passed through untranslated.

It is completely blocked for Freescale PAMU. 
So for this type of iommu what we have to do is to create a MSI mapping just 
after guest physical address, Example: guest have a 512M of memory then we 
create window of 1G (because of power of 2 requirement), then we have to FIT 
MSI just after 512M of guest.
And for that we need
1) to know the physical address of MSI's in interrupt controller (for 
that this patch was all about of).

2) When guest enable MSI interrupt then we write MSI-address and 
MSI-DATA in device. The discussion with Alex Williamson is about that interface.

Thanks
-Bharat

   Joerg

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-10-08 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, October 09, 2013 4:27 AM
 To: Bhushan Bharat-R65777
 Cc: alex.william...@redhat.com; j...@8bytes.org; b...@kernel.crashing.org;
 ga...@kernel.crashing.org; linux-ker...@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; linux-...@vger.kernel.org; ag...@suse.de;
 io...@lists.linux-foundation.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 1/7] powerpc: Add interface to get msi region information
 
 On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
  @@ -376,6 +405,7 @@ static int fsl_of_msi_probe(struct platform_device *dev)
  int len;
  u32 offset;
  static const u32 all_avail[] = { 0, NR_MSI_IRQS };
  +   static int bank_index;
 
  match = of_match_device(fsl_of_msi_ids, dev-dev);
  if (!match)
  @@ -419,8 +449,8 @@ static int fsl_of_msi_probe(struct platform_device *dev)
  dev-dev.of_node-full_name);
  goto error_out;
  }
  -   msi-msiir_offset =
  -   features-msiir_offset + (res.start  0xf);
  +   msi-msiir = res.start + features-msiir_offset;
  +   printk(msi-msiir = %llx\n, msi-msiir);
 
 dev_dbg or remove

Oops, sorry it was leftover of debugging :(

 
  }
 
  msi-feature = features-fsl_pic_ip; @@ -470,6 +500,7 @@ static int
  fsl_of_msi_probe(struct platform_device *dev)
  }
  }
 
  +   msi-bank_index = bank_index++;
 
 What if multiple MSIs are boing probed in parallel?

Ohh, I have not thought that it can be called in parallel

  bank_index is not atomic.

Will declare bank_intex as atomic_t and use atomic_inc_return(bank_index)

 
  diff --git a/arch/powerpc/sysdev/fsl_msi.h
  b/arch/powerpc/sysdev/fsl_msi.h index 8225f86..6bd5cfc 100644
  --- a/arch/powerpc/sysdev/fsl_msi.h
  +++ b/arch/powerpc/sysdev/fsl_msi.h
  @@ -29,12 +29,19 @@ struct fsl_msi {
  struct irq_domain *irqhost;
 
  unsigned long cascade_irq;
  -
  -   u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */
  +   dma_addr_t msiir; /* MSIIR Address in CCSR */
 
 Are you sure dma_addr_t is right here, versus phys_addr_t?  It implies that 
 it's
 the output of the DMA API, but I don't think the DMA API is used in the MSI
 driver.  Perhaps it should be, but we still want the raw physical address to
 pass on to VFIO.

Looking through the conversation I will make this phys_addr_t

 
  void __iomem *msi_regs;
  u32 feature;
  int msi_virqs[NR_MSI_REG];
 
  +   /*
  +* During probe each bank is assigned a index number.
  +* index number ranges from 0 to 2^32.
  +* Example  MSI bank 1 = 0
  +* MSI bank 2 = 1, and so on.
  +*/
  +   int bank_index;
 
 2^32 doesn't fit in int (nor does 2^32 - 1).

Right :(

 
 Just say that indices start at 0.

Will correct this

Thanks
-Bharat

 
 -Scott
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/4] powerpc: Added __cmpdi2 for signed 64bit comparision

2013-10-08 Thread Bhushan Bharat-R65777

Oops it came as 1/4,
I am sorry, please ignore this

Thanks
-Bharat

 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Wednesday, October 09, 2013 10:39 AM
 To: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org; b...@kernel.crashing.org
 Cc: Bhushan Bharat-R65777; Bhushan Bharat-R65777
 Subject: [PATCH 1/4] powerpc: Added __cmpdi2 for signed 64bit comparision
 
 This was missing on powerpc and I am getting compilation error
 drivers/vfio/pci/vfio_pci_rdwr.c:193: undefined reference to `__cmpdi2'
 drivers/vfio/pci/vfio_pci_rdwr.c:193: undefined reference to `__cmpdi2'
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
  arch/powerpc/kernel/misc_32.S   |   14 ++
  arch/powerpc/kernel/ppc_ksyms.c |2 ++
  2 files changed, 16 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S 
 index
 777d999..7c0eec2 100644
 --- a/arch/powerpc/kernel/misc_32.S
 +++ b/arch/powerpc/kernel/misc_32.S
 @@ -644,6 +644,20 @@ _GLOBAL(__lshrdi3)
   blr
 
  /*
 + * 64-bit comparison: __cmpdi2(s64 a, s64 b)
 + * Returns 0 if a  b, 1 if a == b, 2 if a  b.
 + */
 +_GLOBAL(__cmpdi2)
 + cmpwr3,r5
 + li  r3,1
 + bne 1f
 + cmplw   r4,r6
 + beqlr
 +1:   li  r3,0
 + bltlr
 + li  r3,2
 + blr
 +/*
   * 64-bit comparison: __ucmpdi2(u64 a, u64 b)
   * Returns 0 if a  b, 1 if a == b, 2 if a  b.
   */
 diff --git a/arch/powerpc/kernel/ppc_ksyms.c b/arch/powerpc/kernel/ppc_ksyms.c
 index 21646db..5674c00 100644
 --- a/arch/powerpc/kernel/ppc_ksyms.c
 +++ b/arch/powerpc/kernel/ppc_ksyms.c
 @@ -143,6 +143,8 @@ EXPORT_SYMBOL(__ashldi3);  EXPORT_SYMBOL(__lshrdi3);  int
 __ucmpdi2(unsigned long long, unsigned long long);  EXPORT_SYMBOL(__ucmpdi2);
 +int __cmpdi2(long long, long long);
 +EXPORT_SYMBOL(__cmpdi2);
  #endif
  long long __bswapdi2(long long);
  EXPORT_SYMBOL(__bswapdi2);
 --
 1.7.0.4


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device

2013-10-07 Thread Bhushan Bharat-R65777

   Do you really want module dependencies between vfio and your core
   kernel MSI setup?  Look at the vfio external user interface that we've
 already defined.
   That allows other components of the kernel to get a proper reference
   to a vfio group.  From there you can work out how to get what you
   want.  Another alternative is that vfio could register an MSI to
   IOVA mapping with architecture code when the mapping is created.
   The MSI setup path could then do a lookup in architecture code for
   the mapping.  You could even store the MSI to IOVA mapping in VFIO
   and create an interface where SET_IRQ passes that mapping into setup code.
 
  Ok, What I want is to get IOVA associated with a physical address
  (physical address of MSI-bank).
  And currently I do not see a way to know IOVA of a physical address
  and doing all this domain get and then search through all of
  iommu-windows of that domain.
 
  What if we add an iommu-API which can return the IOVA mapping of a
  physical address. Current use case is setting up MSI's for aperture
  type of IOMMU also getting a phys_to_iova() mapping is independent of
  VFIO, your thought?
 
 A physical address can be mapped to multiple IOVAs, so the interface seems
 flawed by design.  It also has the same problem as above, it's a backdoor that
 can be called asynchronous to the owner of the domain, so what reason is there
 to believe the result?  It just replaces an iommu_domain pointer with an IOVA.
 VFIO knows this mapping, so why are we trying to go behind its back and ask 
 the
 IOMMU?
IOMMU is the final place where mapping is created, so may be today it is 
calling on behalf of VFIO, tomorrow it can be for normal Linux or some other 
interface. But I am fine to directly talk to vfio and will not try to solve a 
problem which does not exists today.

MSI subsystem knows pdev (pci device) and physical address, then what interface 
it will use to get the IOVA from VFIO?

Thanks
-Bharat

  Thanks,
 
 Alex
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device

2013-10-06 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Friday, October 04, 2013 11:42 PM
 To: Bhushan Bharat-R65777
 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; 
 linux-
 ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
 p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
 foundation.org
 Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device

 On Fri, 2013-10-04 at 17:23 +, Bhushan Bharat-R65777 wrote:

   -Original Message-
   From: Alex Williamson [mailto:alex.william...@redhat.com]
   Sent: Friday, October 04, 2013 10:43 PM
   To: Bhushan Bharat-R65777
   Cc: j...@8bytes.org; b...@kernel.crashing.org;
   ga...@kernel.crashing.org; linux- ker...@vger.kernel.org;
   linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org;
   ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org
   Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a
   device

   On Fri, 2013-10-04 at 16:47 +, Bhushan Bharat-R65777 wrote:

 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Friday, October 04, 2013 9:15 PM
 To: Bhushan Bharat-R65777
 Cc: j...@8bytes.org; b...@kernel.crashing.org;
 ga...@kernel.crashing.org; linux- ker...@vger.kernel.org;
 linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org;
 ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
 foundation.org
 Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a
 device

 On Fri, 2013-10-04 at 09:54 +, Bhushan Bharat-R65777 wrote:

   -Original Message-
   From: linux-pci-ow...@vger.kernel.org
   [mailto:linux-pci-ow...@vger.kernel.org]
   On Behalf Of Alex Williamson
   Sent: Wednesday, September 25, 2013 10:16 PM
   To: Bhushan Bharat-R65777
   Cc: j...@8bytes.org; b...@kernel.crashing.org;
   ga...@kernel.crashing.org; linux- ker...@vger.kernel.org;
   linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org;
   ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
   foundation.org; Bhushan Bharat-R65777
   Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain
   of a device

   On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
This api return the iommu domain to which the device is 
attached.
The iommu_domain is required for making API calls related to
 iommu.
Follow up patches which use this API to know iommu maping.

Signed-off-by: Bharat Bhushan
bharat.bhus...@freescale.com
---
 drivers/iommu/iommu.c |   10 ++
 include/linux/iommu.h |7 +++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index
fbe9ca7..6ac5f50 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -696,6 +696,16 @@ void iommu_detach_device(struct
iommu_domain *domain, struct device *dev)  }
EXPORT_SYMBOL_GPL(iommu_detach_device);

+struct iommu_domain *iommu_get_dev_domain(struct device *dev) {
+   struct iommu_ops *ops = dev-bus-iommu_ops;
+
+   if (unlikely(ops == NULL || ops-get_dev_iommu_domain ==
 NULL))
+   return NULL;
+
+   return ops-get_dev_iommu_domain(dev); }
+EXPORT_SYMBOL_GPL(iommu_get_dev_domain);

   What prevents this from racing iommu_domain_free()?  There's
   no references acquired, so there's no reason for the caller
   to assume the
 pointer is valid.

  Sorry for late query, somehow this email went into a folder
  and escaped;

  Just to be sure, there is not lock at generic struct
  iommu_domain, but IP
 specific structure (link FSL domain) linked in
 iommu_domain-priv have a lock, so we need to ensure this race
 in FSL iommu code (say drivers/iommu/fsl_pamu_domain.c), right?

 No, it's not sufficient to make sure that your use of the
 interface is race free.  The interface itself needs to be
 designed so that it's difficult to use incorrectly.

So we can define iommu_get_dev_domain()/iommu_put_dev_domain();
iommu_get_dev_domain() will return domain with the lock held, and
iommu_put_dev_domain() will release the lock? And
iommu_get_dev_domain() must always be followed by
iommu_get_dev_domain().

   What lock?  get/put are generally used for reference counting, not
   locking in the kernel.

 That's not the case here.  This is a backdoor to get the iommu
 domain from the iommu driver regardless of who is using it or how.
 The iommu domain is created and managed by vfio, so shouldn't we
 be looking at how to do this through vfio?

Let me

RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device

2013-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
 On Behalf Of Alex Williamson
 Sent: Wednesday, September 25, 2013 10:16 PM
 To: Bhushan Bharat-R65777
 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; 
 linux-
 ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
 p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
 foundation.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device
 
 On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
  This api return the iommu domain to which the device is attached.
  The iommu_domain is required for making API calls related to iommu.
  Follow up patches which use this API to know iommu maping.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   drivers/iommu/iommu.c |   10 ++
   include/linux/iommu.h |7 +++
   2 files changed, 17 insertions(+), 0 deletions(-)
 
  diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
  fbe9ca7..6ac5f50 100644
  --- a/drivers/iommu/iommu.c
  +++ b/drivers/iommu/iommu.c
  @@ -696,6 +696,16 @@ void iommu_detach_device(struct iommu_domain
  *domain, struct device *dev)  }
  EXPORT_SYMBOL_GPL(iommu_detach_device);
 
  +struct iommu_domain *iommu_get_dev_domain(struct device *dev) {
  +   struct iommu_ops *ops = dev-bus-iommu_ops;
  +
  +   if (unlikely(ops == NULL || ops-get_dev_iommu_domain == NULL))
  +   return NULL;
  +
  +   return ops-get_dev_iommu_domain(dev); }
  +EXPORT_SYMBOL_GPL(iommu_get_dev_domain);
 
 What prevents this from racing iommu_domain_free()?  There's no references
 acquired, so there's no reason for the caller to assume the pointer is valid.

Sorry for late query, somehow this email went into a folder and escaped;

Just to be sure, there is not lock at generic struct iommu_domain, but IP 
specific structure (link FSL domain) linked in iommu_domain-priv have a lock, 
so we need to ensure this race in FSL iommu code (say 
drivers/iommu/fsl_pamu_domain.c), right?

Thanks
-Bharat

 
   /*
* IOMMU groups are really the natrual working unit of the IOMMU, but
* the IOMMU API works on domains and devices.  Bridge that gap by
  diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
  7ea319e..fa046bd 100644
  --- a/include/linux/iommu.h
  +++ b/include/linux/iommu.h
  @@ -127,6 +127,7 @@ struct iommu_ops {
  int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count);
  /* Get the numer of window per domain */
  u32 (*domain_get_windows)(struct iommu_domain *domain);
  +   struct iommu_domain *(*get_dev_iommu_domain)(struct device *dev);
 
  unsigned long pgsize_bitmap;
   };
  @@ -190,6 +191,7 @@ extern int iommu_domain_window_enable(struct 
  iommu_domain
 *domain, u32 wnd_nr,
phys_addr_t offset, u64 size,
int prot);
   extern void iommu_domain_window_disable(struct iommu_domain *domain,
  u32 wnd_nr);
  +extern struct iommu_domain *iommu_get_dev_domain(struct device *dev);
   /**
* report_iommu_fault() - report about an IOMMU fault to the IOMMU 
  framework
* @domain: the iommu domain where the fault has happened @@ -284,6
  +286,11 @@ static inline void iommu_domain_window_disable(struct
  iommu_domain *domain,  {  }
 
  +static inline struct iommu_domain *iommu_get_dev_domain(struct device
  +*dev) {
  +   return NULL;
  +}
  +
   static inline phys_addr_t iommu_iova_to_phys(struct iommu_domain
  *domain, dma_addr_t iova)  {
  return 0;
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in the 
 body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device

2013-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Friday, October 04, 2013 3:24 PM
 To: 'Alex Williamson'
 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; 
 linux-
 ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
 p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
 foundation.org
 Subject: RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device
 
 
 
  -Original Message-
  From: linux-pci-ow...@vger.kernel.org
  [mailto:linux-pci-ow...@vger.kernel.org]
  On Behalf Of Alex Williamson
  Sent: Wednesday, September 25, 2013 10:16 PM
  To: Bhushan Bharat-R65777
  Cc: j...@8bytes.org; b...@kernel.crashing.org;
  ga...@kernel.crashing.org; linux- ker...@vger.kernel.org;
  linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org;
  ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org;
  Bhushan Bharat-R65777
  Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a
  device
 
  On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
   This api return the iommu domain to which the device is attached.
   The iommu_domain is required for making API calls related to iommu.
   Follow up patches which use this API to know iommu maping.
  
   Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
   ---
drivers/iommu/iommu.c |   10 ++
include/linux/iommu.h |7 +++
2 files changed, 17 insertions(+), 0 deletions(-)
  
   diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
   fbe9ca7..6ac5f50 100644
   --- a/drivers/iommu/iommu.c
   +++ b/drivers/iommu/iommu.c
   @@ -696,6 +696,16 @@ void iommu_detach_device(struct iommu_domain
   *domain, struct device *dev)  }
   EXPORT_SYMBOL_GPL(iommu_detach_device);
  
   +struct iommu_domain *iommu_get_dev_domain(struct device *dev) {
   + struct iommu_ops *ops = dev-bus-iommu_ops;
   +
   + if (unlikely(ops == NULL || ops-get_dev_iommu_domain == NULL))
   + return NULL;
   +
   + return ops-get_dev_iommu_domain(dev); }
   +EXPORT_SYMBOL_GPL(iommu_get_dev_domain);
 
  What prevents this from racing iommu_domain_free()?  There's no
  references acquired, so there's no reason for the caller to assume the 
  pointer
 is valid.
 
 Sorry for late query, somehow this email went into a folder and escaped;
 
 Just to be sure, there is not lock at generic struct iommu_domain, but IP
 specific structure (link FSL domain) linked in iommu_domain-priv have a lock,
 so we need to ensure this race in FSL iommu code (say
 drivers/iommu/fsl_pamu_domain.c), right?

Further thinking of this, there are more problems here:
 - Like MSI subsystem will call iommu_get_dev_domain(), which will take a lock, 
find the domain pointer, release the lock, and return the domain
 - Now if domain in freed up
 - While MSI subsystem tries to do work on domain (like 
get_attribute/set_attribute etc) ???

So can we do like iommu_get_dev_domain() will return domain with the lock held, 
and iommu_put_dev_domain() will release the lock? And iommu_get_dev_domain() 
must always be followed by iommu_get_dev_domain()

Thanks
-Bharat

 
 Thanks
 -Bharat
 
 
/*
 * IOMMU groups are really the natrual working unit of the IOMMU, but
 * the IOMMU API works on domains and devices.  Bridge that gap by
   diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
   7ea319e..fa046bd 100644
   --- a/include/linux/iommu.h
   +++ b/include/linux/iommu.h
   @@ -127,6 +127,7 @@ struct iommu_ops {
 int (*domain_set_windows)(struct iommu_domain *domain, u32 w_count);
 /* Get the numer of window per domain */
 u32 (*domain_get_windows)(struct iommu_domain *domain);
   + struct iommu_domain *(*get_dev_iommu_domain)(struct device *dev);
  
 unsigned long pgsize_bitmap;
};
   @@ -190,6 +191,7 @@ extern int iommu_domain_window_enable(struct
   iommu_domain
  *domain, u32 wnd_nr,
   phys_addr_t offset, u64 size,
   int prot);
extern void iommu_domain_window_disable(struct iommu_domain
   *domain,
   u32 wnd_nr);
   +extern struct iommu_domain *iommu_get_dev_domain(struct device
   +*dev);
/**
 * report_iommu_fault() - report about an IOMMU fault to the IOMMU
 framework
 * @domain: the iommu domain where the fault has happened @@ -284,6
   +286,11 @@ static inline void iommu_domain_window_disable(struct
   iommu_domain *domain,  {  }
  
   +static inline struct iommu_domain *iommu_get_dev_domain(struct
   +device
   +*dev) {
   + return NULL;
   +}
   +
static inline phys_addr_t iommu_iova_to_phys(struct iommu_domain
   *domain, dma_addr_t iova)  {
 return 0;
 
 
 
  --
  To unsubscribe from this list: send the line unsubscribe linux-pci
  in the body of a message to majord...@vger.kernel.org More majordomo
  info at http://vger.kernel.org/majordomo-info.html

___
Linuxppc-dev mailing list
Linuxppc-dev

RE: [PATCH 4/6 v5] kvm: powerpc: keep only pte search logic in lookup_linux_pte

2013-10-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Friday, October 04, 2013 6:57 PM
 To: Bhushan Bharat-R65777
 Cc: b...@kernel.crashing.org; pau...@samba.org; k...@vger.kernel.org; kvm-
 p...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421; 
 Bhushan
 Bharat-R65777
 Subject: Re: [PATCH 4/6 v5] kvm: powerpc: keep only pte search logic in
 lookup_linux_pte
 
 
 On 19.09.2013, at 08:02, Bharat Bhushan wrote:
 
  lookup_linux_pte() was searching for a pte and also sets access flags
  is writable. This function now searches only pte while access flag
  setting is done explicitly.
 
  This pte lookup is not kvm specific, so moved to common code
  (asm/pgtable.h) My Followup patch will use this on booke.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  v4-v5
  - No change
 
  arch/powerpc/include/asm/pgtable.h  |   24 +++
  arch/powerpc/kvm/book3s_hv_rm_mmu.c |   36 
  +++---
  2 files changed, 36 insertions(+), 24 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/pgtable.h
  b/arch/powerpc/include/asm/pgtable.h
  index 7d6eacf..3a5de5c 100644
  --- a/arch/powerpc/include/asm/pgtable.h
  +++ b/arch/powerpc/include/asm/pgtable.h
  @@ -223,6 +223,30 @@ extern int gup_hugepte(pte_t *ptep, unsigned long
  sz, unsigned long addr, #endif pte_t *find_linux_pte_or_hugepte(pgd_t
  *pgdir, unsigned long ea,
   unsigned *shift);
  +
  +static inline pte_t *lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
  +unsigned long *pte_sizep)
  +{
  +   pte_t *ptep;
  +   unsigned long ps = *pte_sizep;
  +   unsigned int shift;
  +
  +   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
  +   if (!ptep)
  +   return __pte(0);
 
 This returns a struct pte_t, but your return value of the function is a struct
 pte_t *. So this code will fail compiling with STRICT_MM_TYPECHECKS set. Any
 reason you don't just return NULL here?

I want to return the ptep (pte pointer) , so yes this should be NULL.
Will correct this.

Thanks
-Bharat

 
 That way callers could simply check on if (ptep) ... or you leave the return
 value as struct pte_t.
 
 
 Alex
 
  +   if (shift)
  +   *pte_sizep = 1ul  shift;
  +   else
  +   *pte_sizep = PAGE_SIZE;
  +
  +   if (ps  *pte_sizep)
  +   return __pte(0);
  +
  +   if (!pte_present(*ptep))
  +   return __pte(0);
 
  +
  +   return ptep;
  +}
  #endif /* __ASSEMBLY__ */
 
  #endif /* __KERNEL__ */
  diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
  b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
  index 45e30d6..74fa7f8 100644
  --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
  +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
  @@ -134,25 +134,6 @@ static void remove_revmap_chain(struct kvm *kvm, long
 pte_index,
  unlock_rmap(rmap);
  }
 
  -static pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
  - int writing, unsigned long *pte_sizep)
  -{
  -   pte_t *ptep;
  -   unsigned long ps = *pte_sizep;
  -   unsigned int hugepage_shift;
  -
  -   ptep = find_linux_pte_or_hugepte(pgdir, hva, hugepage_shift);
  -   if (!ptep)
  -   return __pte(0);
  -   if (hugepage_shift)
  -   *pte_sizep = 1ul  hugepage_shift;
  -   else
  -   *pte_sizep = PAGE_SIZE;
  -   if (ps  *pte_sizep)
  -   return __pte(0);
  -   return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift);
  -}
  -
  static inline void unlock_hpte(unsigned long *hpte, unsigned long
  hpte_v) {
  asm volatile(PPC_RELEASE_BARRIER  : : : memory); @@ -173,6 +154,7
  @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
  unsigned long is_io;
  unsigned long *rmap;
  pte_t pte;
  +   pte_t *ptep;
  unsigned int writing;
  unsigned long mmu_seq;
  unsigned long rcbits;
  @@ -231,8 +213,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned
  long flags,
 
  /* Look up the Linux PTE for the backing page */
  pte_size = psize;
  -   pte = lookup_linux_pte(pgdir, hva, writing, pte_size);
  -   if (pte_present(pte)) {
  +   ptep = lookup_linux_pte(pgdir, hva, pte_size);
  +   if (pte_present(pte_val(*ptep))) {
  +   pte = kvmppc_read_update_linux_pte(ptep, writing);
  if (writing  !pte_write(pte))
  /* make the actual HPTE be read-only */
  ptel = hpte_make_readonly(ptel);
  @@ -661,15 +644,20 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned
 long flags,
  struct kvm_memory_slot *memslot;
  pgd_t *pgdir = vcpu-arch.pgdir;
  pte_t pte;
  +   pte_t *ptep;
 
  psize = hpte_page_size(v, r);
  gfn = ((r  HPTE_R_RPN)  ~(psize - 1))  PAGE_SHIFT;
  memslot

RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device

2013-10-04 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Friday, October 04, 2013 9:15 PM
 To: Bhushan Bharat-R65777
 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; 
 linux-
 ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
 p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
 foundation.org
 Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device

 On Fri, 2013-10-04 at 09:54 +, Bhushan Bharat-R65777 wrote:

   -Original Message-
   From: linux-pci-ow...@vger.kernel.org
   [mailto:linux-pci-ow...@vger.kernel.org]
   On Behalf Of Alex Williamson
   Sent: Wednesday, September 25, 2013 10:16 PM
   To: Bhushan Bharat-R65777
   Cc: j...@8bytes.org; b...@kernel.crashing.org;
   ga...@kernel.crashing.org; linux- ker...@vger.kernel.org;
   linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org;
   ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org;
   Bhushan Bharat-R65777
   Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a
   device

   On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
This api return the iommu domain to which the device is attached.
The iommu_domain is required for making API calls related to iommu.
Follow up patches which use this API to know iommu maping.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 drivers/iommu/iommu.c |   10 ++
 include/linux/iommu.h |7 +++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
fbe9ca7..6ac5f50 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -696,6 +696,16 @@ void iommu_detach_device(struct iommu_domain
*domain, struct device *dev)  }
EXPORT_SYMBOL_GPL(iommu_detach_device);

+struct iommu_domain *iommu_get_dev_domain(struct device *dev) {
+   struct iommu_ops *ops = dev-bus-iommu_ops;
+
+   if (unlikely(ops == NULL || ops-get_dev_iommu_domain == NULL))
+   return NULL;
+
+   return ops-get_dev_iommu_domain(dev); }
+EXPORT_SYMBOL_GPL(iommu_get_dev_domain);

   What prevents this from racing iommu_domain_free()?  There's no
   references acquired, so there's no reason for the caller to assume the
 pointer is valid.

  Sorry for late query, somehow this email went into a folder and
  escaped;

  Just to be sure, there is not lock at generic struct iommu_domain, but IP
 specific structure (link FSL domain) linked in iommu_domain-priv have a lock,
 so we need to ensure this race in FSL iommu code (say
 drivers/iommu/fsl_pamu_domain.c), right?

 No, it's not sufficient to make sure that your use of the interface is race
 free.  The interface itself needs to be designed so that it's difficult to use
 incorrectly.

So we can define iommu_get_dev_domain()/iommu_put_dev_domain();  
iommu_get_dev_domain() will return domain with the lock held, and 
iommu_put_dev_domain() will release the lock? And iommu_get_dev_domain() must 
always be followed by iommu_get_dev_domain().

 That's not the case here.  This is a backdoor to get the iommu
 domain from the iommu driver regardless of who is using it or how.  The iommu
 domain is created and managed by vfio, so shouldn't we be looking at how to do
 this through vfio?

Let me first describe what we are doing here:
During initialization:-
 - vfio talks to MSI system to know the MSI-page and size
 - vfio then interacts with iommu to map the MSI-page in iommu (IOVA is decided 
by userspace and physical address is the MSI-page)
 - So the IOVA subwindow mapping is created in iommu and yes VFIO know about 
this mapping.

Now do SET_IRQ(MSI/MSIX) ioctl:
 - calls pci_enable_msix()/pci_enable_msi_block(): which is supposed to set MSI 
address/data in device.
 - So in current implementation (this patchset) msi-subsystem gets the IOVA 
from iommu via this defined interface.
 - Are you saying that rather than getting this from iommu, we should get this 
from vfio? What difference does this make?

Thanks
-Bharat

 It seems like you'd want to use your device to get a vfio
 group reference, from which you could do something with the vfio external user
 interface and get the iommu domain reference.  Thanks,

 Alex

 /*
  * IOMMU groups are really the natrual working unit of the IOMMU, but
  * the IOMMU API works on domains and devices.  Bridge that gap
by diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7ea319e..fa046bd 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -127,6 +127,7 @@ struct iommu_ops {
int (*domain_set_windows)(struct iommu_domain *domain, u32
 w_count);
/* Get the numer of window per domain */
u32 (*domain_get_windows)(struct iommu_domain *domain);
+   struct iommu_domain *(*get_dev_iommu_domain

RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device

2013-10-04 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Friday, October 04, 2013 10:43 PM
 To: Bhushan Bharat-R65777
 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; 
 linux-
 ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
 p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
 foundation.org
 Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device

 On Fri, 2013-10-04 at 16:47 +, Bhushan Bharat-R65777 wrote:

   -Original Message-
   From: Alex Williamson [mailto:alex.william...@redhat.com]
   Sent: Friday, October 04, 2013 9:15 PM
   To: Bhushan Bharat-R65777
   Cc: j...@8bytes.org; b...@kernel.crashing.org;
   ga...@kernel.crashing.org; linux- ker...@vger.kernel.org;
   linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org;
   ag...@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org
   Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a
   device

   On Fri, 2013-10-04 at 09:54 +, Bhushan Bharat-R65777 wrote:

 -Original Message-
 From: linux-pci-ow...@vger.kernel.org
 [mailto:linux-pci-ow...@vger.kernel.org]
 On Behalf Of Alex Williamson
 Sent: Wednesday, September 25, 2013 10:16 PM
 To: Bhushan Bharat-R65777
 Cc: j...@8bytes.org; b...@kernel.crashing.org;
 ga...@kernel.crashing.org; linux- ker...@vger.kernel.org;
 linuxppc-dev@lists.ozlabs.org; linux- p...@vger.kernel.org;
 ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
 foundation.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a
 device

 On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
  This api return the iommu domain to which the device is attached.
  The iommu_domain is required for making API calls related to iommu.
  Follow up patches which use this API to know iommu maping.

  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   drivers/iommu/iommu.c |   10 ++
   include/linux/iommu.h |7 +++
   2 files changed, 17 insertions(+), 0 deletions(-)

  diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
  index
  fbe9ca7..6ac5f50 100644
  --- a/drivers/iommu/iommu.c
  +++ b/drivers/iommu/iommu.c
  @@ -696,6 +696,16 @@ void iommu_detach_device(struct
  iommu_domain *domain, struct device *dev)  }
  EXPORT_SYMBOL_GPL(iommu_detach_device);

  +struct iommu_domain *iommu_get_dev_domain(struct device *dev) {
  +   struct iommu_ops *ops = dev-bus-iommu_ops;
  +
  +   if (unlikely(ops == NULL || ops-get_dev_iommu_domain == NULL))
  +   return NULL;
  +
  +   return ops-get_dev_iommu_domain(dev); }
  +EXPORT_SYMBOL_GPL(iommu_get_dev_domain);

 What prevents this from racing iommu_domain_free()?  There's no
 references acquired, so there's no reason for the caller to
 assume the
   pointer is valid.

Sorry for late query, somehow this email went into a folder and
escaped;

Just to be sure, there is not lock at generic struct
iommu_domain, but IP
   specific structure (link FSL domain) linked in iommu_domain-priv
   have a lock, so we need to ensure this race in FSL iommu code (say
   drivers/iommu/fsl_pamu_domain.c), right?

   No, it's not sufficient to make sure that your use of the interface
   is race free.  The interface itself needs to be designed so that
   it's difficult to use incorrectly.

  So we can define iommu_get_dev_domain()/iommu_put_dev_domain();
  iommu_get_dev_domain() will return domain with the lock held, and
  iommu_put_dev_domain() will release the lock? And
  iommu_get_dev_domain() must always be followed by
  iommu_get_dev_domain().

 What lock?  get/put are generally used for reference counting, not locking in
 the kernel.

   That's not the case here.  This is a backdoor to get the iommu
   domain from the iommu driver regardless of who is using it or how.
   The iommu domain is created and managed by vfio, so shouldn't we be
   looking at how to do this through vfio?

  Let me first describe what we are doing here:
  During initialization:-
   - vfio talks to MSI system to know the MSI-page and size
   - vfio then interacts with iommu to map the MSI-page in iommu (IOVA
  is decided by userspace and physical address is the MSI-page)
   - So the IOVA subwindow mapping is created in iommu and yes VFIO know about
 this mapping.

  Now do SET_IRQ(MSI/MSIX) ioctl:
   - calls pci_enable_msix()/pci_enable_msi_block(): which is supposed to set
 MSI address/data in device.
   - So in current implementation (this patchset) msi-subsystem gets the IOVA
 from iommu via this defined interface.
   - Are you saying that rather than getting this from iommu, we should get 
  this
 from vfio? What difference does this make?

 Yes, you just said above

RE: [PATCH 1/7] powerpc: Add interface to get msi region information

2013-10-03 Thread Bhushan Bharat-R65777



 -Original Message-
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
 On Behalf Of Bjorn Helgaas
 Sent: Wednesday, September 25, 2013 5:28 AM
 To: Bhushan Bharat-R65777
 Cc: alex.william...@redhat.com; j...@8bytes.org; b...@kernel.crashing.org;
 ga...@kernel.crashing.org; linux-ker...@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; linux-...@vger.kernel.org; ag...@suse.de; Wood Scott-
 B07421; io...@lists.linux-foundation.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 1/7] powerpc: Add interface to get msi region information
 
 On Thu, Sep 19, 2013 at 12:59:17PM +0530, Bharat Bhushan wrote:
  This patch adds interface to get following information
- Number of MSI regions (which is number of MSI banks for powerpc).
- Get the region address range: Physical page which have the
   address/addresses used for generating MSI interrupt
   and size of the page.
 
  These are required to create IOMMU (Freescale PAMU) mapping for
  devices which are directly assigned using VFIO.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   arch/powerpc/include/asm/machdep.h |8 +++
   arch/powerpc/include/asm/pci.h |2 +
   arch/powerpc/kernel/msi.c  |   18 
   arch/powerpc/sysdev/fsl_msi.c  |   39 
  +--
   arch/powerpc/sysdev/fsl_msi.h  |   11 -
   drivers/pci/msi.c  |   26 
   include/linux/msi.h|8 +++
   include/linux/pci.h|   13 
   8 files changed, 120 insertions(+), 5 deletions(-)
 
  ...
 
  diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index
  aca7578..6d85c15 100644
  --- a/drivers/pci/msi.c
  +++ b/drivers/pci/msi.c
  @@ -30,6 +30,20 @@ static int pci_msi_enable = 1;
 
   /* Arch hooks */
 
  +#ifndef arch_msi_get_region_count
  +int arch_msi_get_region_count(void)
  +{
  +   return 0;
  +}
  +#endif
  +
  +#ifndef arch_msi_get_region
  +int arch_msi_get_region(int region_num, struct msi_region *region) {
  +   return 0;
  +}
  +#endif
 
 This #define strategy is gone; see 4287d824 (PCI: use weak functions for MSI
 arch-specific functions).  Please use the weak function strategy for your new
 MSI region functions.

ok

 
  +
   #ifndef arch_msi_check_device
   int arch_msi_check_device(struct pci_dev *dev, int nvec, int type)  {
  @@ -903,6 +917,18 @@ void pci_disable_msi(struct pci_dev *dev)  }
  EXPORT_SYMBOL(pci_disable_msi);
 
  +int msi_get_region_count(void)
  +{
  +   return arch_msi_get_region_count();
  +}
  +EXPORT_SYMBOL(msi_get_region_count);
  +
  +int msi_get_region(int region_num, struct msi_region *region) {
  +   return arch_msi_get_region(region_num, region); }
  +EXPORT_SYMBOL(msi_get_region);
 
 Please split these interface additions, i.e., the drivers/pci/msi.c,
 include/linux/msi.h, and include/linux/pci.h changes, into a separate patch.

ok

 
 I don't know enough about VFIO to understand why these new interfaces are
 needed.  Is this the first VFIO IOMMU driver?  I see vfio_iommu_spapr_tce.c 
 and
 vfio_iommu_type1.c but I don't know if they're comparable to the Freescale 
 PAMU.
 Do other VFIO IOMMU implementations support MSI?  If so, do they handle the
 problem of mapping the MSI regions in a different way?

PAMU is an aperture type of IOMMU while other are paging type, So they are 
completely different from what PAMU is and handle that differently.

 
   /**
* pci_msix_table_size - return the number of device's MSI-X table entries
* @dev: pointer to the pci_dev data structure of MSI-X device
  function diff --git a/include/linux/msi.h b/include/linux/msi.h index
  ee66f3a..ae32601 100644
  --- a/include/linux/msi.h
  +++ b/include/linux/msi.h
  @@ -50,6 +50,12 @@ struct msi_desc {
  struct kobject kobj;
   };
 
  +struct msi_region {
  +   int region_num;
  +   dma_addr_t addr;
  +   size_t size;
  +};
 
 This needs some sort of explanatory comment.

Ok

-Bharat

 
   /*
* The arch hook for setup up msi irqs
*/
  @@ -58,5 +64,7 @@ void arch_teardown_msi_irq(unsigned int irq);  int
  arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);  void
  arch_teardown_msi_irqs(struct pci_dev *dev);  int
  arch_msi_check_device(struct pci_dev* dev, int nvec, int type);
  +int arch_msi_get_region_count(void);
  +int arch_msi_get_region(int region_num, struct msi_region *region);
 
   #endif /* LINUX_MSI_H */
  diff --git a/include/linux/pci.h b/include/linux/pci.h index
  186540d..2b26a59 100644
  --- a/include/linux/pci.h
  +++ b/include/linux/pci.h
  @@ -1126,6 +1126,7 @@ struct msix_entry {
  u16 entry;  /* driver uses to specify entry, OS writes */
   };
 
  +struct msi_region;
 
   #ifndef CONFIG_PCI_MSI
   static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned
  int nvec) @@ -1168,6 +1169,16 @@ static inline int
  pci_msi_enabled(void)  {
  return 0;
   }
  +
  +static inline int

RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

2013-09-25 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Dongsheng
 Wang
 Sent: Tuesday, September 24, 2013 2:59 PM
 To: Wood Scott-B07421
 Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534
 Subject: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec 
 idle
 
 From: Wang Dongsheng dongsheng.w...@freescale.com
 
 Add a sys interface to enable/diable pw20 state or altivec idle, and
 control the wait entry time.
 
 Enable/Disable interface:
 0, disable. 1, enable.
 /sys/devices/system/cpu/cpuX/pw20_state
 /sys/devices/system/cpu/cpuX/altivec_idle
 
 Set wait time interface:(Nanosecond)
 /sys/devices/system/cpu/cpuX/pw20_wait_time
 /sys/devices/system/cpu/cpuX/altivec_idle_wait_time
 Example: Base on TBfreq is 41MHZ.
 1~47(ns): TB[63]
 48~95(ns): TB[62]
 96~191(ns): TB[61]
 192~383(ns): TB[62]
 384~767(ns): TB[60]
 ...
 
 Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com
 ---
 *v4:
 Move code from 85xx/common.c to kernel/sysfs.c.
 
 Remove has_pw20_altivec_idle function.
 
 Change wait entry_bit to wait time.
 
  arch/powerpc/kernel/sysfs.c | 291 
 
  1 file changed, 291 insertions(+)
 
 diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
 index 27a90b9..23fece6 100644
 --- a/arch/powerpc/kernel/sysfs.c
 +++ b/arch/powerpc/kernel/sysfs.c
 @@ -85,6 +85,279 @@ __setup(smt-snooze-delay=, setup_smt_snooze_delay);
 
  #endif /* CONFIG_PPC64 */
 
 +#ifdef CONFIG_FSL_SOC
 +#define MAX_BIT  63
 +
 +static u64 pw20_wt;
 +static u64 altivec_idle_wt;
 +
 +static unsigned int get_idle_ticks_bit(u64 ns)
 +{
 + u64 cycle;
 +
 + cycle = div_u64(ns, 1000 / tb_ticks_per_usec);

When tb_ticks_per_usec   1000 (timebase frequency  1GHz) then this will 
always be ns, which is not correct, no? 

 + if (!cycle)
 + return 0;
 +
 + return ilog2(cycle);
 +}
 +
 +static void do_show_pwrmgtcr0(void *val)
 +{
 + u32 *value = val;
 +
 + *value = mfspr(SPRN_PWRMGTCR0);
 +}
 +
 +static ssize_t show_pw20_state(struct device *dev,
 + struct device_attribute *attr, char *buf)
 +{
 + u32 value;
 + unsigned int cpu = dev-id;
 +
 + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1);
 +
 + value = PWRMGTCR0_PW20_WAIT;
 +
 + return sprintf(buf, %u\n, value ? 1 : 0);
 +}
 +
 +static void do_store_pw20_state(void *val)
 +{
 + u32 *value = val;
 + u32 pw20_state;
 +
 + pw20_state = mfspr(SPRN_PWRMGTCR0);
 +
 + if (*value)
 + pw20_state |= PWRMGTCR0_PW20_WAIT;
 + else
 + pw20_state = ~PWRMGTCR0_PW20_WAIT;
 +
 + mtspr(SPRN_PWRMGTCR0, pw20_state);
 +}
 +
 +static ssize_t store_pw20_state(struct device *dev,
 + struct device_attribute *attr,
 + const char *buf, size_t count)
 +{
 + u32 value;
 + unsigned int cpu = dev-id;
 +
 + if (kstrtou32(buf, 0, value))
 + return -EINVAL;
 +
 + if (value  1)
 + return -EINVAL;
 +
 + smp_call_function_single(cpu, do_store_pw20_state, value, 1);
 +
 + return count;
 +}
 +
 +static ssize_t show_pw20_wait_time(struct device *dev,
 + struct device_attribute *attr, char *buf)
 +{
 + u32 value;
 + u64 tb_cycle;
 + u64 time;
 +
 + unsigned int cpu = dev-id;
 +
 + if (!pw20_wt) {
 + smp_call_function_single(cpu, do_show_pwrmgtcr0, value, 1);
 + value = (value  PWRMGTCR0_PW20_ENT) 
 + PWRMGTCR0_PW20_ENT_SHIFT;
 +
 + tb_cycle = (1  (MAX_BIT - value)) * 2;
 + time = tb_cycle * (1000 / tb_ticks_per_usec) - 1;

Similar to above comment.

-Bharat

 + } else {
 + time = pw20_wt;
 + }
 +
 + return sprintf(buf, %llu\n, time);
 +}
 +
 +static void set_pw20_wait_entry_bit(void *val)
 +{
 + u32 *value = val;
 + u32 pw20_idle;
 +
 + pw20_idle = mfspr(SPRN_PWRMGTCR0);
 +
 + /* Set Automatic PW20 Core Idle Count */
 + /* clear count */
 + pw20_idle = ~PWRMGTCR0_PW20_ENT;
 +
 + /* set count */
 + pw20_idle |= ((MAX_BIT - *value)  PWRMGTCR0_PW20_ENT_SHIFT);
 +
 + mtspr(SPRN_PWRMGTCR0, pw20_idle);
 +}
 +
 +static ssize_t store_pw20_wait_time(struct device *dev,
 + struct device_attribute *attr,
 + const char *buf, size_t count)
 +{
 + u32 entry_bit;
 + u64 value;
 +
 + unsigned int cpu = dev-id;
 +
 + if (kstrtou64(buf, 0, value))
 + return -EINVAL;
 +
 + if (!value)
 + return -EINVAL;
 +
 + entry_bit = get_idle_ticks_bit(value);
 + if (entry_bit  MAX_BIT)
 + return -EINVAL;
 +
 + pw20_wt = value;
 + smp_call_function_single(cpu, set_pw20_wait_entry_bit,
 + entry_bit, 1);
 +
 +

RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

2013-09-25 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wang Dongsheng-B40534
 Sent: Wednesday, September 25, 2013 1:40 PM
 To: Bhushan Bharat-R65777; Wood Scott-B07421
 Cc: linuxppc-dev@lists.ozlabs.org
 Subject: RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec
 idle

  -Original Message-
  From: Bhushan Bharat-R65777
  Sent: Wednesday, September 25, 2013 2:23 PM
  To: Wang Dongsheng-B40534; Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534
  Subject: RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and
  altivec idle

   -Original Message-
   From: Linuxppc-dev [mailto:linuxppc-dev-
   bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of
   bounces+Dongsheng
   Wang
   Sent: Tuesday, September 24, 2013 2:59 PM
   To: Wood Scott-B07421
   Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534
   Subject: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and
   altivec idle

   From: Wang Dongsheng dongsheng.w...@freescale.com

   Add a sys interface to enable/diable pw20 state or altivec idle, and
   control the wait entry time.

   Enable/Disable interface:
   0, disable. 1, enable.
   /sys/devices/system/cpu/cpuX/pw20_state
   /sys/devices/system/cpu/cpuX/altivec_idle

   Set wait time interface:(Nanosecond)
   /sys/devices/system/cpu/cpuX/pw20_wait_time
   /sys/devices/system/cpu/cpuX/altivec_idle_wait_time
   Example: Base on TBfreq is 41MHZ.
   1~47(ns): TB[63]
   48~95(ns): TB[62]
   96~191(ns): TB[61]
   192~383(ns): TB[62]
   384~767(ns): TB[60]
   ...

   Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com
   ---
   *v4:
   Move code from 85xx/common.c to kernel/sysfs.c.

   Remove has_pw20_altivec_idle function.

   Change wait entry_bit to wait time.

arch/powerpc/kernel/sysfs.c | 291

1 file changed, 291 insertions(+)

   diff --git a/arch/powerpc/kernel/sysfs.c
   b/arch/powerpc/kernel/sysfs.c index 27a90b9..23fece6 100644
   --- a/arch/powerpc/kernel/sysfs.c
   +++ b/arch/powerpc/kernel/sysfs.c
   @@ -85,6 +85,279 @@ __setup(smt-snooze-delay=,
   setup_smt_snooze_delay);

#endif /* CONFIG_PPC64 */

   +#ifdef CONFIG_FSL_SOC
   +#define MAX_BIT  63
   +
   +static u64 pw20_wt;
   +static u64 altivec_idle_wt;
   +
   +static unsigned int get_idle_ticks_bit(u64 ns) {
   + u64 cycle;
   +
   + cycle = div_u64(ns, 1000 / tb_ticks_per_usec);

  When tb_ticks_per_usec   1000 (timebase frequency  1GHz) then this
  will always be ns, which is not correct, no?

 1000 / tb_ticks_per_usec means nsec_ticks_per_tb

 If timebase frequency  1GHz, this should be tb_ticks_per_usec / 1000 and to
 get tb_ticks_per_nsec.
 This should be changed to cycle = ns * tb_ticks_per_nsec;

Yes, we need to change this to two line.

 But at present we do not have such a platform that timebase frequency more 
 than
 1GHz. And I think it is not need to support such a situation. Because we have 
 no
 environment to test it.

 If later there will be more than 1GHZ platform at that time to add this 
 support.

Would like to leave it to Scott, but personally I think that if there is 
something simple to fix then it must be fixed rather than waiting for some 
error to happen and then fixing.

-Bharat

 Thanks.

 -dongsheng

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/7] iommu: supress loff_t compilation error on powerpc

2013-09-25 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Wednesday, September 25, 2013 10:10 PM
 To: Bhushan Bharat-R65777
 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; 
 linux-
 ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
 p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
 foundation.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 5/7] iommu: supress loff_t compilation error on powerpc
 
 On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   drivers/vfio/pci/vfio_pci_rdwr.c |3 ++-
   1 files changed, 2 insertions(+), 1 deletions(-)
 
  diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c
  b/drivers/vfio/pci/vfio_pci_rdwr.c
  index 210db24..8a8156a 100644
  --- a/drivers/vfio/pci/vfio_pci_rdwr.c
  +++ b/drivers/vfio/pci/vfio_pci_rdwr.c
  @@ -181,7 +181,8 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, 
  char
 __user *buf,
 size_t count, loff_t *ppos, bool iswrite)  {
  int ret;
  -   loff_t off, pos = *ppos  VFIO_PCI_OFFSET_MASK;
  +   loff_t off;
  +   u64 pos = (u64 )(*ppos  VFIO_PCI_OFFSET_MASK);
  void __iomem *iomem = NULL;
  unsigned int rsrc;
  bool is_ioport;
 
 What's the compile error that this fixes?

I was getting below error; and after some googling I came to know that this is 
how it is fixed by other guys.

/home/r65777/linux-vfio/drivers/vfio/pci/vfio_pci_rdwr.c:193: undefined 
reference to `__cmpdi2'
/home/r65777/linux-vfio/drivers/vfio/pci/vfio_pci_rdwr.c:193: undefined 
reference to `__cmpdi2'

Thanks
-Bharat
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 6/7] vfio: moving some functions in common file

2013-09-25 Thread Bhushan Bharat-R65777



 -Original Message-
 From: linux-pci-ow...@vger.kernel.org [mailto:linux-pci-ow...@vger.kernel.org]
 On Behalf Of Alex Williamson
 Sent: Wednesday, September 25, 2013 10:33 PM
 To: Bhushan Bharat-R65777
 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; 
 linux-
 ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
 p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
 foundation.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 6/7] vfio: moving some functions in common file
 
 On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
  Some function defined in vfio_iommu_type1.c were common and we want to
  use these for FSL IOMMU (PAMU) and iommu-none driver.
  So some of them are moved to vfio_iommu_common.c
 
  I think we can do more of that but we will take this step by step.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   drivers/vfio/Makefile|4 +-
   drivers/vfio/vfio_iommu_common.c |  235
 ++
   drivers/vfio/vfio_iommu_common.h |   30 +
   drivers/vfio/vfio_iommu_type1.c  |  206
  +-
   4 files changed, 268 insertions(+), 207 deletions(-)  create mode
  100644 drivers/vfio/vfio_iommu_common.c  create mode 100644
  drivers/vfio/vfio_iommu_common.h
 
  diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
  72bfabc..c5792ec 100644
  --- a/drivers/vfio/Makefile
  +++ b/drivers/vfio/Makefile
  @@ -1,4 +1,4 @@
   obj-$(CONFIG_VFIO) += vfio.o
  -obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
  -obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
  +obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_common.o
  +vfio_iommu_type1.o
  +obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_common.o
  +vfio_iommu_spapr_tce.o
   obj-$(CONFIG_VFIO_PCI) += pci/
  diff --git a/drivers/vfio/vfio_iommu_common.c
  b/drivers/vfio/vfio_iommu_common.c
  new file mode 100644
  index 000..8bdc0ea
  --- /dev/null
  +++ b/drivers/vfio/vfio_iommu_common.c
  @@ -0,0 +1,235 @@
  +/*
  + * VFIO: Common code for vfio IOMMU support
  + *
  + * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
  + * Author: Alex Williamson alex.william...@redhat.com
  + * Author: Bharat Bhushan bharat.bhus...@freescale.com
  + *
  + * This program is free software; you can redistribute it and/or
  +modify
  + * it under the terms of the GNU General Public License version 2 as
  + * published by the Free Software Foundation.
  + *
  + * Derived from original vfio:
  + * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
  + * Author: Tom Lyon, p...@cisco.com
  + */
  +
  +#include linux/compat.h
  +#include linux/device.h
  +#include linux/fs.h
  +#include linux/iommu.h
  +#include linux/module.h
  +#include linux/mm.h
  +#include linux/pci.h /* pci_bus_type */
  +#include linux/rbtree.h
  +#include linux/sched.h
  +#include linux/slab.h
  +#include linux/uaccess.h
  +#include linux/vfio.h
  +#include linux/workqueue.h
 
 Please cleanup includes on both the source and target files.  You obviously
 don't need linux/pci.h here for one.

Will do.

 
  +
  +static bool disable_hugepages;
  +module_param_named(disable_hugepages,
  +  disable_hugepages, bool, S_IRUGO | S_IWUSR);
  +MODULE_PARM_DESC(disable_hugepages,
  +Disable VFIO IOMMU support for IOMMU hugepages.);
  +
  +struct vwork {
  +   struct mm_struct*mm;
  +   longnpage;
  +   struct work_struct  work;
  +};
  +
  +/* delayed decrement/increment for locked_vm */ void
  +vfio_lock_acct_bg(struct work_struct *work) {
  +   struct vwork *vwork = container_of(work, struct vwork, work);
  +   struct mm_struct *mm;
  +
  +   mm = vwork-mm;
  +   down_write(mm-mmap_sem);
  +   mm-locked_vm += vwork-npage;
  +   up_write(mm-mmap_sem);
  +   mmput(mm);
  +   kfree(vwork);
  +}
  +
  +void vfio_lock_acct(long npage)
  +{
  +   struct vwork *vwork;
  +   struct mm_struct *mm;
  +
  +   if (!current-mm || !npage)
  +   return; /* process exited or nothing to do */
  +
  +   if (down_write_trylock(current-mm-mmap_sem)) {
  +   current-mm-locked_vm += npage;
  +   up_write(current-mm-mmap_sem);
  +   return;
  +   }
  +
  +   /*
  +* Couldn't get mmap_sem lock, so must setup to update
  +* mm-locked_vm later. If locked_vm were atomic, we
  +* wouldn't need this silliness
  +*/
  +   vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
  +   if (!vwork)
  +   return;
  +   mm = get_task_mm(current);
  +   if (!mm) {
  +   kfree(vwork);
  +   return;
  +   }
  +   INIT_WORK(vwork-work, vfio_lock_acct_bg);
  +   vwork-mm = mm;
  +   vwork-npage = npage;
  +   schedule_work(vwork-work);
  +}
  +
  +/*
  + * Some mappings aren't backed by a struct page, for example an
  +mmap'd
  + * MMIO range for our own or another device.  These use a different
  + * pfn

RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

2013-09-25 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wang Dongsheng-B40534
 Sent: Thursday, September 26, 2013 8:02 AM
 To: Wood Scott-B07421
 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org
 Subject: RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec
 idle

  -Original Message-
  From: Wood Scott-B07421
  Sent: Thursday, September 26, 2013 1:57 AM
  To: Wang Dongsheng-B40534
  Cc: Bhushan Bharat-R65777; Wood Scott-B07421; linuxppc-
  d...@lists.ozlabs.org
  Subject: Re: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and
  altivec idle

  On Wed, 2013-09-25 at 03:10 -0500, Wang Dongsheng-B40534 wrote:

-Original Message-
From: Bhushan Bharat-R65777
Sent: Wednesday, September 25, 2013 2:23 PM
To: Wang Dongsheng-B40534; Wood Scott-B07421
Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534
Subject: RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state
and altivec idle

 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf
 bounces+Of Dongsheng
 Wang
 Sent: Tuesday, September 24, 2013 2:59 PM
 To: Wood Scott-B07421
 Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534
 Subject: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state
 and altivec idle

 From: Wang Dongsheng dongsheng.w...@freescale.com

 Add a sys interface to enable/diable pw20 state or altivec idle,
 and control the wait entry time.

 Enable/Disable interface:
 0, disable. 1, enable.
 /sys/devices/system/cpu/cpuX/pw20_state
 /sys/devices/system/cpu/cpuX/altivec_idle

 Set wait time interface:(Nanosecond)
 /sys/devices/system/cpu/cpuX/pw20_wait_time
 /sys/devices/system/cpu/cpuX/altivec_idle_wait_time
 Example: Base on TBfreq is 41MHZ.
 1~47(ns): TB[63]
 48~95(ns): TB[62]
 96~191(ns): TB[61]
 192~383(ns): TB[62]
 384~767(ns): TB[60]
 ...

 Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com
 ---
 *v4:
 Move code from 85xx/common.c to kernel/sysfs.c.

 Remove has_pw20_altivec_idle function.

 Change wait entry_bit to wait time.

  arch/powerpc/kernel/sysfs.c | 291

  1 file changed, 291 insertions(+)

 diff --git a/arch/powerpc/kernel/sysfs.c
 b/arch/powerpc/kernel/sysfs.c index 27a90b9..23fece6 100644
 --- a/arch/powerpc/kernel/sysfs.c
 +++ b/arch/powerpc/kernel/sysfs.c
 @@ -85,6 +85,279 @@ __setup(smt-snooze-delay=,
 setup_smt_snooze_delay);

  #endif /* CONFIG_PPC64 */

 +#ifdef CONFIG_FSL_SOC
 +#define MAX_BIT  63
 +
 +static u64 pw20_wt;
 +static u64 altivec_idle_wt;
 +
 +static unsigned int get_idle_ticks_bit(u64 ns) {
 + u64 cycle;
 +
 + cycle = div_u64(ns, 1000 / tb_ticks_per_usec);

When tb_ticks_per_usec   1000 (timebase frequency  1GHz) then
this will always be ns, which is not correct, no?

  Actually it'll be a divide by zero in that case.

 tb_ticks_per_usec = ppc_tb_freq / 100; Means TB freq should be more than
 1MHZ.

 if ppc_tb_freq less than 100, the tb_ticks_per_usec will be a divide by
 zero.
 If this condition is established, I think kernel cannot work as a normal.

 So I think we need to believe that the variable is not zero.

We do believe it is non-zero but greater than 1000 :)

 And I think TB freq
 should not less than 1MHZ on PPC platform, because if TB freq less than 1MHZ,
 the precision time will become very poor and system response time will be
 slower.

Not sure what you are describing here related to divide by zero we are 
mentioning.
You are talking about if tb_ticks_per_usec is ZERO and we are talking about if 
(1000/tb_ticks_per_usec) will be zero.

BTW, div_u64() handle the case where divider is zero.

   1000 / tb_ticks_per_usec means nsec_ticks_per_tb

   If timebase frequency  1GHz, this should be tb_ticks_per_usec / 1000
  and to get tb_ticks_per_nsec.
   This should be changed to cycle = ns * tb_ticks_per_nsec;

   But at present we do not have such a platform that timebase
   frequency more than 1GHz. And I think it is not need to support such a
 situation.
   Because we have no environment to test it.

  You can test it by hacking a wrong timebase frequency in and seeing
  what the calculation does.

  Or do something like this:

  if (ns = 1)
^^^

  cycle = ((ns + 500) / 1000) * tb_ticks_per_usec;
  else
  cycle = div_u64((u64)ns * tb_ticks_per_usec, 1000);

 We cannot do this, because if (ns+500)  1000, we cannot get the entry bit,
 it'll always zero bit.

There is a if condition of ns = 1, so ns+500 can not be less than 1000.

 We must to use per_nsec_tb_ticks, like my code 1000 / tb_ticks_per_usec.

  ...which can

RE: [PATCH 7/7] vfio pci: Add vfio iommu implementation for FSL_PAMU

2013-09-25 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alex Williamson [mailto:alex.william...@redhat.com]
 Sent: Thursday, September 26, 2013 12:37 AM
 To: Bhushan Bharat-R65777
 Cc: j...@8bytes.org; b...@kernel.crashing.org; ga...@kernel.crashing.org; 
 linux-
 ker...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; linux-
 p...@vger.kernel.org; ag...@suse.de; Wood Scott-B07421; iommu@lists.linux-
 foundation.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 7/7] vfio pci: Add vfio iommu implementation for FSL_PAMU
 
 On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
  This patch adds vfio iommu support for Freescale IOMMU (PAMU -
  Peripheral Access Management Unit).
 
  The Freescale PAMU is an aperture-based IOMMU with the following
  characteristics.  Each device has an entry in a table in memory
  describing the iova-phys mapping. The mapping has:
-an overall aperture that is power of 2 sized, and has a start iova that
 is naturally aligned
-has 1 or more windows within the aperture
-number of windows must be power of 2, max is 256
-size of each window is determined by aperture size / # of windows
-iova of each window is determined by aperture start iova / # of windows
-the mapped region in each window can be different than
 the window size...mapping must power of 2
-physical address of the mapping must be naturally aligned
 with the mapping size
 
  Some of the code is derived from TYPE1 iommu 
  (driver/vfio/vfio_iommu_type1.c).
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   drivers/vfio/Kconfig   |6 +
   drivers/vfio/Makefile  |1 +
   drivers/vfio/vfio_iommu_fsl_pamu.c |  952
 
   include/uapi/linux/vfio.h  |  100 
   4 files changed, 1059 insertions(+), 0 deletions(-)  create mode
  100644 drivers/vfio/vfio_iommu_fsl_pamu.c
 
  diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
  26b3d9d..7d1da26 100644
  --- a/drivers/vfio/Kconfig
  +++ b/drivers/vfio/Kconfig
  @@ -8,11 +8,17 @@ config VFIO_IOMMU_SPAPR_TCE
  depends on VFIO  SPAPR_TCE_IOMMU
  default n
 
  +config VFIO_IOMMU_FSL_PAMU
  +   tristate
  +   depends on VFIO
  +   default n
  +
   menuconfig VFIO
  tristate VFIO Non-Privileged userspace driver framework
  depends on IOMMU_API
  select VFIO_IOMMU_TYPE1 if X86
  select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES)
  +   select VFIO_IOMMU_FSL_PAMU if FSL_PAMU
  help
VFIO provides a framework for secure userspace device drivers.
See Documentation/vfio.txt for more details.
  diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
  c5792ec..7461350 100644
  --- a/drivers/vfio/Makefile
  +++ b/drivers/vfio/Makefile
  @@ -1,4 +1,5 @@
   obj-$(CONFIG_VFIO) += vfio.o
   obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_common.o
  vfio_iommu_type1.o
   obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_common.o
  vfio_iommu_spapr_tce.o
  +obj-$(CONFIG_VFIO_IOMMU_FSL_PAMU) += vfio_iommu_common.o
  +vfio_iommu_fsl_pamu.o
   obj-$(CONFIG_VFIO_PCI) += pci/
  diff --git a/drivers/vfio/vfio_iommu_fsl_pamu.c
  b/drivers/vfio/vfio_iommu_fsl_pamu.c
  new file mode 100644
  index 000..b29365f
  --- /dev/null
  +++ b/drivers/vfio/vfio_iommu_fsl_pamu.c
  @@ -0,0 +1,952 @@
  +/*
  + * VFIO: IOMMU DMA mapping support for FSL PAMU IOMMU
  + *
  + * This program is free software; you can redistribute it and/or
  +modify
  + * it under the terms of the GNU General Public License, version 2,
  +as
  + * published by the Free Software Foundation.
  + *
  + * This program is distributed in the hope that it will be useful,
  + * but WITHOUT ANY WARRANTY; without even the implied warranty of
  + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  + * GNU General Public License for more details.
  + *
  + * You should have received a copy of the GNU General Public License
  + * along with this program; if not, write to the Free Software
  + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, 
  USA.
  + *
  + * Copyright (C) 2013 Freescale Semiconductor, Inc.
  + *
  + * Author: Bharat Bhushan bharat.bhus...@freescale.com
  + *
  + * This file is derived from driver/vfio/vfio_iommu_type1.c
  + *
  + * The Freescale PAMU is an aperture-based IOMMU with the following
  + * characteristics.  Each device has an entry in a table in memory
  + * describing the iova-phys mapping. The mapping has:
  + *  -an overall aperture that is power of 2 sized, and has a start iova 
  that
  + *   is naturally aligned
  + *  -has 1 or more windows within the aperture
  + * -number of windows must be power of 2, max is 256
  + * -size of each window is determined by aperture size / # of windows
  + * -iova of each window is determined by aperture start iova / # of
 windows
  + * -the mapped region in each window can be different than
  + *  the window size...mapping must power

RE: [PATCH v4 1/4] powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 define

2013-09-24 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Dongsheng
 Wang
 Sent: Tuesday, September 24, 2013 2:58 PM
 To: Wood Scott-B07421
 Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534
 Subject: [PATCH v4 1/4] powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 define
 
 From: Wang Dongsheng dongsheng.w...@freescale.com
 
 E6500 PVR and SPRN_PWRMGTCR0 will be used in subsequent pw20/altivec idle
 patches.
 
 Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com
 ---
 *v3:
 Add bit definitions for PWRMGTCR0.
 
  arch/powerpc/include/asm/reg.h   | 2 ++
  arch/powerpc/include/asm/reg_booke.h | 9 +
  2 files changed, 11 insertions(+)
 
 diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
 index 64264bf..d4160ca 100644
 --- a/arch/powerpc/include/asm/reg.h
 +++ b/arch/powerpc/include/asm/reg.h
 @@ -1053,6 +1053,8 @@
  #define PVR_8560 0x8020
  #define PVR_VER_E500V1   0x8020
  #define PVR_VER_E500V2   0x8021
 +#define PVR_VER_E65000x8040
 +
  /*
   * For the 8xx processors, all of them report the same PVR family for
   * the PowerPC core. The various versions of these processors must be diff --
 git a/arch/powerpc/include/asm/reg_booke.h
 b/arch/powerpc/include/asm/reg_booke.h
 index ed8f836..4a6457e 100644
 --- a/arch/powerpc/include/asm/reg_booke.h
 +++ b/arch/powerpc/include/asm/reg_booke.h
 @@ -170,6 +170,7 @@
  #define SPRN_L2CSR1  0x3FA   /* L2 Data Cache Control and Status Register 1
 */
  #define SPRN_DCCR0x3FA   /* Data Cache Cacheability Register */
  #define SPRN_ICCR0x3FB   /* Instruction Cache Cacheability Register */
 +#define SPRN_PWRMGTCR0   0x3FB   /* Power management control register 0 
 */

Is this generic for booke or e6500 specific? I can't see this register either 
in ISA and EREF.
Also I can see SPRN_ICCR also with same SPRN, how that is possible?

-Bharat

  #define SPRN_SVR 0x3FF   /* System Version Register */
 
  /*
 @@ -216,6 +217,14 @@
  #define  CCR1_DPC0x0100 /* Disable L1 I-Cache/D-Cache parity
 checking */
  #define  CCR1_TCS0x0080 /* Timer Clock Select */
 
 +/* Bit definitions for PWRMGTCR0. */
 +#define PWRMGTCR0_PW20_WAIT  (1  14) /* PW20 state enable bit */
 +#define PWRMGTCR0_PW20_ENT_SHIFT 8
 +#define PWRMGTCR0_PW20_ENT   0x3F00
 +#define PWRMGTCR0_AV_IDLE_PD_EN  (1  22) /* Altivec idle 
 enable */
 +#define PWRMGTCR0_AV_IDLE_CNT_SHIFT  16
 +#define PWRMGTCR0_AV_IDLE_CNT0x3F
 +
  /* Bit definitions for the MCSR. */
  #define MCSR_MCS 0x8000 /* Machine Check Summary */
  #define MCSR_IB  0x4000 /* Instruction PLB Error */
 --
 1.8.0
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v4 1/4] powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 define

2013-09-24 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Kumar Gala [mailto:ga...@kernel.crashing.org]
 Sent: Tuesday, September 24, 2013 9:19 PM
 To: Bhushan Bharat-R65777
 Cc: Wang Dongsheng-B40534; Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH v4 1/4] powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0 
 define
 
 
 On Sep 24, 2013, at 6:21 AM, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Linuxppc-dev [mailto:linuxppc-dev-
  bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of
  bounces+Dongsheng
  Wang
  Sent: Tuesday, September 24, 2013 2:58 PM
  To: Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org; Wang Dongsheng-B40534
  Subject: [PATCH v4 1/4] powerpc/fsl: add E6500 PVR and SPRN_PWRMGTCR0
  define
 
  From: Wang Dongsheng dongsheng.w...@freescale.com
 
  E6500 PVR and SPRN_PWRMGTCR0 will be used in subsequent pw20/altivec
  idle patches.
 
  Signed-off-by: Wang Dongsheng dongsheng.w...@freescale.com
  ---
  *v3:
  Add bit definitions for PWRMGTCR0.
 
  arch/powerpc/include/asm/reg.h   | 2 ++
  arch/powerpc/include/asm/reg_booke.h | 9 +
  2 files changed, 11 insertions(+)
 
  diff --git a/arch/powerpc/include/asm/reg.h
  b/arch/powerpc/include/asm/reg.h index 64264bf..d4160ca 100644
  --- a/arch/powerpc/include/asm/reg.h
  +++ b/arch/powerpc/include/asm/reg.h
  @@ -1053,6 +1053,8 @@
  #define PVR_8560   0x8020
  #define PVR_VER_E500V1 0x8020
  #define PVR_VER_E500V2 0x8021
  +#define PVR_VER_E6500 0x8040
  +
  /*
   * For the 8xx processors, all of them report the same PVR family for
   * the PowerPC core. The various versions of these processors must be
  diff -- git a/arch/powerpc/include/asm/reg_booke.h
  b/arch/powerpc/include/asm/reg_booke.h
  index ed8f836..4a6457e 100644
  --- a/arch/powerpc/include/asm/reg_booke.h
  +++ b/arch/powerpc/include/asm/reg_booke.h
  @@ -170,6 +170,7 @@
  #define SPRN_L2CSR10x3FA   /* L2 Data Cache Control and Status 
  Register 1
  */
  #define SPRN_DCCR  0x3FA   /* Data Cache Cacheability Register */
  #define SPRN_ICCR  0x3FB   /* Instruction Cache Cacheability Register */
  +#define SPRN_PWRMGTCR00x3FB   /* Power management control register 0 
  */
 
  Is this generic for booke or e6500 specific? I can't see this register 
  either
 in ISA and EREF.
  Also I can see SPRN_ICCR also with same SPRN, how that is possible?
 
 Its possibly because the register maybe in implementation specific region.  
 I'm
 guessing ICCR is a 40x specific register.

Kumar, this seems to create confusion? Although I do not like so many header 
files but still I think we can have reg_4xx.h, reg_fsl_booke.h etc for 
implementation specific definitions.

-Bharat

 
 - k
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation

2013-09-20 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wood Scott-B07421
 Sent: Friday, September 20, 2013 9:48 PM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; b...@kernel.crashing.org; ag...@suse.de;
 pau...@samba.org; k...@vger.kernel.org; kvm-...@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org
 Subject: Re: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest
 tlb invalidation

 On Thu, 2013-09-19 at 23:19 -0500, Bhushan Bharat-R65777 wrote:

   -Original Message-
   From: Wood Scott-B07421
   Sent: Friday, September 20, 2013 2:38 AM
   To: Bhushan Bharat-R65777
   Cc: b...@kernel.crashing.org; ag...@suse.de; pau...@samba.org;
   k...@vger.kernel.org; kvm-...@vger.kernel.org;
   linuxppc-dev@lists.ozlabs.org; Bhushan Bharat-R65777
   Subject: Re: [PATCH 5/6 v5] kvm: booke: clear host tlb reference
   flag on guest tlb invalidation

   This breaks when you have both E500_TLB_BITMAP and E500_TLB_TLB0 set.

  I do not see any case where we set both E500_TLB_BITMAP and
  E500_TLB_TLB0.

 This would happen if you have a guest TLB1 entry that is backed by some 4K 
 pages
 and some larger pages (e.g. if the guest maps CCSR with one big
 TLB1 and there are varying I/O passthrough regions mapped).  It's not common,
 but it's possible.

Agree

   Also we have not optimized that yet (keeping track of multiple shadow
  TLB0 entries for one guest TLB1 entry)

 This is about correctness, not optimization.

  We uses these bit flags only for TLB1 and if size of stlbe is 4K then
  we set E500_TLB_TLB0  otherwise we set E500_TLB_BITMAP. Although I
  think that E500_TLB_BITMAP should be set only if stlbe size is less
  than gtlbe size.

 Why?  Even if there's only one bit set in the map, we need it to keep track of
 which entry was used.

If there is one entry then will not this be simple/faster to not lookup bitmap 
and guest-host array?
A flag indicate it is 1:1 map and this is physical address.

-Bharat

 -Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation

2013-09-20 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wood Scott-B07421
 Sent: Friday, September 20, 2013 11:38 PM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; b...@kernel.crashing.org; ag...@suse.de;
 pau...@samba.org; k...@vger.kernel.org; kvm-...@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org
 Subject: Re: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest
 tlb invalidation

 On Fri, 2013-09-20 at 13:04 -0500, Bhushan Bharat-R65777 wrote:

   -Original Message-
   From: Wood Scott-B07421
   Sent: Friday, September 20, 2013 9:48 PM
   To: Bhushan Bharat-R65777
   Cc: Wood Scott-B07421; b...@kernel.crashing.org; ag...@suse.de;
   pau...@samba.org; k...@vger.kernel.org; kvm-...@vger.kernel.org;
   linuxppc- d...@lists.ozlabs.org
   Subject: Re: [PATCH 5/6 v5] kvm: booke: clear host tlb reference
   flag on guest tlb invalidation

   On Thu, 2013-09-19 at 23:19 -0500, Bhushan Bharat-R65777 wrote:
We uses these bit flags only for TLB1 and if size of stlbe is 4K
then we set E500_TLB_TLB0  otherwise we set E500_TLB_BITMAP.
Although I think that E500_TLB_BITMAP should be set only if stlbe
size is less than gtlbe size.

   Why?  Even if there's only one bit set in the map, we need it to
   keep track of which entry was used.

  If there is one entry then will not this be simple/faster to not lookup 
  bitmap
 and guest-host array?
  A flag indicate it is 1:1 map and this is physical address.

 The difference would be negligible, and you'd have added overhead (both 
 runtime
 and complexity) of making this a special case.

May be you are right , I will see if I can give a try :)
BTW I have already sent v6 of this patch.

-Bharat

 -Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest tlb invalidation

2013-09-19 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wood Scott-B07421
 Sent: Friday, September 20, 2013 2:38 AM
 To: Bhushan Bharat-R65777
 Cc: b...@kernel.crashing.org; ag...@suse.de; pau...@samba.org;
 k...@vger.kernel.org; kvm-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
 Bhushan Bharat-R65777
 Subject: Re: [PATCH 5/6 v5] kvm: booke: clear host tlb reference flag on guest
 tlb invalidation

 On Thu, 2013-09-19 at 11:32 +0530, Bharat Bhushan wrote:
  On booke, struct tlbe_ref contains host tlb mapping information
  (pfn: for guest-pfn to pfn, flags: attribute associated with this
  mapping) for a guest tlb entry. So when a guest creates a TLB entry
  then struct tlbe_ref is set to point to valid pfn and set
  attributes in flags field of the above said structure. When a guest
  TLB entry is invalidated then flags field of corresponding struct
  tlbe_ref is updated to point that this is no more valid, also we
  selectively clear some other attribute bits, example: if
  E500_TLB_BITMAP was set then we clear E500_TLB_BITMAP, if E500_TLB_TLB0 is 
  set
 then we clear this.

  Ideally we should clear complete flags as this entry is invalid and
  does not have anything to re-used. The other part of the problem is
  that when we use the same entry again then also we do not clear (started 
  doing
 or-ing etc).

  So far it was working because the selectively clearing mentioned above
  actually clears flags what was set during TLB mapping. But the
  problem starts coming when we add more attributes to this then we need
  to selectively clear them and which is not needed.

  This patch we do both
  - Clear flags when invalidating;
  - Clear flags when reusing same entry later

  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  v3- v5
   - New patch (found this issue when doing vfio-pci development)

   arch/powerpc/kvm/e500_mmu_host.c |   12 +++-
   1 files changed, 7 insertions(+), 5 deletions(-)

  diff --git a/arch/powerpc/kvm/e500_mmu_host.c
  b/arch/powerpc/kvm/e500_mmu_host.c
  index 1c6a9d7..60f5a3c 100644
  --- a/arch/powerpc/kvm/e500_mmu_host.c
  +++ b/arch/powerpc/kvm/e500_mmu_host.c
  @@ -217,7 +217,8 @@ void inval_gtlbe_on_host(struct kvmppc_vcpu_e500
 *vcpu_e500, int tlbsel,
  }
  mb();
  vcpu_e500-g2h_tlb1_map[esel] = 0;
  -   ref-flags = ~(E500_TLB_BITMAP | E500_TLB_VALID);
  +   /* Clear flags as TLB is not backed by the host anymore */
  +   ref-flags = 0;
  local_irq_restore(flags);
  }

 This breaks when you have both E500_TLB_BITMAP and E500_TLB_TLB0 set.

I do not see any case where we set both E500_TLB_BITMAP and E500_TLB_TLB0. Also 
we have not optimized that yet (keeping track of multiple shadow TLB0 entries 
for one guest TLB1 entry)

We uses these bit flags only for TLB1 and if size of stlbe is 4K then we set 
E500_TLB_TLB0  otherwise we set E500_TLB_BITMAP. Although I think that 
E500_TLB_BITMAP should be set only if stlbe size is less than gtlbe size.

 Instead, just convert the final E500_TLB_VALID clearing at the end into
 ref-flags = 0, and convert the early return a few lines earlier into
 conditional execution of the tlbil_one().

This looks better, will send the patch shortly.

Thanks
-Bharat

 -Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver

2013-08-19 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Chen Guangyu-B42378
 Sent: Monday, August 19, 2013 11:55 AM
 To: Bhushan Bharat-R65777
 Cc: broo...@kernel.org; l...@metafoo.de; p.za...@pengutronix.de;
 s.ha...@pengutronix.de; mark.rutl...@arm.com; devicet...@vger.kernel.org; 
 alsa-
 de...@alsa-project.org; swar...@wwwdotorg.org; feste...@gmail.com;
 ti...@tabi.org; rob.herr...@calxeda.com; tomasz.f...@gmail.com;
 shawn@linaro.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver

 Hi Bhushan,

I'll revise some as you suggest. Just a few replies here.

 On Mon, Aug 19, 2013 at 12:38:11PM +0800, Bhushan Bharat-R65777 wrote:
   We here suppose the reset bit would be cleared -- The software
   reset will last
   8 cycles. from RM, so if this happened to be a failure, the whole
   IP module won't be normally working as well.

  Also add a comment describing this against why cycle = 1000 is selected.

 If it is done in 8 cycles, 1000-cycle will be surely a safe value for it.
 As long as it finished in 8 cycles, it would quit anyway. Why against?

I am not against, I am saying why it was not 200 or 50 or 20 etc. I am saying 
that write a comment saying this much is sufficient as per specification and so 
keep 1000/etc as preservative.

-Bharat

 +static bool fsl_spdif_volatile_reg(struct device *dev, unsigned
 +int reg) {
 + /* Sync all registers after reset */

Where us sync :) ?

   The return true would do that. For volatile registers, if no return 
   true
   here, the whole regmap would use the value in cache, while for some
   bits we need to trace its true value from the physical registers not from
 cache.

  Where will be device registers cached? Do not we program them to be non-
 cacheable in core?

 regmap has a regcache for all the mapped registers. Set the regsiters as
 volatile will allow the driver to sync the regcache with physical memory each
 time when using regmap_read/write/update_bits().

 But I think I can try to use the regcache_bypass instead.

 Thank you,
 Nicolin Chen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: MPC8315 reboot failure, lockdep splat possibly related?

2013-08-18 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Anthony Foiani [mailto:t...@scrye.com]
 Sent: Sunday, August 18, 2013 5:37 AM
 To: Bhushan Bharat-R65777
 Cc: linuxppc-dev@lists.ozlabs.org
 Subject: Re: MPC8315 reboot failure, lockdep splat possibly related?
 
 Bhushan Bharat-R65777 r65...@freescale.com writes:
 
  You should get rid of this by changing spin_lock/unlock() in
  fsl_sata_set_irq_coalescing() to spin_lock_irqsave/restore()
 
 I can verify that the suggested change removes the lockdep warning.
 The below patch is against 3.9.7 and has been tested on hardware with that
 release.
 
 It applies with slight fuzz to linux-next; I've compile-tested that version, 
 but
 I have not booted that build on the hardware.  The linux-next patch can be 
 found
 here:
 
   
 http://scrye.com/~tkil/linux/fsl-sata-lockdep-201308/next-sata-fsl-save-irqs-
 while-coalescing.patch
   (or: http://preview.tinyurl.com/mpd4e9h )

Anthony, I would prefer if you can send the patch (In case not then let me know)

Thanks
-Bharat

 
 Unfortunately, the hang on reboot was not easily repeatable; I'll report 
 whether
 it happens in the next few days or not.


 
 Thanks again,
 Anthony Foiani
 
 -- 8 --
 
 From 2abb6df770c95eb4103476c70847a78f816fe5e3 Mon Sep 17 00:00:00 2001
 From: Anthony Foiani anthony.foi...@gmail.com
 Date: Sat, 17 Aug 2013 13:28:17 -0600
 Subject: [PATCH] sata: fsl: save irqs while coalescing
 
 Before this patch, I was seeing the following lockdep splat on my
 MPC8315 (PPC32) target:
 
   [9.086051] =
   [9.090393] [ INFO: inconsistent lock state ]
   [9.094744] 3.9.7-ajf-gc39503d #1 Not tainted
   [9.099087] -
   [9.103432] inconsistent {HARDIRQ-ON-W} - {IN-HARDIRQ-W} usage.
   [9.109431] scsi_eh_1/39 [HC1[1]:SC0[0]:HE0:SE1] takes:
   [9.114642]  ((host-lock)-rlock){?.+...}, at: [c02f4168]
 sata_fsl_interrupt+0x50/0x250
   [9.123137] {HARDIRQ-ON-W} state was registered at:
   [9.128004]   [c006cdb8] lock_acquire+0x90/0xf4
   [9.132737]   [c043ef04] _raw_spin_lock+0x34/0x4c
   [9.137645]   [c02f3560] fsl_sata_set_irq_coalescing+0x68/0x100
   [9.143750]   [c02f36a0] sata_fsl_init_controller+0xa8/0xc0
   [9.149505]   [c02f3f10] sata_fsl_probe+0x17c/0x2e8
   [9.154568]   [c02acc90] driver_probe_device+0x90/0x248
   [9.159987]   [c02acf0c] __driver_attach+0xc4/0xc8
   [9.164964]   [c02aae74] bus_for_each_dev+0x5c/0xa8
   [9.170028]   [c02ac218] bus_add_driver+0x100/0x26c
   [9.175091]   [c02ad638] driver_register+0x88/0x198
   [9.180155]   [c0003a24] do_one_initcall+0x58/0x1b4
   [9.185226]   [c05aeeac] kernel_init_freeable+0x118/0x1c0
   [9.190823]   [c0004110] kernel_init+0x18/0x108
   [9.195542]   [c000f6b8] ret_from_kernel_thread+0x64/0x6c
   [9.201142] irq event stamp: 160
   [9.204366] hardirqs last  enabled at (159): [c043f778]
 _raw_spin_unlock_irq+0x30/0x50
   [9.212469] hardirqs last disabled at (160): [c000f414]
 reenable_mmu+0x30/0x88
   [9.219867] softirqs last  enabled at (144): [c002ae5c]
 __do_softirq+0x168/0x218
   [9.227435] softirqs last disabled at (137): [c002b0d4]
 irq_exit+0xa8/0xb4
   [9.234481]
   [9.234481] other info that might help us debug this:
   [9.240995]  Possible unsafe locking scenario:
   [9.240995]
   [9.246898]CPU0
   [9.249337]
   [9.251776]   lock((host-lock)-rlock);
   [9.255878]   Interrupt
   [9.258492] lock((host-lock)-rlock);
   [9.262765]
   [9.262765]  *** DEADLOCK ***
   [9.262765]
   [9.268684] no locks held by scsi_eh_1/39.
   [9.272767]
   [9.272767] stack backtrace:
   [9.277117] Call Trace:
   [9.279589] [cfff9da0] [c0008504] show_stack+0x48/0x150 (unreliable)
   [9.285972] [cfff9de0] [c0447d5c] print_usage_bug.part.35+0x268/0x27c
   [9.292425] [cfff9e10] [c006ace4] mark_lock+0x2ac/0x658
   [9.297660] [cfff9e40] [c006b7e4] __lock_acquire+0x754/0x1840
   [9.303414] [cfff9ee0] [c006cdb8] lock_acquire+0x90/0xf4
   [9.308745] [cfff9f20] [c043ef04] _raw_spin_lock+0x34/0x4c
   [9.314250] [cfff9f30] [c02f4168] sata_fsl_interrupt+0x50/0x250
   [9.320187] [cfff9f70] [c0079ff0] handle_irq_event_percpu+0x90/0x254
   [9.326547] [cfff9fc0] [c007a1fc] handle_irq_event+0x48/0x78
   [9.332220] [cfff9fe0] [c007c95c] handle_level_irq+0x9c/0x104
   [9.337981] [cfff9ff0] [c000d978] call_handle_irq+0x18/0x28
   [9.343568] [cc7139f0] [c000608c] do_IRQ+0xf0/0x1a8
   [9.348464] [cc713a20] [c000fc8c] ret_from_except+0x0/0x14
   [9.353983] --- Exception: 501 at _raw_spin_unlock_irq+0x40/0x50
   [9.353983] LR = _raw_spin_unlock_irq+0x30/0x50
   [9.364839] [cc713af0] [c043db10] wait_for_common+0xac/0x188
   [9.370513] [cc713b30] [c02ddee4] ata_exec_internal_sg+0x2b0/0x4f0
   [9.376699] [cc713be0] [c02de18c] ata_exec_internal+0x68/0xa8
   [9.382454] [cc713c20

RE: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver

2013-08-18 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Chen Guangyu-B42378
 Sent: Monday, August 19, 2013 8:38 AM
 To: Bhushan Bharat-R65777
 Cc: broo...@kernel.org; l...@metafoo.de; p.za...@pengutronix.de;
 s.ha...@pengutronix.de; mark.rutl...@arm.com; devicet...@vger.kernel.org; 
 alsa-
 de...@alsa-project.org; swar...@wwwdotorg.org; feste...@gmail.com;
 ti...@tabi.org; rob.herr...@calxeda.com; tomasz.f...@gmail.com;
 shawn@linaro.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver

 Hi Bhushan,

Thank you for the comments :)
I'll fix some in v7.

Here is my some replies to you.

 On Sat, Aug 17, 2013 at 02:24:19AM +0800, Bhushan Bharat-R65777 wrote:
   This patch add S/PDIF controller driver for Freescale SoC.

  Please give some more description of the driver?

 I've referred some ASoC drivers, all of them seem to be brief as mine.
 So I'm not sure what else information I should provide here. It's already 
 kinda
 okay to me.

Other does not have description does not mean we also should not add 
description here.
Please describe in few lines about this driver and devices it handles?

   +struct spdif_mixer_control {
   + /* buffer ptrs for writer */
   + u32 upos;
   + u32 qpos;

  They does not look like pointer?

 They are more like offsets to get the correspond pointer.
 But I'll change the confusing comments.

   +/* U/Q Channel receive register full */ static void
   +spdif_irq_uqrx_full(struct fsl_spdif_priv *spdif_priv, char name) {
   + struct spdif_mixer_control *ctrl = spdif_priv-fsl_spdif_control;
   + struct regmap *regmap = spdif_priv-regmap;
   + struct platform_device *pdev = spdif_priv-pdev;
   + u32 *pos, size, val, reg;
   +
   + switch (name) {
   + case 'U':
   + pos = ctrl-upos;
   + size = SPDIF_UBITS_SIZE;
   + reg = REG_SPDIF_SRU;
   + break;
   + case 'Q':
   + pos = ctrl-qpos;
   + size = SPDIF_QSUB_SIZE;
   + reg = REG_SPDIF_SRQ;
   + break;
   + default:
   + return;

  Should return error.

 IMHO, this should be fine. It's a void type function and being used in the
 isr(). The params 'name' is totally controlled by driver itself, so basically 
 we
 don't need to worry about the default path.

Silently returning on potential error is bad. At least add a printk/BUGON or 
something similar which points that some unexpected parameter is passed.

   + if (*pos = size * 2) {
   + *pos = 0;
   + } else if (unlikely((*pos % size) + 3  size)) {
   + dev_err(pdev-dev, User bit receivce buffer overflow\n);
   + return;

  Should return error.

 Ditto, it's being used in isr(), we don't need to detect the return value, 
 just
 use dev_err() to warn users and let the driver clear the irq.

Same as above

   +/* U/Q Channel framing error */
   +static void spdif_irq_uq_err(struct fsl_spdif_priv *spdif_priv) {
   + struct spdif_mixer_control *ctrl = spdif_priv-fsl_spdif_control;
   + struct regmap *regmap = spdif_priv-regmap;
   + struct platform_device *pdev = spdif_priv-pdev;
   + u32 val;
   +
   + dev_dbg(pdev-dev, isr: U/Q Channel framing error\n);
   +
   + /* read U/Q data and do buffer reset */
   + regmap_read(regmap, REG_SPDIF_SRU, val);
   + regmap_read(regmap, REG_SPDIF_SRQ, val);

  Above prints says read u/q data and buffer reset, what is buffer reset? Is
 that read on clear?

 That's the behavior needed by IP, according to the reference manual:
 U Channel receive register full, can't be cleared with reg. IntClear.
 To clear it, read from U Rx reg. and Q Channel receive register full, can't 
 be
 cleared with reg. IntClear. To clear it, read from Q Rx reg.

Then please add this behavior in comment.

   +static void spdif_softreset(struct fsl_spdif_priv *spdif_priv) {
   + struct regmap *regmap = spdif_priv-regmap;
   + u32 val, cycle = 1000;
   +
   + regmap_write(regmap, REG_SPDIF_SCR, SCR_SOFT_RESET);
   + regcache_sync(regmap);
   +
   + /* RESET bit would be cleared after finishing its reset procedure */
   + do {
   + regmap_read(regmap, REG_SPDIF_SCR, val);
   + } while ((val  SCR_SOFT_RESET)  cycle--);

  What if reset is not cleared and timeout happen?

 We here suppose the reset bit would be cleared -- The software reset will 
 last
 8 cycles. from RM, so if this happened to be a failure, the whole IP module
 won't be normally working as well.

Also add a comment describing this against why cycle = 1000 is selected.

 Well, but I don't mind to put here an extra failed return to make it clear.

   +static u8 reverse_bits(u8 input)
   +{
   + u8 tmp = input;
   +
   + tmp = ((tmp  0b10101010)  1) | ((tmp  1)  0b10101010);
   + tmp = ((tmp  0b11001100)  2) | ((tmp  2)  0b11001100);
   + tmp = ((tmp  0b)  4) | ((tmp  4)  0b);

  What is this logic, can the hardcoding be removed and some description on
 above calculation?

 This was provided by Philipp Zabel in his

RE: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver

2013-08-16 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Nicolin 
 Chen
 Sent: Friday, August 16, 2013 6:27 PM
 To: broo...@kernel.org; l...@metafoo.de; p.za...@pengutronix.de;
 s.ha...@pengutronix.de
 Cc: mark.rutl...@arm.com; devicet...@vger.kernel.org; alsa-devel@alsa-
 project.org; swar...@wwwdotorg.org; feste...@gmail.com; ti...@tabi.org;
 rob.herr...@calxeda.com; tomasz.f...@gmail.com; shawn@linaro.org; 
 linuxppc-
 d...@lists.ozlabs.org
 Subject: [PATCH v6 1/2] ASoC: fsl: Add S/PDIF CPU DAI driver
 
 This patch add S/PDIF controller driver for Freescale SoC.

Please give some more description of the driver?

 
 Signed-off-by: Nicolin Chen b42...@freescale.com
 ---
  .../devicetree/bindings/sound/fsl,spdif.txt|   56 +
  sound/soc/fsl/Kconfig  |3 +
  sound/soc/fsl/Makefile |2 +
  sound/soc/fsl/fsl_spdif.c  | 1272 
 
  sound/soc/fsl/fsl_spdif.h  |  224 
  5 files changed, 1557 insertions(+), 0 deletions(-)
  create mode 100644 Documentation/devicetree/bindings/sound/fsl,spdif.txt
  create mode 100644 sound/soc/fsl/fsl_spdif.c
  create mode 100644 sound/soc/fsl/fsl_spdif.h
 
 diff --git a/Documentation/devicetree/bindings/sound/fsl,spdif.txt
 b/Documentation/devicetree/bindings/sound/fsl,spdif.txt
 new file mode 100644
 index 000..5549ce3
 --- /dev/null
 +++ b/Documentation/devicetree/bindings/sound/fsl,spdif.txt
 @@ -0,0 +1,56 @@
 +Freescale Sony/Philips Digital Interface Format (S/PDIF) Controller
 +
 +The Freescale S/PDIF audio block is a stereo transceiver that allows the
 +processor to receive and transmit digital audio via an coaxial cable or
 +a fibre cable.
 +
 +Required properties:
 +
 +  - compatible : Compatible list, contains fsl,chip-spdif.
 +
 +  - reg : Offset and length of the register set for the device.
 +
 +  - interrupts : Contains spdif interrupt.
 +
 +  - dmas : Generic dma devicetree binding as described in
 +  Documentation/devicetree/bindings/dma/dma.txt.
 +
 +  - dma-names : Two dmas have to be defined, tx and rx.
 +
 +  - clocks : Contains an entry for each entry in clock-names.
 +
 +  - clock-names : Includes the following entries:
 + namecomments
 + core  The core clock of spdif controller
 + rxtx0-7 Clock source list for tx and rx clock.
 + This clock list should be identical to
 + the source list connecting to the spdif
 + clock mux in SPDIF Transceiver Clock
 + Diagram of SoC reference manual. It
 + can also be referred to TxClk_Source
 + bit of register SPDIF_STC.
 +
 +Example:
 +
 +spdif: spdif@02004000 {
 + compatible = fsl,imx6q-spdif,
 + fsl,imx35-spdif;
 + reg = 0x02004000 0x4000;
 + interrupts = 0 52 0x04;
 + dmas = sdma 14 18 0,
 +sdma 15 18 0;
 + dma-names = rx, tx;
 +
 + clocks = clks 197, clks 3,
 +clks 197, clks 107,
 +clks 0, clks 118,
 +clks 62, clks 139,
 +clks 0;
 + clock-names = core, rxtx0,
 + rxtx1, rxtx2,
 + rxtx3, rxtx4,
 + rxtx5, rxtx6,
 + rxtx7;
 +
 + status = okay;
 +};
 diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig
 index e15f771..2c518db 100644
 --- a/sound/soc/fsl/Kconfig
 +++ b/sound/soc/fsl/Kconfig
 @@ -1,6 +1,9 @@
  config SND_SOC_FSL_SSI
   tristate
 
 +config SND_SOC_FSL_SPDIF
 + tristate
 +
  config SND_SOC_FSL_UTILS
   tristate
 
 diff --git a/sound/soc/fsl/Makefile b/sound/soc/fsl/Makefile
 index d4b4aa8..4b5970e 100644
 --- a/sound/soc/fsl/Makefile
 +++ b/sound/soc/fsl/Makefile
 @@ -12,9 +12,11 @@ obj-$(CONFIG_SND_SOC_P1022_RDK) += snd-soc-p1022-rdk.o
 
  # Freescale PowerPC SSI/DMA Platform Support
  snd-soc-fsl-ssi-objs := fsl_ssi.o
 +snd-soc-fsl-spdif-objs := fsl_spdif.o
  snd-soc-fsl-utils-objs := fsl_utils.o
  snd-soc-fsl-dma-objs := fsl_dma.o
  obj-$(CONFIG_SND_SOC_FSL_SSI) += snd-soc-fsl-ssi.o
 +obj-$(CONFIG_SND_SOC_FSL_SPDIF) += snd-soc-fsl-spdif.o
  obj-$(CONFIG_SND_SOC_FSL_UTILS) += snd-soc-fsl-utils.o
  obj-$(CONFIG_SND_SOC_POWERPC_DMA) += snd-soc-fsl-dma.o
 
 diff --git a/sound/soc/fsl/fsl_spdif.c b/sound/soc/fsl/fsl_spdif.c
 new file mode 100644
 index 000..e00125e
 --- /dev/null
 +++ b/sound/soc/fsl/fsl_spdif.c
 @@ -0,0 +1,1272 @@
 +/*
 + * Freescale S/PDIF ALSA SoC Digital Audio Interface (DAI) driver
 + *
 + * Copyright (C) 2013 Freescale Semiconductor, Inc.
 + *
 + * Based on stmp3xxx_spdif_dai.c
 + * Vladimir Barinov vbari...@embeddedalley.com
 + * Copyright 2008 SigmaTel, Inc
 + * Copyright 2008 Embedded Alley Solutions, Inc
 + *
 + * This file is licensed under the terms of the GNU General Public License
 + * version 2.  This program  is licensed as is

RE: MPC8315 reboot failure, lockdep splat possibly related?

2013-08-16 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Anthony
 Foiani
 Sent: Saturday, August 17, 2013 7:10 AM
 To: linuxppc-dev@lists.ozlabs.org
 Subject: MPC8315 reboot failure, lockdep splat possibly related?
 
 
 Greetings.
 
 I've been experiencing occasional lockups at reboot for a few weeks, but only
 once every 10-20 boots.  A good reboot looks like this:
 
   [47529.721640] lm77 0-0048: shutdown
   [47529.725160] rtc-m41t80 0-0068: shutdown
   [47529.729169] i2c i2c-0: shutdown
   [47529.732534] fsl-ehci fsl-ehci.0: shutdown
   [47529.736842] sd 1:0:0:0: shutdown
   [47529.740239] sd 1:0:0:0: [sda] Synchronizing SCSI cache
   [47529.747091] uio_pci_generic :00:0a.0: shutdown
   [47529.752079] pci :00:00.0: shutdown
   [47529.756021] Restarting system.
 
 While a bad one fails after the EHCI shutdown:
 
   [  747.578001] lm77 0-0048: shutdown
   [  747.581522] rtc-m41t80 0-0068: shutdown
   [  747.585538] i2c i2c-0: shutdown
   [  747.588909] sd 1:0:0:0: shutdown
   [  747.592304] sd 1:0:0:0: [sda] Synchronizing SCSI cache
   [  747.597973] fsl-ehci fsl-ehci.0: shutdown
 
 I enabled lockdep, and I get this splat on every boot, regardless of whether 
 it
 locks up at reboot or not.  Could it possibly be related?
 Any other ideas on how to avoid the reboot lockup?
 
   [9.086051] =
   [9.090393] [ INFO: inconsistent lock state ]
   [9.094744] 3.9.7-ajf-gc39503d #1 Not tainted
   [9.099087] -
   [9.103432] inconsistent {HARDIRQ-ON-W} - {IN-HARDIRQ-W} usage.
   [9.109431] scsi_eh_1/39 [HC1[1]:SC0[0]:HE0:SE1] takes:
   [9.114642]  ((host-lock)-rlock){?.+...}, at: [c02f4168]
 sata_fsl_interrupt+0x50/0x250
   [9.123137] {HARDIRQ-ON-W} state was registered at:
   [9.128004]   [c006cdb8] lock_acquire+0x90/0xf4
   [9.132737]   [c043ef04] _raw_spin_lock+0x34/0x4c
   [9.137645]   [c02f3560] fsl_sata_set_irq_coalescing+0x68/0x100
   [9.143750]   [c02f36a0] sata_fsl_init_controller+0xa8/0xc0
   [9.149505]   [c02f3f10] sata_fsl_probe+0x17c/0x2e8
   [9.154568]   [c02acc90] driver_probe_device+0x90/0x248
   [9.159987]   [c02acf0c] __driver_attach+0xc4/0xc8
   [9.164964]   [c02aae74] bus_for_each_dev+0x5c/0xa8
   [9.170028]   [c02ac218] bus_add_driver+0x100/0x26c
   [9.175091]   [c02ad638] driver_register+0x88/0x198
   [9.180155]   [c0003a24] do_one_initcall+0x58/0x1b4
   [9.185226]   [c05aeeac] kernel_init_freeable+0x118/0x1c0
   [9.190823]   [c0004110] kernel_init+0x18/0x108
   [9.195542]   [c000f6b8] ret_from_kernel_thread+0x64/0x6c
   [9.201142] irq event stamp: 160
   [9.204366] hardirqs last  enabled at (159): [c043f778]
 _raw_spin_unlock_irq+0x30/0x50
   [9.212469] hardirqs last disabled at (160): [c000f414]
 reenable_mmu+0x30/0x88
   [9.219867] softirqs last  enabled at (144): [c002ae5c]
 __do_softirq+0x168/0x218
   [9.227435] softirqs last disabled at (137): [c002b0d4]
 irq_exit+0xa8/0xb4
   [9.234481]
   [9.234481] other info that might help us debug this:
   [9.240995]  Possible unsafe locking scenario:
   [9.240995]
   [9.246898]CPU0
   [9.249337]
   [9.251776]   lock((host-lock)-rlock);
   [9.255878]   Interrupt
   [9.258492] lock((host-lock)-rlock);
   [9.262765]
   [9.262765]  *** DEADLOCK ***

You should get rid of this by changing spin_lock/unlock() in 
fsl_sata_set_irq_coalescing() to spin_lock_irqsave/restore()

-Bharat


   [9.262765]
   [9.268684] no locks held by scsi_eh_1/39.
   [9.272767]
   [9.272767] stack backtrace:
   [9.277117] Call Trace:
   [9.279589] [cfff9da0] [c0008504] show_stack+0x48/0x150 (unreliable)
   [9.285972] [cfff9de0] [c0447d5c] print_usage_bug.part.35+0x268/0x27c
   [9.292425] [cfff9e10] [c006ace4] mark_lock+0x2ac/0x658
   [9.297660] [cfff9e40] [c006b7e4] __lock_acquire+0x754/0x1840
   [9.303414] [cfff9ee0] [c006cdb8] lock_acquire+0x90/0xf4
   [9.308745] [cfff9f20] [c043ef04] _raw_spin_lock+0x34/0x4c
   [9.314250] [cfff9f30] [c02f4168] sata_fsl_interrupt+0x50/0x250
   [9.320187] [cfff9f70] [c0079ff0] handle_irq_event_percpu+0x90/0x254
   [9.326547] [cfff9fc0] [c007a1fc] handle_irq_event+0x48/0x78
   [9.332220] [cfff9fe0] [c007c95c] handle_level_irq+0x9c/0x104
   [9.337981] [cfff9ff0] [c000d978] call_handle_irq+0x18/0x28
   [9.343568] [cc7139f0] [c000608c] do_IRQ+0xf0/0x1a8
   [9.348464] [cc713a20] [c000fc8c] ret_from_except+0x0/0x14
   [9.353983] --- Exception: 501 at _raw_spin_unlock_irq+0x40/0x50
   [9.353983] LR = _raw_spin_unlock_irq+0x30/0x50
   [9.364839] [cc713af0] [c043db10] wait_for_common+0xac/0x188
   [9.370513] [cc713b30] [c02ddee4] ata_exec_internal_sg+0x2b0/0x4f0
   [9.376699] [cc713be0] [c02de18c] ata_exec_internal+0x68/0xa8
   [

RE: [PATCH] KVM: PPC: POWERNV: move iommu_add_device earlier

2013-08-14 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Alexey
 Kardashevskiy
 Sent: Wednesday, August 14, 2013 2:55 PM
 To: linuxppc-dev@lists.ozlabs.org
 Cc: Alexey Kardashevskiy; Paul Mackerras; linux-ker...@vger.kernel.org
 Subject: [PATCH] KVM: PPC: POWERNV: move iommu_add_device earlier
 
 The current implementation of IOMMU on sPAPR does not use iommu_ops
 and therefore does not call IOMMU API's bus_set_iommu() which
 1) sets iommu_ops for a bus
 2) registers a bus notifier
 Instead, PCI devices are added to IOMMU groups from
 subsys_initcall_sync(tce_iommu_init) which does basically the same
 thing without using iommu_ops callbacks.
 
 However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158)
 implements iommu_ops and when tce_iommu_init is called, every PCI device
 is already added to some group so there is a conflict.
 
 This patch does 2 things:
 1. removes the loop in which PCI devices were added to groups and
 adds explicit iommu_add_device() calls to add devices as soon as they get
 the iommu_table pointer assigned to them.
 2. moves a bus notifier to powernv code in order to avoid conflict with
 the notifier from Freescale driver.
 
 iommu_add_device() and iommu_del_device() are public now.

This works for me (able to boot Linux, as expected) :-)
But a question, why not move arch/powerpc/kernel/iommu.c in platform/ ? or use 
this for book3s or not_book3e only?

Thanks
-Bharat

 
 Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
 ---
  arch/powerpc/include/asm/iommu.h|  2 ++
  arch/powerpc/kernel/iommu.c | 41 
 +++--
  arch/powerpc/platforms/powernv/pci-ioda.c   | 12 ++---
  arch/powerpc/platforms/powernv/pci-p5ioc2.c |  1 +
  arch/powerpc/platforms/powernv/pci.c| 31 ++
  arch/powerpc/platforms/pseries/iommu.c  |  7 +++--
  6 files changed, 51 insertions(+), 43 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/iommu.h 
 b/arch/powerpc/include/asm/iommu.h
 index c34656a..ba74329 100644
 --- a/arch/powerpc/include/asm/iommu.h
 +++ b/arch/powerpc/include/asm/iommu.h
 @@ -103,6 +103,8 @@ extern struct iommu_table *iommu_init_table(struct
 iommu_table * tbl,
   int nid);
  extern void iommu_register_group(struct iommu_table *tbl,
int pci_domain_number, unsigned long pe_num);
 +extern int iommu_add_device(struct device *dev);
 +extern void iommu_del_device(struct device *dev);
 
  extern int iommu_map_sg(struct device *dev, struct iommu_table *tbl,
   struct scatterlist *sglist, int nelems,
 diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
 index b20ff17..15f8ca8 100644
 --- a/arch/powerpc/kernel/iommu.c
 +++ b/arch/powerpc/kernel/iommu.c
 @@ -1105,7 +1105,7 @@ void iommu_release_ownership(struct iommu_table *tbl)
  }
  EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
 -static int iommu_add_device(struct device *dev)
 +int iommu_add_device(struct device *dev)
  {
   struct iommu_table *tbl;
   int ret = 0;
 @@ -1134,46 +1134,13 @@ static int iommu_add_device(struct device *dev)
 
   return ret;
  }
 +EXPORT_SYMBOL_GPL(iommu_add_device);
 
 -static void iommu_del_device(struct device *dev)
 +void iommu_del_device(struct device *dev)
  {
   iommu_group_remove_device(dev);
  }
 -
 -static int iommu_bus_notifier(struct notifier_block *nb,
 -   unsigned long action, void *data)
 -{
 - struct device *dev = data;
 -
 - switch (action) {
 - case BUS_NOTIFY_ADD_DEVICE:
 - return iommu_add_device(dev);
 - case BUS_NOTIFY_DEL_DEVICE:
 - iommu_del_device(dev);
 - return 0;
 - default:
 - return 0;
 - }
 -}
 -
 -static struct notifier_block tce_iommu_bus_nb = {
 - .notifier_call = iommu_bus_notifier,
 -};
 -
 -static int __init tce_iommu_init(void)
 -{
 - struct pci_dev *pdev = NULL;
 -
 - BUILD_BUG_ON(PAGE_SIZE  IOMMU_PAGE_SIZE);
 -
 - for_each_pci_dev(pdev)
 - iommu_add_device(pdev-dev);
 -
 - bus_register_notifier(pci_bus_type, tce_iommu_bus_nb);
 - return 0;
 -}
 -
 -subsys_initcall_sync(tce_iommu_init);
 +EXPORT_SYMBOL_GPL(iommu_del_device);
 
  #else
 
 diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
 b/arch/powerpc/platforms/powernv/pci-ioda.c
 index d8140b1..a9f8fef 100644
 --- a/arch/powerpc/platforms/powernv/pci-ioda.c
 +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
 @@ -441,6 +441,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb 
 *phb,
 struct pci_dev *pdev
 
   pe = phb-ioda.pe_array[pdn-pe_number];
   set_iommu_table_base(pdev-dev, pe-tce32_table);
 + iommu_add_device(pdev-dev);
  }
 
  static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus 
 *bus)
 @@ -449,6 +450,7 @@ static void pnv_ioda_setup_bus_dma(struct

RE: Powerpc: Kernel warn_on when enabling IOMMU_API

2013-08-13 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru]
 Sent: Tuesday, August 13, 2013 5:41 AM
 To: Bhushan Bharat-R65777
 Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API

 On 08/13/2013 02:14 AM, Bhushan Bharat-R65777 wrote:

  -Original Message-
  From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru]
  Sent: Monday, August 12, 2013 7:44 PM
  To: Bhushan Bharat-R65777
  Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org
  Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API

  On 08/12/2013 08:20 PM, Bhushan Bharat-R65777 wrote:
  And this simple fix work for me
  diff --git a/arch/powerpc/kernel/iommu.c
  b/arch/powerpc/kernel/iommu.c index b20ff17..8869b0d 100644
  --- a/arch/powerpc/kernel/iommu.c
  +++ b/arch/powerpc/kernel/iommu.c
  @@ -48,6 +48,8 @@
   #include asm/vio.h
   #include asm/tce.h

  +#define DEBUG
  +
   #define DBG(...)

   static int novmerge;
  @@ -871,7 +873,7 @@ void iommu_free_coherent(struct iommu_table
  *tbl, size_t
  size,
  }
   }

  -#ifdef CONFIG_IOMMU_API
  +#ifdef SPAPR_TCE_IOMMU
   /*
* SPAPR TCE API
*/
  --

  And with this fix, what does ls -laR /sys/kernel/iommu_groups/ print?

  It shows the list of group id and respective devices:

 Is it vanilla 3.11-rc1 kernel? Wow. What does lspci show there?

It is 3.11-rc1 + (FSL_IOMMU + VFIO-PCI : local changes).

root@p5040ds:~# lspci
00:00.0 Class 0604: 1957:0450
01:00.0 Class 0200: 8086:10fb
00:00.0 Class 0604: 1957:0450
01:00.0 Class 0200: 8086:10d3

We uses the bus_set_iommu(), generic iommu api, which creates a iommu_group for 
a device (drivers/iommu/iommu.c) using. Also this have notifier to support 
hotplug-able device.
So when this initcall (in arch/powerpc/kernel/iommu.c) is called, iommu group 
is already setup for the device/s.

I think we do not need this piece of code for powerpc.
So what is the best way to stub this out for FSL PowerPC/IOMMU?

Will the above #ifdef SPAPR_TCE_IOMMU work?
Other way can be selecting iommu.c and dma-iommu.c in Makefile if 
SPAPR_TCE_IOMMU defined and not if CONFIG_64BIT.

-Bharat

 --
 Alexey

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: Powerpc: Kernel warn_on when enabling IOMMU_API

2013-08-13 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru]
 Sent: Tuesday, August 13, 2013 6:25 PM
 To: Bhushan Bharat-R65777
 Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API

 On 08/13/2013 08:44 PM, Bhushan Bharat-R65777 wrote:

  -Original Message- From: Alexey Kardashevskiy
  [mailto:a...@ozlabs.ru] Sent: Tuesday, August 13, 2013 5:41 AM To:
  Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org;
  linuxppc-dev@lists.ozlabs.org Subject: Re: Powerpc: Kernel warn_on
  when enabling IOMMU_API

  On 08/13/2013 02:14 AM, Bhushan Bharat-R65777 wrote:

  -Original Message- From: Alexey Kardashevskiy
  [mailto:a...@ozlabs.ru] Sent: Monday, August 12, 2013 7:44 PM To:
  Bhushan Bharat-R65777 Cc: b...@kernel.crashing.org;
  linuxppc-dev@lists.ozlabs.org Subject: Re: Powerpc: Kernel warn_on
  when enabling IOMMU_API

  On 08/12/2013 08:20 PM, Bhushan Bharat-R65777 wrote:
  And this simple fix work for me diff --git
  a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index
  b20ff17..8869b0d 100644 --- a/arch/powerpc/kernel/iommu.c
  +++ b/arch/powerpc/kernel/iommu.c @@ -48,6 +48,8 @@ #include
  asm/vio.h #include asm/tce.h

  +#define DEBUG + #define DBG(...)

  static int novmerge; @@ -871,7 +873,7 @@ void
  iommu_free_coherent(struct iommu_table *tbl, size_t
  size,
  } }

  -#ifdef CONFIG_IOMMU_API +#ifdef SPAPR_TCE_IOMMU /* * SPAPR TCE
  API */ --

  And with this fix, what does ls -laR /sys/kernel/iommu_groups/
  print?

  It shows the list of group id and respective devices:

  Is it vanilla 3.11-rc1 kernel? Wow. What does lspci show there?

  It is 3.11-rc1 + (FSL_IOMMU + VFIO-PCI : local changes).

  root@p5040ds:~# lspci
 00:00.0 Class 0604: 1957:0450
 01:00.0 Class 0200: 8086:10fb
 00:00.0 Class 0604: 1957:0450
 01:00.0 Class 0200: 8086:10d3

 Is it one PCI domain or two PCI domains? Hm.

  We uses the bus_set_iommu(), generic iommu api, which creates a
  iommu_group for a device (drivers/iommu/iommu.c) using. Also this have
  notifier to support hotplug-able device. So when this initcall (in
  arch/powerpc/kernel/iommu.c) is called, iommu group is already setup
  for the device/s.

  I think we do not need this piece of code for powerpc. So what is the
  best way to stub this out for FSL PowerPC/IOMMU?

 So you implemented iommu_ops? Can you share your code somewhere, just to have 
 a
 look?

https://lkml.org/lkml/2013/7/1/158

  Will the above #ifdef SPAPR_TCE_IOMMU work? Other way can be selecting
  iommu.c and dma-iommu.c in Makefile if SPAPR_TCE_IOMMU defined and not
  if CONFIG_64BIT.

 If SPAPR_TCE_IOMMU is enabled, the code would compile and the subsys_init 
 would
 be called anyway, so normal production kernel will fail anyway.

We will not enable this on FSL powerpc,

-Bharat

 --
 Alexey

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Powerpc: Kernel warn_on when enabling IOMMU_API

2013-08-12 Thread Bhushan Bharat-R65777

Hi Alexey/Ben,

When I enable the IOMMU_API then I get warn_on in arch/powerpc/kernel/iommu.c 
(here is the code snapshot)
{
1110 static int iommu_add_device(struct device *dev)
 {
1112 struct iommu_table *tbl;
1113 int ret = 0;
1114 
1115 if (WARN_ON(dev-iommu_group)) {


This is the point is Warn_on.

1116 pr_warn(iommu_tce: device %s is already in iommu group 
%d, skipping\n,
1117 dev_name(dev),
1118 iommu_group_id(dev-iommu_group));
1119 return -EBUSY;
1120 }
}


---This is the bootlog with #define DEBUG in 
iommu.c-

Using P5040 DS machine description
MMU: Supported page sizes
 4 KB as direct
  4096 KB as direct
 16384 KB as direct
 65536 KB as direct
262144 KB as direct
   1048576 KB as direct
MMU: Book3E HW tablewalk not supported
Found initrd at 0xc0002b759000:0xc00024ab
bootconsole [udbg0] enabled
CPU maps initialized for 1 thread per core
Starting Linux PPC64 #16 SMP Mon Aug 12 15:22:11 IST 2013
-
ppc64_pft_size= 0x0
physicalMemorySize= 0x2
ppc64_caches.dcache_line_size = 0x40
ppc64_caches.icache_line_size = 0x40
-
Linux version 3.11.0-rc1-10505-g8d33668-dirty (r65777@perfidc-01) (gcc version 
4.5.1 (Sourcery G++ Lite 2010.09-55) ) #16 SMP Mon Aug 12 15:22:11 IST 2013
CF12

Setup Arch
[boot]0012 Setup Arch
P5040 DS board from Freescale Semiconductor
Zone ranges:
  DMA  [mem 0x-0x1]
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x-0x1]
MMU: Allocated 2112 bytes of context maps for 255 contexts
CF15

Setup Done
[boot]0015 Setup Done
PERCPU: Embedded 10 pages/cpu @cb10 s11200 r0 d29760 u262144
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 2068480
Kernel command line: console=ttyS0,115200 ramdisk_size=1000 root=/dev/ram rw
PID hash table entries: 4096 (order: 3, 32768 bytes)
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Sorting __ex_table...
Memory: 8110276K/8388608K available (6276K kernel code, 1104K rwdata, 2212K 
rodata, 268K init, 325K bss, 278332K reserved)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
Hierarchical RCU implementation.
RCU restricting CPUs from NR_CPUS=24 to nr_cpu_ids=4.
NR_IRQS:512 nr_irqs:512 16
mpic: Setting up MPIC  OpenPIC   version 1.2 at ffe04, max 4 CPUs
mpic: ISU size: 512, shift: 9, mask: 1ff
mpic: Initializing for 512 sources
clocksource: timebase mult[1400] shift[24] registered
Console: colour dummy device 80x25
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 256
mpic: requesting IPIs...
Brought up 4 CPUs
devtmpfs: initialized
NET: Registered protocol family 16
Found FSL PCI host bridge at 0x000ffe20. Firmware bus number: 0-1
PCI host bridge /pcie@ffe20 (primary) ranges:
 MEM 0x000c..0x000c1fff - 0xe000 
  IO 0x000ff800..0x000ff800 - 0x
/pcie@ffe20: PCICSRBAR @ 0xdf00
/pcie@ffe20: Setup 64-bit PCI DMA window
/pcie@ffe20: DMA window size is 0xdf00
Found FSL PCI host bridge at 0x000ffe201000. Firmware bus number: 0-1
PCI host bridge /pcie@ffe201000  ranges:
 MEM 0x000c2000..0x000c3fff - 0xe000 
  IO 0x000ff801..0x000ff801 - 0x
/pcie@ffe201000: PCICSRBAR @ 0xdf00
/pcie@ffe201000: Setup 64-bit PCI DMA window
/pcie@ffe201000: DMA window size is 0xdf00
software IO TLB [mem 0x0bdca000-0x0fdca000] (64MB) mapped at 
[cbdca000-cfdc9fff]
PCI: Probing PCI hardware
fsl-pci ffe20.pcie: PCI host bridge to bus :00
pci_bus :00: root bus resource [io  0x1-0x1] (bus address 
[0x-0x])
pci_bus :00: root bus resource [mem 0xc-0xc1fff] (bus address 
[0xe000-0x])
pci_bus :00: root bus resource [bus 00-01]
pci :00:00.0: ignoring class 0x0b2000 (doesn't match header type 01)
pci :00:00.0: PCI bridge to [bus 01-ff]
fsl-pci ffe201000.pcie: PCI host bridge to bus 0001:00
pci_bus 0001:00: root bus resource [io  0x21000-0x30fff] (bus address 
[0x-0x])
pci_bus 0001:00: root bus resource [mem 0xc2000-0xc3fff] (bus address 
[0xe000-0x])
pci_bus 0001:00: root bus resource [bus 00-01]
pci 0001:00:00.0: ignoring class 0x0b2000 (doesn't match header type 01)
pci 0001:00:00.0: PCI bridge to [bus 01-ff]
pci :00:00.0: PCI bridge to [bus 01]
pci :00:00.0:   bridge window [io  0x1-0x1]
pci :00:00.0:   bridge window [mem 0xc-0xc1fff]
pci 0001:00:00.0: BAR 9: can't assign mem pref (size 0x10)

RE: Powerpc: Kernel warn_on when enabling IOMMU_API

2013-08-12 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru]
 Sent: Monday, August 12, 2013 7:44 PM
 To: Bhushan Bharat-R65777
 Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API
 
 On 08/12/2013 08:20 PM, Bhushan Bharat-R65777 wrote:
  And this simple fix work for me
  diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
  index b20ff17..8869b0d 100644
  --- a/arch/powerpc/kernel/iommu.c
  +++ b/arch/powerpc/kernel/iommu.c
  @@ -48,6 +48,8 @@
   #include asm/vio.h
   #include asm/tce.h
 
  +#define DEBUG
  +
   #define DBG(...)
 
   static int novmerge;
  @@ -871,7 +873,7 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t
 size,
  }
   }
 
  -#ifdef CONFIG_IOMMU_API
  +#ifdef SPAPR_TCE_IOMMU
   /*
* SPAPR TCE API
*/
  --
 
 
 And with this fix, what does ls -laR /sys/kernel/iommu_groups/ print?

It shows the list of group id and respective devices:

root@p5040ds:~# ls -laR /sys/kernel/iommu_groups/
/sys/kernel/iommu_groups/:
total 0
drwxr-xr-x 15 root root 0 Sep  6 01:42 .
drwxr-xr-x  6 root root 0 Jan  1  1970 ..
drwxr-xr-x  3 root root 0 Sep  6 01:43 0
drwxr-xr-x  3 root root 0 Sep  6 01:43 1
drwxr-xr-x  3 root root 0 Sep  6 01:43 10
drwxr-xr-x  3 root root 0 Sep  6 01:43 11
drwxr-xr-x  3 root root 0 Sep  6 01:43 12
drwxr-xr-x  3 root root 0 Sep  6 01:43 2
drwxr-xr-x  3 root root 0 Sep  6 01:43 3
drwxr-xr-x  3 root root 0 Sep  6 01:43 4
drwxr-xr-x  3 root root 0 Sep  6 01:43 5
drwxr-xr-x  3 root root 0 Sep  6 01:43 6
drwxr-xr-x  3 root root 0 Sep  6 01:43 7
drwxr-xr-x  3 root root 0 Sep  6 01:43 8
drwxr-xr-x  3 root root 0 Sep  6 01:43 9

/sys/kernel/iommu_groups/0:
total 0
drwxr-xr-x  3 root root 0 Sep  6 01:43 .
drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
drwxr-xr-x  2 root root 0 Sep  6 01:43 devices

/sys/kernel/iommu_groups/0/devices:
total 0
drwxr-xr-x 2 root root 0 Sep  6 01:43 .
drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe100300.dma - 
../../../../devices/ffe00.soc/ffe100300.dma

/sys/kernel/iommu_groups/1:
total 0
drwxr-xr-x  3 root root 0 Sep  6 01:43 .
drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
drwxr-xr-x  2 root root 0 Sep  6 01:43 devices

/sys/kernel/iommu_groups/1/devices:
total 0
drwxr-xr-x 2 root root 0 Sep  6 01:43 .
drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe101300.dma - 
../../../../devices/ffe00.soc/ffe101300.dma

/sys/kernel/iommu_groups/10:
total 0
drwxr-xr-x  3 root root 0 Sep  6 01:43 .
drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
drwxr-xr-x  2 root root 0 Sep  6 01:43 devices

/sys/kernel/iommu_groups/10/devices:
total 0
drwxr-xr-x 2 root root 0 Sep  6 01:43 .
drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe302000.jr - 
../../../../devices/ffe00.soc/ffe30.crypto/ffe302000.jr

/sys/kernel/iommu_groups/11:
total 0
drwxr-xr-x  3 root root 0 Sep  6 01:43 .
drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
drwxr-xr-x  2 root root 0 Sep  6 01:43 devices

/sys/kernel/iommu_groups/11/devices:
total 0
drwxr-xr-x 2 root root 0 Sep  6 01:43 .
drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe303000.jr - 
../../../../devices/ffe00.soc/ffe30.crypto/ffe303000.jr

/sys/kernel/iommu_groups/12:
total 0
drwxr-xr-x  3 root root 0 Sep  6 01:43 .
drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
drwxr-xr-x  2 root root 0 Sep  6 01:43 devices

/sys/kernel/iommu_groups/12/devices:
total 0
drwxr-xr-x 2 root root 0 Sep  6 01:43 .
drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe304000.jr - 
../../../../devices/ffe00.soc/ffe30.crypto/ffe304000.jr

/sys/kernel/iommu_groups/2:
total 0
drwxr-xr-x  3 root root 0 Sep  6 01:43 .
drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
drwxr-xr-x  2 root root 0 Sep  6 01:43 devices

/sys/kernel/iommu_groups/2/devices:
total 0
drwxr-xr-x 2 root root 0 Sep  6 01:43 .
drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe114000.sdhc - 
../../../../devices/ffe00.soc/ffe114000.sdhc

/sys/kernel/iommu_groups/3:
total 0
drwxr-xr-x  3 root root 0 Sep  6 01:43 .
drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
drwxr-xr-x  2 root root 0 Sep  6 01:43 devices

/sys/kernel/iommu_groups/3/devices:
total 0
drwxr-xr-x 2 root root 0 Sep  6 01:43 .
drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe21.usb - 
../../../../devices/ffe00.soc/ffe21.usb

/sys/kernel/iommu_groups/4:
total 0
drwxr-xr-x  3 root root 0 Sep  6 01:43 .
drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
drwxr-xr-x  2 root root 0 Sep  6 01:43 devices

/sys/kernel/iommu_groups/4/devices:
total 0
drwxr-xr-x 2 root root 0 Sep  6 01:43 .
drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe211000.usb - 
../../../../devices/ffe00.soc/ffe211000.usb

/sys/kernel/iommu_groups/5:
total 0
drwxr-xr-x  3 root

RE: Powerpc: Kernel warn_on when enabling IOMMU_API

2013-08-12 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Monday, August 12, 2013 9:45 PM
 To: 'Alexey Kardashevskiy'
 Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org
 Subject: RE: Powerpc: Kernel warn_on when enabling IOMMU_API
 
 
 
  -Original Message-
  From: Alexey Kardashevskiy [mailto:a...@ozlabs.ru]
  Sent: Monday, August 12, 2013 7:44 PM
  To: Bhushan Bharat-R65777
  Cc: b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org
  Subject: Re: Powerpc: Kernel warn_on when enabling IOMMU_API
 
  On 08/12/2013 08:20 PM, Bhushan Bharat-R65777 wrote:
   And this simple fix work for me
   diff --git a/arch/powerpc/kernel/iommu.c
   b/arch/powerpc/kernel/iommu.c index b20ff17..8869b0d 100644
   --- a/arch/powerpc/kernel/iommu.c
   +++ b/arch/powerpc/kernel/iommu.c
   @@ -48,6 +48,8 @@
#include asm/vio.h
#include asm/tce.h
  
   +#define DEBUG
   +
#define DBG(...)
  
static int novmerge;
   @@ -871,7 +873,7 @@ void iommu_free_coherent(struct iommu_table
   *tbl, size_t
  size,
   }
}
  
   -#ifdef CONFIG_IOMMU_API
   +#ifdef SPAPR_TCE_IOMMU
/*
 * SPAPR TCE API
 */
   --
 
 
  And with this fix, what does ls -laR /sys/kernel/iommu_groups/ print?
 
 It shows the list of group id and respective devices:

We uses the common iommu code to add a device with iommu_group 
(drivers/iommu/iommu.c) using bus_set_iommu().
Also this have notifier to support hotplug-able device.
So when this initcall (in arch/powerpc/kernel/iommu.c) is called, iommu group 
is already setup.

So we do not this piece of code for powerpc. BTW why we need this with 
Power/TCE, does not the code in driver/iommu/iommu.c serve the purpose?

-Bharat

 
 root@p5040ds:~# ls -laR /sys/kernel/iommu_groups/
 /sys/kernel/iommu_groups/:
 total 0
 drwxr-xr-x 15 root root 0 Sep  6 01:42 .
 drwxr-xr-x  6 root root 0 Jan  1  1970 ..
 drwxr-xr-x  3 root root 0 Sep  6 01:43 0 drwxr-xr-x  3 root root 0 Sep  6 
 01:43
 1 drwxr-xr-x  3 root root 0 Sep  6 01:43 10 drwxr-xr-x  3 root root 0 Sep  6
 01:43 11 drwxr-xr-x  3 root root 0 Sep  6 01:43 12 drwxr-xr-x  3 root root 0 
 Sep
 6 01:43 2 drwxr-xr-x  3 root root 0 Sep  6 01:43 3 drwxr-xr-x  3 root root 0 
 Sep
 6 01:43 4 drwxr-xr-x  3 root root 0 Sep  6 01:43 5 drwxr-xr-x  3 root root 0 
 Sep
 6 01:43 6 drwxr-xr-x  3 root root 0 Sep  6 01:43 7 drwxr-xr-x  3 root root 0 
 Sep
 6 01:43 8 drwxr-xr-x  3 root root 0 Sep  6 01:43 9
 
 /sys/kernel/iommu_groups/0:
 total 0
 drwxr-xr-x  3 root root 0 Sep  6 01:43 .
 drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
 drwxr-xr-x  2 root root 0 Sep  6 01:43 devices
 
 /sys/kernel/iommu_groups/0/devices:
 total 0
 drwxr-xr-x 2 root root 0 Sep  6 01:43 .
 drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
 lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe100300.dma -
 ../../../../devices/ffe00.soc/ffe100300.dma
 
 /sys/kernel/iommu_groups/1:
 total 0
 drwxr-xr-x  3 root root 0 Sep  6 01:43 .
 drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
 drwxr-xr-x  2 root root 0 Sep  6 01:43 devices
 
 /sys/kernel/iommu_groups/1/devices:
 total 0
 drwxr-xr-x 2 root root 0 Sep  6 01:43 .
 drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
 lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe101300.dma -
 ../../../../devices/ffe00.soc/ffe101300.dma
 
 /sys/kernel/iommu_groups/10:
 total 0
 drwxr-xr-x  3 root root 0 Sep  6 01:43 .
 drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
 drwxr-xr-x  2 root root 0 Sep  6 01:43 devices
 
 /sys/kernel/iommu_groups/10/devices:
 total 0
 drwxr-xr-x 2 root root 0 Sep  6 01:43 .
 drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
 lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe302000.jr -
 ../../../../devices/ffe00.soc/ffe30.crypto/ffe302000.jr
 
 /sys/kernel/iommu_groups/11:
 total 0
 drwxr-xr-x  3 root root 0 Sep  6 01:43 .
 drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
 drwxr-xr-x  2 root root 0 Sep  6 01:43 devices
 
 /sys/kernel/iommu_groups/11/devices:
 total 0
 drwxr-xr-x 2 root root 0 Sep  6 01:43 .
 drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
 lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe303000.jr -
 ../../../../devices/ffe00.soc/ffe30.crypto/ffe303000.jr
 
 /sys/kernel/iommu_groups/12:
 total 0
 drwxr-xr-x  3 root root 0 Sep  6 01:43 .
 drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
 drwxr-xr-x  2 root root 0 Sep  6 01:43 devices
 
 /sys/kernel/iommu_groups/12/devices:
 total 0
 drwxr-xr-x 2 root root 0 Sep  6 01:43 .
 drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
 lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe304000.jr -
 ../../../../devices/ffe00.soc/ffe30.crypto/ffe304000.jr
 
 /sys/kernel/iommu_groups/2:
 total 0
 drwxr-xr-x  3 root root 0 Sep  6 01:43 .
 drwxr-xr-x 15 root root 0 Sep  6 01:42 ..
 drwxr-xr-x  2 root root 0 Sep  6 01:43 devices
 
 /sys/kernel/iommu_groups/2/devices:
 total 0
 drwxr-xr-x 2 root root 0 Sep  6 01:43 .
 drwxr-xr-x 3 root root 0 Sep  6 01:43 ..
 lrwxrwxrwx 1 root root 0 Sep  6 01:43 ffe114000.sdhc -
 ../../../../devices/ffe00.soc/ffe114000.sdhc
 
 /sys/kernel/iommu_groups/3:
 total 0

RE: [PATCH 6/6 v3] kvm: powerpc: use caching attributes as per linux pte

2013-08-12 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wood Scott-B07421
 Sent: Saturday, August 10, 2013 6:35 AM
 To: Bhushan Bharat-R65777
 Cc: b...@kernel.crashing.org; ag...@suse.de; pau...@samba.org;
 k...@vger.kernel.org; kvm-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
 Bhushan Bharat-R65777
 Subject: Re: [PATCH 6/6 v3] kvm: powerpc: use caching attributes as per linux
 pte

 On Tue, 2013-08-06 at 17:01 +0530, Bharat Bhushan wrote:
  @@ -449,7 +446,16 @@ static inline int kvmppc_e500_shadow_map(struct
 kvmppc_vcpu_e500 *vcpu_e500,
  gvaddr = ~((tsize_pages  PAGE_SHIFT) - 1);
  }

  -   kvmppc_e500_ref_setup(ref, gtlbe, pfn);
  +   pgdir = vcpu_e500-vcpu.arch.pgdir;
  +   ptep = lookup_linux_pte(pgdir, hva, tsize_pages);
  +   if (pte_present(*ptep)) {
  +   wimg = (pte_val(*ptep)  PTE_WIMGE_SHIFT)  MAS2_WIMGE_MASK;
  +   } else {
  +   printk(KERN_ERR pte not present: gfn %lx, pfn %lx\n,
  +   (long)gfn, pfn);
  +   return -EINVAL;

 Don't let the guest spam the host kernel console by repeatedly accessing bad
 mappings (even if it requires host userspace to assist by pointing a memslot 
 at
 a bad hva).  This should at most be printk_ratelimited(), and probably just
 pr_debug().  It should also have __func__ context.

Very good point, I will make this printk_ratelimited() in this patch. And 
convert this and other error prints to pr_debug() when we will send machine 
check on error in this flow.

 Also, I don't see the return value getting checked (the immediate callers 
 check
 it and propogate the error, but kvmppc_mmu_map() doesn't).
 We want to send a machine check to the guest if this happens (or possibly exit
 to userspace since it indicates a bad memslot, not just a guest bug).  We 
 don't
 want to just silently retry over and over.

I completely agree with you, but this was something already missing (error 
return by this function is nothing new added in this patch), So I would like to 
take that separately.

 Otherwise, this series looks good to me.

Thank you. :)
-Bharat

 -Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-06 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Tuesday, August 06, 2013 6:42 AM
 To: Wood Scott-B07421
 Cc: Benjamin Herrenschmidt; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Subject: RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
 booke3s
 
 
 
  -Original Message-
  From: Wood Scott-B07421
  Sent: Tuesday, August 06, 2013 12:49 AM
  To: Bhushan Bharat-R65777
  Cc: Benjamin Herrenschmidt; Wood Scott-B07421; ag...@suse.de; kvm-
  p...@vger.kernel.org; k...@vger.kernel.org;
  linuxppc-dev@lists.ozlabs.org
  Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup
  like booke3s
 
  On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:
  
-Original Message-
From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
Sent: Saturday, August 03, 2013 9:54 AM
To: Bhushan Bharat-R65777
Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org;
k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte
lookup like booke3s
   
On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote:
 One of the problem I saw was that if I put this code in
 asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and
 other friend function (on which this code depends) are defined in
 pgtable.h.
 And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h
 before it defines pte_present() and friends functions.

 Ok I move wove this in asm/pgtable*.h, initially I fought with
 myself to take this code in pgtable* but finally end up doing
 here (got biased by book3s :)).
   
Is there a reason why these routines can not be completely generic
in pgtable.h ?
  
   How about the generic function:
  
   diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h
   b/arch/powerpc/include/asm/pgtable-ppc64.h
   index d257d98..21daf28 100644
   --- a/arch/powerpc/include/asm/pgtable-ppc64.h
   +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
   @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct
   mm_struct
  *mm,
   return old;
}
  
   +static inline unsigned long pte_read(pte_t *p) { #ifdef
   +PTE_ATOMIC_UPDATES
   +   pte_t pte;
   +   pte_t tmp;
   +   __asm__ __volatile__ (
   +   1: ldarx   %0,0,%3\n
   +  andi.   %1,%0,%4\n
   +  bne-1b\n
   +  ori %1,%0,%4\n
   +  stdcx.  %1,0,%3\n
   +  bne-1b
   +   : =r (pte), =r (tmp), =m (*p)
   +   : r (p), i (_PAGE_BUSY)
   +   : cc);
   +
   +   return pte;
   +#else
   +   return pte_val(*p);
   +#endif
   +#endif
   +}
static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
 unsigned long addr,
   pte_t *ptep)
 
  Please leave a blank line between functions.
 
{
   diff --git a/arch/powerpc/include/asm/pgtable.h
   b/arch/powerpc/include/asm/pgtable.h
   index 690c8c2..dad712c 100644
   --- a/arch/powerpc/include/asm/pgtable.h
   +++ b/arch/powerpc/include/asm/pgtable.h
   @@ -254,6 +254,45 @@ static inline pte_t
   *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,  }
   #endif
   /* !CONFIG_HUGETLB_PAGE */
  
   +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
   +int writing, unsigned long
   +*pte_sizep)
 
  The name implies that it just reads the PTE.  Setting accessed/dirty
  shouldn't be an undocumented side-effect.
 
 Ok, will rename and document.
 
  Why can't the caller do that (or a different function that the caller
  calls afterward if desired)?
 
 The current implementation in book3s is;
  1) find a pte/hugepte
  2) return null if pte not present
  3) take _PAGE_BUSY lock
  4) set accessed/dirty
  5) clear _PAGE_BUSY.
 
 What I tried was
 1) find a pte/hugepte
 2) return null if pte not present
 3) return pte (not take lock by not setting _PAGE_BUSY)
 
 4) then user calls  __ptep_set_access_flags() to atomic update the
 dirty/accessed flags in pte.
 
 - but the benchmark results were not good
 - Also can there be race as we do not take lock in step 3 and update in step 
 4 ?
 
 
  Though even then you have the undocumented side effect of locking the
  PTE on certain targets.
 
   +{
   +   pte_t *ptep;
   +   pte_t pte;
   +   unsigned long ps = *pte_sizep;
   +   unsigned int shift;
   +
   +   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
   +   if (!ptep)
   +   return __pte(0);
   +   if (shift)
   +   *pte_sizep = 1ul  shift;
   +   else
   +   *pte_sizep = PAGE_SIZE;
   +
   +   if (ps  *pte_sizep)
   +   return __pte(0);
   +
   +   if (!pte_present(*ptep))
   +   return __pte(0);
   +
   +#ifdef CONFIG_PPC64
   +   /* Lock

RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-06 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, August 06, 2013 12:49 AM
 To: Bhushan Bharat-R65777
 Cc: Benjamin Herrenschmidt; Wood Scott-B07421; ag...@suse.de; kvm-
 p...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
 booke3s
 
 On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:
 
   -Original Message-
   From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
   Sent: Saturday, August 03, 2013 9:54 AM
   To: Bhushan Bharat-R65777
   Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org;
   k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
   Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte
   lookup like booke3s
  
   On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote:
One of the problem I saw was that if I put this code in
asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
friend function (on which this code depends) are defined in pgtable.h.
And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h
before it defines pte_present() and friends functions.
   
Ok I move wove this in asm/pgtable*.h, initially I fought with
myself to take this code in pgtable* but finally end up doing here
(got biased by book3s :)).
  
   Is there a reason why these routines can not be completely generic
   in pgtable.h ?
 
  How about the generic function:
 
  diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h
  b/arch/powerpc/include/asm/pgtable-ppc64.h
  index d257d98..21daf28 100644
  --- a/arch/powerpc/include/asm/pgtable-ppc64.h
  +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
  @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct
 *mm,
  return old;
   }
 
  +static inline unsigned long pte_read(pte_t *p) { #ifdef
  +PTE_ATOMIC_UPDATES
  +   pte_t pte;
  +   pte_t tmp;
  +   __asm__ __volatile__ (
  +   1: ldarx   %0,0,%3\n
  +  andi.   %1,%0,%4\n
  +  bne-1b\n
  +  ori %1,%0,%4\n
  +  stdcx.  %1,0,%3\n
  +  bne-1b
  +   : =r (pte), =r (tmp), =m (*p)
  +   : r (p), i (_PAGE_BUSY)
  +   : cc);
  +
  +   return pte;
  +#else
  +   return pte_val(*p);
  +#endif
  +#endif
  +}
   static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
unsigned long addr,
  pte_t *ptep)
 
 Please leave a blank line between functions.
 
   {
  diff --git a/arch/powerpc/include/asm/pgtable.h
  b/arch/powerpc/include/asm/pgtable.h
  index 690c8c2..dad712c 100644
  --- a/arch/powerpc/include/asm/pgtable.h
  +++ b/arch/powerpc/include/asm/pgtable.h
  @@ -254,6 +254,45 @@ static inline pte_t
  *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,  }  #endif
  /* !CONFIG_HUGETLB_PAGE */
 
  +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
  +int writing, unsigned long
  +*pte_sizep)
 
 The name implies that it just reads the PTE.  Setting accessed/dirty shouldn't
 be an undocumented side-effect.  Why can't the caller do that (or a different
 function that the caller calls afterward if desired)?

Scott, I sent the next version of patch based on above idea. Now I think we do 
not need to update the pte flags on booke 
So we do not need to solve the kvmppc_read_update_linux_pte() stuff of book3s.

-Bharat

 
 Though even then you have the undocumented side effect of locking the PTE on
 certain targets.
 
  +{
  +   pte_t *ptep;
  +   pte_t pte;
  +   unsigned long ps = *pte_sizep;
  +   unsigned int shift;
  +
  +   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
  +   if (!ptep)
  +   return __pte(0);
  +   if (shift)
  +   *pte_sizep = 1ul  shift;
  +   else
  +   *pte_sizep = PAGE_SIZE;
  +
  +   if (ps  *pte_sizep)
  +   return __pte(0);
  +
  +   if (!pte_present(*ptep))
  +   return __pte(0);
  +
  +#ifdef CONFIG_PPC64
  +   /* Lock PTE (set _PAGE_BUSY) and read */
  +   pte = pte_read(ptep);
  +#else
  +   pte = pte_val(*ptep);
  +#endif
 
 What about 32-bit platforms that need atomic PTEs?
 
 -Scott
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-05 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Saturday, August 03, 2013 9:54 AM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
 booke3s
 
 On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote:
  One of the problem I saw was that if I put this code in
  asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
  friend function (on which this code depends) are defined in pgtable.h.
  And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it
  defines pte_present() and friends functions.
 
  Ok I move wove this in asm/pgtable*.h, initially I fought with myself
  to take this code in pgtable* but finally end up doing here (got
  biased by book3s :)).
 
 Is there a reason why these routines can not be completely generic in 
 pgtable.h
 ?

How about the generic function:

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index d257d98..21daf28 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct 
*mm,
return old;
 }

+static inline unsigned long pte_read(pte_t *p)
+{
+#ifdef PTE_ATOMIC_UPDATES
+   pte_t pte;
+   pte_t tmp;
+   __asm__ __volatile__ (
+   1: ldarx   %0,0,%3\n
+  andi.   %1,%0,%4\n
+  bne-1b\n
+  ori %1,%0,%4\n
+  stdcx.  %1,0,%3\n
+  bne-1b
+   : =r (pte), =r (tmp), =m (*p)
+   : r (p), i (_PAGE_BUSY)
+   : cc);
+
+   return pte;
+#else  
+   return pte_val(*p);
+#endif
+#endif
+}
 static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
  unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 690c8c2..dad712c 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -254,6 +254,45 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t 
*pgdir, unsigned long ea,
 }
 #endif /* !CONFIG_HUGETLB_PAGE */

+static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
+int writing, unsigned long *pte_sizep)
+{
+   pte_t *ptep;
+   pte_t pte;
+   unsigned long ps = *pte_sizep;
+   unsigned int shift;
+
+   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
+   if (!ptep)
+   return __pte(0);
+   if (shift)
+   *pte_sizep = 1ul  shift;
+   else
+   *pte_sizep = PAGE_SIZE;
+
+   if (ps  *pte_sizep)
+   return __pte(0);
+
+   if (!pte_present(*ptep))
+   return __pte(0);
+
+#ifdef CONFIG_PPC64
+   /* Lock PTE (set _PAGE_BUSY) and read */
+   pte = pte_read(ptep);
+#else
+   pte = pte_val(*ptep);
+#endif
+   if (pte_present(pte)) {
+   pte = pte_mkyoung(pte);
+   if (writing  pte_write(pte))
+   pte = pte_mkdirty(pte);
+   }
+
+   *ptep = __pte(pte); /* 64bit: Also unlock pte (clear _PAGE_BUSY) */
+
+   return pte;
+}
+
 #endif /* __ASSEMBLY__ */

 #endif /* __KERNEL__ */
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-05 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, August 06, 2013 12:49 AM
 To: Bhushan Bharat-R65777
 Cc: Benjamin Herrenschmidt; Wood Scott-B07421; ag...@suse.de; kvm-
 p...@vger.kernel.org; k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
 booke3s

 On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:

   -Original Message-
   From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
   Sent: Saturday, August 03, 2013 9:54 AM
   To: Bhushan Bharat-R65777
   Cc: Wood Scott-B07421; ag...@suse.de; kvm-...@vger.kernel.org;
   k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
   Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte
   lookup like booke3s

   On Sat, 2013-08-03 at 02:58 +, Bhushan Bharat-R65777 wrote:
One of the problem I saw was that if I put this code in
asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
friend function (on which this code depends) are defined in pgtable.h.
And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h
before it defines pte_present() and friends functions.

Ok I move wove this in asm/pgtable*.h, initially I fought with
myself to take this code in pgtable* but finally end up doing here
(got biased by book3s :)).

   Is there a reason why these routines can not be completely generic
   in pgtable.h ?

  How about the generic function:

  diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h
  b/arch/powerpc/include/asm/pgtable-ppc64.h
  index d257d98..21daf28 100644
  --- a/arch/powerpc/include/asm/pgtable-ppc64.h
  +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
  @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct
 *mm,
  return old;
   }

  +static inline unsigned long pte_read(pte_t *p) { #ifdef
  +PTE_ATOMIC_UPDATES
  +   pte_t pte;
  +   pte_t tmp;
  +   __asm__ __volatile__ (
  +   1: ldarx   %0,0,%3\n
  +  andi.   %1,%0,%4\n
  +  bne-1b\n
  +  ori %1,%0,%4\n
  +  stdcx.  %1,0,%3\n
  +  bne-1b
  +   : =r (pte), =r (tmp), =m (*p)
  +   : r (p), i (_PAGE_BUSY)
  +   : cc);
  +
  +   return pte;
  +#else
  +   return pte_val(*p);
  +#endif
  +#endif
  +}
   static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
unsigned long addr,
  pte_t *ptep)

 Please leave a blank line between functions.

   {
  diff --git a/arch/powerpc/include/asm/pgtable.h
  b/arch/powerpc/include/asm/pgtable.h
  index 690c8c2..dad712c 100644
  --- a/arch/powerpc/include/asm/pgtable.h
  +++ b/arch/powerpc/include/asm/pgtable.h
  @@ -254,6 +254,45 @@ static inline pte_t
  *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,  }  #endif
  /* !CONFIG_HUGETLB_PAGE */

  +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
  +int writing, unsigned long
  +*pte_sizep)

 The name implies that it just reads the PTE.  Setting accessed/dirty shouldn't
 be an undocumented side-effect.

Ok, will rename and document.

 Why can't the caller do that (or a different
 function that the caller calls afterward if desired)?

The current implementation in book3s is;
 1) find a pte/hugepte
 2) return null if pte not present
 3) take _PAGE_BUSY lock
 4) set accessed/dirty
 5) clear _PAGE_BUSY.

What I tried was 
1) find a pte/hugepte
2) return null if pte not present
3) return pte (not take lock by not setting _PAGE_BUSY)

4) then user calls  __ptep_set_access_flags() to atomic update the 
dirty/accessed flags in pte.

- but the benchmark results were not good
- Also can there be race as we do not take lock in step 3 and update in step 4 ?

 Though even then you have the undocumented side effect of locking the PTE on
 certain targets.

  +{
  +   pte_t *ptep;
  +   pte_t pte;
  +   unsigned long ps = *pte_sizep;
  +   unsigned int shift;
  +
  +   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
  +   if (!ptep)
  +   return __pte(0);
  +   if (shift)
  +   *pte_sizep = 1ul  shift;
  +   else
  +   *pte_sizep = PAGE_SIZE;
  +
  +   if (ps  *pte_sizep)
  +   return __pte(0);
  +
  +   if (!pte_present(*ptep))
  +   return __pte(0);
  +
  +#ifdef CONFIG_PPC64
  +   /* Lock PTE (set _PAGE_BUSY) and read */
  +   pte = pte_read(ptep);
  +#else
  +   pte = pte_val(*ptep);
  +#endif

 What about 32-bit platforms that need atomic PTEs?

I called __ptep_set_access_flags() for both 32/64bit (for 64bit I was not 
calling pte_read()), which handles atomic updates. Somehow the benchmark result 
were not good, will try again.

Thanks
-Bharat

 -Scott

___
Linuxppc-dev mailing list
Linuxppc-dev

RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

2013-08-02 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Saturday, August 03, 2013 4:47 AM
 To: Wood Scott-B07421
 Cc: Bhushan Bharat-R65777; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
 booke3s

 On Fri, 2013-08-02 at 17:58 -0500, Scott Wood wrote:

  What about 64-bit PTEs on 32-bit kernels?

  In any case, this code does not belong in KVM.  It should be in the
  main PPC mm code, even if KVM is the only user.

 Also don't we do similar things in BookS KVM ? At the very least that sutff
 should become common. And yes, I agree, it should probably also move to 
 pgtable*

One of the problem I saw was that if I put this code in asm/pgtable-32.h and 
asm/pgtable-64.h then pte_persent() and other friend function (on which this 
code depends) are defined in pgtable.h. And pgtable.h includes asm/pgtable-32.h 
and asm/pgtable-64.h before it defines pte_present() and friends functions.

Ok I move wove this in asm/pgtable*.h, initially I fought with myself to take 
this code in pgtable* but finally end up doing here (got biased by book3s :)).

Thanks
-Bharat

 Cheers,
 Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte

2013-08-02 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Wood Scott-B07421
 Sent: Saturday, August 03, 2013 5:05 AM
 To: Bhushan Bharat-R65777
 Cc: b...@kernel.crashing.org; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux
 pte
 
 On Thu, Aug 01, 2013 at 04:42:38PM +0530, Bharat Bhushan wrote:
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
  17722d8..eb2 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -697,7 +697,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run,
  struct kvm_vcpu *vcpu)  #endif
 
  kvmppc_fix_ee_before_entry();
  -
  +   vcpu-arch.pgdir = current-mm-pgd;
  ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
 kvmppc_fix_ee_before_entry() is supposed to be the last thing that happens
 before __kvmppc_vcpu_run().
 
  @@ -332,6 +324,8 @@ static inline int kvmppc_e500_shadow_map(struct
 kvmppc_vcpu_e500 *vcpu_e500,
  unsigned long hva;
  int pfnmap = 0;
  int tsize = BOOK3E_PAGESZ_4K;
  +   pte_t pte;
  +   int wimg = 0;
 
  /*
   * Translate guest physical to true physical, acquiring @@ -437,6
  +431,8 @@ static inline int kvmppc_e500_shadow_map(struct
  kvmppc_vcpu_e500 *vcpu_e500,
 
  if (likely(!pfnmap)) {
  unsigned long tsize_pages = 1  (tsize + 10 - PAGE_SHIFT);
  +   pgd_t *pgdir;
  +
  pfn = gfn_to_pfn_memslot(slot, gfn);
  if (is_error_noslot_pfn(pfn)) {
  printk(KERN_ERR Couldn't get real page for gfn 
  %lx!\n, @@
 -447,9
  +443,18 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500
 *vcpu_e500,
  /* Align guest and physical address to page map boundaries */
  pfn = ~(tsize_pages - 1);
  gvaddr = ~((tsize_pages  PAGE_SHIFT) - 1);
  +   pgdir = vcpu_e500-vcpu.arch.pgdir;
  +   pte = lookup_linux_pte(pgdir, hva, 1, tsize_pages);
  +   if (pte_present(pte)) {
  +   wimg = (pte  PTE_WIMGE_SHIFT)  MAS2_WIMGE_MASK;
  +   } else {
  +   printk(KERN_ERR pte not present: gfn %lx, pfn %lx\n,
  +   (long)gfn, pfn);
  +   return -EINVAL;
  +   }
  }
 
 How does wimg get set in the pfnmap case?

Pfnmap is not kernel managed pages, right? So should we set I+G there ?

 
 Could you explain why we need to set dirty/referenced on the PTE, when we 
 didn't
 need to do that before? All we're getting from the PTE is wimg.
 We have MMU notifiers to take care of the page being unmapped, and we've 
 already
 marked the page itself as dirty if the TLB entry is writeable.

I pulled this code from book3s.

Ben, can you describe why we need this on book3s ?

Thanks
-Bharat
 
 -Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages

2013-07-30 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Saturday, July 27, 2013 3:57 AM
 To: Bhushan Bharat-R65777
 Cc: Alexander Graf; kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-
 d...@lists.ozlabs.org; Wood Scott-B07421
 Subject: Re: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages
 
 On Fri, 2013-07-26 at 15:03 +, Bhushan Bharat-R65777 wrote:
  Will not searching the Linux PTE is a overkill?
 
 That's the best approach. Also we are searching it already to resolve the page
 fault. That does mean we search twice but on the other hand that also means 
 it's
 hot in the cache.


Below is early git diff (not a proper cleanup patch), to be sure that this is 
what we want on PowerPC and take early feedback. Also I run some benchmark to 
understand the overhead if any. 

Using kvm_is_mmio_pfn(); what the current patch does:   

Real: 0m46.616s + 0m49.517s + 0m49.510s + 0m46.936s + 0m46.889s + 0m46.684s = 
Avg; 47.692s
User: 0m31.636s + 0m31.816s + 0m31.456s + 0m31.752s + 0m32.028s + 0m31.848s = 
Avg; 31.756s
Sys:  0m11.596s + 0m11.868s + 0m12.244s + 0m11.672s + 0m11.356s + 0m11.432s = 
Avg; 11.695s


Using kernel page table search (below changes):
Real: 0m46.431s + 0m50.269s + 0m46.724s + 0m46.645s + 0m46.670s + 0m50.259s = 
Avg; 47.833s
User: 0m31.568s + 0m31.816s + 0m31.444s + 0m31.808s + 0m31.312s + 0m31.740s = 
Avg; 31.614s
Sys:  0m11.516s + 0m12.060s + 0m11.872s + 0m11.476s + 0m12.000s + 0m12.152s = 
Avg; 11.846s

--
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 3328353..d6d0dac 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -532,6 +532,7 @@ struct kvm_vcpu_arch {
u32 epr;
u32 crit_save;
struct kvmppc_booke_debug_reg dbg_reg;
+   pgd_t *pgdir;
 #endif
gpa_t paddr_accessed;
gva_t vaddr_accessed;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 17722d8..eb2 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -697,7 +697,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 #endif
 
kvmppc_fix_ee_before_entry();
-
+   vcpu-arch.pgdir = current-mm-pgd;
ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
/* No need for kvm_guest_exit. It's done in handle_exit.
diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index 4fd9650..fc4b2f6 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -31,11 +31,13 @@ enum vcpu_ftr {
 #define E500_TLB_NUM   2
 
 /* entry is mapped somewhere in host TLB */
-#define E500_TLB_VALID (1  0)
+#define E500_TLB_VALID (1  31)
 /* TLB1 entry is mapped by host TLB1, tracked by bitmaps */
-#define E500_TLB_BITMAP(1  1)
+#define E500_TLB_BITMAP(1  30)
 /* TLB1 entry is mapped by host TLB0 */
-#define E500_TLB_TLB0  (1  2)
+#define E500_TLB_TLB0  (1  29)
+/* Lower 5 bits have WIMGE value */
+#define E500_TLB_WIMGE_MASK(0x1f)
 
 struct tlbe_ref {
pfn_t pfn;  /* valid only for TLB0, except briefly */
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 5cbdc8f..a48c13f 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -40,6 +40,84 @@
 
 static struct kvmppc_e500_tlb_params host_tlb_params[E500_TLB_NUM];
 
+/*
+ * find_linux_pte returns the address of a linux pte for a given
+ * effective address and directory.  If not found, it returns zero.
+ */
+static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea)
+{
+pgd_t *pg;
+pud_t *pu;
+pmd_t *pm;
+pte_t *pt = NULL;
+
+pg = pgdir + pgd_index(ea);
+if (!pgd_none(*pg)) {
+pu = pud_offset(pg, ea);
+if (!pud_none(*pu)) {
+pm = pmd_offset(pu, ea);
+if (pmd_present(*pm))
+pt = pte_offset_kernel(pm, ea);
+}
+}
+return pt;
+}
+
+#ifdef CONFIG_HUGETLB_PAGE
+pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
+ unsigned *shift);
+#else
+static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
+   unsigned *shift)
+{
+if (shift)
+*shift = 0;
+return find_linux_pte(pgdir, ea);
+}
+#endif /* !CONFIG_HUGETLB_PAGE */
+
+/*
+ * Lock and read a linux PTE.  If it's present and writable, atomically
+ * set dirty and referenced bits and return the PTE, otherwise return 0.
+ */
+static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing)
+{
+   pte_t pte = pte_val(*p);
+
+   if (pte_present(pte)) {
+   pte = pte_mkyoung

RE: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages

2013-07-30 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, July 31, 2013 12:19 AM
 To: Bhushan Bharat-R65777
 Cc: Benjamin Herrenschmidt; Alexander Graf; kvm-...@vger.kernel.org;
 k...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421
 Subject: Re: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages
 
 On 07/30/2013 11:22:54 AM, Bhushan Bharat-R65777 wrote:
  diff --git a/arch/powerpc/kvm/e500_mmu_host.c
  b/arch/powerpc/kvm/e500_mmu_host.c
  index 5cbdc8f..a48c13f 100644
  --- a/arch/powerpc/kvm/e500_mmu_host.c
  +++ b/arch/powerpc/kvm/e500_mmu_host.c
  @@ -40,6 +40,84 @@
 
   static struct kvmppc_e500_tlb_params host_tlb_params[E500_TLB_NUM];
 
  +/*
  + * find_linux_pte returns the address of a linux pte for a given
  + * effective address and directory.  If not found, it returns zero.
  + */
  +static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea) {
  +pgd_t *pg;
  +pud_t *pu;
  +pmd_t *pm;
  +pte_t *pt = NULL;
  +
  +pg = pgdir + pgd_index(ea);
  +if (!pgd_none(*pg)) {
  +pu = pud_offset(pg, ea);
  +if (!pud_none(*pu)) {
  +pm = pmd_offset(pu, ea);
  +if (pmd_present(*pm))
  +pt = pte_offset_kernel(pm, ea);
  +}
  +}
  +return pt;
  +}
 
 How is this specific to KVM or e500?
 
  +#ifdef CONFIG_HUGETLB_PAGE
  +pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
  + unsigned *shift); #else static
  +inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir,
  unsigned long ea,
  +   unsigned *shift) {
  +if (shift)
  +*shift = 0;
  +return find_linux_pte(pgdir, ea); } #endif /*
  +!CONFIG_HUGETLB_PAGE */
 
 This is already declared in asm/pgtable.h.  If we need a non-hugepage
 alternative, that should also go in asm/pgtable.h.
 
  +/*
  + * Lock and read a linux PTE.  If it's present and writable,
  atomically
  + * set dirty and referenced bits and return the PTE, otherwise
  return 0.
  + */
  +static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int
  writing)
  +{
  +   pte_t pte = pte_val(*p);
  +
  +   if (pte_present(pte)) {
  +   pte = pte_mkyoung(pte);
  +   if (writing  pte_write(pte))
  +   pte = pte_mkdirty(pte);
  +   }
  +
  +   *p = pte;
  +
  +   return pte;
  +}
  +
  +static pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
  + int writing, unsigned long *pte_sizep) {
  +   pte_t *ptep;
  +   unsigned long ps = *pte_sizep;
  +   unsigned int shift;
  +
  +   ptep = find_linux_pte_or_hugepte(pgdir, hva, shift);
  +   if (!ptep)
  +   return __pte(0);
  +   if (shift)
  +   *pte_sizep = 1ul  shift;
  +   else
  +   *pte_sizep = PAGE_SIZE;
  +
  +   if (ps  *pte_sizep)
  +   return __pte(0);
  +   if (!pte_present(*ptep))
  +   return __pte(0);
  +
  +   return kvmppc_read_update_linux_pte(ptep, writing); }
  +
 
 None of this belongs in this file either.
 
  @@ -326,8 +405,8 @@ static void kvmppc_e500_setup_stlbe(
 
  /* Force IPROT=0 for all guest mappings. */
  stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) |
  MAS1_VALID;
  -   stlbe-mas2 = (gvaddr  MAS2_EPN) |
  - e500_shadow_mas2_attrib(gtlbe-mas2, pfn);
  +   stlbe-mas2 = (gvaddr  MAS2_EPN) | (ref-flags 
  E500_TLB_WIMGE_MASK);
  +//   e500_shadow_mas2_attrib(gtlbe-mas2, pfn);
 
 MAS2_E and MAS2_G should be safe to come from the guest.

This is handled when setting WIMGE in ref-flags.

 
 How does this work for TLB1?  One ref corresponds to one guest entry, which 
 may
 correspond to multiple host entries, potentially each with different WIM
 settings.

Yes, one ref corresponds to one guest entry. To understand how this will work 
when a one guest tlb1 entry may maps to many host tlb0/1 entry; 
on guest tlbwe, KVM setup one guest tlb entry and then pre-map one host tlb 
entry (out of many) and ref (ref-pfn etc) points to this pre-map entry for 
that guest entry.
Now a guest TLB miss happens which falls on same guest tlb entry and but 
demands another host tlb entry. In that flow we change/overwrite ref (ref-pfn 
etc) to point to new host mapping for same guest mapping.

 
  stlbe-mas7_3 = ((u64)pfn  PAGE_SHIFT) |
  e500_shadow_mas3_attrib(gtlbe-mas7_3, pr);
 
  @@ -346,6 +425,8 @@ static inline int kvmppc_e500_shadow_map(struct
  kvmppc_vcpu_e500 *vcpu_e500,
  unsigned long hva;
  int pfnmap = 0;
  int tsize = BOOK3E_PAGESZ_4K;
  +   pte_t pte;
  +   int wimg = 0;
 
  /*
   * Translate guest physical to true physical, acquiring

RE: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages

2013-07-26 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Friday, July 26, 2013 1:57 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; 
 linuxppc-dev@lists.ozlabs.org;
 ag...@suse.de; Wood Scott-B07421; Bhushan Bharat-R65777
 Subject: Re: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages
 
 On Fri, 2013-07-26 at 11:16 +0530, Bharat Bhushan wrote:
  If the page is RAM then map this as cacheable and coherent (set M
  bit) otherwise this page is treated as I/O and map this as cache
  inhibited and guarded (set  I + G)
 
  This helps setting proper MMU mapping for direct assigned device.
 
  NOTE: There can be devices that require cacheable mapping, which is not yet
 supported.
 
 Why don't you do like server instead and enforce the use of the same I and M
 bits as the corresponding qemu PTE ?

Ben/Alex, I will look into the code. Can you please describe how this is 
handled on server?

Thanks
-Bharat

 
 Cheers,
 Ben.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   arch/powerpc/kvm/e500_mmu_host.c |   24 +++-
   1 files changed, 19 insertions(+), 5 deletions(-)
 
  diff --git a/arch/powerpc/kvm/e500_mmu_host.c
  b/arch/powerpc/kvm/e500_mmu_host.c
  index 1c6a9d7..5cbdc8f 100644
  --- a/arch/powerpc/kvm/e500_mmu_host.c
  +++ b/arch/powerpc/kvm/e500_mmu_host.c
  @@ -64,13 +64,27 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int
 usermode)
  return mas3;
   }
 
  -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
  +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
   {
  +   u32 mas2_attr;
  +
  +   mas2_attr = mas2  MAS2_ATTRIB_MASK;
  +
  +   if (kvm_is_mmio_pfn(pfn)) {
  +   /*
  +* If page is not RAM then it is treated as I/O page.
  +* Map it with cache inhibited and guarded (set I + G).
  +*/
  +   mas2_attr |= MAS2_I | MAS2_G;
  +   return mas2_attr;
  +   }
  +
  +   /* Map RAM pages as cacheable (Not setting I in MAS2) */
   #ifdef CONFIG_SMP
  -   return (mas2  MAS2_ATTRIB_MASK) | MAS2_M;
  -#else
  -   return mas2  MAS2_ATTRIB_MASK;
  +   /* Also map as coherent (set M) in SMP */
  +   mas2_attr |= MAS2_M;
   #endif
  +   return mas2_attr;
   }
 
   /*
  @@ -313,7 +327,7 @@ static void kvmppc_e500_setup_stlbe(
  /* Force IPROT=0 for all guest mappings. */
  stlbe-mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
  stlbe-mas2 = (gvaddr  MAS2_EPN) |
  - e500_shadow_mas2_attrib(gtlbe-mas2, pr);
  + e500_shadow_mas2_attrib(gtlbe-mas2, pfn);
  stlbe-mas7_3 = ((u64)pfn  PAGE_SHIFT) |
  e500_shadow_mas3_attrib(gtlbe-mas7_3, pr);
 
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages

2013-07-26 Thread Bhushan Bharat-R65777



 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
 Behalf Of Alexander Graf
 Sent: Friday, July 26, 2013 2:20 PM
 To: Benjamin Herrenschmidt
 Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; k...@vger.kernel.org;
 linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421; Bhushan Bharat-R65777
 Subject: Re: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages
 
 
 On 26.07.2013, at 10:26, Benjamin Herrenschmidt wrote:
 
  On Fri, 2013-07-26 at 11:16 +0530, Bharat Bhushan wrote:
  If the page is RAM then map this as cacheable and coherent (set M
  bit) otherwise this page is treated as I/O and map this as cache
  inhibited and guarded (set  I + G)
 
  This helps setting proper MMU mapping for direct assigned device.
 
  NOTE: There can be devices that require cacheable mapping, which is not yet
 supported.
 
  Why don't you do like server instead and enforce the use of the same I
  and M bits as the corresponding qemu PTE ?
 
 Specifically, Ben is talking about this code:
 
 
 /* Translate to host virtual address */
 hva = __gfn_to_hva_memslot(memslot, gfn);
 
 /* Look up the Linux PTE for the backing page */
 pte_size = psize;
 pte = lookup_linux_pte(pgdir, hva, writing, pte_size);
 if (pte_present(pte)) {
 if (writing  !pte_write(pte))
 /* make the actual HPTE be read-only */
 ptel = hpte_make_readonly(ptel);
 is_io = hpte_cache_bits(pte_val(pte));
 pa = pte_pfn(pte)  PAGE_SHIFT;
 }
 

Ok

Thanks
-Bharat


 
 Alex
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body
 of a message to majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages

2013-07-26 Thread Bhushan Bharat-R65777



 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
 Behalf Of Alexander Graf
 Sent: Friday, July 26, 2013 2:20 PM
 To: Benjamin Herrenschmidt
 Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; k...@vger.kernel.org;
 linuxppc-dev@lists.ozlabs.org; Wood Scott-B07421; Bhushan Bharat-R65777
 Subject: Re: [PATCH 4/4] kvm: powerpc: set cache coherency only for RAM pages
 
 
 On 26.07.2013, at 10:26, Benjamin Herrenschmidt wrote:
 
  On Fri, 2013-07-26 at 11:16 +0530, Bharat Bhushan wrote:
  If the page is RAM then map this as cacheable and coherent (set M
  bit) otherwise this page is treated as I/O and map this as cache
  inhibited and guarded (set  I + G)
 
  This helps setting proper MMU mapping for direct assigned device.
 
  NOTE: There can be devices that require cacheable mapping, which is not yet
 supported.
 
  Why don't you do like server instead and enforce the use of the same I
  and M bits as the corresponding qemu PTE ?
 
 Specifically, Ben is talking about this code:
 
 
 /* Translate to host virtual address */
 hva = __gfn_to_hva_memslot(memslot, gfn);
 
 /* Look up the Linux PTE for the backing page */
 pte_size = psize;
 pte = lookup_linux_pte(pgdir, hva, writing, pte_size);
 if (pte_present(pte)) {
 if (writing  !pte_write(pte))
 /* make the actual HPTE be read-only */
 ptel = hpte_make_readonly(ptel);
 is_io = hpte_cache_bits(pte_val(pte));
 pa = pte_pfn(pte)  PAGE_SHIFT;
 }
 

Will not searching the Linux PTE is a overkill?

=Bharat



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [v3][PATCH 1/8] powerpc/book3e: rename interrupt_end_book3e with __end_interrupts

2013-07-09 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun 
 Chen
 Sent: Tuesday, July 09, 2013 1:33 PM
 To: b...@kernel.crashing.org
 Cc: linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org
 Subject: [v3][PATCH 1/8] powerpc/book3e: rename interrupt_end_book3e with
 __end_interrupts
 
 We can rename 'interrupt_end_book3e' with '__end_interrupts' then 
 book3s/book3e
 can share this unique label to make sure we can use this conveniently.

I think we can be consistent with start and end names, no?

-Bharat

 
 Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
 ---
  arch/powerpc/kernel/exceptions-64e.S |8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/arch/powerpc/kernel/exceptions-64e.S
 b/arch/powerpc/kernel/exceptions-64e.S
 index 645170a..a518e48 100644
 --- a/arch/powerpc/kernel/exceptions-64e.S
 +++ b/arch/powerpc/kernel/exceptions-64e.S
 @@ -309,8 +309,8 @@ interrupt_base_book3e:
 /* fake
 trap */
   EXCEPTION_STUB(0x300, hypercall)
   EXCEPTION_STUB(0x320, ehpriv)
 
 - .globl interrupt_end_book3e
 -interrupt_end_book3e:
 + .globl __end_interrupts
 +__end_interrupts:
 
  /* Critical Input Interrupt */
   START_EXCEPTION(critical_input);
 @@ -493,7 +493,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
   beq+1f
 
   LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e)
 - LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e)
 + LOAD_REG_IMMEDIATE(r15,__end_interrupts)
   cmpld   cr0,r10,r14
   cmpld   cr1,r10,r15
   blt+cr0,1f
 @@ -559,7 +559,7 @@ kernel_dbg_exc:
   beq+1f
 
   LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e)
 - LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e)
 + LOAD_REG_IMMEDIATE(r15,__end_interrupts)
   cmpld   cr0,r10,r14
   cmpld   cr1,r10,r15
   blt+cr0,1f
 --
 1.7.9.5
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [v3][PATCH 7/8] book3e/kexec/kdump: redefine VIRT_PHYS_OFFSET

2013-07-09 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun 
 Chen
 Sent: Tuesday, July 09, 2013 1:33 PM
 To: b...@kernel.crashing.org
 Cc: linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org
 Subject: [v3][PATCH 7/8] book3e/kexec/kdump: redefine VIRT_PHYS_OFFSET
 
 Book3e is always aligned 1GB to create TLB so we should
 use (KERNELBASE - MEMORY_START) as VIRT_PHYS_OFFSET to
 get __pa/__va properly while boot kdump.
 
 Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
 ---
  arch/powerpc/include/asm/page.h |2 ++
  1 file changed, 2 insertions(+)
 
 diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
 index 988c812..5b00081 100644
 --- a/arch/powerpc/include/asm/page.h
 +++ b/arch/powerpc/include/asm/page.h
 @@ -112,6 +112,8 @@ extern long long virt_phys_offset;
  /* See Description below for VIRT_PHYS_OFFSET */
  #ifdef CONFIG_RELOCATABLE_PPC32
  #define VIRT_PHYS_OFFSET virt_phys_offset
 +#elif defined(CONFIG_PPC_BOOK3E_64)
 +#define VIRT_PHYS_OFFSET (KERNELBASE - MEMORY_START)

Can you please explain this code a bit more. I am not understanding this part:)

-Bharat

  #else
  #define VIRT_PHYS_OFFSET (KERNELBASE - PHYSICAL_START)
  #endif
 --
 1.7.9.5
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [v2][PATCH 1/7] powerpc/book3e: support CONFIG_RELOCATABLE

2013-07-01 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun 
 Chen
 Sent: Thursday, June 20, 2013 1:23 PM
 To: b...@kernel.crashing.org
 Cc: linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org
 Subject: [v2][PATCH 1/7] powerpc/book3e: support CONFIG_RELOCATABLE
 
 book3e is different with book3s since 3s includes the exception
 vectors code in head_64.S as it relies on absolute addressing
 which is only possible within this compilation unit. So we have
 to get that label address with got.
 
 And when boot a relocated kernel, we should reset ipvr properly again
 after .relocate.
 
 Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
 ---
  arch/powerpc/include/asm/exception-64e.h |8 
  arch/powerpc/kernel/exceptions-64e.S |   15 ++-
  arch/powerpc/kernel/head_64.S|   22 ++
  arch/powerpc/lib/feature-fixups.c|7 +++
  4 files changed, 51 insertions(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/include/asm/exception-64e.h
 b/arch/powerpc/include/asm/exception-64e.h
 index 51fa43e..89e940d 100644
 --- a/arch/powerpc/include/asm/exception-64e.h
 +++ b/arch/powerpc/include/asm/exception-64e.h
 @@ -214,10 +214,18 @@ exc_##label##_book3e:
  #define TLB_MISS_STATS_SAVE_INFO_BOLTED
  #endif
 
 +#ifndef CONFIG_RELOCATABLE
  #define SET_IVOR(vector_number, vector_offset)   \
   li  r3,vector_offset@l; \
   ori r3,r3,interrupt_base_book3e@l;  \
   mtspr   SPRN_IVOR##vector_number,r3;
 +#else
 +#define SET_IVOR(vector_number, vector_offset)   \
 + LOAD_REG_ADDR(r3,interrupt_base_book3e);\
 + rlwinm  r3,r3,0,15,0;   \
 + ori r3,r3,vector_offset@l;  \
 + mtspr   SPRN_IVOR##vector_number,r3;
 +#endif
 
  #endif /* _ASM_POWERPC_EXCEPTION_64E_H */
 
 diff --git a/arch/powerpc/kernel/exceptions-64e.S
 b/arch/powerpc/kernel/exceptions-64e.S
 index 645170a..4b23119 100644
 --- a/arch/powerpc/kernel/exceptions-64e.S
 +++ b/arch/powerpc/kernel/exceptions-64e.S
 @@ -1097,7 +1097,15 @@ skpinv:addir6,r6,1 
 /*
 Increment */
   * r4 = MAS0 w/TLBSEL  ESEL for the temp mapping
   */
   /* Now we branch the new virtual address mapped by this entry */
 +#ifdef CONFIG_RELOCATABLE
 + /* We have to find out address from lr. */
 + bl  1f  /* Find our address */
 +1:   mflrr6
 + addir6,r6,(2f - 1b)
 + tovirt(r6,r6)
 +#else
   LOAD_REG_IMMEDIATE(r6,2f)
 +#endif
   lis r7,MSR_KERNEL@h
   ori r7,r7,MSR_KERNEL@l
   mtspr   SPRN_SRR0,r6
 @@ -1348,9 +1356,14 @@ _GLOBAL(book3e_secondary_thread_init)
   mflrr28
   b   3b
 
 -_STATIC(init_core_book3e)
 +_GLOBAL(init_core_book3e)
   /* Establish the interrupt vector base */
 +#ifdef CONFIG_RELOCATABLE
 + tovirt(r2,r2)
 + LOAD_REG_ADDR(r3, interrupt_base_book3e)
 +#else
   LOAD_REG_IMMEDIATE(r3, interrupt_base_book3e)
 +#endif
   mtspr   SPRN_IVPR,r3
   sync
   blr
 diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
 index b61363d..0942f3a 100644
 --- a/arch/powerpc/kernel/head_64.S
 +++ b/arch/powerpc/kernel/head_64.S
 @@ -414,12 +414,22 @@ _STATIC(__after_prom_start)
   /* process relocations for the final address of the kernel */
   lis r25,PAGE_OFFSET@highest /* compute virtual base of kernel */
   sldir25,r25,32
 +#if defined(CONFIG_PPC_BOOK3E)
 + tovirt(r26,r26) /* on booke, we already run at
 PAGE_OFFSET */
 +#endif
   lwz r7,__run_at_load-_stext(r26)
 +#if defined(CONFIG_PPC_BOOK3E)
 + tophys(r26,r26) /* Restore for the remains. */
 +#endif
   cmplwi  cr0,r7,1/* flagged to stay where we are ? */
   bne 1f
   add r25,r25,r26
  1:   mr  r3,r25
   bl  .relocate
 +#if defined(CONFIG_PPC_BOOK3E)
 + /* We should set ivpr again after .relocate. */
 + bl  .init_core_book3e
 +#endif
  #endif
 
  /*
 @@ -447,12 +457,24 @@ _STATIC(__after_prom_start)
   * variable __run_at_load, if it is set the kernel is treated as relocatable
   * kernel, otherwise it will be moved to PHYSICAL_START
   */
 +#if defined(CONFIG_PPC_BOOK3E)
 + tovirt(r26,r26) /* on booke, we already run at
 PAGE_OFFSET */
 +#endif
   lwz r7,__run_at_load-_stext(r26)
 +#if defined(CONFIG_PPC_BOOK3E)
 + tophys(r26,r26) /* Restore for the remains. */
 +#endif
   cmplwi  cr0,r7,1
   bne 3f
 
 +#ifdef CONFIG_PPC_BOOK3E
 + LOAD_REG_ADDR(r5, interrupt_end_book3e)
 + LOAD_REG_ADDR(r11, _stext)
 + sub r5,r5,r11
 +#else
   /* just copy interrupts */
   LOAD_REG_IMMEDIATE(r5, __end_interrupts - _stext)
 +#endif
   b   5f
  3:
  #endif
 diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-
 fixups.c

RE: [v2][PATCH 2/7] book3e/kexec/kdump: enable kexec for kernel

2013-07-01 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun 
 Chen
 Sent: Thursday, June 20, 2013 1:23 PM
 To: b...@kernel.crashing.org
 Cc: linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org
 Subject: [v2][PATCH 2/7] book3e/kexec/kdump: enable kexec for kernel
 
 We need to active KEXEC for book3e and bypass or convert non-book3e stuff
 in kexec coverage.
 
 Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
 ---
  arch/powerpc/Kconfig   |2 +-
  arch/powerpc/kernel/machine_kexec_64.c |6 ++
  arch/powerpc/kernel/misc_64.S  |6 ++
  3 files changed, 13 insertions(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
 index c33e3ad..6ecf3c9 100644
 --- a/arch/powerpc/Kconfig
 +++ b/arch/powerpc/Kconfig
 @@ -364,7 +364,7 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
 
  config KEXEC
   bool kexec system call
 - depends on (PPC_BOOK3S || FSL_BOOKE || (44x  !SMP))
 + depends on (PPC_BOOK3S || FSL_BOOKE || (44x  !SMP)) || PPC_BOOK3E
   help
 kexec is a system call that implements the ability to shutdown your
 current kernel, and to start another kernel.  It is like a reboot
 diff --git a/arch/powerpc/kernel/machine_kexec_64.c
 b/arch/powerpc/kernel/machine_kexec_64.c
 index 611acdf..ef39271 100644
 --- a/arch/powerpc/kernel/machine_kexec_64.c
 +++ b/arch/powerpc/kernel/machine_kexec_64.c
 @@ -33,6 +33,7 @@
  int default_machine_kexec_prepare(struct kimage *image)
  {
   int i;
 +#ifndef CONFIG_PPC_BOOK3E
   unsigned long begin, end;   /* limits of segment */
   unsigned long low, high;/* limits of blocked memory range */
   struct device_node *node;
 @@ -41,6 +42,7 @@ int default_machine_kexec_prepare(struct kimage *image)
 
   if (!ppc_md.hpte_clear_all)
   return -ENOENT;
 +#endif

Do we really need this function for book3e? can we have a separate function 
rather than multiple confusing ifdef?

-Bharat

 
   /*
* Since we use the kernel fault handlers and paging code to
 @@ -51,6 +53,7 @@ int default_machine_kexec_prepare(struct kimage *image)
   if (image-segment[i].mem  __pa(_end))
   return -ETXTBSY;
 
 +#ifndef CONFIG_PPC_BOOK3E
   /*
* For non-LPAR, we absolutely can not overwrite the mmu hash
* table, since we are still using the bolted entries in it to
 @@ -92,6 +95,7 @@ int default_machine_kexec_prepare(struct kimage *image)
   return -ETXTBSY;
   }
   }
 +#endif
 
   return 0;
  }
 @@ -367,6 +371,7 @@ void default_machine_kexec(struct kimage *image)
   /* NOTREACHED */
  }
 
 +#ifndef CONFIG_PPC_BOOK3E
  /* Values we need to export to the second kernel via the device tree. */
  static unsigned long htab_base;
 
 @@ -411,3 +416,4 @@ static int __init export_htab_values(void)
   return 0;
  }
  late_initcall(export_htab_values);
 +#endif
 diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
 index 6820e45..f1a7ce7 100644
 --- a/arch/powerpc/kernel/misc_64.S
 +++ b/arch/powerpc/kernel/misc_64.S
 @@ -543,9 +543,13 @@ _GLOBAL(kexec_sequence)
   lhz r25,PACAHWCPUID(r13)/* get our phys cpu from paca */
 
   /* disable interrupts, we are overwriting kernel data next */
 +#ifndef CONFIG_PPC_BOOK3E
   mfmsr   r3
   rlwinm  r3,r3,0,17,15
   mtmsrd  r3,1
 +#else
 + wrteei  0
 +#endif
 
   /* copy dest pages, flush whole dest image */
   mr  r3,r29
 @@ -567,10 +571,12 @@ _GLOBAL(kexec_sequence)
   li  r6,1
   stw r6,kexec_flag-1b(5)
 
 +#ifndef CONFIG_PPC_BOOK3E
   /* clear out hardware hash page table and tlb */
   ld  r5,0(r27)   /* deref function descriptor */
   mtctr   r5
   bctrl   /* ppc_md.hpte_clear_all(void); */
 +#endif
 
  /*
   *   kexec image calling is:
 --
 1.7.9.5
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [v2][PATCH 4/7] book3e/kexec/kdump: introduce a kexec kernel flag

2013-07-01 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Tiejun 
 Chen
 Sent: Thursday, June 20, 2013 1:23 PM
 To: b...@kernel.crashing.org
 Cc: linuxppc-dev@lists.ozlabs.org; linux-ker...@vger.kernel.org
 Subject: [v2][PATCH 4/7] book3e/kexec/kdump: introduce a kexec kernel flag
 
 We need to introduce a flag to indicate we're already running
 a kexec kernel then we can go proper path. For example, We
 shouldn't access spin_table from the bootloader to up any secondary
 cpu for kexec kernel, and kexec kernel already know how to jump to
 generic_secondary_smp_init.
 
 Signed-off-by: Tiejun Chen tiejun.c...@windriver.com
 ---
  arch/powerpc/include/asm/smp.h|3 +++
  arch/powerpc/kernel/head_64.S |   12 
  arch/powerpc/kernel/misc_64.S |6 ++
  arch/powerpc/platforms/85xx/smp.c |   14 ++
  4 files changed, 35 insertions(+)
 
 diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
 index ffbaabe..fbc3d9b 100644
 --- a/arch/powerpc/include/asm/smp.h
 +++ b/arch/powerpc/include/asm/smp.h
 @@ -200,6 +200,9 @@ extern void generic_secondary_thread_init(void);
  extern unsigned long __secondary_hold_spinloop;
  extern unsigned long __secondary_hold_acknowledge;
  extern char __secondary_hold;
 +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
 +extern unsigned long __run_at_kexec;
 +#endif
 
  extern void __early_start(void);
  #endif /* __ASSEMBLY__ */
 diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
 index 3e19ba2..ffa4b18 100644
 --- a/arch/powerpc/kernel/head_64.S
 +++ b/arch/powerpc/kernel/head_64.S
 @@ -89,6 +89,12 @@ __secondary_hold_spinloop:
  __secondary_hold_acknowledge:
   .llong  0x0
 
 +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
 + .globl  __run_at_kexec
 +__run_at_kexec:
 + .llong  0x0 /* Flag for the secondary kernel from kexec. */
 +#endif
 +
  #ifdef CONFIG_RELOCATABLE
   /* This flag is set to 1 by a loader if the kernel should run
* at the loaded address instead of the linked address.  This
 @@ -417,6 +423,12 @@ _STATIC(__after_prom_start)
  #if defined(CONFIG_PPC_BOOK3E)
   tovirt(r26,r26) /* on booke, we already run at
 PAGE_OFFSET */
  #endif
 +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
 + /* If relocated we need to restore this flag on that relocated address. 
 */
 + ld  r7,__run_at_kexec-_stext(r26)
 + std r7,__run_at_kexec-_stext(r26)
 +#endif
 +
   lwz r7,__run_at_load-_stext(r26)
  #if defined(CONFIG_PPC_BOOK3E)
   tophys(r26,r26) /* Restore for the remains. */
 diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
 index 20cbb98..c89aead 100644
 --- a/arch/powerpc/kernel/misc_64.S
 +++ b/arch/powerpc/kernel/misc_64.S
 @@ -619,6 +619,12 @@ _GLOBAL(kexec_sequence)
   bl  .copy_and_flush /* (dest, src, copy limit, start offset) */
  1:   /* assume normal blr return */
 
 + /* notify we're going into kexec kernel for SMP. */
 + LOAD_REG_ADDR(r3,__run_at_kexec)
 + li  r4,1
 + std r4,0(r3)
 + sync
 +
   /* release other cpus to the new kernel secondary start at 0x60 */
   mflrr5
   li  r6,1
 diff --git a/arch/powerpc/platforms/85xx/smp.c
 b/arch/powerpc/platforms/85xx/smp.c
 index 6a17599..b308373 100644
 --- a/arch/powerpc/platforms/85xx/smp.c
 +++ b/arch/powerpc/platforms/85xx/smp.c
 @@ -150,6 +150,9 @@ static int __cpuinit smp_85xx_kick_cpu(int nr)
   int hw_cpu = get_hard_smp_processor_id(nr);
   int ioremappable;
   int ret = 0;
 +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
 + unsigned long *ptr;
 +#endif

What about if we can remove the ifdef around *ptr ...

 
   WARN_ON(nr  0 || nr = NR_CPUS);
   WARN_ON(hw_cpu  0 || hw_cpu = NR_CPUS);
 @@ -238,11 +241,22 @@ out:
  #else
   smp_generic_kick_cpu(nr);
 
 +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
 + ptr  = (unsigned long *)((unsigned long)__run_at_kexec);

... #endif here ...

 + /* We shouldn't access spin_table from the bootloader to up any
 +  * secondary cpu for kexec kernel, and kexec kernel already
 +  * know how to jump to generic_secondary_smp_init.
 +  */
 + if (!*ptr) {
 +#endif

... remove #endif ...

   flush_spin_table(spin_table);
   out_be32(spin_table-pir, hw_cpu);
   out_be64((u64 *)(spin_table-addr_h),
 __pa((u64)*((unsigned long long *)generic_secondary_smp_init)));
   flush_spin_table(spin_table);
 +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
 + }
 +#endif

--- remove above 3 lines

-Bharat

  #endif
 
   local_irq_restore(flags);
 --
 1.7.9.5
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction

2013-06-26 Thread Bhushan Bharat-R65777



 -Original Message-
 From: tiejun.chen [mailto:tiejun.c...@windriver.com]
 Sent: Wednesday, June 26, 2013 12:25 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott-
 B07421; b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org; linux-
 ker...@vger.kernel.org; mi...@neuling.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv 
 instruction
 
 On 06/26/2013 01:42 PM, Bharat Bhushan wrote:
  ehpriv instruction is used for setting software breakpoints
  by user space. This patch adds support to exit to user space
  with run-debug have relevant information.
 
  As this is the first point we are using run-debug, also defined
  the run-debug structure.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
arch/powerpc/include/asm/disassemble.h |4 
arch/powerpc/include/uapi/asm/kvm.h|   21 +
arch/powerpc/kvm/e500_emulate.c|   27 +++
3 files changed, 48 insertions(+), 4 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/disassemble.h
 b/arch/powerpc/include/asm/disassemble.h
  index 9b198d1..856f8de 100644
  --- a/arch/powerpc/include/asm/disassemble.h
  +++ b/arch/powerpc/include/asm/disassemble.h
  @@ -77,4 +77,8 @@ static inline unsigned int get_d(u32 inst)
  return inst  0x;
}
 
  +static inline unsigned int get_oc(u32 inst)
  +{
  +   return (inst  11)  0x7fff;
  +}
#endif /* __ASM_PPC_DISASSEMBLE_H__ */
  diff --git a/arch/powerpc/include/uapi/asm/kvm.h
 b/arch/powerpc/include/uapi/asm/kvm.h
  index 0fb1a6e..ded0607 100644
  --- a/arch/powerpc/include/uapi/asm/kvm.h
  +++ b/arch/powerpc/include/uapi/asm/kvm.h
  @@ -269,7 +269,24 @@ struct kvm_fpu {
  __u64 fpr[32];
};
 
  +/*
  + * Defines for h/w breakpoint, watchpoint (read, write or both) and
  + * software breakpoint.
  + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status
  + * for KVM_DEBUG_EXIT.
  + */
  +#define KVMPPC_DEBUG_NONE  0x0
  +#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
  +#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
  +#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
struct kvm_debug_exit_arch {
  +   __u64 address;
  +   /*
  +* exiting to userspace because of h/w breakpoint, watchpoint
  +* (read, write or both) and software breakpoint.
  +*/
  +   __u32 status;
  +   __u32 reserved;
};
 
/* for KVM_SET_GUEST_DEBUG */
  @@ -281,10 +298,6 @@ struct kvm_guest_debug_arch {
   * Type denotes h/w breakpoint, read watchpoint, write
   * watchpoint or watchpoint (both read and write).
   */
  -#define KVMPPC_DEBUG_NONE  0x0
  -#define KVMPPC_DEBUG_BREAKPOINT(1UL  1)
  -#define KVMPPC_DEBUG_WATCH_WRITE   (1UL  2)
  -#define KVMPPC_DEBUG_WATCH_READ(1UL  3)
  __u32 type;
  __u32 reserved;
  } bp[16];
  diff --git a/arch/powerpc/kvm/e500_emulate.c 
  b/arch/powerpc/kvm/e500_emulate.c
  index b10a012..dab9d07 100644
  --- a/arch/powerpc/kvm/e500_emulate.c
  +++ b/arch/powerpc/kvm/e500_emulate.c
  @@ -26,6 +26,8 @@
#define XOP_TLBRE   946
#define XOP_TLBWE   978
#define XOP_TLBILX  18
  +#define XOP_EHPRIV  270
  +#define EHPRIV_OC_DEBUG 0
 
 As I think the case, OC = 0, is a bit specific since IIRC, if the OC
 operand is omitted, its equal 0 by default. So I think we should start this OC
 value from 1 or other magic number.

ehpriv instruction is defined to be used as:
ehpriv OC // where OC can be 0,1, ... n
and in extended for it can be used as
ehpriv // With no OC, and here it assumes OC = 0
So OC = 0 is not specific but ehpriv is same as ehpriv 0.

I do not think of any special reason to reserve ehpriv and ehpriv 0.

Thanks
-Bharat

 
 And if possible, we'd better add some comments to describe this to make the OC
 definition readable.
 
 Tiejun
 
 
#ifdef CONFIG_KVM_E500MC
static int dbell2prio(ulong param)
  @@ -82,6 +84,26 @@ static int kvmppc_e500_emul_msgsnd(struct kvm_vcpu *vcpu,
 int rb)
}
#endif
 
  +static int kvmppc_e500_emul_ehpriv(struct kvm_run *run, struct kvm_vcpu
 *vcpu,
  +  unsigned int inst, int *advance)
  +{
  +   int emulated = EMULATE_DONE;
  +
  +   switch (get_oc(inst)) {
  +   case EHPRIV_OC_DEBUG:
  +   run-exit_reason = KVM_EXIT_DEBUG;
  +   run-debug.arch.address = vcpu-arch.pc;
  +   run-debug.arch.status = 0;
  +   kvmppc_account_exit(vcpu, DEBUG_EXITS);
  +   emulated = EMULATE_EXIT_USER;
  +   *advance = 0;
  +   break;
  +   default:
  +   emulated = EMULATE_FAIL;
  +   }
  +   return emulated;
  +}
  +
int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
   unsigned int inst, int *advance)
{
  @@ -130,6 +152,11 @@ int kvmppc_core_emulate_op(struct

RE: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv instruction

2013-06-26 Thread Bhushan Bharat-R65777



 -Original Message-
 From: tiejun.chen [mailto:tiejun.c...@windriver.com]
 Sent: Wednesday, June 26, 2013 2:47 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood Scott-
 B07421; b...@kernel.crashing.org; linuxppc-dev@lists.ozlabs.org; linux-
 ker...@vger.kernel.org; mi...@neuling.org
 Subject: Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv 
 instruction
 
 On 06/26/2013 04:44 PM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From: tiejun.chen [mailto:tiejun.c...@windriver.com]
  Sent: Wednesday, June 26, 2013 12:25 PM
  To: Bhushan Bharat-R65777
  Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; ag...@suse.de; Wood
  Scott- B07421; b...@kernel.crashing.org;
  linuxppc-dev@lists.ozlabs.org; linux- ker...@vger.kernel.org;
  mi...@neuling.org; Bhushan Bharat-R65777
  Subject: Re: [PATCH 4/6 v5] KVM: PPC: exit to user space on ehpriv
  instruction
 
  On 06/26/2013 01:42 PM, Bharat Bhushan wrote:
  ehpriv instruction is used for setting software breakpoints by
  user space. This patch adds support to exit to user space with
  run-debug have relevant information.
 
  As this is the first point we are using run-debug, also defined the
  run-debug structure.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
 arch/powerpc/include/asm/disassemble.h |4 
 arch/powerpc/include/uapi/asm/kvm.h|   21 +
 arch/powerpc/kvm/e500_emulate.c|   27 
  +++
 3 files changed, 48 insertions(+), 4 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/disassemble.h
  b/arch/powerpc/include/asm/disassemble.h
  index 9b198d1..856f8de 100644
  --- a/arch/powerpc/include/asm/disassemble.h
  +++ b/arch/powerpc/include/asm/disassemble.h
  @@ -77,4 +77,8 @@ static inline unsigned int get_d(u32 inst)
return inst  0x;
 }
 
  +static inline unsigned int get_oc(u32 inst) {
  + return (inst  11)  0x7fff;
  +}
 #endif /* __ASM_PPC_DISASSEMBLE_H__ */ diff --git
  a/arch/powerpc/include/uapi/asm/kvm.h
  b/arch/powerpc/include/uapi/asm/kvm.h
  index 0fb1a6e..ded0607 100644
  --- a/arch/powerpc/include/uapi/asm/kvm.h
  +++ b/arch/powerpc/include/uapi/asm/kvm.h
  @@ -269,7 +269,24 @@ struct kvm_fpu {
__u64 fpr[32];
 };
 
  +/*
  + * Defines for h/w breakpoint, watchpoint (read, write or both) and
  + * software breakpoint.
  + * These are used as type in KVM_SET_GUEST_DEBUG ioctl and status
  + * for KVM_DEBUG_EXIT.
  + */
  +#define KVMPPC_DEBUG_NONE0x0
  +#define KVMPPC_DEBUG_BREAKPOINT  (1UL  1)
  +#define KVMPPC_DEBUG_WATCH_WRITE (1UL  2)
  +#define KVMPPC_DEBUG_WATCH_READ  (1UL  3)
 struct kvm_debug_exit_arch {
  + __u64 address;
  + /*
  +  * exiting to userspace because of h/w breakpoint, watchpoint
  +  * (read, write or both) and software breakpoint.
  +  */
  + __u32 status;
  + __u32 reserved;
 };
 
 /* for KVM_SET_GUEST_DEBUG */
  @@ -281,10 +298,6 @@ struct kvm_guest_debug_arch {
 * Type denotes h/w breakpoint, read watchpoint, write
 * watchpoint or watchpoint (both read and write).
 */
  -#define KVMPPC_DEBUG_NONE0x0
  -#define KVMPPC_DEBUG_BREAKPOINT  (1UL  1)
  -#define KVMPPC_DEBUG_WATCH_WRITE (1UL  2)
  -#define KVMPPC_DEBUG_WATCH_READ  (1UL  3)
__u32 type;
__u32 reserved;
} bp[16];
  diff --git a/arch/powerpc/kvm/e500_emulate.c
  b/arch/powerpc/kvm/e500_emulate.c index b10a012..dab9d07 100644
  --- a/arch/powerpc/kvm/e500_emulate.c
  +++ b/arch/powerpc/kvm/e500_emulate.c
  @@ -26,6 +26,8 @@
 #define XOP_TLBRE   946
 #define XOP_TLBWE   978
 #define XOP_TLBILX  18
  +#define XOP_EHPRIV  270
  +#define EHPRIV_OC_DEBUG 0
 
  As I think the case, OC = 0, is a bit specific since IIRC, if the
  OC operand is omitted, its equal 0 by default. So I think we should
  start this OC value from 1 or other magic number.
 
  ehpriv instruction is defined to be used as:
  ehpriv OC // where OC can be 0,1, ... n and in extended for it can be
  used as
  ehpriv // With no OC, and here it assumes OC = 0 So OC = 0 is not
  specific but ehpriv is same as ehpriv 0.
 
 Yes, this is just what I mean.
 
 
  I do not think of any special reason to reserve ehpriv and ehpriv 0.
 
 So I still prefer we can reserve the 'ehpriv' without OC operand as one simple
 approach to test or develop something for KVM quickly because its really
 convenient to trap into the hypervisor only with one 'ehpriv' instruction
 easily.
 
 But I have no further objection if you guys are fine to this ;-)

I can see the using ehpriv can be a default choice. But all ehvpriv trap is 
handled at one place (in a single function) so the accidently overlap with 
debug should not be an issue.

I too do not have any strong opinion to keep either way, so want

RE: [PATCH 0/2 v3] powerpc: Make ptrace work reliably

2013-06-11 Thread Bhushan Bharat-R65777

Hi Ben,

Ping; 
Please review this patchset .. 

Thanks
-Bharat

 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Wednesday, May 22, 2013 9:51 AM
 To: ga...@kernel.crashing.org; b...@kernel.crashing.org; linuxppc-
 d...@lists.ozlabs.org; Wood Scott-B07421; Yoder Stuart-B08248; Yang 
 James-RA8135
 Cc: Bhushan Bharat-R65777
 Subject: [PATCH 0/2 v3] powerpc: Make ptrace work reliably
 
 From: Bharat Bhushan bharat.bhus...@freescale.com
 
 v2-v3
  - Load PACACURRENT immediately after _MSR(r1), and load DBCR0
just after beq resume_kernel
  - Added lat_sysycal results before and after the patch
 
 v1-v2
  - Subject line was missing 0/2, 1/2, 2/2
 
 Bharat Bhushan (2):
   powerpc: debug control and status registers are 32bit = This patch makes
 debug control and status registers as 32bit as they are.
This does not fix anything
 
   powerpc: restore dbcr0 on user space exit = This patch fixes the ptrace
 reliability issue. The description is the patch
describes one of the case where it does not work reliably
 
  arch/powerpc/include/asm/processor.h |8 
  arch/powerpc/kernel/asm-offsets.c|1 +
  arch/powerpc/kernel/entry_64.S   |   28 
  3 files changed, 29 insertions(+), 8 deletions(-)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: SATA FSL and upstreaming

2013-05-16 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Benjamin
 Herrenschmidt
 Sent: Thursday, May 16, 2013 11:16 AM
 To: Liu Qiang-B32616
 Cc: linuxppc-dev@lists.ozlabs.org; Fleming Andy-AFLEMING; Xie Shaohui-B21989
 Subject: Re: SATA FSL and upstreaming

 On Thu, 2013-05-16 at 14:47 +1000, Benjamin Herrenschmidt wrote:
  Hi folks !

  So I was trying to use my 5020ds to test some stuff today. Since I
  hadn't used it in a while, I decided to upgrade it to the latest NOR
  etc...

 On another note, I can't seem to get any PCIe card recognized in any slot...

 Can you give me an example config of the DIP switches that is known to work 
 with
 some slots ? Is there some EEPROM config needed ? If yes, any pointers ? (I
 can't quite make sense of either u-boot or the doc there).

Can you give RCW dump?
Or can try the attached RCW.

Thanks
-Bharat

 Thanks,
 Ben.

 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev

rcw_15g_2000mhz.rcw
Description: rcw_15g_2000mhz.rcw
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: SATA FSL and upstreaming

2013-05-16 Thread Bhushan Bharat-R65777

Try:

From bank 0


tftp 0x100  rcw_2sgmii_1500mhz.bin
protect off 0xec00 +$filesize; erase 0xec00 +$filesize; cp.b 0x100 
0xec00 $filesize


Thanks
-Bharat

 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Thursday, May 16, 2013 11:54 AM
 To: Zang Roy-R61911
 Cc: Bhushan Bharat-R65777; Liu Qiang-B32616; Fleming Andy-AFLEMING; linuxppc-
 d...@lists.ozlabs.org; Xie Shaohui-B21989
 Subject: Re: SATA FSL and upstreaming
 
 On Thu, 2013-05-16 at 06:17 +, Zang Roy-R61911 wrote:
  Do you try slot7?
  PCIe1 connects to slot7 directly.
 
 I tried all slots. None of them sees any card. The card also doesn't seem to 
 be
 powered up (none of the LEDs blink, it's an e1000 since I don't have 
 networking
 with upstream).
 
 I also tried a different card and uboot is pretty adamant at saying no link 
 :-
 )
 
 I'll try to update the RCW when I know how to do it :-)
 
 Cheers,
 Ben.
 
  Roy
 
   -Original Message-
   From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
   Sent: Thursday, May 16, 2013 2:09 PM
   To: Zang Roy-R61911
   Cc: Bhushan Bharat-R65777; Liu Qiang-B32616; Fleming Andy-AFLEMING;
   linuxppc-dev@lists.ozlabs.org; Xie Shaohui-B21989
   Subject: Re: SATA FSL and upstreaming
  
   On Thu, 2013-05-16 at 06:05 +, Zang Roy-R61911 wrote:
I do not suggest changing the RCW. If the RCW is broken on Ben's
side, it is not easy to recover for him.
Let's check the U-boot output first.
  
   U-Boot 2013.01-9-g7bcd7f4 (Mar 14 2013 - 14:23:16)
  
   CPU0:  P5020E, Version: 1.0, (0x82280010)
   Core:  E5500, Version: 1.0, (0x80240010) Clock Configuration:
  CPU0:2000 MHz, CPU1:2000 MHz,
  CCB:800  MHz,
  DDR:666.667 MHz (1333.333 MT/s data rate) (Asynchronous),
   LBC:100 MHz
  FMAN1: 600 MHz
  QMAN:  400 MHz
  PME:   400 MHz
   L1:D-cache 32 kB enabled
  I-cache 32 kB enabled
   Board: P5020DS, Sys ID: 0x1c, Sys Ver: 0x12, FPGA Ver: 0x05, vBank:
   0 Reset Configuration Word (RCW):
  : 0c54  1e12 
  0010: d8984a01 03002000 de80 4100
  0020:    1007
  0030:     SERDES
   Reference Clocks: Bank1=100Mhz Bank2=125Mhz Bank3=125Mhz
   I2C:   ready
   SPI:   ready
   DRAM:  Initializingusing SPD
   Detected UDIMM i-DIMM
   Detected UDIMM i-DIMM
   2 GiB left unmapped
   4 GiB (DDR3, 64-bit, CL=9, ECC on)
  DDR Controller Interleaving Mode: cache line
  DDR Chip-Select Interleaving Mode: CS0+CS1 Testing 0x
   - 0x7fff Testing 0x8000 - 0x Remap DDR 2 GiB left
   unmapped
  
   POST memory PASSED
   Flash: 128 MiB
   L2:512 KB enabled
   Corenet Platform Cache: 2048 KB enabled
   SRIO1: disabled
   SRIO2: disabled
   NAND:  1024 MiB
   MMC:  FSL_SDHC: 0
   EEPROM: NXID v1
   PCIe1: Root Complex, no link, regs @ 0xfe20
   PCIe1: Bus 00 - 00
   PCIe2: disabled
   PCIe3: Root Complex, no link, regs @ 0xfe202000
   PCIe3: Bus 01 - 01
   PCIe4: disabled
   In:serial
   Out:   serial
   Err:   serial
   Net:   Initializing Fman
   Fman1: Uploading microcode version 106.1.7 PHY reset timed out PHY
   reset timed out PHY reset timed out PHY reset timed out FM1@DTSEC1,
   FM1@DTSEC2, FM1@DTSEC3, FM1@DTSEC4, FM1@DTSEC5, FM1@TGEC1 Hit any
   key to stop autoboot:  0 =
  
   Cheers,
   Ben.
  
  
 
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: SATA FSL and upstreaming

2013-05-16 Thread Bhushan Bharat-R65777



 -Original Message-
 From: tiejun.chen [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, May 16, 2013 12:13 PM
 To: Benjamin Herrenschmidt
 Cc: Zang Roy-R61911; Liu Qiang-B32616; Fleming Andy-AFLEMING; linuxppc-
 d...@lists.ozlabs.org; Xie Shaohui-B21989; Bhushan Bharat-R65777
 Subject: Re: SATA FSL and upstreaming
 
 On 05/16/2013 02:40 PM, Benjamin Herrenschmidt wrote:
  On Thu, 2013-05-16 at 14:35 +0800, tiejun.chen wrote:
  On 05/16/2013 02:21 PM, Benjamin Herrenschmidt wrote:
  On Thu, 2013-05-16 at 14:17 +0800, tiejun.chen wrote:
  I think you can use Bharat's RCW, which seems RR_HXAPNSP_0x36, then
  please take a look at this:
 
  Ok, how do I update my RCW to bse Bharat's ?
 
 
  Firstly please check which flash bank is used since we have to know
  where should be updated RCW.
 
  What is SW7[1:4]?
 
  Or we have another simple way in u-boot prompt:
 
  = md.b ffdf002c
  ffdf002c: 4f 00 fe 00 39 00 00 00 00 00 00 00 00 00 00 00
  O...9...
  ...
 
  ffdf002c: 0f 00 fe 00 00 00 00 00 00 00 00 00 00 00 00 00
  
 
  This means we're on bank4.
 
  I assume that means bank0 ?
 
 Yes, RCW should be burned to 0xec00.
 
 In u-boot prompt:
 
 = loady
 ## Ready for binary (ymodem) download to 0x0100 at 115200 bps...
 C
 
 Then send that RCW with ymodem in your terminal client.

1) Load RCW as Tiejun on some address in DDR.

2) Brun RCW at 0xec00:
protect off 0xec00 +$filesize; erase 0xec00 +$filesize; cp.b 0x100 
0xec00 $filesize

3) run  pix altbak command

4) check you are on bank4

5) If you are luckier then networking will work for you.

Thanks
-Bharat

 
 Tiejun

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: SATA FSL and upstreaming

2013-05-16 Thread Bhushan Bharat-R65777

Ben, Which SDK you are using?

-Bharat

 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Thursday, May 16, 2013 12:36 PM
 To: Zang Roy-R61911
 Cc: Bhushan Bharat-R65777; tiejun.chen; Liu Qiang-B32616; Fleming 
 Andy-AFLEMING;
 linuxppc-dev@lists.ozlabs.org; Xie Shaohui-B21989
 Subject: Re: SATA FSL and upstreaming
 
 On Thu, 2013-05-16 at 07:01 +, Zang Roy-R61911 wrote:
 
  I just tried your RCW. one e1000 card works in slot7.
  we may need to check others ...
 
 Tried 4 and 7 ...
 
 Note that this *used* to work. Last year I had this machine up with 2 cards
 doing things. Not sure what changed, it's possible that the DIP got
 inadvertently changed. Or somebody stole a jumper from it in the lab :-)
 
  U-Boot 2013.01-00078-g2741c99 (May 03 2013 - 00:20:41)
 
  CPU0:  P5020E, Version: 2.0, (0x82280020)
  Core:  E5500, Version: 1.2, (0x80240012) Clock Configuration:
 CPU0:2000 MHz, CPU1:2000 MHz,
 CCB:800  MHz,
 DDR:666.667 MHz (1333.333 MT/s data rate) (Asynchronous), LBC:100  
  MHz
 FMAN1: 600 MHz
 QMAN:  400 MHz
 PME:   400 MHz
  L1:D-cache 32 kB enabled
 I-cache 32 kB enabled
  Reset Configuration Word (RCW):
 : 0c54  1e12 
 0010: d8984a01 03002000 de80 4100
 0020:    1007
 0030:    
 
 My RCW is identical
 
  Board: P5020DS, Sys ID: 0x1c, Sys Ver: 0x02, FPGA Ver: 0x04, vBank: 4
 
 Mine is:
 Board: P5020DS, Sys ID: 0x1c, Sys Ver: 0x12, FPGA Ver: 0x05, vBank: 4
 
  SERDES Reference Clocks: Bank1=100Mhz Bank2=125Mhz Bank3=125Mhz
 
 Same.
 
  I2C:   ready
  SPI:   ready
  DRAM:  Initializingusing SPD
  Detected UDIMM i-DIMM
  Detected UDIMM i-DIMM
  2 GiB left unmapped
  4 GiB (DDR3, 64-bit, CL=9, ECC on)
 DDR Controller Interleaving Mode: cache line
 DDR Chip-Select Interleaving Mode: CS0+CS1 Testing 0x -
  0x7fff Testing 0x8000 - 0x Remap DDR 2 GiB left
  unmapped
 
  POST memory PASSED
  Flash: 128 MiB
  L2:512 KB enabled
  Corenet Platform Cache: 2048 KB enabled
  SRIO1: disabled
  SRIO2: disabled
  NAND:  1024 MiB
  MMC:  FSL_SDHC: 0
  EEPROM: Invalid ID (ff ff ff ff)
  PCIe1: Root Complex, x2, regs @ 0xfe20
01:00.0 - 8086:105e - Network controller
01:00.1 - 8086:105e - Network controller
  PCIe1: Bus 00 - 01
  PCIe2: disabled
  PCIe3: Root Complex, no link, regs @ 0xfe202000
  PCIe3: Bus 02 - 02
  PCIe4: disabled
 
 And I never see anything here anymore...
 
  In:serial
  Out:   serial
  Err:   serial
  Net:   Initializing Fman
  Fman1: Uploading microcode version 106.1.6 PHY reset timed out PHY
  reset timed out PHY reset timed out PHY reset timed out
  e1000: 00:15:17:16:ce:b8
 e1000: 00:15:17:16:ce:b9
 FM1@DTSEC1, FM1@DTSEC2, FM1@DTSEC3, FM1@DTSEC4 [PRIME],
  FM1@DTSEC5, FM1@TGEC1, e1000#0
  Warning: e1000#0 MAC addresses don't match:
  Address in SROM is 00:15:17:16:ce:b8
  Address in environment is  00:1b:21:68:5e:d4 , e1000#1
  Warning: e1000#1 using MAC address from net device
 
  =
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: SATA FSL and upstreaming

2013-05-16 Thread Bhushan Bharat-R65777

Ben,

If you are using SDK1.3 and later then the support for p5020ds rev 1.0 support 
is removed.
So use earlier sdk for rev 1.0 or wait for rev2.0 :)

Thanks
-Bharat


 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Thursday, May 16, 2013 12:36 PM
 To: Zang Roy-R61911
 Cc: Bhushan Bharat-R65777; tiejun.chen; Liu Qiang-B32616; Fleming 
 Andy-AFLEMING;
 linuxppc-dev@lists.ozlabs.org; Xie Shaohui-B21989
 Subject: Re: SATA FSL and upstreaming
 
 On Thu, 2013-05-16 at 07:01 +, Zang Roy-R61911 wrote:
 
  I just tried your RCW. one e1000 card works in slot7.
  we may need to check others ...
 
 Tried 4 and 7 ...
 
 Note that this *used* to work. Last year I had this machine up with 2 cards
 doing things. Not sure what changed, it's possible that the DIP got
 inadvertently changed. Or somebody stole a jumper from it in the lab :-)
 
  U-Boot 2013.01-00078-g2741c99 (May 03 2013 - 00:20:41)
 
  CPU0:  P5020E, Version: 2.0, (0x82280020)
  Core:  E5500, Version: 1.2, (0x80240012) Clock Configuration:
 CPU0:2000 MHz, CPU1:2000 MHz,
 CCB:800  MHz,
 DDR:666.667 MHz (1333.333 MT/s data rate) (Asynchronous), LBC:100  
  MHz
 FMAN1: 600 MHz
 QMAN:  400 MHz
 PME:   400 MHz
  L1:D-cache 32 kB enabled
 I-cache 32 kB enabled
  Reset Configuration Word (RCW):
 : 0c54  1e12 
 0010: d8984a01 03002000 de80 4100
 0020:    1007
 0030:    
 
 My RCW is identical
 
  Board: P5020DS, Sys ID: 0x1c, Sys Ver: 0x02, FPGA Ver: 0x04, vBank: 4
 
 Mine is:
 Board: P5020DS, Sys ID: 0x1c, Sys Ver: 0x12, FPGA Ver: 0x05, vBank: 4
 
  SERDES Reference Clocks: Bank1=100Mhz Bank2=125Mhz Bank3=125Mhz
 
 Same.
 
  I2C:   ready
  SPI:   ready
  DRAM:  Initializingusing SPD
  Detected UDIMM i-DIMM
  Detected UDIMM i-DIMM
  2 GiB left unmapped
  4 GiB (DDR3, 64-bit, CL=9, ECC on)
 DDR Controller Interleaving Mode: cache line
 DDR Chip-Select Interleaving Mode: CS0+CS1 Testing 0x -
  0x7fff Testing 0x8000 - 0x Remap DDR 2 GiB left
  unmapped
 
  POST memory PASSED
  Flash: 128 MiB
  L2:512 KB enabled
  Corenet Platform Cache: 2048 KB enabled
  SRIO1: disabled
  SRIO2: disabled
  NAND:  1024 MiB
  MMC:  FSL_SDHC: 0
  EEPROM: Invalid ID (ff ff ff ff)
  PCIe1: Root Complex, x2, regs @ 0xfe20
01:00.0 - 8086:105e - Network controller
01:00.1 - 8086:105e - Network controller
  PCIe1: Bus 00 - 01
  PCIe2: disabled
  PCIe3: Root Complex, no link, regs @ 0xfe202000
  PCIe3: Bus 02 - 02
  PCIe4: disabled
 
 And I never see anything here anymore...
 
  In:serial
  Out:   serial
  Err:   serial
  Net:   Initializing Fman
  Fman1: Uploading microcode version 106.1.6 PHY reset timed out PHY
  reset timed out PHY reset timed out PHY reset timed out
  e1000: 00:15:17:16:ce:b8
 e1000: 00:15:17:16:ce:b9
 FM1@DTSEC1, FM1@DTSEC2, FM1@DTSEC3, FM1@DTSEC4 [PRIME],
  FM1@DTSEC5, FM1@TGEC1, e1000#0
  Warning: e1000#0 MAC addresses don't match:
  Address in SROM is 00:15:17:16:ce:b8
  Address in environment is  00:1b:21:68:5e:d4 , e1000#1
  Warning: e1000#1 using MAC address from net device
 
  =
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 2/2 v2] powerpc: restore dbcr0 on user space exit

2013-05-16 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wood Scott-B07421
 Sent: Thursday, May 16, 2013 10:24 PM
 To: Bhushan Bharat-R65777
 Cc: ga...@kernel.crashing.org; b...@kernel.crashing.org; linuxppc-
 d...@lists.ozlabs.org; Yoder Stuart-B08248; Yang James-RA8135; Bhushan Bharat-
 R65777
 Subject: Re: [PATCH 2/2 v2] powerpc: restore dbcr0 on user space exit

 On 05/16/2013 12:34:32 AM, Bharat Bhushan wrote:
  On BookE (Branch taken + Single Step) is as same as Branch Taken on
  BookS and in Linux we simulate BookS behavior for BookE as well.
  When doing so, in Branch taken handling we want to set DBCR0_IC but we
  update the current-thread-dbcr0 and not DBCR0.

  Now on 64bit the current-thread.dbcr0 (and other debug registers) is
  synchronized ONLY on context switch flow. But after handling Branch
  taken in debug exception if we return back to user space without
  context switch then single stepping change (DBCR0_ICMP) does not get
  written in h/w DBCR0 and Instruction Complete exception does not
  happen.

  This fixes using ptrace reliably on BookE-PowerPC

  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  v1-v2
   - Subject line was not having 2/2

   arch/powerpc/kernel/asm-offsets.c |1 +
   arch/powerpc/kernel/entry_64.S|   24 
   2 files changed, 21 insertions(+), 4 deletions(-)

  diff --git a/arch/powerpc/kernel/asm-offsets.c
  b/arch/powerpc/kernel/asm-offsets.c
  index b51a97c..1e2f450 100644
  --- a/arch/powerpc/kernel/asm-offsets.c
  +++ b/arch/powerpc/kernel/asm-offsets.c
  @@ -103,6 +103,7 @@ int main(void)
   #endif /* CONFIG_VSX */
   #ifdef CONFIG_PPC64
  DEFINE(KSP_VSID, offsetof(struct thread_struct, ksp_vsid));
  +   DEFINE(THREAD_DBCR0, offsetof(struct thread_struct, dbcr0));
   #else /* CONFIG_PPC64 */
  DEFINE(PGDIR, offsetof(struct thread_struct, pgdir));  #if
  defined(CONFIG_4xx) || defined(CONFIG_BOOKE) diff --git
  a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
  index 794889b..561630d 100644
  --- a/arch/powerpc/kernel/entry_64.S
  +++ b/arch/powerpc/kernel/entry_64.S
  @@ -614,7 +614,9 @@ _GLOBAL(ret_from_except_lite)
   * from the interrupt.
   */
   #ifdef CONFIG_PPC_BOOK3E
  +   ld  r3,PACACURRENT(r13)
  wrteei  0
  +   lwz r10,(THREAD+THREAD_DBCR0)(r3)

 I know I asked you to move these earlier, but this is probably too early...
 wrteei has synchronization, so it will probably have to wait until the ld
 completes, defeating the purpose of moving it earlier.

 Ideal would probably be to load PACACURRENT immediately after _MSR(r1), and 
 load
 DBCR0 just after beq resume_kernel.

ok

 Or, move DBCR0 to therad_info as I suggested internally.

If no one have objection on moving dbcr0 to thread_info then I am happy to do 
that.

 Regardless of what you do, could you run a basic syscall benchmark (e.g. from
 lmbench) before and after the patch?

Sure.

-Bharat

 -Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Caraman
 Mihai Claudiu-B02008
 Sent: Wednesday, May 08, 2013 6:44 PM
 To: Wood Scott-B07421; tiejun.chen
 Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org
 Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

   This only disable soft interrupt for kvmppc_restart_interrupt() that
   restarts interrupts if they were meant for the host:

   a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
   BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL

  Those aren't the only exceptions that can end up going to the host.
  We could get a TLB miss that results in a heavyweight MMIO exit, etc.

   And shouldn't we handle kvmppc_restart_interrupt() like the original
   HOST flow?

   #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
   ack)   \

   START_EXCEPTION(label); \
   NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
   PROLOG_ADDITION_MASKABLE)\
   EXCEPTION_COMMON(trapnum, PACA_EXGEN,
   *INTS_DISABLE*) \
 ...

  Could you elaborate on what you mean?

 I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL
 interrupts. There is special macro masked_interrupt_book3e in those exception
 handlers that sets paca-irq_happened.

 The list of replied interrupts is limited to asynchronous noncritical 
 interrupts
 which can be masked by MSR[EE] (therefore no TLB miss).

Embedded Perfmon interrupt is also asynchronous, Why that is not in the list of 
masked interruts.

-Bharat

 Now on KVM book3e we
 don't want to put them in the irq_happened lazy state but rather to execute 
 them
 directly, so there is no reason for exception handling symmetry between host 
 and
 guest.

 -Mike

 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread Bhushan Bharat-R65777

 -Original Message-
 From: tiejun.chen [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, May 09, 2013 1:18 PM
 To: Bhushan Bharat-R65777
 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc-
 d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org
 Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

 On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote:

  -Original Message-
  From: Linuxppc-dev [mailto:linuxppc-dev-
  bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of
  bounces+Caraman
  Mihai Claudiu-B02008
  Sent: Wednesday, May 08, 2013 6:44 PM
  To: Wood Scott-B07421; tiejun.chen
  Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de;
  kvm-...@vger.kernel.org; k...@vger.kernel.org
  Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
  interrupts

  This only disable soft interrupt for kvmppc_restart_interrupt()
  that restarts interrupts if they were meant for the host:

  a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
  BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL

  Those aren't the only exceptions that can end up going to the host.
  We could get a TLB miss that results in a heavyweight MMIO exit, etc.

  And shouldn't we handle kvmppc_restart_interrupt() like the
  original HOST flow?

  #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
  ack)   \

  START_EXCEPTION(label); \
   NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
  PROLOG_ADDITION_MASKABLE)\
   EXCEPTION_COMMON(trapnum, PACA_EXGEN,
  *INTS_DISABLE*) \
   ...

  Could you elaborate on what you mean?

  I think Tiejun was saying that host has flags and replays only
  EE/DEC/DBELL interrupts. There is special macro
  masked_interrupt_book3e in those exception handlers that sets paca-
 irq_happened.

  The list of replied interrupts is limited to asynchronous noncritical
  interrupts which can be masked by MSR[EE] (therefore no TLB miss).

  Embedded Perfmon interrupt is also asynchronous, Why that is not in the list
 of masked interruts.

 Are you saying perfmon? If so, its also in that list:

  START_EXCEPTION(perfmon);
  NORMAL_EXCEPTION_PROLOG(0x260, BOOKE_INTERRUPT_PERFORMANCE_MONITOR,
  PROLOG_ADDITION_NONE)
  EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE)

Where it is recorded in paca-irq_happned to be replayed later ?

 Tiejun

  -Bharat

  Now on KVM book3e we
  don't want to put them in the irq_happened lazy state but rather to
  execute them directly, so there is no reason for exception handling
  symmetry between host and guest.

  -Mike

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Kevin Hao [mailto:haoke...@gmail.com]
 Sent: Thursday, May 09, 2013 1:38 PM
 To: Bhushan Bharat-R65777
 Cc: tiejun.chen; Caraman Mihai Claudiu-B02008; k...@vger.kernel.org; Wood 
 Scott-
 B07421; ag...@suse.de; kvm-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

 On Thu, May 09, 2013 at 07:51:09AM +, Bhushan Bharat-R65777 wrote:

   -Original Message-
   From: tiejun.chen [mailto:tiejun.c...@windriver.com]
   Sent: Thursday, May 09, 2013 1:18 PM
   To: Bhushan Bharat-R65777
   Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc-
   d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org;
   k...@vger.kernel.org
   Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
   interrupts

   On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote:

-Original Message-
From: Linuxppc-dev [mailto:linuxppc-dev-
bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf
bounces+Of Caraman
Mihai Claudiu-B02008
Sent: Wednesday, May 08, 2013 6:44 PM
To: Wood Scott-B07421; tiejun.chen
Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de;
kvm-...@vger.kernel.org; k...@vger.kernel.org
Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
interrupts

This only disable soft interrupt for kvmppc_restart_interrupt()
that restarts interrupts if they were meant for the host:

a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL

Those aren't the only exceptions that can end up going to the host.
We could get a TLB miss that results in a heavyweight MMIO exit, etc.

And shouldn't we handle kvmppc_restart_interrupt() like the
original HOST flow?

#define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
ack)   \

START_EXCEPTION(label); \
 NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
PROLOG_ADDITION_MASKABLE)\
 EXCEPTION_COMMON(trapnum, PACA_EXGEN,
*INTS_DISABLE*) \
 ...

Could you elaborate on what you mean?

I think Tiejun was saying that host has flags and replays only
EE/DEC/DBELL interrupts. There is special macro
masked_interrupt_book3e in those exception handlers that sets
paca-
   irq_happened.

The list of replied interrupts is limited to asynchronous
noncritical interrupts which can be masked by MSR[EE] (therefore no TLB
 miss).

Embedded Perfmon interrupt is also asynchronous, Why that is not
in the list
   of masked interruts.

   Are you saying perfmon? If so, its also in that list:

START_EXCEPTION(perfmon);
NORMAL_EXCEPTION_PROLOG(0x260, 
   BOOKE_INTERRUPT_PERFORMANCE_MONITOR,
PROLOG_ADDITION_NONE)
EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE)

  Where it is recorded in paca-irq_happned to be replayed later ?

 Actually we don't want replay the perfmon interrupt later. We would run it 
 even
 soft irq is disabled and just treat it as NMI. Please see the following 
 function
 quoted from arch/powerpc/perf/core-fsl-emb.c:
   /*
* If interrupts were soft-disabled when a PMU interrupt occurs, treat
* it as an NMI.
*/
   static inline int perf_intr_is_nmi(struct pt_regs *regs)
   {
   #ifdef __powerpc64__
   return !regs-softe;
   #else
   return 0;
   #endif
   }

Is it because that we cannot afford to lose perfmon interrupt for more accurate 
capturing of data ?

-Bharat

 Thanks,
 Kevin

   Tiejun

-Bharat

Now on KVM book3e we
don't want to put them in the irq_happened lazy state but rather
to execute them directly, so there is no reason for exception
handling symmetry between host and guest.

-Mike

  ___
  Linuxppc-dev mailing list
  Linuxppc-dev@lists.ozlabs.org
  https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Caraman
 Mihai Claudiu-B02008
 Sent: Wednesday, May 08, 2013 6:44 PM
 To: Wood Scott-B07421; tiejun.chen
 Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org
 Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

   This only disable soft interrupt for kvmppc_restart_interrupt() that
   restarts interrupts if they were meant for the host:

   a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
   BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL

  Those aren't the only exceptions that can end up going to the host.
  We could get a TLB miss that results in a heavyweight MMIO exit, etc.

   And shouldn't we handle kvmppc_restart_interrupt() like the original
   HOST flow?

   #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
   ack)   \

   START_EXCEPTION(label); \
   NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
   PROLOG_ADDITION_MASKABLE)\
   EXCEPTION_COMMON(trapnum, PACA_EXGEN,
   *INTS_DISABLE*) \
 ...

  Could you elaborate on what you mean?

 I think Tiejun was saying that host has flags and replays only EE/DEC/DBELL
 interrupts. There is special macro masked_interrupt_book3e in those exception
 handlers that sets paca-irq_happened.

 The list of replied interrupts is limited to asynchronous noncritical 
 interrupts
 which can be masked by MSR[EE] (therefore no TLB miss). Now on KVM book3e we
 don't want to put them in the irq_happened lazy state but rather to execute 
 them
 directly, so there is no reason for exception handling symmetry between host 
 and
 guest.

Another Question: 

The case is: 

Case 1)
 - Local_irq_disable()  will set soft_enabled = 0
 - Now Externel interrupt happens, there we set PACA_IRQ_EE in irq_happened, 
Also clears EE in SRR1 and rfi. So interrupts are hard disabled. No more other 
interrupt gated by MSR.EE can happen. Looks like the idea here is to not let a 
device keep on inserting interrupt till the interrupt condition on device is 
cleared, right?
 - local_irq_enable() - This checks that irq_happened is set, and replays

Now the case 2)
Case 2)
- Local_irq_disable()  will set soft_enabled = 0
 - Now DEC interrupt happens. We set PACA_IRQ_DEC in irq_happened, But do not 
clear EE in SRR1 and rfi. So interrupts are not hard disabled. 
 - Now say EE interrupt happens, there we set PACA_IRQ_EE in irq_happened, 
Also clears EE in SRR1 and rfi. So interrupts are hard disabled. 
 - local_irq_enable() - This checks that irq_happened is set.
IIUC, it replays only one interrupt? is not it?

-Bharat

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread Bhushan Bharat-R65777

 -Original Message-
 From: tiejun.chen [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, May 09, 2013 1:48 PM
 To: Bhushan Bharat-R65777
 Cc: Kevin Hao; Caraman Mihai Claudiu-B02008; k...@vger.kernel.org; Wood Scott-
 B07421; ag...@suse.de; kvm-...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

 On 05/09/2013 04:12 PM, Bhushan Bharat-R65777 wrote:

  -Original Message-
  From: Kevin Hao [mailto:haoke...@gmail.com]
  Sent: Thursday, May 09, 2013 1:38 PM
  To: Bhushan Bharat-R65777
  Cc: tiejun.chen; Caraman Mihai Claudiu-B02008; k...@vger.kernel.org;
  Wood Scott- B07421; ag...@suse.de; kvm-...@vger.kernel.org;
  linuxppc-dev@lists.ozlabs.org
  Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
  interrupts

  On Thu, May 09, 2013 at 07:51:09AM +, Bhushan Bharat-R65777 wrote:

  -Original Message-
  From: tiejun.chen [mailto:tiejun.c...@windriver.com]
  Sent: Thursday, May 09, 2013 1:18 PM
  To: Bhushan Bharat-R65777
  Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc-
  d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org;
  k...@vger.kernel.org
  Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
  interrupts

  On 05/09/2013 03:33 PM, Bhushan Bharat-R65777 wrote:

  -Original Message-
  From: Linuxppc-dev [mailto:linuxppc-dev-
  bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf
  bounces+Of Caraman
  Mihai Claudiu-B02008
  Sent: Wednesday, May 08, 2013 6:44 PM
  To: Wood Scott-B07421; tiejun.chen
  Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de;
  kvm-...@vger.kernel.org; k...@vger.kernel.org
  Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
  interrupts

  This only disable soft interrupt for kvmppc_restart_interrupt()
  that restarts interrupts if they were meant for the host:

  a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
  BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL

  Those aren't the only exceptions that can end up going to the host.
  We could get a TLB miss that results in a heavyweight MMIO exit, etc.

  And shouldn't we handle kvmppc_restart_interrupt() like the
  original HOST flow?

  #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
  ack)   \

  START_EXCEPTION(label); \
NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
  PROLOG_ADDITION_MASKABLE)\
EXCEPTION_COMMON(trapnum, PACA_EXGEN,
  *INTS_DISABLE*) \
   ...

  Could you elaborate on what you mean?

  I think Tiejun was saying that host has flags and replays only
  EE/DEC/DBELL interrupts. There is special macro
  masked_interrupt_book3e in those exception handlers that sets
  paca-
  irq_happened.

  The list of replied interrupts is limited to asynchronous
  noncritical interrupts which can be masked by MSR[EE] (therefore
  no TLB
  miss).

  Embedded Perfmon interrupt is also asynchronous, Why that is not
  in the list
  of masked interruts.

  Are you saying perfmon? If so, its also in that list:

START_EXCEPTION(perfmon);
NORMAL_EXCEPTION_PROLOG(0x260,
 BOOKE_INTERRUPT_PERFORMANCE_MONITOR,
PROLOG_ADDITION_NONE)
EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE)

  Where it is recorded in paca-irq_happned to be replayed later ?

  Actually we don't want replay the perfmon interrupt later. We would
  run it even soft irq is disabled and just treat it as NMI. Please see
  the following function quoted from arch/powerpc/perf/core-fsl-emb.c:
 /*
  * If interrupts were soft-disabled when a PMU interrupt occurs, treat
  * it as an NMI.
  */
 static inline int perf_intr_is_nmi(struct pt_regs *regs)
 {
 #ifdef __powerpc64__
 return !regs-softe;
 #else
 return 0;
 #endif
 }

  Is it because that we cannot afford to lose perfmon interrupt for more
 accurate capturing of data ?

  powerpc/perf: e500 support

  This implements perf_event support for the Freescale embedded performance
  monitor, based on the existing perf_event.c that supports server/classic
  chips.

  Some limitations:
  - Performance monitor interrupts are regular EE interrupts, and thus you
can't profile places with interrupts disabled.  We may want to 
 implement
soft IRQ-disabling, with perfmon interrupts exempted and treated as 
 NMIs.

Ahh, that gives the answer and same as I expected :)

-Bharat

 Tiejun

  -Bharat

  Thanks,
  Kevin

  Tiejun

  -Bharat

  Now on KVM book3e we
  don't want to put them in the irq_happened lazy state but rather
  to execute them directly, so there is no reason for exception
  handling symmetry between host and guest.

  -Mike

  ___
  Linuxppc-dev mailing list

RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread Bhushan Bharat-R65777

 -Original Message-
 From: tiejun.chen [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, May 09, 2013 3:15 PM
 To: Bhushan Bharat-R65777
 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc-
 d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org
 Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

 On 05/09/2013 04:23 PM, Bhushan Bharat-R65777 wrote:

  -Original Message-
  From: Linuxppc-dev [mailto:linuxppc-dev-
  bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of
  bounces+Caraman
  Mihai Claudiu-B02008
  Sent: Wednesday, May 08, 2013 6:44 PM
  To: Wood Scott-B07421; tiejun.chen
  Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de;
  kvm-...@vger.kernel.org; k...@vger.kernel.org
  Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
  interrupts

  This only disable soft interrupt for kvmppc_restart_interrupt()
  that restarts interrupts if they were meant for the host:

  a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
  BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL

  Those aren't the only exceptions that can end up going to the host.
  We could get a TLB miss that results in a heavyweight MMIO exit, etc.

  And shouldn't we handle kvmppc_restart_interrupt() like the
  original HOST flow?

  #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
  ack)   \

  START_EXCEPTION(label); \
   NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
  PROLOG_ADDITION_MASKABLE)\
   EXCEPTION_COMMON(trapnum, PACA_EXGEN,
  *INTS_DISABLE*) \
   ...

  Could you elaborate on what you mean?

  I think Tiejun was saying that host has flags and replays only
  EE/DEC/DBELL interrupts. There is special macro
  masked_interrupt_book3e in those exception handlers that sets paca-
 irq_happened.

  The list of replied interrupts is limited to asynchronous noncritical
  interrupts which can be masked by MSR[EE] (therefore no TLB miss).
  Now on KVM book3e we don't want to put them in the irq_happened lazy
  state but rather to execute them directly, so there is no reason for
  exception handling symmetry between host and guest.

  Another Question:

  The case is:

 Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still occur as I recall.

  Case 1)
- Local_irq_disable()  will set soft_enabled = 0
- Now Externel interrupt happens, there we set PACA_IRQ_EE in 
  irq_happened,
 Also clears EE in SRR1 and rfi. So interrupts are hard disabled. No more other
 interrupt gated by MSR.EE can happen. Looks like the idea here is to not let a
 device keep on inserting interrupt till the interrupt condition on device is
 cleared, right?

 I don't understand the interrupt condition on device is cleared here.

 I think regardless if you clear the device interrupt status, the system still
 receive a pending interrupt once EE or GS = 1.

Once yes, but I think to avoid flood of device interrupt we disable MSR.EE when 
soft-disabled.

- local_irq_enable() - This checks that irq_happened is set, and
  replays

 ret_from_except also check to replay.

  Now the case 2)
  Case 2)
  - Local_irq_disable()  will set soft_enabled = 0
- Now DEC interrupt happens. We set PACA_IRQ_DEC in irq_happened, But do
 not clear EE in SRR1 and rfi. So interrupts are not hard disabled.
- Now say EE interrupt happens, there we set PACA_IRQ_EE in irq_happened,
 Also clears EE in SRR1 and rfi. So interrupts are hard disabled.
- local_irq_enable() - This checks that irq_happened is set.
  IIUC, it replays only one interrupt? is not it?

 After anyone is replayed in arch_local_irq_restore(), we will set soft/hard
 interrupt there:

 set_soft_enabled(1);
 __hard_irq_enable();

 Then any pending interrupt can be executed now.

Do you mean that the interrupt should fire again?

 Additionally, ret_from_except probably check to replay all.

Local_irq_enable() will not take us to ret_from_except.

-Bharat

 Tiejun

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

2013-05-09 Thread Bhushan Bharat-R65777

 -Original Message-
 From: tiejun.chen [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, May 09, 2013 3:48 PM
 To: Bhushan Bharat-R65777
 Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc-
 d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org;
 k...@vger.kernel.org
 Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable interrupts

 On 05/09/2013 06:00 PM, Bhushan Bharat-R65777 wrote:

  -Original Message-
  From: tiejun.chen [mailto:tiejun.c...@windriver.com]
  Sent: Thursday, May 09, 2013 3:15 PM
  To: Bhushan Bharat-R65777
  Cc: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; linuxppc-
  d...@lists.ozlabs.org; ag...@suse.de; kvm-...@vger.kernel.org;
  k...@vger.kernel.org
  Subject: Re: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
  interrupts

  On 05/09/2013 04:23 PM, Bhushan Bharat-R65777 wrote:

  -Original Message-
  From: Linuxppc-dev [mailto:linuxppc-dev-
  bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of
  bounces+Caraman
  Mihai Claudiu-B02008
  Sent: Wednesday, May 08, 2013 6:44 PM
  To: Wood Scott-B07421; tiejun.chen
  Cc: linuxppc-dev@lists.ozlabs.org; ag...@suse.de;
  kvm-...@vger.kernel.org; k...@vger.kernel.org
  Subject: RE: [RFC][KVM][PATCH 1/1] kvm:ppc:booke-64: soft-disable
  interrupts

  This only disable soft interrupt for kvmppc_restart_interrupt()
  that restarts interrupts if they were meant for the host:

  a. SOFT_DISABLE_INTS() only for BOOKE_INTERRUPT_EXTERNAL |
  BOOKE_INTERRUPT_DECREMENTER | BOOKE_INTERRUPT_DOORBELL

  Those aren't the only exceptions that can end up going to the host.
  We could get a TLB miss that results in a heavyweight MMIO exit, etc.

  And shouldn't we handle kvmppc_restart_interrupt() like the
  original HOST flow?

  #define MASKABLE_EXCEPTION(trapnum, intnum, label, hdlr,
  ack)   \

  START_EXCEPTION(label); \
NORMAL_EXCEPTION_PROLOG(trapnum, intnum,
  PROLOG_ADDITION_MASKABLE)\
EXCEPTION_COMMON(trapnum, PACA_EXGEN,
  *INTS_DISABLE*) \
 ...

  Could you elaborate on what you mean?

  I think Tiejun was saying that host has flags and replays only
  EE/DEC/DBELL interrupts. There is special macro
  masked_interrupt_book3e in those exception handlers that sets paca-
  irq_happened.

  The list of replied interrupts is limited to asynchronous
  noncritical interrupts which can be masked by MSR[EE] (therefore no TLB
 miss).
  Now on KVM book3e we don't want to put them in the irq_happened
  lazy state but rather to execute them directly, so there is no
  reason for exception handling symmetry between host and guest.

  Another Question:

  The case is:

  Actually in the case GS=1 even if EE=0, EXT/DEC/DBELL still occur as I
 recall.

  Case 1)
 - Local_irq_disable()  will set soft_enabled = 0
 - Now Externel interrupt happens, there we set PACA_IRQ_EE in
  irq_happened,
  Also clears EE in SRR1 and rfi. So interrupts are hard disabled. No
  more other interrupt gated by MSR.EE can happen. Looks like the idea
  here is to not let a device keep on inserting interrupt till the
  interrupt condition on device is cleared, right?

  I don't understand the interrupt condition on device is cleared here.

  I think regardless if you clear the device interrupt status, the
  system still receive a pending interrupt once EE or GS = 1.

  Once yes, but I think to avoid flood of device interrupt we disable MSR.EE
 when soft-disabled.

 But we neither ACK nor send EOI to that irq in the interrupt controller, so 
 that
 should be in pending state.

 - local_irq_enable() - This checks that irq_happened is set, and
  replays

  ret_from_except also check to replay.

  Now the case 2)
  Case 2)
  - Local_irq_disable()  will set soft_enabled = 0
 - Now DEC interrupt happens. We set PACA_IRQ_DEC in
  irq_happened, But do
  not clear EE in SRR1 and rfi. So interrupts are not hard disabled.
 - Now say EE interrupt happens, there we set PACA_IRQ_EE in
  irq_happened,
  Also clears EE in SRR1 and rfi. So interrupts are hard disabled.
 - local_irq_enable() - This checks that irq_happened is set.
  IIUC, it replays only one interrupt? is not it?

  After anyone is replayed in arch_local_irq_restore(), we will set
  soft/hard interrupt there:

  set_soft_enabled(1);
  __hard_irq_enable();

  Then any pending interrupt can be executed now.

  Do you mean that the interrupt should fire again?

 I means the pending exception including external interrupt, the decrementer
 exception and the doorbell exception, can trap CPU once EE=1 with
 __hard_irq_enable() here. Then the kernel can handle those exception since 
 soft
 enable is also 1 now.

  Additionally, ret_from_except probably check to replay all.

  Local_irq_enable() will not take us to ret_from_except.

 Yes. I just say ret_from_except can provide an approach to replay all

RE: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry

2013-05-09 Thread Bhushan Bharat-R65777



 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
 Of
 Scott Wood
 Sent: Friday, May 10, 2013 8:40 AM
 To: Alexander Graf; Benjamin Herrenschmidt
 Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; 
 linuxppc-dev@lists.ozlabs.org;
 Wood Scott-B07421
 Subject: [PATCH v2 3/4] kvm/ppc: Call trace_hardirqs_on before entry
 
 Currently this is only being done on 64-bit.  Rather than just move it
 out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is
 consistent with lazy ee state, and so that we don't track more host
 code as interrupts-enabled than necessary.
 
 Rename kvm_lazy_ee_enable() to kvm_fix_ee_before_entry() to reflect
 that this function now has a role on 32-bit as well.
 
 Signed-off-by: Scott Wood scottw...@freescale.com
 ---
  arch/powerpc/include/asm/kvm_ppc.h |   11 ---
  arch/powerpc/kvm/book3s_pr.c   |4 ++--
  arch/powerpc/kvm/booke.c   |4 ++--
  arch/powerpc/kvm/powerpc.c |2 --
  4 files changed, 12 insertions(+), 9 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_ppc.h
 b/arch/powerpc/include/asm/kvm_ppc.h
 index a5287fe..6885846 100644
 --- a/arch/powerpc/include/asm/kvm_ppc.h
 +++ b/arch/powerpc/include/asm/kvm_ppc.h
 @@ -394,10 +394,15 @@ static inline void kvmppc_mmu_flush_icache(pfn_t pfn)
   }
  }
 
 -/* Please call after prepare_to_enter. This function puts the lazy ee state
 -   back to normal mode, without actually enabling interrupts. */
 -static inline void kvmppc_lazy_ee_enable(void)
 +/*
 + * Please call after prepare_to_enter. This function puts the lazy ee and irq
 + * disabled tracking state back to normal mode, without actually enabling
 + * interrupts.
 + */
 +static inline void kvmppc_fix_ee_before_entry(void)
  {
 + trace_hardirqs_on();
 +
  #ifdef CONFIG_PPC64
   /* Only need to enable IRQs by hard enabling them after this */
   local_paca-irq_happened = 0;
 diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
 index bdc40b8..0b97ce4 100644
 --- a/arch/powerpc/kvm/book3s_pr.c
 +++ b/arch/powerpc/kvm/book3s_pr.c
 @@ -890,7 +890,7 @@ program_interrupt:
   local_irq_enable();
   r = s;
   } else {
 - kvmppc_lazy_ee_enable();
 + kvmppc_fix_ee_before_entry();
   }
   }
 
 @@ -1161,7 +1161,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct
 kvm_vcpu *vcpu)
   if (vcpu-arch.shared-msr  MSR_FP)
   kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
 
 - kvmppc_lazy_ee_enable();
 + kvmppc_fix_ee_before_entry();
 
   ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index 705fc5c..eb89b83 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -673,7 +673,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
 kvm_vcpu
 *vcpu)
   ret = s;
   goto out;
   }
 - kvmppc_lazy_ee_enable();
 + kvmppc_fix_ee_before_entry();

local_irq_disable() is called before kvmppc_prepare_to_enter().
Now we put the irq_happend and soft_enabled back to previous state without 
checking for any interrupt happened in between. If any interrupt happens in 
between, will not that be lost?

-Bharat

 
   kvm_guest_enter();
 
 @@ -1154,7 +1154,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct
 kvm_vcpu *vcpu,
   local_irq_enable();
   r = (s  2) | RESUME_HOST | (r  RESUME_FLAG_NV);
   } else {
 - kvmppc_lazy_ee_enable();
 + kvmppc_fix_ee_before_entry();
   }
   }
 
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 6316ee3..4e05f8c 100644
 --- a/arch/powerpc/kvm/powerpc.c
 +++ b/arch/powerpc/kvm/powerpc.c
 @@ -117,8 +117,6 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
   kvm_guest_exit();
   continue;
   }
 -
 - trace_hardirqs_on();
  #endif
 
   kvm_guest_enter();
 --
 1.7.10.4
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v2 2/4] kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()

2013-05-09 Thread Bhushan Bharat-R65777



 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
 Behalf Of Scott Wood
 Sent: Friday, May 10, 2013 8:40 AM
 To: Alexander Graf; Benjamin Herrenschmidt
 Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; 
 linuxppc-dev@lists.ozlabs.org;
 Wood Scott-B07421
 Subject: [PATCH v2 2/4] kvm/ppc/booke64: Fix lazy ee handling in
 kvmppc_handle_exit()
 
 EE is hard-disabled on entry to kvmppc_handle_exit(), so call
 hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled
 is unset.
 
 Without this, we get warnings such as arch/powerpc/kernel/time.c:300,
 and sometimes host kernel hangs.
 
 Signed-off-by: Scott Wood scottw...@freescale.com
 ---
  arch/powerpc/kvm/booke.c |5 +
  1 file changed, 5 insertions(+)
 
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index 1020119..705fc5c 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -833,6 +833,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
 kvm_vcpu
 *vcpu,
   int r = RESUME_HOST;
   int s;
 
 +#ifdef CONFIG_PPC64
 + WARN_ON(local_paca-irq_happened != 0);
 +#endif
 + hard_irq_disable();

It is not actually to hard disable as EE is already clear but to make it looks 
like hard_disable to host. Right?
If so, should we write a comment here on why we are doing this? 

-Bharat

 +
   /* update before a new last_exit_type is rewritten */
   kvmppc_update_timing_stats(vcpu);
 
 --
 1.7.10.4
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v2 4/4] kvm/ppc: IRQ disabling cleanup

2013-05-09 Thread Bhushan Bharat-R65777



 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
 Behalf Of Scott Wood
 Sent: Friday, May 10, 2013 8:40 AM
 To: Alexander Graf; Benjamin Herrenschmidt
 Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; 
 linuxppc-dev@lists.ozlabs.org;
 Wood Scott-B07421
 Subject: [PATCH v2 4/4] kvm/ppc: IRQ disabling cleanup
 
 Simplify the handling of lazy EE by going directly from fully-enabled
 to hard-disabled.  This replaces the lazy_irq_pending() check
 (including its misplaced kvm_guest_exit() call).
 
 As suggested by Tiejun Chen, move the interrupt disabling into
 kvmppc_prepare_to_enter() rather than have each caller do it.  Also
 move the IRQ enabling on heavyweight exit into
 kvmppc_prepare_to_enter().
 
 Don't move kvmppc_fix_ee_before_entry() into kvmppc_prepare_to_enter(),
 so that the caller can avoid marking interrupts enabled earlier than
 necessary (e.g. book3s_pr waits until after FP save/restore is done).
 
 Signed-off-by: Scott Wood scottw...@freescale.com
 ---
  arch/powerpc/include/asm/kvm_ppc.h |6 ++
  arch/powerpc/kvm/book3s_pr.c   |   12 +++-
  arch/powerpc/kvm/booke.c   |9 ++---
  arch/powerpc/kvm/powerpc.c |   21 -
  4 files changed, 19 insertions(+), 29 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_ppc.h
 b/arch/powerpc/include/asm/kvm_ppc.h
 index 6885846..e4474f8 100644
 --- a/arch/powerpc/include/asm/kvm_ppc.h
 +++ b/arch/powerpc/include/asm/kvm_ppc.h
 @@ -404,6 +404,12 @@ static inline void kvmppc_fix_ee_before_entry(void)
   trace_hardirqs_on();
 
  #ifdef CONFIG_PPC64
 + /*
 +  * To avoid races, the caller must have gone directly from having
 +  * interrupts fully-enabled to hard-disabled.
 +  */
 + WARN_ON(local_paca-irq_happened != PACA_IRQ_HARD_DIS);
 +
   /* Only need to enable IRQs by hard enabling them after this */
   local_paca-irq_happened = 0;
   local_paca-soft_enabled = 1;
 diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
 index 0b97ce4..e61e39e 100644
 --- a/arch/powerpc/kvm/book3s_pr.c
 +++ b/arch/powerpc/kvm/book3s_pr.c
 @@ -884,14 +884,11 @@ program_interrupt:
* and if we really did time things so badly, then we just exit
* again due to a host external interrupt.
*/
 - local_irq_disable();
   s = kvmppc_prepare_to_enter(vcpu);
 - if (s = 0) {
 - local_irq_enable();
 + if (s = 0)
   r = s;
 - } else {
 + else
   kvmppc_fix_ee_before_entry();
 - }
   }
 
   trace_kvm_book3s_reenter(r, vcpu);
 @@ -1121,12 +1118,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct
 kvm_vcpu *vcpu)
* really did time things so badly, then we just exit again due to
* a host external interrupt.
*/
 - local_irq_disable();
   ret = kvmppc_prepare_to_enter(vcpu);
 - if (ret = 0) {
 - local_irq_enable();
 + if (ret = 0)
   goto out;
 - }
 
   /* Save FPU state in stack */
   if (current-thread.regs-msr  MSR_FP)
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index eb89b83..f7c0111 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -666,10 +666,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct
 kvm_vcpu *vcpu)
   return -EINVAL;
   }
 
 - local_irq_disable();
   s = kvmppc_prepare_to_enter(vcpu);
   if (s = 0) {
 - local_irq_enable();
   ret = s;
   goto out;
   }
 @@ -1148,14 +1146,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct
 kvm_vcpu *vcpu,
* aren't already exiting to userspace for some other reason.
*/
   if (!(r  RESUME_HOST)) {
 - local_irq_disable();

Ok, Now we do not soft disable before kvmppc_prapare_to_enter().

   s = kvmppc_prepare_to_enter(vcpu);
 - if (s = 0) {
 - local_irq_enable();
 + if (s = 0)
   r = (s  2) | RESUME_HOST | (r  RESUME_FLAG_NV);
 - } else {
 + else
   kvmppc_fix_ee_before_entry();
 - }
   }
 
   return r;
 diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
 index 4e05f8c..f8659aa 100644
 --- a/arch/powerpc/kvm/powerpc.c
 +++ b/arch/powerpc/kvm/powerpc.c
 @@ -64,12 +64,14 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
  {
   int r = 1;
 
 - WARN_ON_ONCE(!irqs_disabled());
 + WARN_ON(irqs_disabled());
 + hard_irq_disable();

Here we hard disable in kvmppc_prepare_to_enter(), so my comment in other patch 
about interrupt loss is no more valid.

So here
  MSR.EE = 0
  local_paca-soft_enabled = 0
  local_paca-irq_happened |= PACA_IRQ_HARD_DIS;

 +
   while (true)

RE: [PATCH] bookehv: Handle debug exception on guest exit

2013-04-05 Thread Bhushan Bharat-R65777

Hi Kumar/Benh,

After further looking into the code I think that if we correct the vector range 
below in DebugDebug handler then we do not need the change I provided in this 
patch.

Here is the snapshot for 32 bit (head_booke.h, same will be true for 64 bit):

#define DEBUG_DEBUG_EXCEPTION \
START_EXCEPTION(DebugDebug);  \
DEBUG_EXCEPTION_PROLOG;   \
  \
/*\
 * If there is a single step or branch-taken exception in an  \
 * exception entry sequence, it was probably meant to apply to\
 * the code where the exception occurred (since exception entry   \
 * doesn't turn off DE automatically).  We simulate the effect\
 * of turning off DE on entry to an exception handler by turning  \
 * off DE in the DSRR1 value and clearing the debug status.   \
 */   \
mfspr   r10,SPRN_DBSR;  /* check single-step/branch taken */  \
andis.  r10,r10,(DBSR_IC|DBSR_BT)@h;  \
beq+2f;   \
  \
lis r10,KERNELBASE@h;   /* check if exception in vectors */   \
ori r10,r10,KERNELBASE@l; \
cmplw   r12,r10;  \
blt+2f; /* addr below exception vectors */\
  \
lis r10,DebugDebug@h;\
ori r10,r10,DebugDebug@l;   
 \


Here we assume all exception vector ends at DebugDebug, which is not 
correct.
We probably should get proper end by using some start_vector and 
end_vector lebels
or at least use end at Ehvpriv (which is last defined in 
head_fsl_booke.S for PowerPC. Is that correct?


cmplw   r12,r10;  \
bgt+2f; /* addr above exception vectors */\

Thanks
-Bharat


 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On
 Behalf Of Bhushan Bharat-R65777
 Sent: Thursday, April 04, 2013 8:29 PM
 To: Alexander Graf
 Cc: linuxppc-dev@lists.ozlabs.org; k...@vger.kernel.org; 
 kvm-...@vger.kernel.org;
 Wood Scott-B07421
 Subject: RE: [PATCH] bookehv: Handle debug exception on guest exit
 
 
 
  -Original Message-
  From: Alexander Graf [mailto:ag...@suse.de]
  Sent: Thursday, April 04, 2013 6:55 PM
  To: Bhushan Bharat-R65777
  Cc: linuxppc-dev@lists.ozlabs.org; k...@vger.kernel.org;
  kvm-...@vger.kernel.org; Wood Scott-B07421; Bhushan Bharat-R65777
  Subject: Re: [PATCH] bookehv: Handle debug exception on guest exit
 
 
  On 20.03.2013, at 18:45, Bharat Bhushan wrote:
 
   EPCR.DUVD controls whether the debug events can come in hypervisor
   mode or not. When KVM guest is using the debug resource then we do
   not want debug events to be captured in guest entry/exit path. So we
   set EPCR.DUVD when entering and clears EPCR.DUVD when exiting from guest.
  
   Debug instruction complete is a post-completion debug exception but
   debug event gets posted on the basis of MSR before the instruction
   is executed. Now if the instruction switches the context from guest
   mode (MSR.GS = 1) to hypervisor mode (MSR.GS = 0) then the xSRR0
   points to first instruction of KVM handler and xSRR1 points that
   MSR.GS is clear (hypervisor context). Now as xSRR1.GS is used to
   decide whether KVM handler will be invoked to handle the exception
   or host host kernel debug handler will be invoked to handle the exception.
   This leads to host kernel debug handler handling the exception which
   should either be handled by KVM.
  
   This is tested on e500mc in 32 bit mode
  
   Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
   ---
   v0:
   - Do not apply this change for debug_crit as we do not know those
   chips have
  issue or not.
   - corrected 64bit case branching
  
   arch/powerpc/kernel/exceptions-64e.S |   29 -
   arch/powerpc/kernel/head_booke.h |   26 ++
   2 files changed, 54 insertions(+), 1 deletions(-)
  
   diff --git a/arch/powerpc/kernel/exceptions-64e.S
   b/arch/powerpc/kernel/exceptions-64e.S
   index 4684e33..8b26294 100644
   --- a/arch/powerpc/kernel/exceptions-64e.S
   +++ b/arch/powerpc/kernel

RE: [PATCH] bookehv: Handle debug exception on guest exit

2013-04-04 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Thursday, April 04, 2013 6:55 PM
 To: Bhushan Bharat-R65777
 Cc: linuxppc-dev@lists.ozlabs.org; k...@vger.kernel.org; 
 kvm-...@vger.kernel.org;
 Wood Scott-B07421; Bhushan Bharat-R65777
 Subject: Re: [PATCH] bookehv: Handle debug exception on guest exit
 
 
 On 20.03.2013, at 18:45, Bharat Bhushan wrote:
 
  EPCR.DUVD controls whether the debug events can come in hypervisor
  mode or not. When KVM guest is using the debug resource then we do not
  want debug events to be captured in guest entry/exit path. So we set
  EPCR.DUVD when entering and clears EPCR.DUVD when exiting from guest.
 
  Debug instruction complete is a post-completion debug exception but
  debug event gets posted on the basis of MSR before the instruction is
  executed. Now if the instruction switches the context from guest mode
  (MSR.GS = 1) to hypervisor mode (MSR.GS = 0) then the xSRR0 points to
  first instruction of KVM handler and xSRR1 points that MSR.GS is clear
  (hypervisor context). Now as xSRR1.GS is used to decide whether KVM
  handler will be invoked to handle the exception or host host kernel
  debug handler will be invoked to handle the exception.
  This leads to host kernel debug handler handling the exception which
  should either be handled by KVM.
 
  This is tested on e500mc in 32 bit mode
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  v0:
  - Do not apply this change for debug_crit as we do not know those chips have
 issue or not.
  - corrected 64bit case branching
 
  arch/powerpc/kernel/exceptions-64e.S |   29 -
  arch/powerpc/kernel/head_booke.h |   26 ++
  2 files changed, 54 insertions(+), 1 deletions(-)
 
  diff --git a/arch/powerpc/kernel/exceptions-64e.S
  b/arch/powerpc/kernel/exceptions-64e.S
  index 4684e33..8b26294 100644
  --- a/arch/powerpc/kernel/exceptions-64e.S
  +++ b/arch/powerpc/kernel/exceptions-64e.S
  @@ -516,6 +516,33 @@ kernel_dbg_exc:
  andis.  r15,r14,DBSR_IC@h
  beq+1f
 
  +#ifdef CONFIG_KVM_BOOKE_HV
  +   /*
  +* EPCR.DUVD controls whether the debug events can come in
  +* hypervisor mode or not. When KVM guest is using the debug
  +* resource then we do not want debug events to be captured
  +* in guest entry/exit path. So we set EPCR.DUVD when entering
  +* and clears EPCR.DUVD when exiting from guest.
  +* Debug instruction complete is a post-completion debug
  +* exception but debug event gets posted on the basis of MSR
  +* before the instruction is executed. Now if the instruction
  +* switches the context from guest mode (MSR.GS = 1) to hypervisor
  +* mode (MSR.GS = 0) then the xSRR0 points to first instruction of
 
 Can't we just execute that code path with MSR.DE=0?

Single stepping uses DBCR0.IC (instruction complete).
Can you describe how MSR.DE = 0 will work?

 
 
 Alex
 
  +* KVM handler and xSRR1 points that MSR.GS is clear
  +* (hypervisor context). Now as xSRR1.GS is used to decide whether
  +* KVM handler will be invoked to handle the exception or host
  +* host kernel debug handler will be invoked to handle the exception.
  +* This leads to host kernel debug handler handling the exception
  +* which should either be handled by KVM.
  +*/
  +   mfspr   r10, SPRN_EPCR
  +   andis.  r10,r10,SPRN_EPCR_DUVD@h
  +   beq+2f
  +
  +   andis.  r10,r9,MSR_GS@h
  +   beq+3f
  +2:
  +#endif
  LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e)
  LOAD_REG_IMMEDIATE(r15,interrupt_end_book3e)
  cmpld   cr0,r10,r14
  @@ -523,7 +550,7 @@ kernel_dbg_exc:
  blt+cr0,1f
  bge+cr1,1f
 
  -   /* here it looks like we got an inappropriate debug exception. */
  +3: /* here it looks like we got an inappropriate debug exception. */
  lis r14,DBSR_IC@h   /* clear the IC event */
  rlwinm  r11,r11,0,~MSR_DE   /* clear DE in the DSRR1 value */
  mtspr   SPRN_DBSR,r14
  diff --git a/arch/powerpc/kernel/head_booke.h
  b/arch/powerpc/kernel/head_booke.h
  index 5f051ee..edc6a3b 100644
  --- a/arch/powerpc/kernel/head_booke.h
  +++ b/arch/powerpc/kernel/head_booke.h
  @@ -285,7 +285,33 @@ label:
  mfspr   r10,SPRN_DBSR;  /* check single-step/branch taken */  \
  andis.  r10,r10,(DBSR_IC|DBSR_BT)@h;  \
  beq+2f;   \
  +#ifdef CONFIG_KVM_BOOKE_HV   \
  +   /*\
  +* EPCR.DUVD controls whether the debug events can come in\
  +* hypervisor mode or not. When KVM guest is using the debug  \
  +* resource then we do not want debug events to be captured   \
  +* in guest entry/exit path. So we set EPCR.DUVD when entering\
  +* and clears

Clearing DBSR and DBCR0 in host handler.

2013-04-03 Thread Bhushan Bharat-R65777

Hi All,

The kernel exception handling code for 32 bit (transfer_to_handler in 
entry_32.S) clear DBSR and load DBCR0 with 0 (global_dbcr0 variable, which is 
zero) if user space used debug (DBCR0.IDM set).

But I do not same (clearing DBCR0 and DBSR) in 64bit exception handler. Is this 
an issue or I am missing something?

Thanks
-Bharat

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: Clearing DBSR and DBCR0 in host handler.

2013-04-03 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Kumar Gala [mailto:ga...@kernel.crashing.org]
 Sent: Wednesday, April 03, 2013 9:41 PM
 To: Bhushan Bharat-R65777
 Cc: linuxppc-dev@lists.ozlabs.org; Benjamin Herrenschmidt; Alexander Graf; 
 Wood
 Scott-B07421
 Subject: Re: Clearing DBSR and DBCR0 in host handler.

 On Apr 3, 2013, at 10:24 AM, Bhushan Bharat-R65777 wrote:

  Hi All,

  The kernel exception handling code for 32 bit (transfer_to_handler in
 entry_32.S) clear DBSR and load DBCR0 with 0 (global_dbcr0 variable, which is
 zero) if user space used debug (DBCR0.IDM set).

  But I do not same (clearing DBCR0 and DBSR) in 64bit exception handler. Is
 this an issue or I am missing something?

  Thanks
  -Bharat

 Are you having a problem with debug w/the 64-bit kernel?

No not any issue, I was looking into code where it saves/restores of debug 
register. I observed the above said inconsistency in 32 bit and 64 bit.

  The 32-bit kernel
 supports several kernel level debug features that the 64-bit doesn't support.

I am talking about the a user process debugging:
-  A user process is under debugging using gdb. So the h/w debug 
register will have thread context.
-  An interrupt/exception happens in user process.
-  Now on 32 bit we clear the DBSR (pending events) and DBCR0 (so no 
new events get captured). But on 64bit we do not follow same, Why it is so?
   Are we doing something extra on 32 bit or something is missing on 64 
bit? 
   Can it happen that on 64 bit some more debug events get captured and 
debug interrupts get fired if MSR.DE is set, which is undesired. 

Or I am missing something here ?

Thanks
-Bharat

 So if you are having an issue that might be more helpful to convey that just
 asking about exception code path.

 - k

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

using request_irq_percpu()

2013-03-19 Thread Bhushan Bharat-R65777

Hi All,

request_irq_percpu() is defined in kernel/irq/manage.c, this takes a percpu 
pointer which will be unique based upon on which cpu the handler executes.

So, it looks like we can use this to have multiple bottom half interrupt 
handler executing at same time on different CPU and each can handle this 
independently.

Flow will be like:
-- Interrupt occurs on CPU1 - handler save some context for bottom half and 
then clears the interrupt condition, and return (in between the interrupt 
affinity will be moved to next CPU in round robin fashion).

-- CPU 1 executing its bottom half.

-- Again interrupt occurs, which will come on CPU 2

-- CPU 2 handler similar to CPU1 and so on.

This way multiple similar bottom half can run at same time on different CPU

Thanks
-Bharat

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: BOOKE KVM calling load_up_fpu from C?

2013-02-12 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Michael Neuling [mailto:mi...@neuling.org]
 Sent: Tuesday, February 12, 2013 9:46 AM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org
 Subject: Re: BOOKE KVM calling load_up_fpu from C?
 
 Bhushan Bharat-R65777 r65...@freescale.com wrote:
 
 
 
   -Original Message-
   From: Michael Neuling [mailto:mi...@neuling.org]
   Sent: Tuesday, February 12, 2013 9:16 AM
   To: Bhushan Bharat-R65777
   Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org
   Subject: Re: BOOKE KVM calling load_up_fpu from C?
  
   Bhushan Bharat-R65777 r65...@freescale.com wrote:
  
   
   
 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf
 bounces+Of Michael
 Neuling
 Sent: Tuesday, February 12, 2013 8:59 AM
 To: Wood Scott-B07421
 Cc: linuxppc-dev@lists.ozlabs.org
 Subject: BOOKE KVM calling load_up_fpu from C?

 Scott,

 I was looking at changing how load_up_fpu works and I found this
 in
 arch/powerpc/kvm/booke.h:

 static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu) {
 #ifdef CONFIG_PPC_FPU
   if (vcpu-fpu_active  !(current-thread.regs-msr  MSR_FP)) {
   load_up_fpu();
   current-thread.regs-msr |= MSR_FP;
   }
 #endif
 }

 I'm wondering how this is suppose to work since load_up_fpu is
 suppose to have MSR in R12?
   
Is not the load_up_fpu() does mfmsr:
   
_GLOBAL(load_up_fpu)
mfmsr   r5
ori r5,r5,MSR_FP
#ifdef CONFIG_VSX
BEGIN_FTR_SECTION
orisr5,r5,MSR_VSX@h
END_FTR_SECTION_IFSET(CPU_FTR_VSX)
#endif
SYNC
MTMSRD(r5)  /* enable use of fpu now */
isync
snip
  
   Look further down...
  
   #ifdef CONFIG_PPC32
 mfspr   r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) 
   */
 lwz r4,THREAD_FPEXC_MODE(r5)
 ori r9,r9,MSR_FP/* enable FP for current */
 or  r9,r9,r4
   #else
 ld  r4,PACACURRENT(r13)
 addir5,r4,THREAD/* Get THREAD */
 lwz r4,THREAD_FPEXC_MODE(r5)
 ori r12,r12,MSR_FP
 or  r12,r12,r4
 std r12,_MSR(r1)
   #endif
  
   R12 is loaded with SRR1 in the exception prolog before load_up_fpu is
 called.
 
  Yes it is SRR1 not MSR.
 
 Yes, SRR1 == the MSR of the user process, not the current MSR.
 
  Also on 32bit it looks like that R9 is assumed to have SRR1.
 
 Yep that too.
 
 So any idea how it's suppose to work or is it broken?

To me this looks wrong. And this seems to works because the thread-reg-msr is 
not actually used to write SRR1 (and eventually the thread MSR) when doing rfi 
to enter guest. Infact Guest(shadow_msr) MSR is used as SRR1 and which will 
have proper MSR (including FP set).

But Yes, Scott is right person to comment, So let us wait for him comment.

Thanks
-Bharat


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH v2] net: fec_mpc52xx: Read MAC address from device-tree

2013-02-12 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Stefan 
 Roese
 Sent: Tuesday, February 12, 2013 2:38 PM
 To: net...@vger.kernel.org
 Cc: linuxppc-...@ozlabs.org; Anatolij Gustschin
 Subject: [PATCH v2] net: fec_mpc52xx: Read MAC address from device-tree
 
 Until now, the MPC5200 FEC ethernet driver relied upon the bootloader
 (U-Boot) to write the MAC address into the ethernet controller registers. The
 Linux driver should not rely on such a thing. So lets read the MAC address 
 from
 the DT as it should be done here.
 
 The following priority is now used to read the MAC address:
 
 1) First, try OF node MAC address, if not present or invalid, then:
 
 2) Read from MAC address registers, if invalid, then:

Why we read from MAC registers if Linux should not rely on bootloader?

-Bharat


 
 3) Log a warning message, and choose a random MAC address.
 
 This fixes a problem with a MPC5200 board that uses the SPL U-Boot version
 without FEC initialization before Linux booting for boot speedup.
 
 Additionally a status line is now be printed upon successful driver probing,
 also displaying this MAC address.
 
 Signed-off-by: Stefan Roese s...@denx.de
 Cc: Anatolij Gustschin ag...@denx.de
 ---
 v2:
 - Remove module parameter mpc52xx_fec_mac_addr
 - Priority for MAC address probing now is DT, controller regs
   If the resulting MAC address is invalid, a random address will
   be generated and used with a warning message
 - Use np variable to simplify the code
 
  drivers/net/ethernet/freescale/fec_mpc52xx.c | 61 
 +---
  1 file changed, 37 insertions(+), 24 deletions(-)
 
 diff --git a/drivers/net/ethernet/freescale/fec_mpc52xx.c
 b/drivers/net/ethernet/freescale/fec_mpc52xx.c
 index 817d081..8b725f4 100644
 --- a/drivers/net/ethernet/freescale/fec_mpc52xx.c
 +++ b/drivers/net/ethernet/freescale/fec_mpc52xx.c
 @@ -76,10 +76,6 @@ static void mpc52xx_fec_stop(struct net_device *dev);  
 static
 void mpc52xx_fec_start(struct net_device *dev);  static void
 mpc52xx_fec_reset(struct net_device *dev);
 
 -static u8 mpc52xx_fec_mac_addr[6];
 -module_param_array_named(mac, mpc52xx_fec_mac_addr, byte, NULL, 0); -
 MODULE_PARM_DESC(mac, six hex digits, ie. 0x1,0x2,0xc0,0x01,0xba,0xbe);
 -
  #define MPC52xx_MESSAGES_DEFAULT ( NETIF_MSG_DRV | NETIF_MSG_PROBE | \
   NETIF_MSG_LINK | NETIF_MSG_IFDOWN | NETIF_MSG_IFUP)
  static int debug = -1;   /* the above default */
 @@ -110,15 +106,6 @@ static void mpc52xx_fec_set_paddr(struct net_device *dev,
 u8 *mac)
   out_be32(fec-paddr2, (*(u16 *)(mac[4])  16) | FEC_PADDR2_TYPE);  }
 
 -static void mpc52xx_fec_get_paddr(struct net_device *dev, u8 *mac) -{
 - struct mpc52xx_fec_priv *priv = netdev_priv(dev);
 - struct mpc52xx_fec __iomem *fec = priv-fec;
 -
 - *(u32 *)(mac[0]) = in_be32(fec-paddr1);
 - *(u16 *)(mac[4]) = in_be32(fec-paddr2)  16;
 -}
 -
  static int mpc52xx_fec_set_mac_address(struct net_device *dev, void *addr)  {
   struct sockaddr *sock = addr;
 @@ -853,6 +840,8 @@ static int mpc52xx_fec_probe(struct platform_device *op)
   struct resource mem;
   const u32 *prop;
   int prop_size;
 + struct device_node *np = op-dev.of_node;
 + const void *p;
 
   phys_addr_t rx_fifo;
   phys_addr_t tx_fifo;
 @@ -866,7 +855,7 @@ static int mpc52xx_fec_probe(struct platform_device *op)
   priv-ndev = ndev;
 
   /* Reserve FEC control zone */
 - rv = of_address_to_resource(op-dev.of_node, 0, mem);
 + rv = of_address_to_resource(np, 0, mem);
   if (rv) {
   printk(KERN_ERR DRIVER_NAME : 
   Error while parsing device node resource\n ); 
 @@ -
 919,7 +908,7 @@ static int mpc52xx_fec_probe(struct platform_device *op)
 
   /* Get the IRQ we need one by one */
   /* Control */
 - ndev-irq = irq_of_parse_and_map(op-dev.of_node, 0);
 + ndev-irq = irq_of_parse_and_map(np, 0);
 
   /* RX */
   priv-r_irq = bcom_get_task_irq(priv-rx_dmatsk);
 @@ -927,11 +916,33 @@ static int mpc52xx_fec_probe(struct platform_device *op)
   /* TX */
   priv-t_irq = bcom_get_task_irq(priv-tx_dmatsk);
 
 - /* MAC address init */
 - if (!is_zero_ether_addr(mpc52xx_fec_mac_addr))
 - memcpy(ndev-dev_addr, mpc52xx_fec_mac_addr, 6);
 - else
 - mpc52xx_fec_get_paddr(ndev, ndev-dev_addr);
 + /*
 +  * MAC address init:
 +  *
 +  * First try to read MAC address from DT
 +  */
 + p = of_get_property(np, local-mac-address, NULL);
 + if (p != NULL) {
 + memcpy(ndev-dev_addr, p, 6);
 + } else {
 + struct mpc52xx_fec __iomem *fec = priv-fec;
 +
 + /*
 +  * If the MAC addresse is not provided via DT then read
 +  * it back from the controller regs
 +  */
 + *(u32 *)(ndev-dev_addr[0])

RE: [PATCH v2] net: fec_mpc52xx: Read MAC address from device-tree

2013-02-12 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Stefan Roese [mailto:s...@denx.de]
 Sent: Tuesday, February 12, 2013 4:34 PM
 To: Bhushan Bharat-R65777
 Cc: net...@vger.kernel.org; linuxppc-...@ozlabs.org; Anatolij Gustschin; David
 S. Miller
 Subject: Re: [PATCH v2] net: fec_mpc52xx: Read MAC address from device-tree

 On 12.02.2013 11:56, Bhushan Bharat-R65777 wrote:
  Until now, the MPC5200 FEC ethernet driver relied upon the bootloader
  (U-Boot) to write the MAC address into the ethernet controller
  registers. The Linux driver should not rely on such a thing. So lets
  read the MAC address from the DT as it should be done here.

  The following priority is now used to read the MAC address:

  1) First, try OF node MAC address, if not present or invalid, then:

  2) Read from MAC address registers, if invalid, then:

  Why we read from MAC registers if Linux should not rely on bootloader?

 It was suggested by David. Backwards compatibility. Here Davids comment to my
 original patch which removed this register reading completely:

 I don't think this is a conservative enough change.

 You have to keep the MAC register reading code around, as a backup code path 
 in
 case the OF device node lacks a MAC address 

Ok,

But this is really a backward compatibility or hiding some bug? My thought is 
that if DT does not have a valid MAC address then it is a BUG and should be 
fixed. Is not it?

-Bharat

 Thanks,
 Stefan

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: BOOKE KVM calling load_up_fpu from C?

2013-02-12 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, February 13, 2013 12:03 AM
 To: Bhushan Bharat-R65777
 Cc: Michael Neuling; Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org
 Subject: Re: BOOKE KVM calling load_up_fpu from C?

 On 02/12/2013 03:01:07 AM, Bhushan Bharat-R65777 wrote:

   -Original Message-
   From: Michael Neuling [mailto:mi...@neuling.org]
   Sent: Tuesday, February 12, 2013 9:46 AM
   To: Bhushan Bharat-R65777
   Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org
   Subject: Re: BOOKE KVM calling load_up_fpu from C?

   Bhushan Bharat-R65777 r65...@freescale.com wrote:

 -Original Message-
 From: Michael Neuling [mailto:mi...@neuling.org]
 Sent: Tuesday, February 12, 2013 9:16 AM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org
 Subject: Re: BOOKE KVM calling load_up_fpu from C?

 Look further down...

 #ifdef CONFIG_PPC32
   mfspr   r5,SPRN_SPRG_THREAD /* current
  task's THREAD (phys) */
   lwz r4,THREAD_FPEXC_MODE(r5)
   ori r9,r9,MSR_FP/* enable FP for
  current */
   or  r9,r9,r4
 #else
   ld  r4,PACACURRENT(r13)
   addir5,r4,THREAD/* Get THREAD */
   lwz r4,THREAD_FPEXC_MODE(r5)
   ori r12,r12,MSR_FP
   or  r12,r12,r4
   std r12,_MSR(r1)
 #endif

 R12 is loaded with SRR1 in the exception prolog before
  load_up_fpu is
   called.

Yes it is SRR1 not MSR.

   Yes, SRR1 == the MSR of the user process, not the current MSR.

Also on 32bit it looks like that R9 is assumed to have SRR1.

   Yep that too.

   So any idea how it's suppose to work or is it broken?

  To me this looks wrong. And this seems to works because the
  thread-reg-msr is not actually used to write SRR1 (and eventually
  the thread MSR) when doing rfi to enter guest. Infact
  Guest(shadow_msr) MSR is used as SRR1 and which will have proper MSR
  (including FP set).

  But Yes, Scott is right person to comment, So let us wait for him
  comment.

 I don't think it's actually a problem on 32-bit, since r9 is modified but 
 never
 actually used for anything.

Is not the epilog loads srr1 in r9 and load_up_fpu() changes r9 and then r9 is 
written back in srr1 ?

  On 64-bit, though, there's a store to the caller's
 stack frame (yuck) which the kvm/booke.h caller is not prepared for.

So if caller is using r12 then it can lead to come corruption, right ?

  Indeed,
 book3s's kvmppc_load_up_fpu creates an interrupt-like stack frame, but does 
 not
 load r9 or r12.

 It would be really nice if assumptions like these were put in a code comment
 above load_up_fpu...  and if we didn't have so many random differences between
 32-bit and 64-bit. :-P

:)

Thanks
-Bharat

 -Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: BOOKE KVM calling load_up_fpu from C?

2013-02-12 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, February 13, 2013 6:53 AM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; Michael Neuling; linuxppc-dev@lists.ozlabs.org
 Subject: Re: BOOKE KVM calling load_up_fpu from C?

 On 02/12/2013 07:18:14 PM, Bhushan Bharat-R65777 wrote:

   -Original Message-
   From: Wood Scott-B07421
   Sent: Wednesday, February 13, 2013 12:03 AM
   To: Bhushan Bharat-R65777
   Cc: Michael Neuling; Wood Scott-B07421;
  linuxppc-dev@lists.ozlabs.org
   Subject: Re: BOOKE KVM calling load_up_fpu from C?

   On 02/12/2013 03:01:07 AM, Bhushan Bharat-R65777 wrote:
To me this looks wrong. And this seems to works because the
thread-reg-msr is not actually used to write SRR1 (and
  eventually
the thread MSR) when doing rfi to enter guest. Infact
Guest(shadow_msr) MSR is used as SRR1 and which will have proper
  MSR
(including FP set).

But Yes, Scott is right person to comment, So let us wait for him
comment.

   I don't think it's actually a problem on 32-bit, since r9 is
  modified but never
   actually used for anything.

  Is not the epilog loads srr1 in r9 and load_up_fpu() changes r9 and
  then r9 is written back in srr1 ?

 What epilog?  We're talking about the case where it's called from C code

 When it's called from an exception handler, then r9 is used, but in that case
 it's also initialized before calling load_up_fpu, by the prolog.

Agree. Was just confirming the exception handler case.

On 64-bit, though, there's a store to the caller's stack frame
   (yuck) which the kvm/booke.h caller is not prepared for.

  So if caller is using r12 then it can lead to come corruption, right ?

 No, r12 is a volatile register in the ABI, as is r9.  The issue is that the
 stack can be corrupted.

Ok, Thanks
-Bharat

 -Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: BOOKE KVM calling load_up_fpu from C?

2013-02-12 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wood Scott-B07421
 Sent: Wednesday, February 13, 2013 6:53 AM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; Michael Neuling; linuxppc-dev@lists.ozlabs.org
 Subject: Re: BOOKE KVM calling load_up_fpu from C?

 On 02/12/2013 07:18:14 PM, Bhushan Bharat-R65777 wrote:

   -Original Message-
   From: Wood Scott-B07421
   Sent: Wednesday, February 13, 2013 12:03 AM
   To: Bhushan Bharat-R65777
   Cc: Michael Neuling; Wood Scott-B07421;
  linuxppc-dev@lists.ozlabs.org
   Subject: Re: BOOKE KVM calling load_up_fpu from C?

   On 02/12/2013 03:01:07 AM, Bhushan Bharat-R65777 wrote:
To me this looks wrong. And this seems to works because the
thread-reg-msr is not actually used to write SRR1 (and
  eventually
the thread MSR) when doing rfi to enter guest. Infact
Guest(shadow_msr) MSR is used as SRR1 and which will have proper
  MSR
(including FP set).

But Yes, Scott is right person to comment, So let us wait for him
comment.

   I don't think it's actually a problem on 32-bit, since r9 is
  modified but never
   actually used for anything.

  Is not the epilog loads srr1 in r9 and load_up_fpu() changes r9 and
  then r9 is written back in srr1 ?

 What epilog?  We're talking about the case where it's called from C code.

 When it's called from an exception handler, then r9 is used, but in that case
 it's also initialized before calling load_up_fpu, by the prolog.

On 64-bit, though, there's a store to the caller's stack frame
   (yuck) which the kvm/booke.h caller is not prepared for.

  So if caller is using r12 then it can lead to come corruption, right ?

 No, r12 is a volatile register in the ABI, as is r9.  The issue is that the
 stack can be corrupted.

What do you mean by stack is corrupted?
My understanding is that when calling the assembly function from C function 
then stack frame will not be pushed and assembly function uses the caller stack 
frame.  Example function1() calls function2() which calls assembly_routine()

functio1() 

|-|
| Stack Frame 1   |
| function1 caller   |
|  registers etc |
|-|

Calls function 2

|--|
| Stack Frame 2|
| function1 registers |
|   etc   |
|--|
| Stack Frame 1|
| function1 caller|
|  registers etc  |
|--|

calls assembly_routine();
Now no stack frame push; And the assembly_routine() changes register values 
saved in stack. So when stack will be unrolled then registers of function1() 
will get corrupted, right?

Thanks
-Bharat

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: BOOKE KVM calling load_up_fpu from C?

2013-02-11 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Michael
 Neuling
 Sent: Tuesday, February 12, 2013 8:59 AM
 To: Wood Scott-B07421
 Cc: linuxppc-dev@lists.ozlabs.org
 Subject: BOOKE KVM calling load_up_fpu from C?
 
 Scott,
 
 I was looking at changing how load_up_fpu works and I found this in
 arch/powerpc/kvm/booke.h:
 
 static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu) { #ifdef
 CONFIG_PPC_FPU
   if (vcpu-fpu_active  !(current-thread.regs-msr  MSR_FP)) {
   load_up_fpu();
   current-thread.regs-msr |= MSR_FP;
   }
 #endif
 }
 
 I'm wondering how this is suppose to work since load_up_fpu is suppose to have
 MSR in R12?

Is not the load_up_fpu() does mfmsr:

_GLOBAL(load_up_fpu)
mfmsr   r5
ori r5,r5,MSR_FP
#ifdef CONFIG_VSX
BEGIN_FTR_SECTION
orisr5,r5,MSR_VSX@h
END_FTR_SECTION_IFSET(CPU_FTR_VSX)
#endif
SYNC
MTMSRD(r5)  /* enable use of fpu now */
isync
snip

-Bharat

 
 Mikey
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: BOOKE KVM calling load_up_fpu from C?

2013-02-11 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Michael Neuling [mailto:mi...@neuling.org]
 Sent: Tuesday, February 12, 2013 9:16 AM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; linuxppc-dev@lists.ozlabs.org
 Subject: Re: BOOKE KVM calling load_up_fpu from C?
 
 Bhushan Bharat-R65777 r65...@freescale.com wrote:
 
 
 
   -Original Message-
   From: Linuxppc-dev [mailto:linuxppc-dev-
   bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of
   bounces+Michael
   Neuling
   Sent: Tuesday, February 12, 2013 8:59 AM
   To: Wood Scott-B07421
   Cc: linuxppc-dev@lists.ozlabs.org
   Subject: BOOKE KVM calling load_up_fpu from C?
  
   Scott,
  
   I was looking at changing how load_up_fpu works and I found this in
   arch/powerpc/kvm/booke.h:
  
   static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu) {
   #ifdef CONFIG_PPC_FPU
 if (vcpu-fpu_active  !(current-thread.regs-msr  MSR_FP)) {
 load_up_fpu();
 current-thread.regs-msr |= MSR_FP;
 }
   #endif
   }
  
   I'm wondering how this is suppose to work since load_up_fpu is
   suppose to have MSR in R12?
 
  Is not the load_up_fpu() does mfmsr:
 
  _GLOBAL(load_up_fpu)
  mfmsr   r5
  ori r5,r5,MSR_FP
  #ifdef CONFIG_VSX
  BEGIN_FTR_SECTION
  orisr5,r5,MSR_VSX@h
  END_FTR_SECTION_IFSET(CPU_FTR_VSX)
  #endif
  SYNC
  MTMSRD(r5)  /* enable use of fpu now */
  isync
  snip
 
 Look further down...
 
 #ifdef CONFIG_PPC32
   mfspr   r5,SPRN_SPRG_THREAD /* current task's THREAD (phys) 
 */
   lwz r4,THREAD_FPEXC_MODE(r5)
   ori r9,r9,MSR_FP/* enable FP for current */
   or  r9,r9,r4
 #else
   ld  r4,PACACURRENT(r13)
   addir5,r4,THREAD/* Get THREAD */
   lwz r4,THREAD_FPEXC_MODE(r5)
   ori r12,r12,MSR_FP
   or  r12,r12,r4
   std r12,_MSR(r1)
 #endif
 
 R12 is loaded with SRR1 in the exception prolog before load_up_fpu is called.

Yes it is SRR1 not MSR.
Also on 32bit it looks like that R9 is assumed to have SRR1.

-Bharat

 It's the MSR of the user process, not the current MSR.
 
 Mikey


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] using get/put_user64 apis on 64bit machine

2012-09-10 Thread Bhushan Bharat-R65777



 -Original Message-
 From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org]
 Sent: Monday, September 10, 2012 10:12 AM
 To: Bhushan Bharat-R65777
 Cc: linuxppc-dev@lists.ozlabs.org; ga...@kernel.crashing.org; ag...@suse.de;
 Wood Scott-B07421; Bhushan Bharat-R65777
 Subject: Re: [PATCH] using get/put_user64 apis on 64bit machine
 
 On Mon, 2012-07-23 at 15:46 +0530, Bharat Bhushan wrote:
  On powerpc64 machine get/put_user64() is same as get/put_user() while
  on powerpc32 machine get_user64 is different. With this patch we can
  use get_user64() and put_user64() on 32 and 64 bit machines.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
 
 There appear to be no user of any of these APIs in the tree. There's also no
 get_user64 - __get_user64() macros either. Why not just remove the whole lot 
 ?

When I sent the patch I did not search the users, I agree that we can remove 
the __get_user64 and __put_user64 altogether.

Thanks
-Bharat 

 
 Cheers,
 Ben.
 
   arch/powerpc/include/asm/uaccess.h |7 +++
   1 files changed, 7 insertions(+), 0 deletions(-)
 
  diff --git a/arch/powerpc/include/asm/uaccess.h
  b/arch/powerpc/include/asm/uaccess.h
  index 17bb40c..01743aa 100644
  --- a/arch/powerpc/include/asm/uaccess.h
  +++ b/arch/powerpc/include/asm/uaccess.h
  @@ -114,10 +114,17 @@ struct exception_table_entry {  #define
  __put_user(x, ptr) \
  __put_user_nocheck((__typeof__(*(ptr)))(x), (ptr), sizeof(*(ptr)))
 
  +/*
  + * On powerpc64 machine get/put_user64() is same as get/put_user()
  +while
  + * on powerpc32 machine get_user64 is different.
  + */
   #ifndef __powerpc64__
   #define __get_user64(x, ptr) \
  __get_user64_nocheck((x), (ptr), sizeof(*(ptr)))  #define
  __put_user64(x, ptr) __put_user(x, ptr)
  +#else
  +#define __get_user64(x, ptr) __get_user(x, ptr) #define
  +__put_user64(x, ptr) __put_user(x, ptr)
   #endif
 
   #define __get_user_inatomic(x, ptr) \
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] powerpc/mm: add ZONE_NORMAL zone for 64 bit kernel

2012-07-24 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Benjamin
 Herrenschmidt
 Sent: Tuesday, July 24, 2012 10:16 AM
 To: Tabi Timur-B04825
 Cc: Wood Scott-B07421; Hu Mingkai-B21284; linuxppc-dev@lists.ozlabs.org; Xie
 Shaohui-B21989; Chen Yuanquan-B41889
 Subject: Re: [PATCH] powerpc/mm: add ZONE_NORMAL zone for 64 bit kernel

 On Tue, 2012-07-24 at 04:04 +, Tabi Timur-B04825 wrote:
  Benjamin Herrenschmidt wrote:
   Sure but I don't want to create the zones in the first place (and
   thus introduce the added pressure on the memory management) on
   machines that don't need it.

  One thing that does confuse me -- by default, we don't create a
  ZONE_NORMAL.  We only create a ZONE_DMA.  Why is that?  Shouldn't it
  be the other way around?

 Because ZONE_NORMAL allocations can be serviced from the ZONE_DMA while the
 other way isn't possible.

Say, if we have defined only one zone (ZONE_DMA) to which we give all memory ( 
 4G).
Device set the DMA_MASK to 4G or less.

dma_alloc_coherent() will set GFP_DMA flag, But that is of no use, because the 
memory allocator have only one zone which have all memory (which assumes all 
dma-able). And can return memory at address at  4G. which will crash !!

I think we have to have at least one zone which gives memory to be dma-able for 
all devices (memory limit should be set by platform, because different  
platform have different devices with different limits.). And another ( 1 or 
more) will cover rest of memory.

Thanks
-Bharat

 Especially in the old days, there were quite a few cases of drivers and/or
 subsystems who were a bit heavy handed at using ZONE_DMA, so not having one
 would essentially make them not work at all.

 Cheers,
 Ben.

 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH][v2] powerpc/watchdog: move booke watchdog param related code to setup-common.c

2012-07-11 Thread Bhushan Bharat-R65777

ACK:

 -Original Message-
 From: Linuxppc-dev [mailto:linuxppc-dev-
 bounces+bharat.bhushan=freescale@lists.ozlabs.org] On Behalf Of Shaohui 
 Xie
 Sent: Wednesday, July 11, 2012 3:26 PM
 To: linux-watch...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Cc: Xie Shaohui-B21989
 Subject: [PATCH][v2] powerpc/watchdog: move booke watchdog param related code 
 to
 setup-common.c
 
 Currently, BOOKE watchdog code for checking wdt and wdt_period is in
 setup_32.c, it cannot be used in 64-bit, so move it to a common place setup-
 common.c, which will be shared by 32-bit and 64-bit.
 
 Also, replace the simple_strtoul with kstrtol.
 
 Signed-off-by: Shaohui Xie shaohui@freescale.com
 ---
 changes for v2:
 use setup-common.c instead of prom.c
 
  arch/powerpc/kernel/setup-common.c |   27 +++
  arch/powerpc/kernel/setup_32.c |   24 
  2 files changed, 27 insertions(+), 24 deletions(-)
 
 diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-
 common.c
 index afd4f05..bdc499c 100644
 --- a/arch/powerpc/kernel/setup-common.c
 +++ b/arch/powerpc/kernel/setup-common.c
 @@ -720,6 +720,33 @@ static int powerpc_debugfs_init(void)
 arch_initcall(powerpc_debugfs_init);
  #endif
 
 +#ifdef CONFIG_BOOKE_WDT
 +extern u32 booke_wdt_enabled;
 +extern u32 booke_wdt_period;
 +
 +/* Checks wdt=x and wdt_period=xx command-line option */ notrace int
 +__init early_parse_wdt(char *p) {
 + if (p  strncmp(p, 0, 1) != 0)
 + booke_wdt_enabled = 1;
 +
 + return 0;
 +}
 +early_param(wdt, early_parse_wdt);
 +
 +int __init early_parse_wdt_period(char *p) {
 + unsigned long ret;
 + if (p) {
 + if (!kstrtol(p, 0, ret))
 + booke_wdt_period = ret;
 + }
 +
 + return 0;
 +}
 +early_param(wdt_period, early_parse_wdt_period);
 +#endif   /* CONFIG_BOOKE_WDT */
 +
  void ppc_printk_progress(char *s, unsigned short hex)  {
   pr_info(%s\n, s);
 diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
 index ec8a53f..a8f54ec 100644
 --- a/arch/powerpc/kernel/setup_32.c
 +++ b/arch/powerpc/kernel/setup_32.c
 @@ -149,30 +149,6 @@ notrace void __init machine_init(u64 dt_ptr)
   ppc_md.progress(id mach(): done, 0x200);  }
 
 -#ifdef CONFIG_BOOKE_WDT
 -extern u32 booke_wdt_enabled;
 -extern u32 booke_wdt_period;
 -
 -/* Checks wdt=x and wdt_period=xx command-line option */ -notrace int __init
 early_parse_wdt(char *p) -{
 - if (p  strncmp(p, 0, 1) != 0)
 -booke_wdt_enabled = 1;
 -
 - return 0;
 -}
 -early_param(wdt, early_parse_wdt);
 -
 -int __init early_parse_wdt_period (char *p) -{
 - if (p)
 - booke_wdt_period = simple_strtoul(p, NULL, 0);
 -
 - return 0;
 -}
 -early_param(wdt_period, early_parse_wdt_period);
 -#endif   /* CONFIG_BOOKE_WDT */
 -
  /* Checks l2cr= command-line option */  int __init ppc_setup_l2cr(char
 *str)  {
 --
 1.6.4
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 0/6] Description for PCI patches using platform driver

2012-06-14 Thread Bhushan Bharat-R65777

Hello Ben, Kumar, others

Please provide your comments/thoughts on this ?

Thanks
-Bharat

   
 -Original Message-
 From: Jia Hongtao-B38951
 Sent: Friday, June 08, 2012 3:12 PM
 To: linuxppc-dev@lists.ozlabs.org; ga...@kernel.crashing.org
 Cc: Li Yang-R58472; b...@kernel.crashing.org; Wood Scott-B07421;
Bhushan Bharat-
 R65777; Jia Hongtao-B38951
 Subject: [PATCH 0/6] Description for PCI patches using platform
 driver

 This series of patches are to unify pci initialization code and
 add PM
support
 for all 85xx/86xx powerpc boards. But two side effects are
 introduced
by this
 mechanism which listed below:

 1. of_platform_bus_probe() will be called twice but in some
 cases
duplication
warning occured. We fix this in [PATCH 5/6].

 2. Edac driver failed to register pci nodes as platform devices.
 We fix
this
in [PATCH 6/6].
   
With these patches will not the SWIOTLB will not be initialized
even if PCI/PCIe demanded?
   
Thanks
-Bharat
   
  
   These patches still have the swiotlb init problem if
  ppc_swiotlb_enable is
   only demanded by PCI/PCIe. One of the purposes of sending out these
  patches is
   to let us start a discussion for this problem in upstream.
 
  Ok, I did not find any mention of that, so I thought that you have
  resolved the issue by some means in these patches which I did not catch.
 
  So, these patches introduces the issue, that SWIOTLB will not be
  initialized if requested by pci/pcie. The request is raised by setting
  the flag ppc_swiotlb_enable. The swiotlb_init() will be called in
  mem_init() if ppc_swiotlb_enable is set. Now with these patches, the
  request is raised after mem_init() is called. So request not handled :).
 
  Following are the solutions we have thought of during our internal
  discussions (if I did not missed any):
 
  1. These patches move the code from platform init to device init
  (arch_initcall()). Rather than moving the whole code, let us divide
  the code into two. First, which is needed to raise the swiotlb init
  request and second the rest. Define this first as an function in
  arch/powerpc/sysdev/fsl_pci.c and call this from platform init code of
  the SOCs.
 
  2. All known devices, the lowest PCIe outbound range starts at
  0x8000, but there's nothing above 0xc000. So the inbound of
  size 0x8000_ is always availbe on all devices. Hardcode the check
  in platform code to check memblock_end_of_DRAM() to 0x8000.
 
  Something like this:
 
  diff --git a/arch/powerpc/platforms/85xx/corenet_ds.c
  b/arch/powerpc/platforms/85xx/corenet_ds.c
  index 1f7028e..ef4e215 100644
  --- a/arch/powerpc/platforms/85xx/corenet_ds.c
  +++ b/arch/powerpc/platforms/85xx/corenet_ds.c
  @@ -79,7 +79,7 @@ void __init corenet_ds_setup_arch(void)  #endif
 
  #ifdef CONFIG_SWIOTLB
  -   if (memblock_end_of_DRAM()  0x)
  +   if (memblock_end_of_DRAM()  0xff00)
   ppc_swiotlb_enable = 1;  #endif
   pr_info(%s board from Freescale Semiconductor\n,
  ppc_md.name);
 
  -
 
  3. Always do swiotlb_init() in mem_init() and later after PCI init, if
  the swiotlb is not needed then free it (swiotlb_free()).
 
  4. etc, please provide some other better way.
 
  Thanks
  -Bharat
 
 Thanks.
 In my point of view the 2nd solution is better for it does not treat PCI/PCIe 
 as
 the special kind of devices from others.
 
 -Jia Hongtao.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 0/6] Description for PCI patches using platform driver

2012-06-11 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Jia Hongtao-B38951
 Sent: Monday, June 11, 2012 8:03 AM
 To: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org;
 ga...@kernel.crashing.org
 Cc: Li Yang-R58472; b...@kernel.crashing.org; Wood Scott-B07421
 Subject: RE: [PATCH 0/6] Description for PCI patches using platform driver

  -Original Message-
  From: Bhushan Bharat-R65777
  Sent: Friday, June 08, 2012 6:47 PM
  To: Jia Hongtao-B38951; linuxppc-dev@lists.ozlabs.org;
  ga...@kernel.crashing.org
  Cc: Li Yang-R58472; b...@kernel.crashing.org; Wood Scott-B07421
  Subject: RE: [PATCH 0/6] Description for PCI patches using platform
  driver

   -Original Message-
   From: Jia Hongtao-B38951
   Sent: Friday, June 08, 2012 3:12 PM
   To: linuxppc-dev@lists.ozlabs.org; ga...@kernel.crashing.org
   Cc: Li Yang-R58472; b...@kernel.crashing.org; Wood Scott-B07421;
  Bhushan Bharat-
   R65777; Jia Hongtao-B38951
   Subject: [PATCH 0/6] Description for PCI patches using platform
   driver

   This series of patches are to unify pci initialization code and add
   PM
  support
   for all 85xx/86xx powerpc boards. But two side effects are
   introduced
  by this
   mechanism which listed below:

   1. of_platform_bus_probe() will be called twice but in some cases
  duplication
  warning occured. We fix this in [PATCH 5/6].

   2. Edac driver failed to register pci nodes as platform devices. We
   fix
  this
  in [PATCH 6/6].

  With these patches will not the SWIOTLB will not be initialized even
  if PCI/PCIe demanded?

  Thanks
  -Bharat

 These patches still have the swiotlb init problem if ppc_swiotlb_enable is
 only demanded by PCI/PCIe. One of the purposes of sending out these patches is
 to let us start a discussion for this problem in upstream.

Ok, I did not find any mention of that, so I thought that you have resolved the 
issue by some means in these patches which I did not catch.

So, these patches introduces the issue, that SWIOTLB will not be initialized if 
requested by pci/pcie. The request is raised by setting the flag 
ppc_swiotlb_enable. The swiotlb_init() will be called in mem_init() if 
ppc_swiotlb_enable is set. Now with these patches, the request is raised after 
mem_init() is called. So request not handled :).

Following are the solutions we have thought of during our internal discussions 
(if I did not missed any):

1. These patches move the code from platform init to device init 
(arch_initcall()). Rather than moving the whole code, let us divide the code 
into two. First, which is needed to raise the swiotlb init request and second 
the rest. Define this first as an function in arch/powerpc/sysdev/fsl_pci.c and 
call this from platform init code of the SOCs.

2. All known devices, the lowest PCIe outbound range starts at 0x8000, but 
there's nothing above 0xc000. So the inbound of size 0x8000_ is always 
availbe on all devices. Hardcode the check in platform code to check 
memblock_end_of_DRAM() to 0x8000.

Something like this:

diff --git a/arch/powerpc/platforms/85xx/corenet_ds.c
b/arch/powerpc/platforms/85xx/corenet_ds.c
index 1f7028e..ef4e215 100644
--- a/arch/powerpc/platforms/85xx/corenet_ds.c
+++ b/arch/powerpc/platforms/85xx/corenet_ds.c
@@ -79,7 +79,7 @@ void __init corenet_ds_setup_arch(void)  #endif

#ifdef CONFIG_SWIOTLB
-   if (memblock_end_of_DRAM()  0x)
+   if (memblock_end_of_DRAM()  0xff00)
 ppc_swiotlb_enable = 1;  #endif
 pr_info(%s board from Freescale Semiconductor\n, ppc_md.name);

-

3. Always do swiotlb_init() in mem_init() and later after PCI init, if the 
swiotlb is not needed then free it (swiotlb_free()). 

4. etc, please provide some other better way.

Thanks
-Bharat

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 0/6] Description for PCI patches using platform driver

2012-06-08 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Jia Hongtao-B38951
 Sent: Friday, June 08, 2012 3:12 PM
 To: linuxppc-dev@lists.ozlabs.org; ga...@kernel.crashing.org
 Cc: Li Yang-R58472; b...@kernel.crashing.org; Wood Scott-B07421; Bhushan 
 Bharat-
 R65777; Jia Hongtao-B38951
 Subject: [PATCH 0/6] Description for PCI patches using platform driver

 This series of patches are to unify pci initialization code and add PM support
 for all 85xx/86xx powerpc boards. But two side effects are introduced by this
 mechanism which listed below:

 1. of_platform_bus_probe() will be called twice but in some cases duplication
warning occured. We fix this in [PATCH 5/6].

 2. Edac driver failed to register pci nodes as platform devices. We fix this
in [PATCH 6/6].

With these patches will not the SWIOTLB will not be initialized even if 
PCI/PCIe demanded?

Thanks
-Bharat

 These patches are against 'next' branch on:
 http://git.kernel.org/?p=linux/kernel/git/galak/powerpc.git

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH] powerpc: Fix assmption of end_of_DRAM() returns end address

2012-06-05 Thread Bhushan Bharat-R65777

 -Original Message-
 From: David Miller [mailto:da...@davemloft.net]
 Sent: Wednesday, June 06, 2012 3:51 AM
 To: b...@kernel.crashing.org
 Cc: Bhushan Bharat-R65777; linuxppc-dev@lists.ozlabs.org; linux-
 ker...@vger.kernel.org; ga...@kernel.crashing.org; Bhushan Bharat-R65777
 Subject: Re: [PATCH] powerpc: Fix assmption of end_of_DRAM() returns end 
 address

 From: Benjamin Herrenschmidt b...@kernel.crashing.org
 Date: Wed, 06 Jun 2012 08:17:39 +1000

  On Tue, 2012-06-05 at 19:25 +0530, Bharat Bhushan wrote:
  memblock_end_of_DRAM() returns end_address + 1, not end address.
  While some code assumes that it returns end address.

  Shouldn't we instead fix it the other way around ? IE, make
  memblock_end_of_DRAM() does what the name implies, which is to return
  the last byte of DRAM, and fix the -other- callers not to make bad
  assumptions ?

 That was my impression too when I saw this patch.

Initially I also intended to do so. I initiated a email on linux-mm@  subject 
memblock_end_of_DRAM()  return end address + 1 and the only response I 
received from Andrea was:

It's normal that end means first byte offset out of the range. End = not ok.
end = start+size.
This is true for vm_end too. So it's better to keep it that way.
My suggestion is to just fix point 1 below and audit the rest :)

Thanks
-Bharat

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 2/2] powerpc/watchdog: replace CONFIG_FSL_BOOKE with CONFIG_FSL_SOC_BOOKE

2012-05-08 Thread Bhushan Bharat-R65777



 -Original Message-
 From: linuxppc-dev-bounces+bharat.bhushan=freescale@lists.ozlabs.org
 [mailto:linuxppc-dev-bounces+bharat.bhushan=freescale@lists.ozlabs.org] On
 Behalf Of Shaohui Xie
 Sent: Tuesday, May 08, 2012 11:38 AM
 To: linux-watch...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
 Cc: Xie Shaohui-B21989
 Subject: [PATCH 2/2] powerpc/watchdog: replace CONFIG_FSL_BOOKE with
 CONFIG_FSL_SOC_BOOKE

CONFIG_FSL_SOC_BOOKE looks like for SOC config option and watchdog is cpu 
feature.

Should not we use PPC_FSL_BOOK3E?

Thanks
-Bharat

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

1 2 >

1 - 100 of 112 matches

Mail list logo