date:20140908

Re: [PATCH v9 2/6] clk: Move all drivers to use internal API

2014-09-08 Thread Mike Turquette

Quoting Tomeu Vizoso (2014-09-03 08:31:57)
 In preparation to change the public API to return a per-user clk structure,
 remove any usage of this public API from the clock implementations.
 
 The reason for having this in a separate commit from the one that introduces
 the implementation of the new functions is to separate the changes generated
 with Coccinelle from the rest, and keep the patches' size reasonable.
 
 Signed-off-by: Tomeu Vizoso tomeu.viz...@collabora.com
 Tested-by: Boris Brezillon boris.brezil...@free-electrons.com
 Tested-by: Heiko Stuebner he...@sntech.de
 Acked-by: Boris Brezillon boris.brezil...@free-electrons.com

Hi Tomeu,

Looks like the Coccinelle script had a false-positive.
asm-generic/clkdev.h was converted from clk-clk_core and this blowed up
clock drivers for architectures that don't provide an asm-specific
clkdev.h implementation. This fixes x86's LPSS and a Microblaze driver.

I've rolled the following fix into your 2/9 patch. No action is
necessary.

Regards,
Mike



diff --git a/include/asm-generic/clkdev.h b/include/asm-generic/clkdev.h
index 4320225..90a32a6 100644
--- a/include/asm-generic/clkdev.h
+++ b/include/asm-generic/clkdev.h
@@ -15,10 +15,10 @@

 #include linux/slab.h

-struct clk_core;
+struct clk;

-static inline int __clk_get(struct clk_core *clk) { return 1; }
-static inline void __clk_put(struct clk_core *clk) { }
+static inline int __clk_get(struct clk *clk) { return 1; }
+static inline void __clk_put(struct clk *clk) { }

 static inline struct clk_lookup_alloc *__clkdev_alloc(size_t size)
 {
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] drivers: char: hw_random: printk replacement

2014-09-08 Thread Sudip Mukherjee

On Thu, Aug 28, 2014 at 08:32:58PM +0530, Sudip Mukherjee wrote:
 as pr_* macros are more preffered over printk, so printk replaced with 
 corresponding pr_* macros
 
 Signed-off-by: Sudip Mukherjee su...@vectorindia.org
 ---
 
 The replacement was done by a bash script to avoid copy paste error. The 
 script is as follows :
 
 OLD=printk(KERN_ERR \?
 OLD1=printk(KERN_NOTICE \?
 OLD2=printk(KERN_WARNING \?
 OLD3=printk(KERN_INFO \?
 NEW=pr_err(
 NEW1=pr_notice(
 NEW2=pr_warn(
 NEW3=pr_info(
 TFILE=/tmp/out.tmp.$$
 for f in *.c
 do
 sed -e s/$OLD/$NEW/g -e s/$OLD1/$NEW1/g -e s/$OLD2/$NEW2/g -e 
 s/$OLD3/$NEW3/g $f  $TFILE  mv $TFILE $f
 done
 
 Frankly speaking this script has missed one instance of printk(warning) in 
 intel-rng.c file , and this one was edited manually.
 
  drivers/char/hw_random/amd-rng.c |  4 ++--
  drivers/char/hw_random/geode-rng.c   |  4 ++--
  drivers/char/hw_random/intel-rng.c   | 12 ++--
  drivers/char/hw_random/pasemi-rng.c  |  2 +-
  drivers/char/hw_random/pseries-rng.c |  2 +-
  drivers/char/hw_random/via-rng.c |  8 
  6 files changed, 16 insertions(+), 16 deletions(-)
 
 diff --git a/drivers/char/hw_random/amd-rng.c 
 b/drivers/char/hw_random/amd-rng.c
 index c6af038..48f6a83 100644
 --- a/drivers/char/hw_random/amd-rng.c
 +++ b/drivers/char/hw_random/amd-rng.c
 @@ -142,10 +142,10 @@ found:
   amd_rng.priv = (unsigned long)pmbase;
   amd_pdev = pdev;
  
 - printk(KERN_INFO AMD768 RNG detected\n);
 + pr_info(AMD768 RNG detected\n);
   err = hwrng_register(amd_rng);
   if (err) {
 - printk(KERN_ERR PFX RNG registering failed (%d)\n,
 + pr_err(PFX RNG registering failed (%d)\n,
  err);
   release_region(pmbase + 0xF0, 8);
   goto out;
 diff --git a/drivers/char/hw_random/geode-rng.c 
 b/drivers/char/hw_random/geode-rng.c
 index 4c4d4e1..0d0579f 100644
 --- a/drivers/char/hw_random/geode-rng.c
 +++ b/drivers/char/hw_random/geode-rng.c
 @@ -109,10 +109,10 @@ found:
   goto out;
   geode_rng.priv = (unsigned long)mem;
  
 - printk(KERN_INFO AMD Geode RNG detected\n);
 + pr_info(AMD Geode RNG detected\n);
   err = hwrng_register(geode_rng);
   if (err) {
 - printk(KERN_ERR PFX RNG registering failed (%d)\n,
 + pr_err(PFX RNG registering failed (%d)\n,
  err);
   goto err_unmap;
   }
 diff --git a/drivers/char/hw_random/intel-rng.c 
 b/drivers/char/hw_random/intel-rng.c
 index 86fe45c..3122376 100644
 --- a/drivers/char/hw_random/intel-rng.c
 +++ b/drivers/char/hw_random/intel-rng.c
 @@ -199,7 +199,7 @@ static int intel_rng_init(struct hwrng *rng)
   if ((hw_status  INTEL_RNG_ENABLED) == 0)
   hw_status = hwstatus_set(mem, hw_status | INTEL_RNG_ENABLED);
   if ((hw_status  INTEL_RNG_ENABLED) == 0) {
 - printk(KERN_ERR PFX cannot enable RNG, aborting\n);
 + pr_err(PFX cannot enable RNG, aborting\n);
   goto out;
   }
   err = 0;
 @@ -216,7 +216,7 @@ static void intel_rng_cleanup(struct hwrng *rng)
   if (hw_status  INTEL_RNG_ENABLED)
   hwstatus_set(mem, hw_status  ~INTEL_RNG_ENABLED);
   else
 - printk(KERN_WARNING PFX unusual: RNG already disabled\n);
 + pr_warn(PFX unusual: RNG already disabled\n);
  }
  
  
 @@ -274,7 +274,7 @@ static int __init intel_rng_hw_init(void *_intel_rng_hw)
   if (mfc != INTEL_FWH_MANUFACTURER_CODE ||
   (dvc != INTEL_FWH_DEVICE_CODE_8M 
dvc != INTEL_FWH_DEVICE_CODE_4M)) {
 - printk(KERN_NOTICE PFX FWH not detected\n);
 + pr_notice(PFX FWH not detected\n);
   return -ENODEV;
   }
  
 @@ -314,7 +314,7 @@ PFX RNG, try using the 'no_fwh_detect' option.\n;
  
   if (no_fwh_detect)
   return -ENODEV;
 - printk(warning);
 + pr_warn(warning);
   return -EBUSY;
   }
  
 @@ -392,10 +392,10 @@ fwh_done:
   goto out;
   }
  
 - printk(KERN_INFO Intel 82802 RNG detected\n);
 + pr_info(Intel 82802 RNG detected\n);
   err = hwrng_register(intel_rng);
   if (err) {
 - printk(KERN_ERR PFX RNG registering failed (%d)\n,
 + pr_err(PFX RNG registering failed (%d)\n,
  err);
   iounmap(mem);
   }
 diff --git a/drivers/char/hw_random/pasemi-rng.c 
 b/drivers/char/hw_random/pasemi-rng.c
 index c66279b..c0347d1 100644
 --- a/drivers/char/hw_random/pasemi-rng.c
 +++ b/drivers/char/hw_random/pasemi-rng.c
 @@ -113,7 +113,7 @@ static int rng_probe(struct platform_device *ofdev)
  
   pasemi_rng.priv = (unsigned long)rng_regs;
  
 - printk(KERN_INFO Registering PA Semi RNG\n);
 + pr_info(Registering PA Semi RNG\n);
  
   err = hwrng_register(pasemi_rng);
  
 diff --git a/drivers/char/hw_random/pseries-rng.c

[PATCH v2 2/5] Documentation: Add isl1208 and isl12022 to trivial-devices list

2014-09-08 Thread Philipp Zabel

This patch adds the Intersil ISL1208 and ISL12022 I2C RTCs to the
trivial-devices list. For ISL1208 a deprecated intersil,isl1208
entry is added since that is used in ppa8548.dts.

Signed-off-by: Philipp Zabel p.za...@pengutronix.de
---
Changes since v1:
 - Added deprecated entries that are still in use
---
 Documentation/devicetree/bindings/i2c/trivial-devices.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/devicetree/bindings/i2c/trivial-devices.txt 
b/Documentation/devicetree/bindings/i2c/trivial-devices.txt
index 72e399c..60ecea5 100644
--- a/Documentation/devicetree/bindings/i2c/trivial-devices.txt
+++ b/Documentation/devicetree/bindings/i2c/trivial-devices.txt
@@ -56,6 +56,9 @@ fsl,sgtl5000  SGTL5000: Ultra Low-Power Audio Codec
 gmt,g751   G751: Digital Temperature Sensor and Thermal Watchdog 
with Two-Wire Interface
 infineon,slb9635tt Infineon SLB9635 (Soft-) I2C TPM (old protocol, max 
100khz)
 infineon,slb9645tt Infineon SLB9645 I2C TPM (new protocol, max 400khz)
+intersil,isl1208   Intersil ISL1208 I2C RTC Chip (deprecated, use 
isil,isl1208)
+isil,isl1208   Intersil ISL1208 I2C RTC Chip
+isil,isl12022  Intersil ISL12022 I2C RTC Chip
 isil,isl12057  Intersil ISL12057 I2C RTC Chip
 isl,isl12057   Intersil ISL12057 I2C RTC Chip (deprecated, use 
isil,isl12057)
 maxim,ds1050   5 Bit Programmable, Pulse-Width Modulator
-- 
2.1.0.rc1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 4/5] rtc: rtc-isl12022: Change vendor prefix for Intersil Corporation to isil

2014-09-08 Thread Philipp Zabel

Currently there is a wild mixture of isl, isil, and intersil compatibles
in the kernel. At this point, changing the vendor symbol to the most often
used variant, which is equal to the NASDAQ symbol, isil, should not hurt.

Patch db04d6284e2a (drivers/rtc/rtc-isl12022.c: device tree support)
added device tree support using the then documented isl vendor prefix,
so we keep that around for backwards compatibility.

Signed-off-by: Philipp Zabel p.za...@pengutronix.de
---
 drivers/rtc/rtc-isl12022.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/rtc/rtc-isl12022.c b/drivers/rtc/rtc-isl12022.c
index aa55f08..df20f18 100644
--- a/drivers/rtc/rtc-isl12022.c
+++ b/drivers/rtc/rtc-isl12022.c
@@ -275,7 +275,8 @@ static int isl12022_probe(struct i2c_client *client,
 
 #ifdef CONFIG_OF
 static struct of_device_id isl12022_dt_match[] = {
-   { .compatible = isl,isl12022 },
+   { .compatible = isil,isl12022 },
+   { .compatible = isl,isl12022 }, /* for backwards compatibility */
{ },
 };
 #endif
-- 
2.1.0.rc1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/5] Change vendor prefix for Intersil Corporation

2014-09-08 Thread Philipp Zabel

Hi,

currently there is a wild mixture of isl, isil, and intersil
compatibles in the kernel. I figure at this point it is still
possible to change this to use isil everywhere without too
much pain, but it might be preferred to keep the already
documented isl prefix, even though it doesn't follow the
convention to use the NASDAQ symbol where available.

This is an updated version of the RFC I've sent previously:
https://www.mail-archive.com/devicetree@vger.kernel.org/msg38505.html

Changes since RFC:
 - Drop powerpc/85xx patch, that device tree set in stone.
 - Added commit summaries where referencing commit ids
 - Added deprecated names to the documentation

regards
Philipp

Philipp Zabel (5):
  of: Change vendor prefix for Intersil Corporation to isil
  Documentation: Add isl1208 and isl12022 to trivial-devices list
  ARM: mvebu: Change vendor prefix for Intersil Corporation to isil
  rtc: rtc-isl12022: Change vendor prefix for Intersil Corporation to
isil
  rtc: rtc-isl12057: Change vendor prefix for Intersil Corporation to
isil

 Documentation/devicetree/bindings/i2c/trivial-devices.txt | 6 +-
 Documentation/devicetree/bindings/vendor-prefixes.txt | 4 +++-
 arch/arm/boot/dts/armada-370-netgear-rn102.dts| 2 +-
 arch/arm/boot/dts/armada-370-netgear-rn104.dts| 2 +-
 arch/arm/boot/dts/armada-xp-netgear-rn2120.dts| 2 +-
 drivers/rtc/rtc-isl12022.c| 3 ++-
 drivers/rtc/rtc-isl12057.c| 3 ++-
 7 files changed, 15 insertions(+), 7 deletions(-)

-- 
2.1.0.rc1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 3/5] ARM: mvebu: Change vendor prefix for Intersil Corporation to isil

2014-09-08 Thread Philipp Zabel

Currently there is a wild mixture of isl, isil, and intersil
compatibles in the kernel. At this point, changing the vendor
symbol to the most often used variant, which is equal to the
NASDAQ symbol, isil, should not hurt.

Signed-off-by: Philipp Zabel p.za...@pengutronix.de
---
 arch/arm/boot/dts/armada-370-netgear-rn102.dts | 2 +-
 arch/arm/boot/dts/armada-370-netgear-rn104.dts | 2 +-
 arch/arm/boot/dts/armada-xp-netgear-rn2120.dts | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/boot/dts/armada-370-netgear-rn102.dts 
b/arch/arm/boot/dts/armada-370-netgear-rn102.dts
index d6d572e..edc381c 100644
--- a/arch/arm/boot/dts/armada-370-netgear-rn102.dts
+++ b/arch/arm/boot/dts/armada-370-netgear-rn102.dts
@@ -122,7 +122,7 @@
status = okay;
 
isl12057: isl12057@68 {
-   compatible = isl,isl12057;
+   compatible = isil,isl12057;
reg = 0x68;
};
 
diff --git a/arch/arm/boot/dts/armada-370-netgear-rn104.dts 
b/arch/arm/boot/dts/armada-370-netgear-rn104.dts
index c5fe8b5..7367b4c 100644
--- a/arch/arm/boot/dts/armada-370-netgear-rn104.dts
+++ b/arch/arm/boot/dts/armada-370-netgear-rn104.dts
@@ -117,7 +117,7 @@
status = okay;
 
isl12057: isl12057@68 {
-   compatible = isl,isl12057;
+   compatible = isil,isl12057;
reg = 0x68;
};
 
diff --git a/arch/arm/boot/dts/armada-xp-netgear-rn2120.dts 
b/arch/arm/boot/dts/armada-xp-netgear-rn2120.dts
index 0cf999a..252def8 100644
--- a/arch/arm/boot/dts/armada-xp-netgear-rn2120.dts
+++ b/arch/arm/boot/dts/armada-xp-netgear-rn2120.dts
@@ -174,7 +174,7 @@
status = okay;
 
isl12057: isl12057@68 {
-   compatible = isl,isl12057;
+   compatible = isil,isl12057;
reg = 0x68;
};
 
-- 
2.1.0.rc1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 5/5] rtc: rtc-isl12057: Change vendor prefix for Intersil Corporation to isil

2014-09-08 Thread Philipp Zabel

Currently there is a wild mixture of isl, isil, and intersil compatibles
in the kernel. At this point, changing the vendor symbol to the most often
used variant, which is equal to the NASDAQ symbol, isil, should not hurt.

Patch 70e123373c05 (rtc: Add support for Intersil ISL12057 I2C RTC chip)
added this driver with device tree support using the then documented isl
vendor prefix, so we keep that around for backwards compatibility.

Signed-off-by: Philipp Zabel p.za...@pengutronix.de
---
 drivers/rtc/rtc-isl12057.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/rtc/rtc-isl12057.c b/drivers/rtc/rtc-isl12057.c
index 455b601..8276bd6 100644
--- a/drivers/rtc/rtc-isl12057.c
+++ b/drivers/rtc/rtc-isl12057.c
@@ -279,7 +279,8 @@ static int isl12057_probe(struct i2c_client *client,
 
 #ifdef CONFIG_OF
 static const struct of_device_id isl12057_dt_match[] = {
-   { .compatible = isl,isl12057 },
+   { .compatible = isil,isl12057 },
+   { .compatible = isl,isl12057 }, /* for backwards compatibility */
{ },
 };
 #endif
-- 
2.1.0.rc1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/5] of: Change vendor prefix for Intersil Corporation to isil

2014-09-08 Thread Philipp Zabel

Currently there is a wild mixture of isl, isil, and intersil
compatibles in the kernel. At this point, changing the vendor
symbol to the most often used variant, which is equal to the
NASDAQ symbol, isil, should not hurt.

This patch marks both intersil and isl prefix as deprecated
and documents the isil prefix in the vendor prefix list.

Signed-off-by: Philipp Zabel p.za...@pengutronix.de
---
Changes since v1:
 - Added deprecated entries that are still in use
---
 Documentation/devicetree/bindings/i2c/trivial-devices.txt | 3 ++-
 Documentation/devicetree/bindings/vendor-prefixes.txt | 4 +++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/i2c/trivial-devices.txt 
b/Documentation/devicetree/bindings/i2c/trivial-devices.txt
index 6af570e..72e399c 100644
--- a/Documentation/devicetree/bindings/i2c/trivial-devices.txt
+++ b/Documentation/devicetree/bindings/i2c/trivial-devices.txt
@@ -56,7 +56,8 @@ fsl,sgtl5000  SGTL5000: Ultra Low-Power Audio Codec
 gmt,g751   G751: Digital Temperature Sensor and Thermal Watchdog 
with Two-Wire Interface
 infineon,slb9635tt Infineon SLB9635 (Soft-) I2C TPM (old protocol, max 
100khz)
 infineon,slb9645tt Infineon SLB9645 I2C TPM (new protocol, max 400khz)
-isl,isl12057   Intersil ISL12057 I2C RTC Chip
+isil,isl12057  Intersil ISL12057 I2C RTC Chip
+isl,isl12057   Intersil ISL12057 I2C RTC Chip (deprecated, use 
isil,isl12057)
 maxim,ds1050   5 Bit Programmable, Pulse-Width Modulator
 maxim,max1237  Low-Power, 4-/12-Channel, 2-Wire Serial, 12-Bit ADCs
 maxim,max6625  9-Bit/12-Bit Temperature Sensors with I²C-Compatible 
Serial Interface
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt 
b/Documentation/devicetree/bindings/vendor-prefixes.txt
index ac7269f..4941ac2 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -68,7 +68,9 @@ img   Imagination Technologies Ltd.
 intel  Intel Corporation
 intercontrol   Inter Control Group
 isee   ISEE 2007 S.L.
-islIntersil
+intersil   Intersil Corporation (deprecated, use isil)
+isil   Intersil Corporation
+islIntersil Corporation (deprecated, use isil)
 karo   Ka-Ro electronics GmbH
 keymileKeymile GmbH
 lacie  LaCie
-- 
2.1.0.rc1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH 0/6] Change vendor prefix for Intersil Corporation

2014-09-08 Thread Philipp Zabel

Hi Jason,

Am Sonntag, den 07.09.2014, 09:32 -0400 schrieb Jason Cooper:
 Just to dot our i's and cross our t's, would you mind sending out a
 non-rfc version of this series?

Thank you for the reminder, I have sent an updated version.

regards
Philipp

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2 v5] powerpc/kvm: support to handle sw breakpoint

2014-09-08 Thread Alexander Graf



On 07.09.14 18:31, Madhavan Srinivasan wrote:
 This patch adds kernel side support for software breakpoint.
 Design is that, by using an illegal instruction, we trap to hypervisor
 via Emulation Assistance interrupt, where we check for the illegal instruction
 and accordingly we return to Host or Guest. Patch also adds support for
 software breakpoint in PR KVM.
 
 Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/kvm_ppc.h |  6 ++
  arch/powerpc/kvm/book3s.c  |  3 ++-
  arch/powerpc/kvm/book3s_hv.c   | 41 
 ++
  arch/powerpc/kvm/book3s_pr.c   |  3 +++
  arch/powerpc/kvm/emulate.c | 18 +
  5 files changed, 66 insertions(+), 5 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
 b/arch/powerpc/include/asm/kvm_ppc.h
 index fb86a22..dd83c9a 100644
 --- a/arch/powerpc/include/asm/kvm_ppc.h
 +++ b/arch/powerpc/include/asm/kvm_ppc.h
 @@ -38,6 +38,12 @@
  #include asm/paca.h
  #endif
  
 +/*
 + * KVMPPC_INST_SW_BREAKPOINT is debug Instruction
 + * for supporting software breakpoint.
 + */
 +#define KVMPPC_INST_SW_BREAKPOINT0x0000
 +
  enum emulation_result {
   EMULATE_DONE, /* no further processing */
   EMULATE_DO_MMIO,  /* kvm_run filled with MMIO request */
 diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
 index dd03f6b..00e9c9f 100644
 --- a/arch/powerpc/kvm/book3s.c
 +++ b/arch/powerpc/kvm/book3s.c
 @@ -778,7 +778,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
   struct kvm_guest_debug *dbg)
  {
 - return -EINVAL;
 + vcpu-guest_debug = dbg-control;
 + return 0;
  }
  
  void kvmppc_decrementer_func(unsigned long data)
 diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
 index 27cced9..3a2414c 100644
 --- a/arch/powerpc/kvm/book3s_hv.c
 +++ b/arch/powerpc/kvm/book3s_hv.c
 @@ -725,6 +725,30 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd)
   return kvmppc_hcall_impl_hv_realmode(cmd);
  }
  
 +static int kvmppc_emulate_debug_inst(struct kvm_run *run,
 + struct kvm_vcpu *vcpu)
 +{
 + u32 last_inst;
 +
 + if(kvmppc_get_last_inst(vcpu, INST_GENERIC, last_inst) !=
 + EMULATE_DONE) {
 + /*
 +  * Fetch failed, so return to guest and
 +  * try executing it again.
 +  */
 + return RESUME_GUEST;

Please end the if() here. In the kernel we usually treat abort
situations as separate.

 + } else {
 + if (last_inst == KVMPPC_INST_SW_BREAKPOINT) {
 + run-exit_reason = KVM_EXIT_DEBUG;
 + run-debug.arch.address = kvmppc_get_pc(vcpu);
 + return RESUME_HOST;
 + } else {
 + kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
 + return RESUME_GUEST;
 + }
 + }
 +}
 +
  static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
struct task_struct *tsk)
  {
 @@ -807,12 +831,18 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
 struct kvm_vcpu *vcpu,
   break;
   /*
* This occurs if the guest executes an illegal instruction.
 -  * We just generate a program interrupt to the guest, since
 -  * we don't emulate any guest instructions at this stage.
 +  * If the guest debug is disabled, generate a program interrupt
 +  * to the guest. If guest debug is enabled, we need to check
 +  * whether the instruction is a software breakpoint instruction.
 +  * Accordingly return to Guest or Host.
*/
   case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
 - kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
 - r = RESUME_GUEST;
 + if(vcpu-guest_debug  KVM_GUESTDBG_USE_SW_BP) {
 + r = kvmppc_emulate_debug_inst(run, vcpu);
 + } else {
 + kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
 + r = RESUME_GUEST;
 + }
   break;
   /*
* This occurs if the guest (kernel or userspace), does something that
 @@ -922,6 +952,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, 
 u64 id,
   long int i;
  
   switch (id) {
 + case KVM_REG_PPC_DEBUG_INST:
 + *val = get_reg_val(id, KVMPPC_INST_SW_BREAKPOINT);
 + break;
   case KVM_REG_PPC_HIOR:
   *val = get_reg_val(id, 0);
   break;
 diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
 index faffb27..6d73708 100644
 --- a/arch/powerpc/kvm/book3s_pr.c
 +++ b/arch/powerpc/kvm/book3s_pr.c
 @@ -1319,6 +1319,9 @@ static int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, 
 u64 id,

Re: [PATCH 2/2 v5] powerpc/kvm: common sw breakpoint instr across ppc

2014-09-08 Thread Alexander Graf



On 07.09.14 18:31, Madhavan Srinivasan wrote:
 This patch extends the use of illegal instruction as software
 breakpoint instruction across the ppc platform. Patch extends
 booke program interrupt code to support software breakpoint.
 
 Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com
 ---
 
 Patch is only compile tested. Will really help if
 someone can try it out and let me know comments.
 
  arch/powerpc/kvm/booke.c | 18 --
  1 file changed, 16 insertions(+), 2 deletions(-)
 
 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
 index b4c89fa..1b84853 100644
 --- a/arch/powerpc/kvm/booke.c
 +++ b/arch/powerpc/kvm/booke.c
 @@ -870,6 +870,10 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
   case BOOKE_INTERRUPT_HV_PRIV:
   emulated = kvmppc_get_last_inst(vcpu, false, last_inst);
   break;
 + case BOOKE_INTERRUPT_PROGRAM:
 + /*SW breakpoints arrive as illegal instructions on HV */

Is it my email client or is there a space missing again? ;)

Also, please only fetch the last instruction if debugging is active.

 + emulated = kvmppc_get_last_inst(vcpu, false, last_inst);
 + break;
   default:
   break;
   }
 @@ -947,7 +951,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
   break;
  
   case BOOKE_INTERRUPT_PROGRAM:
 - if (vcpu-arch.shared-msr  (MSR_PR | MSR_GS)) {
 + if ((vcpu-arch.shared-msr  (MSR_PR | MSR_GS)) 
 + (last_inst == KVMPPC_INST_SW_BREAKPOINT)) {

I think this is changing the logic from if the guest is in user mode or
we're in HV, deflect to if the guest is in user mode or an HV guest
and the instruction is a breakpoint, treat it as debug. Otherwise
deflect. So you're essentially breaking PR KVM here from what I can tell.

Why don't you just split the whole thing out to the beginning of
BOOKE_INTERRUPT_PROGRAM and check for

  a) debug is enabled
  b) instruction is sw breakpoint

instead?

 + /*
 +  * We are here because of an SW breakpoint instr,
 +  * so lets return to host to handle.
 +  */
 + r = kvmppc_handle_debug(run, vcpu);
 + run-exit_reason = KVM_EXIT_DEBUG;
 + kvmppc_account_exit(vcpu, DEBUG_EXITS);
 + break;
 + } else {
   /*
* Program traps generated by user-level software must
* be handled by the guest kernel.
 @@ -1505,7 +1519,7 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
 struct kvm_one_reg *reg)
   val = get_reg_val(reg-id, vcpu-arch.tsr);
   break;
   case KVM_REG_PPC_DEBUG_INST:
 - val = get_reg_val(reg-id, KVMPPC_INST_EHPRIV_DEBUG);

Please also remove the definition of EHPRIV_DEBUG.


Alex

 + val = get_reg_val(reg-id, KVMPPC_INST_SW_BREAKPOINT);
   break;
   case KVM_REG_PPC_VRSAVE:
   val = get_reg_val(reg-id, vcpu-arch.vrsave);
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] of: make sure of_alias is initialized before accessing it

2014-09-08 Thread Grant Likely

On Wed, 27 Aug 2014 17:09:39 +0300, Laurentiu Tudor b10...@freescale.com 
wrote:
 Simply swap of_alias and of_chosen initialization so
 that of_alias ends up read first. This must be done
 because it is accessed couple of lines below when
 trying to initialize the of_stdout using the alias
 based legacy method.
 
 [Fixes a752ee5 - tty: Update hypervisor tty drivers to
   
 use core stdout parsing code]
 
 Signed-off-by: Laurentiu Tudor laurentiu.tu...@freescale.com
 Cc: Grant Likely grant.lik...@linaro.org
 ---
  drivers/of/base.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/of/base.c b/drivers/of/base.c
 index d8574ad..52f8506 100644
 --- a/drivers/of/base.c
 +++ b/drivers/of/base.c
 @@ -1847,6 +1847,10 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
 align))
  {
   struct property *pp;
  
 + of_aliases = of_find_node_by_path(/aliases);
 + if (!of_aliases)
 + return;
 +
   of_chosen = of_find_node_by_path(/chosen);
   if (of_chosen == NULL)
   of_chosen = of_find_node_by_path(/chosen@0);
 @@ -1862,10 +1866,6 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
 align))
   of_stdout = of_find_node_by_path(name);
   }
  
 - of_aliases = of_find_node_by_path(/aliases);
 - if (!of_aliases)
 - return;
 -

Close, but not quite. The 'if (!of_aliases)' test should not be moved.
Only the search for of_find_node_by_path(). I've fixed it up and
applied.

g.

   for_each_property_of_node(of_aliases, pp) {
   const char *start = pp-name;
   const char *end = start + strlen(start);
 -- 
 1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] pseries: Fix endianness in cpu hotplug and hotremove

2014-09-08 Thread Nathan Fontenot

It looks like you have a lot of the same changes as the patch Bharata
sent out last week. Including the one issue I saw in Bharata's patch
below.

On 09/05/2014 02:09 PM, Thomas Falcon wrote:
 This patch attempts to ensure that all values are in the proper
 endianness format when both hotadding and hotremoving cpus.
 
 Signed-off-by: Thomas Falcon tlfal...@linux.vnet.ibm.com
 ---
  arch/powerpc/platforms/pseries/dlpar.c   | 56 
 ++--
  arch/powerpc/platforms/pseries/hotplug-cpu.c | 20 +-
  2 files changed, 38 insertions(+), 38 deletions(-)
 
 diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
 b/arch/powerpc/platforms/pseries/dlpar.c
 index a2450b8..c1d7e40 100644
 --- a/arch/powerpc/platforms/pseries/dlpar.c
 +++ b/arch/powerpc/platforms/pseries/dlpar.c
 @@ -24,11 +24,11 @@
  #include asm/rtas.h
  
  struct cc_workarea {
 - u32 drc_index;
 - u32 zero;
 - u32 name_offset;
 - u32 prop_length;
 - u32 prop_offset;
 + __be32  drc_index;
 + __be32  zero;
 + __be32  name_offset;
 + __be32  prop_length;
 + __be32  prop_offset;
  };
  
  void dlpar_free_cc_property(struct property *prop)
 @@ -48,11 +48,11 @@ static struct property *dlpar_parse_cc_property(struct 
 cc_workarea *ccwa)
   if (!prop)
   return NULL;
  
 - name = (char *)ccwa + ccwa-name_offset;
 + name = (char *)ccwa + be32_to_cpu(ccwa-name_offset);
   prop-name = kstrdup(name, GFP_KERNEL);
  
 - prop-length = ccwa-prop_length;
 - value = (char *)ccwa + ccwa-prop_offset;
 + prop-length = be32_to_cpu(ccwa-prop_length);
 + value = (char *)ccwa + be32_to_cpu(ccwa-prop_offset);
   prop-value = kmemdup(value, prop-length, GFP_KERNEL);
   if (!prop-value) {
   dlpar_free_cc_property(prop);
 @@ -78,7 +78,7 @@ static struct device_node *dlpar_parse_cc_node(struct 
 cc_workarea *ccwa,
   if (!dn)
   return NULL;
  
 - name = (char *)ccwa + ccwa-name_offset;
 + name = (char *)ccwa + be32_to_cpu(ccwa-name_offset);
   dn-full_name = kasprintf(GFP_KERNEL, %s/%s, path, name);
   if (!dn-full_name) {
   kfree(dn);
 @@ -148,7 +148,7 @@ struct device_node *dlpar_configure_connector(u32 
 drc_index,
   return NULL;
  
   ccwa = (struct cc_workarea *)data_buf[0];
 - ccwa-drc_index = drc_index;
 + ccwa-drc_index = cpu_to_be32(drc_index);

This will break partition migration.

The drc index valued passed into dlpar_configure_connector() from the migration
path, pseries_devicetree_update(), is already in BE format.

-Nathan

   ccwa-zero = 0;
  
   do {
 @@ -363,10 +363,10 @@ static int dlpar_online_cpu(struct device_node *dn)
   int rc = 0;
   unsigned int cpu;
   int len, nthreads, i;
 - const u32 *intserv;
 + const __be32 *intserv_be;
  
 - intserv = of_get_property(dn, ibm,ppc-interrupt-server#s, len);
 - if (!intserv)
 + intserv_be = of_get_property(dn, ibm,ppc-interrupt-server#s, len);
 + if (!intserv_be)
   return -EINVAL;
  
   nthreads = len / sizeof(u32);
 @@ -374,7 +374,7 @@ static int dlpar_online_cpu(struct device_node *dn)
   cpu_maps_update_begin();
   for (i = 0; i  nthreads; i++) {
   for_each_present_cpu(cpu) {
 - if (get_hard_smp_processor_id(cpu) != intserv[i])
 + if (get_hard_smp_processor_id(cpu) != 
 be32_to_cpu(intserv_be[i]))
   continue;
   BUG_ON(get_cpu_current_state(cpu)
   != CPU_STATE_OFFLINE);
 @@ -388,7 +388,7 @@ static int dlpar_online_cpu(struct device_node *dn)
   }
   if (cpu == num_possible_cpus())
   printk(KERN_WARNING Could not find cpu to online 
 -with physical id 0x%x\n, intserv[i]);
 +with physical id 0x%x\n, 
 be32_to_cpu(intserv_be[i]));
   }
   cpu_maps_update_done();
  
 @@ -442,18 +442,17 @@ static int dlpar_offline_cpu(struct device_node *dn)
   int rc = 0;
   unsigned int cpu;
   int len, nthreads, i;
 - const u32 *intserv;
 + const __be32 *intserv_be;
  
 - intserv = of_get_property(dn, ibm,ppc-interrupt-server#s, len);
 - if (!intserv)
 + intserv_be = of_get_property(dn, ibm,ppc-interrupt-server#s, len);
 + if (!intserv_be)
   return -EINVAL;
  
   nthreads = len / sizeof(u32);
 -
   cpu_maps_update_begin();
   for (i = 0; i  nthreads; i++) {
   for_each_present_cpu(cpu) {
 - if (get_hard_smp_processor_id(cpu) != intserv[i])
 + if (get_hard_smp_processor_id(cpu) != 
 be32_to_cpu(intserv_be[i]))
   continue;
  
   if (get_cpu_current_state(cpu) == CPU_STATE_OFFLINE)
 @@ -469,20 +468,19 @@ static int dlpar_offline_cpu(struct

Re: [PATCH] of: make sure of_alias is initialized before accessing it

2014-09-08 Thread Laurentiu Tudor

On 09/08/2014 04:29 PM, Grant Likely wrote:
 On Wed, 27 Aug 2014 17:09:39 +0300, Laurentiu Tudor b10...@freescale.com 
 wrote:
 Simply swap of_alias and of_chosen initialization so
 that of_alias ends up read first. This must be done
 because it is accessed couple of lines below when
 trying to initialize the of_stdout using the alias
 based legacy method.

 [Fixes a752ee5 - tty: Update hypervisor tty drivers to   

 use core stdout parsing code]

 Signed-off-by: Laurentiu Tudor laurentiu.tu...@freescale.com
 Cc: Grant Likely grant.lik...@linaro.org
 ---
  drivers/of/base.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/drivers/of/base.c b/drivers/of/base.c
 index d8574ad..52f8506 100644
 --- a/drivers/of/base.c
 +++ b/drivers/of/base.c
 @@ -1847,6 +1847,10 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
 align))
  {
  struct property *pp;
  
 +of_aliases = of_find_node_by_path(/aliases);
 +if (!of_aliases)
 +return;
 +
  of_chosen = of_find_node_by_path(/chosen);
  if (of_chosen == NULL)
  of_chosen = of_find_node_by_path(/chosen@0);
 @@ -1862,10 +1866,6 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 
 align))
  of_stdout = of_find_node_by_path(name);
  }
  
 -of_aliases = of_find_node_by_path(/aliases);
 -if (!of_aliases)
 -return;
 -
 
 Close, but not quite. The 'if (!of_aliases)' test should not be moved.
 Only the search for of_find_node_by_path().

Eek, completely missed this. Sorry.

 I've fixed it up and applied.
 
Thanks!

---
Best Regards, Laurentiu
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v1 09/21] Irq_remapping/MSI: Use MSI chip framework to configure MSI/MSI-X irq

2014-09-08 Thread wangyijing


 在 2014年9月5日，18:42，Sergei Shtylyov sergei.shtyl...@cogentembedded.com 写道：
 
 Hello.
 
 On 9/5/2014 2:09 PM, Yijing Wang wrote:
 
 Use MSI chip framework instead of arch MSI functions to configure
 MSI/MSI-X irq. So we can manage MSI/MSI-X irq in a unified framework.
 
 Signed-off-by: Yijing Wang wangyij...@huawei.com
 ---
  drivers/iommu/irq_remapping.c |8 +++-
  1 files changed, 7 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
 index 33c4395..e75026e 100644
 --- a/drivers/iommu/irq_remapping.c
 +++ b/drivers/iommu/irq_remapping.c
 [...]
 @@ -165,9 +170,10 @@ static void __init irq_remapping_modify_x86_ops(void)
  x86_io_apic_ops.set_affinity= set_remapped_irq_affinity;
  x86_io_apic_ops.setup_entry= setup_ioapic_remapped_entry;
  x86_io_apic_ops.eoi_ioapic_pin= eoi_ioapic_pin_remapped;
 -x86_msi.setup_msi_irqs= irq_remapping_setup_msi_irqs;
 +x86_msi.setup_msi_irqs  = irq_remapping_setup_msi_irqs;
 
   AFAICS, this change only converts tabs to spaces, so not needed at all.

Will update,  thanks.

 
  x86_msi.setup_hpet_msi= setup_hpet_msi_remapped;
  x86_msi.compose_msi_msg= compose_remapped_msi_msg;
 +x86_msi_chip = remap_msi_chip;
 
   Please align = with the rest of assignments.

Ok.

Thanks!
Yijing.

 
 WBR, Sergei
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v1 15/21] Powerpc/MSI: Use MSI chip framework to configure MSI/MSI-X irq

2014-09-08 Thread wangyijing



 在 2014年9月5日，18:47，Sergei Shtylyov sergei.shtyl...@cogentembedded.com 写道：
 
 Hello.
 
 On 9/5/2014 2:10 PM, Yijing Wang wrote:
 
 Use MSI chip framework instead of arch MSI functions to configure
 MSI/MSI-X irq. So we can manage MSI/MSI-X irq in a unified framework.
 
 Signed-off-by: Yijing Wang wangyij...@huawei.com
 ---
  arch/powerpc/kernel/msi.c |   14 --
  1 files changed, 12 insertions(+), 2 deletions(-)
 
 diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c
 index 71bd161..01781a4 100644
 --- a/arch/powerpc/kernel/msi.c
 +++ b/arch/powerpc/kernel/msi.c
 [...]
 @@ -27,7 +27,17 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, 
 int type)
  return ppc_md.setup_msi_irqs(dev, nvec, type);
  }
 
 -void arch_teardown_msi_irqs(struct pci_dev *dev)
 +static void ppc_teardown_msi_irqs(struct pci_dev *dev)
 
   Shouldn't this function take IRQ # instead?

This function need to teardown all msi irqs of the pci dev, we should pass the 
pci dev as argument .

Thanks!
Yijing.

 
  {
  ppc_md.teardown_msi_irqs(dev);
  }
 +
 +static struct msi_chip ppc_msi_chip = {
 +.setup_irqs = ppc_setup_msi_irqs,
 +.teardown_irqs = ppc_teardown_msi_irqs,
 +};
 +
 +struct msi_chip *arch_find_msi_chip(struct pci_dev *dev)
 +{
 +return ppc_msi_chip;
 +}
 
 WBR, Sergei
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-pci in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: bit fields data tearing

2014-09-08 Thread Marc Gauthier

Paul E. McKenney wrote:
On Fri, Sep 05, 2014 at 02:50:31PM -0400, Peter Hurley wrote:
On 09/05/2014 02:09 PM, Paul E. McKenney wrote:
 This commit documents the fact that it is not safe to use bitfields as
 shared variables in synchronization algorithms.  It also documents that
 CPUs must provide one-byte and two-byte load and store instructions
^
 atomic
 
 Here you meant non-atomic?  My guess is that you are referring to the
 fact that you could emulate a one-byte store on pre-EV56 Alpha CPUs
 using the ll and sc atomic-read-modify-write instructions, correct?
 
 in order to be supported by the Linux kernel.  (Michael Cree
 has agreed to the resulting non-support of pre-EV56 Alpha CPUs:
 https://lkml.org/lkml/2014/9/5/143.
[...]

 + and 64-bit systems, respectively.  Note that this means that the
 + Linux kernel does not support pre-EV56 Alpha CPUs, because these
 + older CPUs do not provide one-byte and two-byte loads and stores.
  ^
 non-atomic
 
 I took this, thank you!

Eum, am I totally lost, or aren't both of these supposed to say atomic ?

Can't imagine requiring a CPU to provide non-atomic loads and stores
(i.e. requiring old Alpha behavior?).

-Marc
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] pseries: Fix endianness in cpu hotplug and hotremove

2014-09-08 Thread Thomas Falcon

I guess we were both working on it independently.  I had made the 
changes to hotplug a cpu a few weeks ago, but was blocked on removing a 
cpu.  Last week I realized what was blocking me and fixed cpu removal, 
so I sent the patch along.  Since Bharata has already submitted a patch 
that will handle hotplugging, I'll resubmit this with only the changes 
needed to remove a cpu.


Tom

On 09/08/2014 09:07 AM, Nathan Fontenot wrote:

It looks like you have a lot of the same changes as the patch Bharata
sent out last week. Including the one issue I saw in Bharata's patch
below.

On 09/05/2014 02:09 PM, Thomas Falcon wrote:

This patch attempts to ensure that all values are in the proper
endianness format when both hotadding and hotremoving cpus.

Signed-off-by: Thomas Falcon tlfal...@linux.vnet.ibm.com
---
  arch/powerpc/platforms/pseries/dlpar.c   | 56 ++--
  arch/powerpc/platforms/pseries/hotplug-cpu.c | 20 +-
  2 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dlpar.c 
b/arch/powerpc/platforms/pseries/dlpar.c
index a2450b8..c1d7e40 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -24,11 +24,11 @@
  #include asm/rtas.h
  
  struct cc_workarea {

-   u32 drc_index;
-   u32 zero;
-   u32 name_offset;
-   u32 prop_length;
-   u32 prop_offset;
+   __be32  drc_index;
+   __be32  zero;
+   __be32  name_offset;
+   __be32  prop_length;
+   __be32  prop_offset;
  };
  
  void dlpar_free_cc_property(struct property *prop)

@@ -48,11 +48,11 @@ static struct property *dlpar_parse_cc_property(struct 
cc_workarea *ccwa)
if (!prop)
return NULL;
  
-	name = (char *)ccwa + ccwa-name_offset;

+   name = (char *)ccwa + be32_to_cpu(ccwa-name_offset);
prop-name = kstrdup(name, GFP_KERNEL);
  
-	prop-length = ccwa-prop_length;

-   value = (char *)ccwa + ccwa-prop_offset;
+   prop-length = be32_to_cpu(ccwa-prop_length);
+   value = (char *)ccwa + be32_to_cpu(ccwa-prop_offset);
prop-value = kmemdup(value, prop-length, GFP_KERNEL);
if (!prop-value) {
dlpar_free_cc_property(prop);
@@ -78,7 +78,7 @@ static struct device_node *dlpar_parse_cc_node(struct 
cc_workarea *ccwa,
if (!dn)
return NULL;
  
-	name = (char *)ccwa + ccwa-name_offset;

+   name = (char *)ccwa + be32_to_cpu(ccwa-name_offset);
dn-full_name = kasprintf(GFP_KERNEL, %s/%s, path, name);
if (!dn-full_name) {
kfree(dn);
@@ -148,7 +148,7 @@ struct device_node *dlpar_configure_connector(u32 drc_index,
return NULL;
  
  	ccwa = (struct cc_workarea *)data_buf[0];

-   ccwa-drc_index = drc_index;
+   ccwa-drc_index = cpu_to_be32(drc_index);

This will break partition migration.

The drc index valued passed into dlpar_configure_connector() from the migration
path, pseries_devicetree_update(), is already in BE format.

-Nathan


ccwa-zero = 0;
  
  	do {

@@ -363,10 +363,10 @@ static int dlpar_online_cpu(struct device_node *dn)
int rc = 0;
unsigned int cpu;
int len, nthreads, i;
-   const u32 *intserv;
+   const __be32 *intserv_be;
  
-	intserv = of_get_property(dn, ibm,ppc-interrupt-server#s, len);

-   if (!intserv)
+   intserv_be = of_get_property(dn, ibm,ppc-interrupt-server#s, len);
+   if (!intserv_be)
return -EINVAL;
  
  	nthreads = len / sizeof(u32);

@@ -374,7 +374,7 @@ static int dlpar_online_cpu(struct device_node *dn)
cpu_maps_update_begin();
for (i = 0; i  nthreads; i++) {
for_each_present_cpu(cpu) {
-   if (get_hard_smp_processor_id(cpu) != intserv[i])
+   if (get_hard_smp_processor_id(cpu) != 
be32_to_cpu(intserv_be[i]))
continue;
BUG_ON(get_cpu_current_state(cpu)
!= CPU_STATE_OFFLINE);
@@ -388,7 +388,7 @@ static int dlpar_online_cpu(struct device_node *dn)
}
if (cpu == num_possible_cpus())
printk(KERN_WARNING Could not find cpu to online 
-  with physical id 0x%x\n, intserv[i]);
+  with physical id 0x%x\n, 
be32_to_cpu(intserv_be[i]));
}
cpu_maps_update_done();
  
@@ -442,18 +442,17 @@ static int dlpar_offline_cpu(struct device_node *dn)

int rc = 0;
unsigned int cpu;
int len, nthreads, i;
-   const u32 *intserv;
+   const __be32 *intserv_be;
  
-	intserv = of_get_property(dn, ibm,ppc-interrupt-server#s, len);

-   if (!intserv)
+   intserv_be = of_get_property(dn, ibm,ppc-interrupt-server#s, len);
+   if (!intserv_be)
return -EINVAL;
  
  	nthreads = len / sizeof(u32);

-

Re: bit fields data tearing

2014-09-08 Thread One Thousand Gnomes

On Fri, 05 Sep 2014 08:41:52 -0700
H. Peter Anvin h...@zytor.com wrote:

 On 09/05/2014 08:31 AM, Peter Hurley wrote:
  
  Which is a bit ironic because I remember when Digital had a team
  working on emulating native x86 apps on Alpha/NT.
  
 
 Right, because the x86 architecture was obsolete and would never scale...

Talking about not scaling can anyone explain how a you need to use
set_bit() and friends bug report scaled into a hundred message plus
discussion about ambiguous properties of processors (and nobody has
audited all the embedded platforms we support yet, or the weirder ARMs)
and a propsal to remove Alpha support.

Wouldn't it be *much* simpler to do what I suggested in the first place
and use the existing intended for purpose, deliberately put there,
functions for atomic bitops, because they are fast on sane processors and
they work on everything else.

I think the whole removing Alpha EV5 support is basically bonkers. Just
use set_bit in the tty layer. Alpha will continue to work as well as it
always has done and you won't design out support for any future processor
that turns out not to do byte aligned stores.

Alan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin

On 09/08/2014 10:52 AM, One Thousand Gnomes wrote:
 On Fri, 05 Sep 2014 08:41:52 -0700
 H. Peter Anvin h...@zytor.com wrote:
 
 On 09/05/2014 08:31 AM, Peter Hurley wrote:

 Which is a bit ironic because I remember when Digital had a team
 working on emulating native x86 apps on Alpha/NT.


 Right, because the x86 architecture was obsolete and would never scale...
 
 Talking about not scaling can anyone explain how a you need to use
 set_bit() and friends bug report scaled into a hundred message plus
 discussion about ambiguous properties of processors (and nobody has
 audited all the embedded platforms we support yet, or the weirder ARMs)
 and a propsal to remove Alpha support.
 
 Wouldn't it be *much* simpler to do what I suggested in the first place
 and use the existing intended for purpose, deliberately put there,
 functions for atomic bitops, because they are fast on sane processors and
 they work on everything else.
 
 I think the whole removing Alpha EV5 support is basically bonkers. Just
 use set_bit in the tty layer. Alpha will continue to work as well as it
 always has done and you won't design out support for any future processor
 that turns out not to do byte aligned stores.
 
 Alan
 

Is *that* what we are talking about?  I was added to this conversation
in the middle where it had already generalized, so I had no idea.

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin

On 09/07/2014 10:56 PM, James Bottomley wrote:
 On Sun, 2014-09-07 at 16:39 -0700, H. Peter Anvin wrote:
 How many PARISC systems do we have that actually do real work on Linux?
 
 I'd be very surprised if this problem didn't exist on all alignment
 requiring architectures, like PPC and Sparc as well.  I know it would be
 very convenient if all the world were an x86 ... but it would also be
 very boring as well.

I wouldn't be so sure about that.  That is a pretty aggressive
relaxation of ordering that PARISC has enacted here, kind of like the
Alpha we don't need no stinking byte accesses.

 The rules for coping with it are well known and a relaxation of what we
 currently do in the kernel, so I don't see what the actual problem is.
 
 In the context of this thread, PA can't do atomic bit sets (no atomic
 RMW except the ldcw operation) it can do atomic writes to fundamental
 sizes (byte, short, int, long) provided gcc emits the correct primitive
 (there are lots of gotchas in this, but that's not an architectural
 problem).  These atomicity guarantees depend on the underlying storage
 and are respected for main memory but not for any other type of bus.

So I'm not trying to push the all the world is an x86... certainly not
given that x86 has abnormally strict ordering rules and so is itself an
outlier.  What I *don't* really want to have to deal with is going
through more than causal effort to accommodate outliers which no longer
have any real value -- we have too much work to do.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread James Bottomley

On Mon, 2014-09-08 at 18:52 +0100, One Thousand Gnomes wrote:
 On Fri, 05 Sep 2014 08:41:52 -0700
 H. Peter Anvin h...@zytor.com wrote:
 
  On 09/05/2014 08:31 AM, Peter Hurley wrote:
   
   Which is a bit ironic because I remember when Digital had a team
   working on emulating native x86 apps on Alpha/NT.
   
  
  Right, because the x86 architecture was obsolete and would never scale...
 
 Talking about not scaling can anyone explain how a you need to use
 set_bit() and friends bug report scaled into a hundred message plus
 discussion about ambiguous properties of processors (and nobody has
 audited all the embedded platforms we support yet, or the weirder ARMs)
 and a propsal to remove Alpha support.
 
 Wouldn't it be *much* simpler to do what I suggested in the first place
 and use the existing intended for purpose, deliberately put there,
 functions for atomic bitops, because they are fast on sane processors and
 they work on everything else.
 
 I think the whole removing Alpha EV5 support is basically bonkers. Just
 use set_bit in the tty layer. Alpha will continue to work as well as it
 always has done and you won't design out support for any future processor
 that turns out not to do byte aligned stores.

Seconded.  We implement via hashed spinlocks on PA ... but hey, we're
not the fastest architecture anyway and semantically it just works.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread James Bottomley

On Mon, 2014-09-08 at 11:12 -0700, H. Peter Anvin wrote:
 On 09/07/2014 10:56 PM, James Bottomley wrote:
  On Sun, 2014-09-07 at 16:39 -0700, H. Peter Anvin wrote:
  How many PARISC systems do we have that actually do real work on Linux?
  
  I'd be very surprised if this problem didn't exist on all alignment
  requiring architectures, like PPC and Sparc as well.  I know it would be
  very convenient if all the world were an x86 ... but it would also be
  very boring as well.
 
 I wouldn't be so sure about that.  That is a pretty aggressive
 relaxation of ordering that PARISC has enacted here, kind of like the
 Alpha we don't need no stinking byte accesses.

Um, I think you need to re-read the thread; that's not what I said at
all. It's even written lower down: PA can't do atomic bit sets (no
atomic RMW except the ldcw operation) it can do atomic writes to
fundamental sizes (byte, short, int, long) provided gcc emits the
correct primitive.  The original question was whether atomicity
required native bus width access, which we currently assume, so there's
no extant problem.

  The rules for coping with it are well known and a relaxation of what we
  currently do in the kernel, so I don't see what the actual problem is.
  
  In the context of this thread, PA can't do atomic bit sets (no atomic
  RMW except the ldcw operation) it can do atomic writes to fundamental
  sizes (byte, short, int, long) provided gcc emits the correct primitive
  (there are lots of gotchas in this, but that's not an architectural
  problem).  These atomicity guarantees depend on the underlying storage
  and are respected for main memory but not for any other type of bus.
 
 So I'm not trying to push the all the world is an x86... certainly not
 given that x86 has abnormally strict ordering rules and so is itself an
 outlier.  What I *don't* really want to have to deal with is going
 through more than causal effort to accommodate outliers which no longer
 have any real value -- we have too much work to do.

What accommodation?  I was just explaining the architectural atomicity
rules on PA.  How they're bus dependent (which isn't a PA restriction)
and how the compiler can mess them up.  None of the current PA atomicity
rules conflicts with anything we do today ... the danger is that moving
to implicit atomicity gives us more scope for compiler problems ... but
that's not PA specific either; any architecture requiring alignment has
this problem.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin

On 09/08/2014 12:09 PM, James Bottomley wrote:
 
 Um, I think you need to re-read the thread; that's not what I said at
 all. It's even written lower down: PA can't do atomic bit sets (no
 atomic RMW except the ldcw operation) it can do atomic writes to
 fundamental sizes (byte, short, int, long) provided gcc emits the
 correct primitive.  The original question was whether atomicity
 required native bus width access, which we currently assume, so there's
 no extant problem.
 

The issue at hand was whether or not partially overlapped (but natually
aligned) writes can pass each other.  *This* is the aggressive
relaxation to which I am referring.

I would guess that that is a very unusual constraint.

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread One Thousand Gnomes

  I think the whole removing Alpha EV5 support is basically bonkers. Just
  use set_bit in the tty layer. Alpha will continue to work as well as it
  always has done and you won't design out support for any future processor
  that turns out not to do byte aligned stores.
  
  Alan
  
 
 Is *that* what we are talking about?  I was added to this conversation
 in the middle where it had already generalized, so I had no idea.
 
   -hpa

Yes there are some flags in the tty layer that are vulnerable to this
(although they've been vulnerable to it and missing a lock since last
century with no known ill effects).

It's not as if this is causing any work to anyone beyond using the
standard API, no API needs to change.

Alan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin

On 09/08/2014 12:09 PM, James Bottomley wrote:
 
 Um, I think you need to re-read the thread; that's not what I said at
 all. It's even written lower down: PA can't do atomic bit sets (no
 atomic RMW except the ldcw operation) it can do atomic writes to
 fundamental sizes (byte, short, int, long) provided gcc emits the
 correct primitive.  The original question was whether atomicity
 required native bus width access, which we currently assume, so there's
 no extant problem.
 

The issue at hand was whether or not partially overlapped (but natually
aligned) writes can pass each other.  *This* is the aggressive
relaxation to which I am referring.

I would guess that that is a very unusual constraint.

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/3] init/main.c: Give init_task a canary

2014-09-08 Thread Aaron Tomlin

Tasks get their end of stack set to STACK_END_MAGIC with the
aim to catch stack overruns. Currently this feature does not
apply to init_task. This patch removes this restriction.

Note that a similar patch was posted by Prarit Bhargava [1]
some time ago but was never merged.

[1]: http://marc.info/?l=linux-kernelm=127144305403241w=2

Signed-off-by: Aaron Tomlin atom...@redhat.com
---
 arch/powerpc/mm/fault.c|  3 +--
 arch/x86/mm/fault.c|  3 +--
 include/linux/sched.h  |  2 ++
 init/main.c|  1 +
 kernel/fork.c  | 12 +---
 kernel/trace/trace_stack.c |  4 +---
 6 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 51ab9e7..35d0760c 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -30,7 +30,6 @@
 #include linux/kprobes.h
 #include linux/kdebug.h
 #include linux/perf_event.h
-#include linux/magic.h
 #include linux/ratelimit.h
 #include linux/context_tracking.h
 
@@ -538,7 +537,7 @@ void bad_page_fault(struct pt_regs *regs, unsigned long 
address, int sig)
regs-nip);
 
stackend = end_of_stack(current);
-   if (current != init_task  *stackend != STACK_END_MAGIC)
+   if (*stackend != STACK_END_MAGIC)
printk(KERN_ALERT Thread overran stack, or stack corrupted\n);
 
die(Kernel access of bad area, regs, sig);
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a241946..bc23a70 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -3,7 +3,6 @@
  *  Copyright (C) 2001, 2002 Andi Kleen, SuSE Labs.
  *  Copyright (C) 2008-2009, Red Hat Inc., Ingo Molnar
  */
-#include linux/magic.h   /* STACK_END_MAGIC  */
 #include linux/sched.h   /* test_thread_flag(), ...  */
 #include linux/kdebug.h  /* oops_begin/end, ...  */
 #include linux/module.h  /* search_exception_table   */
@@ -710,7 +709,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
show_fault_oops(regs, error_code, address);
 
stackend = end_of_stack(tsk);
-   if (tsk != init_task  *stackend != STACK_END_MAGIC)
+   if (*stackend != STACK_END_MAGIC)
printk(KERN_EMERG Thread overran stack, or stack corrupted\n);
 
tsk-thread.cr2 = address;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5c2c885..4dee9d7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -57,6 +57,7 @@ struct sched_param {
 #include linux/llist.h
 #include linux/uidgid.h
 #include linux/gfp.h
+#include linux/magic.h
 
 #include asm/processor.h
 
@@ -2636,6 +2637,7 @@ static inline unsigned long stack_not_used(struct 
task_struct *p)
return (unsigned long)n - (unsigned long)end_of_stack(p);
 }
 #endif
+extern void task_stack_end_magic(struct task_struct *tsk);
 
 /* set thread flags in other task's structures
  * - see asm/thread_info.h for TIF_ flags available
diff --git a/init/main.c b/init/main.c
index bb1aed9..bcdabf3a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -508,6 +508,7 @@ asmlinkage __visible void __init start_kernel(void)
 * lockdep hash:
 */
lockdep_init();
+   task_stack_end_magic(init_task);
smp_setup_processor_id();
debug_objects_early_init();
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 0cf9cdb..67cb1e3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -294,11 +294,18 @@ int __weak arch_dup_task_struct(struct task_struct *dst,
return 0;
 }
 
+void task_stack_end_magic(struct task_struct *tsk)
+{
+   unsigned long *stackend;
+
+   stackend = end_of_stack(tsk);
+   *stackend = STACK_END_MAGIC;/* for overflow detection */
+}
+
 static struct task_struct *dup_task_struct(struct task_struct *orig)
 {
struct task_struct *tsk;
struct thread_info *ti;
-   unsigned long *stackend;
int node = tsk_fork_get_node(orig);
int err;
 
@@ -328,8 +335,7 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig)
setup_thread_stack(tsk, orig);
clear_user_return_notifier(tsk);
clear_tsk_need_resched(tsk);
-   stackend = end_of_stack(tsk);
-   *stackend = STACK_END_MAGIC;/* for overflow detection */
+   task_stack_end_magic(tsk);
 
 #ifdef CONFIG_CC_STACKPROTECTOR
tsk-stack_canary = get_random_int();
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 8a4e5cb..1636e41 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -13,7 +13,6 @@
 #include linux/sysctl.h
 #include linux/init.h
 #include linux/fs.h
-#include linux/magic.h
 
 #include asm/setup.h
 
@@ -171,8 +170,7 @@ check_stack(unsigned long ip, unsigned long *stack)
i++;
}
 
-   if ((current != init_task 
-   *(end_of_stack(current)) != STACK_END_MAGIC)) {
+   if (*end_of_stack(current) !=

[PATCH 2/3] sched: Add helper for task stack page overrun checking

2014-09-08 Thread Aaron Tomlin

This facility is used in a few places so let's introduce
a helper function to improve code readability.

Signed-off-by: Aaron Tomlin atom...@redhat.com
---
 arch/powerpc/mm/fault.c| 4 +---
 arch/x86/mm/fault.c| 4 +---
 include/linux/sched.h  | 2 ++
 kernel/trace/trace_stack.c | 2 +-
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 35d0760c..99b2f27 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -507,7 +507,6 @@ bail:
 void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
 {
const struct exception_table_entry *entry;
-   unsigned long *stackend;
 
/* Are we prepared to handle this fault?  */
if ((entry = search_exception_tables(regs-nip)) != NULL) {
@@ -536,8 +535,7 @@ void bad_page_fault(struct pt_regs *regs, unsigned long 
address, int sig)
printk(KERN_ALERT Faulting instruction address: 0x%08lx\n,
regs-nip);
 
-   stackend = end_of_stack(current);
-   if (*stackend != STACK_END_MAGIC)
+   if (task_stack_end_corrupted(current))
printk(KERN_ALERT Thread overran stack, or stack corrupted\n);
 
die(Kernel access of bad area, regs, sig);
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index bc23a70..6240bc7 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -648,7 +648,6 @@ no_context(struct pt_regs *regs, unsigned long error_code,
   unsigned long address, int signal, int si_code)
 {
struct task_struct *tsk = current;
-   unsigned long *stackend;
unsigned long flags;
int sig;
 
@@ -708,8 +707,7 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 
show_fault_oops(regs, error_code, address);
 
-   stackend = end_of_stack(tsk);
-   if (*stackend != STACK_END_MAGIC)
+   if (task_stack_end_corrupted(tsk))
printk(KERN_EMERG Thread overran stack, or stack corrupted\n);
 
tsk-thread.cr2 = address;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4dee9d7..6e07cb9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2615,6 +2615,8 @@ static inline unsigned long *end_of_stack(struct 
task_struct *p)
 }
 
 #endif
+#define task_stack_end_corrupted(task) \
+   (*(end_of_stack(task)) != STACK_END_MAGIC)
 
 static inline int object_is_on_stack(void *obj)
 {
diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 1636e41..16eddb3 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -170,7 +170,7 @@ check_stack(unsigned long ip, unsigned long *stack)
i++;
}
 
-   if (*end_of_stack(current) != STACK_END_MAGIC) {
+   if (task_stack_end_corrupted(current)) {
print_max_stack();
BUG();
}
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/3] sched: Always check the integrity of the canary

2014-09-08 Thread Aaron Tomlin

Currently in the event of a stack overrun a call to schedule()
does not check for this type of corruption. This corruption is
often silent and can go unnoticed. However once the corrupted
region is examined at a later stage, the outcome is undefined
and often results in a sporadic page fault which cannot be
handled.

The first patch adds a canary to init_task's end of stack.
While the second patch provides a helper to determine the
integrity of the canary. The third checks for a stack
overrun and takes appropriate action since the damage
is already done, there is no point in continuing.

Changes since v1:

 * Rebased against v3.17-rc4
 * Add a canary to init_task - Oleg Nesterov
 * Fix various code formatting issues - Peter Zijlstra
 * Introduce Kconfig option - Peter Zijlstra

Aaron Tomlin (3):
  init/main.c: Give init_task a canary
  sched: Add helper for task stack page overrun checking
  sched: BUG when stack end location is over written

 arch/powerpc/mm/fault.c|  5 +
 arch/x86/mm/fault.c|  5 +
 include/linux/sched.h  |  4 
 init/main.c|  1 +
 kernel/fork.c  | 12 +---
 kernel/sched/core.c|  4 
 kernel/trace/trace_stack.c |  4 +---
 lib/Kconfig.debug  | 12 
 8 files changed, 33 insertions(+), 14 deletions(-)

-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/3] sched: BUG when stack end location is over written

2014-09-08 Thread Aaron Tomlin

Currently in the event of a stack overrun a call to schedule()
does not check for this type of corruption. This corruption is
often silent and can go unnoticed. However once the corrupted
region is examined at a later stage, the outcome is undefined
and often results in a sporadic page fault which cannot be
handled.

This patch checks for a stack overrun and takes appropriate
action since the damage is already done, there is no point
in continuing.

Signed-off-by: Aaron Tomlin atom...@redhat.com
---
 kernel/sched/core.c |  4 
 lib/Kconfig.debug   | 12 
 2 files changed, 16 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec1a286..182b2d1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2660,6 +2660,10 @@ static noinline void __schedule_bug(struct task_struct 
*prev)
  */
 static inline void schedule_debug(struct task_struct *prev)
 {
+#ifdef CONFIG_SCHED_STACK_END_CHECK
+   if (unlikely(task_stack_end_corrupted(prev)))
+   BUG();
+#endif
/*
 * Test if we are atomic. Since do_exit() needs to call into
 * schedule() atomically, we ignore that path. Otherwise whine
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a285900..2a8280a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -824,6 +824,18 @@ config SCHEDSTATS
  application, you can say N to avoid the very slight overhead
  this adds.
 
+config SCHED_STACK_END_CHECK
+   bool Detect stack corruption on calls to schedule()
+   depends on DEBUG_KERNEL
+   default y
+   help
+ This option checks for a stack overrun on calls to schedule().
+ If the stack end location is found to be over written always panic as
+ the content of the corrupted region can no longer be trusted.
+ This is to ensure no erroneous behaviour occurs which could result in
+ data corruption or a sporadic crash at a later stage once the region
+ is examined. The runtime overhead introduced is minimal.
+
 config TIMER_STATS
bool Collect kernel timers statistics
depends on DEBUG_KERNEL  PROC_FS
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread Chris Metcalf


On 9/8/2014 1:50 AM, James Bottomley wrote:

Actual alignment is pretty irrelevant.  That's why all architectures
which require alignment also have to implement misaligned traps ... this
is a fundamental requirement of the networking code, for instance.


Can you clarify what you think the requirement is?  The tile architecture
doesn't support misaligned load/store in general, but we do support it for
userspace (using a nifty JIT approach with a direct-map hash table kept
in userspace), and also for get_user/put_user.  But that's it, and,
the networking subsystem works fine for us.

Occasionally we report bugs for driver code that doesn't use the
get_unaligned_xxx() macros and friends, and our fixes are generally taken
upstream.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread James Bottomley

On Mon, 2014-09-08 at 16:45 -0400, Chris Metcalf wrote:
 On 9/8/2014 1:50 AM, James Bottomley wrote:
  Actual alignment is pretty irrelevant.  That's why all architectures
  which require alignment also have to implement misaligned traps ... this
  is a fundamental requirement of the networking code, for instance.
 
 Can you clarify what you think the requirement is?  The tile architecture
 doesn't support misaligned load/store in general, but we do support it for
 userspace (using a nifty JIT approach with a direct-map hash table kept
 in userspace), and also for get_user/put_user.  But that's it, and,
 the networking subsystem works fine for us.

This was years ago (possibly decades).  We had to implement in-kernel
unaligned traps for the networking layer because it could access short
and int fields that weren't of the correct alignment when processing
packets.  It that's all corrected now, we wouldn't really notice (except
a bit of a speed up since an unaligned trap effectively places the
broken out instructions into the bit stream).

James


 Occasionally we report bugs for driver code that doesn't use the
 get_unaligned_xxx() macros and friends, and our fixes are generally taken
 upstream.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread Peter Hurley

On 09/08/2014 01:59 PM, H. Peter Anvin wrote:
 On 09/08/2014 10:52 AM, One Thousand Gnomes wrote:
 On Fri, 05 Sep 2014 08:41:52 -0700
 H. Peter Anvin h...@zytor.com wrote:

 On 09/05/2014 08:31 AM, Peter Hurley wrote:

 Which is a bit ironic because I remember when Digital had a team
 working on emulating native x86 apps on Alpha/NT.


 Right, because the x86 architecture was obsolete and would never scale...

 Talking about not scaling can anyone explain how a you need to use
 set_bit() and friends bug report scaled into a hundred message plus
 discussion about ambiguous properties of processors (and nobody has
 audited all the embedded platforms we support yet, or the weirder ARMs)
 and a propsal to remove Alpha support.

 Wouldn't it be *much* simpler to do what I suggested in the first place
 and use the existing intended for purpose, deliberately put there,
 functions for atomic bitops, because they are fast on sane processors and
 they work on everything else.

 I think the whole removing Alpha EV5 support is basically bonkers. Just
 use set_bit in the tty layer. Alpha will continue to work as well as it
 always has done and you won't design out support for any future processor
 that turns out not to do byte aligned stores.

 Alan

 
 Is *that* what we are talking about?  I was added to this conversation
 in the middle where it had already generalized, so I had no idea.

No, this is just what brought this craziness to my attention.

For example, byte- and short-sized circular buffers could not possibly
be safe either, when the head nears the tail.

Who has audited global storage and ensured that _every_ byte-sized write
doesn't happen to be adjacent to some other storage that may not happen
to be protected by the same (or any) lock?

Regards,
Peter Hurley



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread Peter Hurley

On 09/08/2014 01:50 AM, James Bottomley wrote:
 On Sun, 2014-09-07 at 16:41 -0400, Peter Hurley wrote:
 On 09/07/2014 03:04 PM, James Bottomley wrote:
 On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
 On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
 On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
 On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
 Hi James,

 On 09/04/2014 10:11 PM, James Bottomley wrote:
 On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
 +And there are anti-guarantees:
 +
 + (*) These guarantees do not apply to bitfields, because compilers 
 often
 + generate code to modify these using non-atomic read-modify-write
 + sequences.  Do not attempt to use bitfields to synchronize 
 parallel
 + algorithms.
 +
 + (*) Even in cases where bitfields are protected by locks, all fields
 + in a given bitfield must be protected by one lock.  If two 
 fields
 + in a given bitfield are protected by different locks, the 
 compiler's
 + non-atomic read-modify-write sequences can cause an update to 
 one
 + field to corrupt the value of an adjacent field.
 +
 + (*) These guarantees apply only to properly aligned and sized scalar
 + variables.  Properly sized currently means int and long,
 + because some CPU families do not support loads and stores of
 + other sizes.  (Some CPU families is currently believed to
 + be only Alpha 21064.  If this is actually the case, a different
 + non-guarantee is likely to be formulated.)

 This is a bit unclear.  Presumably you're talking about definiteness of
 the outcome (as in what's seen after multiple stores to the same
 variable).

 No, the last conditions refers to adjacent byte stores from different
 cpu contexts (either interrupt or SMP).

 The guarantees are only for natural width on Parisc as well,
 so you would get a mess if you did byte stores to adjacent memory
 locations.

 For a simple test like:

 struct x {
 long a;
 char b;
 char c;
 char d;
 char e;
 };

 void store_bc(struct x *p) {
 p-b = 1;
 p-c = 2;
 }

 on parisc, gcc generates separate byte stores

 void store_bc(struct x *p) {
0:   34 1c 00 02 ldi 1,ret0
4:   0f 5c 12 08 stb ret0,4(r26)
8:   34 1c 00 04 ldi 2,ret0
c:   e8 40 c0 00 bv r0(rp)
   10:   0f 5c 12 0a stb ret0,5(r26)

 which appears to confirm that on parisc adjacent byte data
 is safe from corruption by concurrent cpu updates; that is,

 CPU 0| CPU 1
  |
 p-b = 1 | p-c = 2
  |

 will result in p-b == 1  p-c == 2 (assume both values
 were 0 before the call to store_bc()).

 What Peter said.  I would ask for suggestions for better wording, but
 I would much rather be able to say that single-byte reads and writes
 are atomic and that aligned-short reads and writes are also atomic.

 Thus far, it looks like we lose only very old Alpha systems, so unless
 I hear otherwise, I update my patch to outlaw these very old systems.

 This isn't universally true according to the architecture manual.  The
 PARISC CPU can make byte to long word stores atomic against the memory
 bus but not against the I/O bus for instance.  Atomicity is a property
 of the underlying substrate, not of the CPU.  Implying that atomicity is
 a CPU property is incorrect.

 To go back to this briefly, while it's true that atomicity is not
 a CPU property, atomicity is not possible if the CPU is not cooperating.
 
 You mean if it doesn't have the bus logic to emit?  Like ia32 mostly
 can't do 64 bit writes ... sure.

Yeah, or Alphas that do rmw for byte storage.

 OK, fair point.

 But are there in-use-for-Linux PARISC memory fabrics (for normal memory,
 not I/O) that do not support single-byte and double-byte stores?

 For aligned access, I believe that's always the case for the memory bus
 (on both 32 and 64 bit systems).  However, it only applies to machine
 instruction loads and stores of the same width..  If you mix the widths
 on the loads and stores, all bets are off.  That means you have to
 beware of the gcc penchant for coalescing loads and stores: if it sees
 two adjacent byte stores it can coalesce them into a short store
 instead ... that screws up the atomicity guarantees.

 Two things: I think that gcc has given up on combining adjacent writes,
 perhaps because unaligned writes on some arches are prohibitive, so
 whatever minor optimization was believed to be gained was quickly lost,
 multi-fold. (Although this might not be the reason since one would
 expect that gcc would know the alignment of the promoted store).
 
 Um, gcc assumes architecturally correct alignment; that's why it pads
 structures.  Only when accessing packed structures will it use the
 lowest unit load/store.
 
 if you have a struct { char a, char b }; and load first a then b with a
 constant gcc will obligingly optimise to a

Re: [PATCH V2 0/2] cpufreq/powernv: Set core pstate to a minimum just before hotplugging it out

2014-09-08 Thread Rafael J. Wysocki

On Friday, September 05, 2014 01:11:22 PM Viresh Kumar wrote:
 On 5 September 2014 13:09, Preeti U Murthy pre...@linux.vnet.ibm.com wrote:
  Today cpus go to winkle when they are offlined. Since it is the deepest
  idle state that we have, it is expected to save good amount of power as 
  compared
  to online state, where cores can enter nap/fastsleep only which are
  shallower idle states.
  However we observed no powersavings with winkle as compared to nap/fastsleep
  and traced the problem to the pstate of the core being kept at a high even
  when the core is offline. This can keep the socket pstate high, thus burning
  power unnecessarily. This patchset fixes this issue.
 
  Changes in V2: Changed smp_call_function_any() to 
  smp_call_function_single() in Patch[2/2]
 
 Acked-by: Viresh Kumar viresh.ku...@linaro.org

I've queued up the two patches for 3.18, thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] PCI/MSI: Remove arch_msi_check_device()

2014-09-08 Thread Michael Ellerman

On Fri, 2014-09-05 at 15:27 -0600, Bjorn Helgaas wrote:
 On Fri, Sep 5, 2014 at 3:25 PM, Bjorn Helgaas bhelg...@google.com wrote:
  On Sat, Jul 12, 2014 at 01:21:06PM +0200, Alexander Gordeev wrote:
  Hello,
 
  This is a cleanup effort to get rid of useless arch_msi_check_device().
  I am not sure what were the reasons for its existence in the first place,
  but at the moment it appears totally unnecessary.
 
  Thanks!
 
  Cc: linuxppc-dev@lists.ozlabs.org
  Cc: linux-...@vger.kernel.org
 
  Alexander Gordeev (2):
PCI/MSI/PPC: Remove arch_msi_check_device()
PCI/MSI: Remove arch_msi_check_device()
 
  I applied these (with Michael's ack on the first, and v2 of the second) to
  pci/msi for v3.18, thanks!
 
 Oh, I forgot -- if you'd rather take the first one through the PPC
 tree, you can do that and I can merge the second one later.  Let me
 know if you want to do that.

You take them all, that code shouldn't be changing much in the powerpc tree.

If the merge gets messy closer to the merge window we can always take the first
patch into the powerpc tree then.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: Add macros for the ibm_architecture_vec[] lengths

2014-09-08 Thread Stewart Smith

Michael Ellerman m...@ellerman.id.au writes:
 The encoding of the lengths in the ibm_architecture_vec array is
 interesting to say the least. It's non-obvious how the number of bytes
 we provide relates to the length value.

 In fact we already got it wrong once, see 11e9ed43ca8a Fix up
 ibm_architecture_vec definition.

 So add some macros to make it (hopefully) clearer. These at least have
 the property that the integer present in the code is equal to the number
 of bytes that follows it.

 Signed-off-by: Michael Ellerman m...@ellerman.id.au

Seems at least as correct as the code was before... so,

Reviewed-by: Stewart Smith stew...@linux.vnet.ibm.com

(not actually compiled or tested or anything, but my internal C
preprocesser says it looks okay :)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread Paul E. McKenney

On Mon, Sep 08, 2014 at 06:47:35PM -0400, Peter Hurley wrote:
 On 09/08/2014 01:59 PM, H. Peter Anvin wrote:
  On 09/08/2014 10:52 AM, One Thousand Gnomes wrote:
  On Fri, 05 Sep 2014 08:41:52 -0700
  H. Peter Anvin h...@zytor.com wrote:
 
  On 09/05/2014 08:31 AM, Peter Hurley wrote:
 
  Which is a bit ironic because I remember when Digital had a team
  working on emulating native x86 apps on Alpha/NT.
 
 
  Right, because the x86 architecture was obsolete and would never scale...
 
  Talking about not scaling can anyone explain how a you need to use
  set_bit() and friends bug report scaled into a hundred message plus
  discussion about ambiguous properties of processors (and nobody has
  audited all the embedded platforms we support yet, or the weirder ARMs)
  and a propsal to remove Alpha support.
 
  Wouldn't it be *much* simpler to do what I suggested in the first place
  and use the existing intended for purpose, deliberately put there,
  functions for atomic bitops, because they are fast on sane processors and
  they work on everything else.
 
  I think the whole removing Alpha EV5 support is basically bonkers. Just
  use set_bit in the tty layer. Alpha will continue to work as well as it
  always has done and you won't design out support for any future processor
  that turns out not to do byte aligned stores.
 
  Alan
 
  
  Is *that* what we are talking about?  I was added to this conversation
  in the middle where it had already generalized, so I had no idea.
 
 No, this is just what brought this craziness to my attention.
 
 For example, byte- and short-sized circular buffers could not possibly
 be safe either, when the head nears the tail.
 
 Who has audited global storage and ensured that _every_ byte-sized write
 doesn't happen to be adjacent to some other storage that may not happen
 to be protected by the same (or any) lock?

This was my concern as well.

Thanx, Paul

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [Xen-devel] [PATCH v1 08/21] x86/xen/MSI: Use MSI chip framework to configure MSI/MSI-X irq

2014-09-08 Thread Yijing Wang

On 2014/9/5 22:29, David Vrabel wrote:
 On 05/09/14 11:09, Yijing Wang wrote:
 Use MSI chip framework instead of arch MSI functions to configure
 MSI/MSI-X irq. So we can manage MSI/MSI-X irq in a unified framework.
 [...]
 --- a/arch/x86/pci/xen.c
 +++ b/arch/x86/pci/xen.c
 [...]
 @@ -418,9 +430,9 @@ int __init pci_xen_init(void)
  #endif
  
  #ifdef CONFIG_PCI_MSI
 -x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
 -x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
 -x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
 +xen_msi_chip.setup_irqs = xen_setup_msi_irqs;
 +xen_msi_chip.teardown_irqs = xen_teardown_msi_irqs;
 +x86_msi_chip = xen_msi_chip;
  msi_chip.irq_mask = xen_nop_msi_mask;
  msi_chip.irq_unmask = xen_nop_msi_mask;
 
 Why have these not been changed to set the x86_msi_chip.mask/unmask
 fields instead?

Hi David, x86_msi_chip here is struct msi_chip data type, used to configure 
MSI/MSI-X
irq. msi_chip above is struct irq_chip data type, represent the MSI irq 
controller. They are
not the same object. Their name easily confusing people.

Defined in arch/x86/kernel/apic/io_apic.c
/*
 * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
 * which implement the MSI or MSI-X Capability Structure.
 */
static struct irq_chip msi_chip = {
.name   = PCI-MSI,
.irq_unmask = unmask_msi_irq,
.irq_mask   = mask_msi_irq,
.irq_ack= ack_apic_edge,
.irq_set_affinity   = msi_set_affinity,
.irq_retrigger  = ioapic_retrigger_irq,
};


Defined in arch/x86/kernel/apic/io_apic.c, introduced in patch 7/21
struct msi_chip apic_msi_chip = {
.setup_irqs = native_setup_msi_irqs,
.teardown_irq = native_teardown_msi_irq,
};
[...]
struct msi_chip *x86_msi_chip = apic_msi_chip;


Thanks!
Yijing.

 
 David
 
 .
 


-- 
Thanks!
Yijing

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin

On 09/08/2014 03:43 PM, James Bottomley wrote:
 
 This was years ago (possibly decades).  We had to implement in-kernel
 unaligned traps for the networking layer because it could access short
 and int fields that weren't of the correct alignment when processing
 packets.  It that's all corrected now, we wouldn't really notice (except
 a bit of a speed up since an unaligned trap effectively places the
 broken out instructions into the bit stream).
 
 James
 

Well, ARM doesn't trap, it just silently gives garbage on unaligned
memory references.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread James Bottomley

On Mon, 2014-09-08 at 19:30 -0400, Peter Hurley wrote:
 On 09/08/2014 01:50 AM, James Bottomley wrote:
  Two things: I think that gcc has given up on combining adjacent writes,
  perhaps because unaligned writes on some arches are prohibitive, so
  whatever minor optimization was believed to be gained was quickly lost,
  multi-fold. (Although this might not be the reason since one would
  expect that gcc would know the alignment of the promoted store).
  
  Um, gcc assumes architecturally correct alignment; that's why it pads
  structures.  Only when accessing packed structures will it use the
  lowest unit load/store.
  
  if you have a struct { char a, char b }; and load first a then b with a
  constant gcc will obligingly optimise to a short store.
 
 Um, please re-look at the test above. The exact test you describe is
 coded above and compiled with gcc 4.6.3 cross-compiler for parisc using
 the kernel compiler options.
 
 In the generated code, please note the _absence_ of a combined write
 to two adjacent byte stores.

So one version of gcc doesn't do it.  Others do because I've been
surprised seeing it in assembly.

  But additionally, even if gcc combines adjacent writes _that are part
  of the program flow_ then I believe the situation is no worse than
  would otherwise exist.
 
  For instance, given the following:
 
  struct x {
 spinlock_t lock;
 long a;
 byte b;
 byte c;
  };
 
  void locked_store_b(struct x *p)
  {
 spin_lock(p-lock);
 p-b = 1;
 spin_unlock(p-lock);
 p-c = 2;
  }
 
  Granted, the author probably expects ordered writes of
 STORE B
 STORE C
  but that's not guaranteed because there is no memory barrier
  ordering B before C.
  
  Yes, there is: loads and stores may not migrate into or out of critical
  sections.
 
 That's a common misconception.
 
 The processor is free to re-order this to:
 
   STORE C
   STORE B
   UNLOCK
 
 That's because the unlock() only guarantees that:
 
 Stores before the unlock in program order are guaranteed to complete
 before the unlock completes. Stores after the unlock _may_ complete
 before the unlock completes.
 
 My point was that even if compiler barriers had the same semantics
 as memory barriers, the situation would be no worse. That is, code
 that is sensitive to memory barriers (like the example I gave above)
 would merely have the same fragility with one-way compiler barriers
 (with respect to the compiler combining writes).
 
 That's what I meant by no worse than would otherwise exist.

Actually, that's not correct.  This is actually deja vu with me on the
other side of the argument.  When we first did spinlocks on PA, I argued
as you did: lock only a barrier for code after and unlock for code
before.  The failing case is that you can have a critical section which
performs an atomically required operation and a following unit which
depends on it being performed.  If you begin the following unit before
the atomic requirement, you may end up losing.  It turns out this kind
of pattern is inherent in a lot of mail box device drivers: you need to
set up the mailbox atomically then poke it.  Setup is usually atomic,
deciding which mailbox to prime and actually poking it is in the
following unit.  Priming often involves an I/O bus transaction and if
you poke before priming, you get a misfire.

  I see no problem with gcc write-combining in the absence of
  memory barriers (as long as alignment is still respected,
  ie., the size-promoted write is still naturally aligned). The
  combined write is still atomic.
  
  Actual alignment is pretty irrelevant.  That's why all architectures
  which require alignment also have to implement misaligned traps ... this
  is a fundamental requirement of the networking code, for instance.
  
  The main problem is gcc thinking there's a misalignment (or a packed
  structure).  That causes splitting of loads and stores and that destroys
  atomicity.
 
 Yeah, the extra requirement I added is basically nonsense, since the
 only issue is what instructions the compiler is emitting. So if compiler
 thinks the alignment is natural and combines the writes -- ok. If the
 compiler thinks the alignment is off and doesn't combine the writes --
 also ok.

Yes, I think I can agree that the only real problem is gcc thinking the
store or load needs splitting.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin

On 09/08/2014 07:56 PM, James Bottomley wrote:

 Yeah, the extra requirement I added is basically nonsense, since the
 only issue is what instructions the compiler is emitting. So if compiler
 thinks the alignment is natural and combines the writes -- ok. If the
 compiler thinks the alignment is off and doesn't combine the writes --
 also ok.
 
 Yes, I think I can agree that the only real problem is gcc thinking the
 store or load needs splitting.
 

That seems much saner, and yes, that applies to any architecture which
needs unaligned references.  Now, if the references are *actually*
unaligned they can end up being split even on x86, especially if they
cross cache line boundaries.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin

Add a short member for proper alignment and one will probably pop out.

Sent from my tablet, pardon any formatting problems.

 On Sep 8, 2014, at 19:56, James Bottomley 
 james.bottom...@hansenpartnership.com wrote:
 
 On Mon, 2014-09-08 at 19:30 -0400, Peter Hurley wrote:
 On 09/08/2014 01:50 AM, James Bottomley wrote:
 Two things: I think that gcc has given up on combining adjacent writes,
 perhaps because unaligned writes on some arches are prohibitive, so
 whatever minor optimization was believed to be gained was quickly lost,
 multi-fold. (Although this might not be the reason since one would
 expect that gcc would know the alignment of the promoted store).
 
 Um, gcc assumes architecturally correct alignment; that's why it pads
 structures.  Only when accessing packed structures will it use the
 lowest unit load/store.
 
 if you have a struct { char a, char b }; and load first a then b with a
 constant gcc will obligingly optimise to a short store.
 
 Um, please re-look at the test above. The exact test you describe is
 coded above and compiled with gcc 4.6.3 cross-compiler for parisc using
 the kernel compiler options.
 
 In the generated code, please note the _absence_ of a combined write
 to two adjacent byte stores.
 
 So one version of gcc doesn't do it.  Others do because I've been
 surprised seeing it in assembly.
 
 But additionally, even if gcc combines adjacent writes _that are part
 of the program flow_ then I believe the situation is no worse than
 would otherwise exist.
 
 For instance, given the following:
 
 struct x {
spinlock_t lock;
long a;
byte b;
byte c;
 };
 
 void locked_store_b(struct x *p)
 {
spin_lock(p-lock);
p-b = 1;
spin_unlock(p-lock);
p-c = 2;
 }
 
 Granted, the author probably expects ordered writes of
STORE B
STORE C
 but that's not guaranteed because there is no memory barrier
 ordering B before C.
 
 Yes, there is: loads and stores may not migrate into or out of critical
 sections.
 
 That's a common misconception.
 
 The processor is free to re-order this to:
 
STORE C
STORE B
UNLOCK
 
 That's because the unlock() only guarantees that:
 
 Stores before the unlock in program order are guaranteed to complete
 before the unlock completes. Stores after the unlock _may_ complete
 before the unlock completes.
 
 My point was that even if compiler barriers had the same semantics
 as memory barriers, the situation would be no worse. That is, code
 that is sensitive to memory barriers (like the example I gave above)
 would merely have the same fragility with one-way compiler barriers
 (with respect to the compiler combining writes).
 
 That's what I meant by no worse than would otherwise exist.
 
 Actually, that's not correct.  This is actually deja vu with me on the
 other side of the argument.  When we first did spinlocks on PA, I argued
 as you did: lock only a barrier for code after and unlock for code
 before.  The failing case is that you can have a critical section which
 performs an atomically required operation and a following unit which
 depends on it being performed.  If you begin the following unit before
 the atomic requirement, you may end up losing.  It turns out this kind
 of pattern is inherent in a lot of mail box device drivers: you need to
 set up the mailbox atomically then poke it.  Setup is usually atomic,
 deciding which mailbox to prime and actually poking it is in the
 following unit.  Priming often involves an I/O bus transaction and if
 you poke before priming, you get a misfire.
 
 I see no problem with gcc write-combining in the absence of
 memory barriers (as long as alignment is still respected,
 ie., the size-promoted write is still naturally aligned). The
 combined write is still atomic.
 
 Actual alignment is pretty irrelevant.  That's why all architectures
 which require alignment also have to implement misaligned traps ... this
 is a fundamental requirement of the networking code, for instance.
 
 The main problem is gcc thinking there's a misalignment (or a packed
 structure).  That causes splitting of loads and stores and that destroys
 atomicity.
 
 Yeah, the extra requirement I added is basically nonsense, since the
 only issue is what instructions the compiler is emitting. So if compiler
 thinks the alignment is natural and combines the writes -- ok. If the
 compiler thinks the alignment is off and doesn't combine the writes --
 also ok.
 
 Yes, I think I can agree that the only real problem is gcc thinking the
 store or load needs splitting.
 
 James
 
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

43 matches

Mail list logo