Re: [RFC PATCH 0/6] Change vendor prefix for Intersil Corporation

2014-09-07 Thread Jason Cooper
On Wed, Aug 20, 2014 at 08:16:43AM -0400, Jason Cooper wrote:
 Philipp,
 
 On Wed, Aug 20, 2014 at 10:42:58AM +0200, Philipp Zabel wrote:
  Hi,
  
  currently there is a wild mixture of isl, isil, and intersil
  compatibles in the kernel. I figure at this point it is still
  possible to change this to use isil everywhere without too
  much pain, but it might be preferred to keep the already
  documented isl prefix, even though it doesn't follow the
  convention to use the NASDAQ symbol where available.
  
  The current users of the isil prefix are the following drivers
  and device trees, so we could just as well change those instead:
  arch/arm/boot/dts/tegra20-seaboard.dts
  arch/arm/boot/dts/tegra20-ventana.dts
  arch/arm/boot/dts/tegra30-cardhu.dtsi
  arch/powerpc/boot/dts/p1022rdk.dts
  drivers/staging/iio/light/isl29018.c
  drivers/staging/iio/light/isl29028.c
  
  regards
  Philipp
  
  Philipp Zabel (6):
of: Change vendor prefix for Intersil Corporation to isil
Documentation: Add isl1208 and isl12022 to trivial-devices list
ARM: mvebu: Change vendor prefix for Intersil Corporation to isil
powerpc/85xx: Change vendor prefix for Intersil Corporation to isil
rtc: rtc-isl12022: Change vendor prefix for Intersil Corporation to
  isil
rtc: rtc-isl12057: Change vendor prefix for Intersil Corporation to
  isil
  
   Documentation/devicetree/bindings/i2c/trivial-devices.txt | 4 +++-
   Documentation/devicetree/bindings/vendor-prefixes.txt | 2 +-
   arch/arm/boot/dts/armada-370-netgear-rn102.dts| 2 +-
   arch/arm/boot/dts/armada-370-netgear-rn104.dts| 2 +-
   arch/arm/boot/dts/armada-xp-netgear-rn2120.dts| 2 +-
   arch/powerpc/boot/dts/ppa8548.dts | 2 +-
   drivers/rtc/rtc-isl12022.c| 3 ++-
   drivers/rtc/rtc-isl12057.c| 3 ++-
   8 files changed, 12 insertions(+), 8 deletions(-)
 
 This looks good overall.  My only nit is that I'd like to see the legacy
 name(s) captured in the binding docs.
 
 I'll take the Armada dts changes through mvebu/arm-soc after a few days.

Just to dot our i's and cross our t's, would you mind sending out a
non-rfc version of this series?

thx,

Jason.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-07 Thread Paul E. McKenney
On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
 On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
  On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
   Hi James,
   
   On 09/04/2014 10:11 PM, James Bottomley wrote:
On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
+And there are anti-guarantees:
+
+ (*) These guarantees do not apply to bitfields, because compilers 
often
+ generate code to modify these using non-atomic read-modify-write
+ sequences.  Do not attempt to use bitfields to synchronize 
parallel
+ algorithms.
+
+ (*) Even in cases where bitfields are protected by locks, all fields
+ in a given bitfield must be protected by one lock.  If two fields
+ in a given bitfield are protected by different locks, the 
compiler's
+ non-atomic read-modify-write sequences can cause an update to one
+ field to corrupt the value of an adjacent field.
+
+ (*) These guarantees apply only to properly aligned and sized scalar
+ variables.  Properly sized currently means int and long,
+ because some CPU families do not support loads and stores of
+ other sizes.  (Some CPU families is currently believed to
+ be only Alpha 21064.  If this is actually the case, a different
+ non-guarantee is likely to be formulated.)

This is a bit unclear.  Presumably you're talking about definiteness of
the outcome (as in what's seen after multiple stores to the same
variable).
   
   No, the last conditions refers to adjacent byte stores from different
   cpu contexts (either interrupt or SMP).
   
The guarantees are only for natural width on Parisc as well,
so you would get a mess if you did byte stores to adjacent memory
locations.
   
   For a simple test like:
   
   struct x {
 long a;
 char b;
 char c;
 char d;
 char e;
   };
   
   void store_bc(struct x *p) {
 p-b = 1;
 p-c = 2;
   }
   
   on parisc, gcc generates separate byte stores
   
   void store_bc(struct x *p) {
  0: 34 1c 00 02 ldi 1,ret0
  4: 0f 5c 12 08 stb ret0,4(r26)
  8: 34 1c 00 04 ldi 2,ret0
  c: e8 40 c0 00 bv r0(rp)
 10: 0f 5c 12 0a stb ret0,5(r26)
   
   which appears to confirm that on parisc adjacent byte data
   is safe from corruption by concurrent cpu updates; that is,
   
   CPU 0| CPU 1
|
   p-b = 1 | p-c = 2
|
   
   will result in p-b == 1  p-c == 2 (assume both values
   were 0 before the call to store_bc()).
  
  What Peter said.  I would ask for suggestions for better wording, but
  I would much rather be able to say that single-byte reads and writes
  are atomic and that aligned-short reads and writes are also atomic.
  
  Thus far, it looks like we lose only very old Alpha systems, so unless
  I hear otherwise, I update my patch to outlaw these very old systems.
 
 This isn't universally true according to the architecture manual.  The
 PARISC CPU can make byte to long word stores atomic against the memory
 bus but not against the I/O bus for instance.  Atomicity is a property
 of the underlying substrate, not of the CPU.  Implying that atomicity is
 a CPU property is incorrect.

OK, fair point.

But are there in-use-for-Linux PARISC memory fabrics (for normal memory,
not I/O) that do not support single-byte and double-byte stores?

Thanx, Paul

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/2 v5] powerpc/kvm: support to handle sw breakpoint

2014-09-07 Thread Madhavan Srinivasan
This patchset adds ppc64 server side support for software breakpoint
and extends the use of illegal instruction as software
breakpoint across ppc platform.

Patch 1, adds kernel side support for software breakpoint.
Design is that, by using an illegal instruction, we trap to
hypervisor via Emulation Assistance interrupt, where we check
for the illegal instruction and accordingly we return to Host
or Guest. Patch also adds support for software breakpoint
in PR KVM.

Patch 2,extends the use of illegal instruction as software
breakpoint instruction across the ppc platform. Patch extends
booke program interrupt code to support software breakpoint.

Patch 2 is only compile tested. Will really help if
someone can try it out and let me know comments.

Changes v4-v5:
 Made changes to code comments and commit messages
 Added debugging active checks for illegal instr comparison
 Added debug instruction check in emulate code
 Extended SW breakpoint to booke

Changes v3-v4:
 Made changes to code comments and removed #define of zero opcode
 Added a new function to handle the debug instruction emulation in book3s_hv
 Rebased the code to latest upstream source.

Changes v2-v3:
 Changed the debug instructions. Using the all zero opcode in the instruction 
word
  as illegal instruction as mentioned in Power ISA instead of ABS
 Removed reg updated in emulation assist and added a call to
  kvmppc_emulate_instruction for reg update.

Changes v1-v2:

 Moved the debug instruction #def to kvm_book3s.h. This way PR_KVM can also 
share it.
 Added code to use KVM get one reg infrastructure to get debug opcode.
 Updated emulate.c to include emulation of debug instruction incase of PR_KVM.
 Made changes to commit message.

Madhavan Srinivasan (2):
  powerpc/kvm: support to handle sw breakpoint
  powerpc/kvm: common sw breakpoint instr across ppc

 arch/powerpc/include/asm/kvm_ppc.h |  6 ++
 arch/powerpc/kvm/book3s.c  |  3 ++-
 arch/powerpc/kvm/book3s_hv.c   | 41 ++
 arch/powerpc/kvm/book3s_pr.c   |  3 +++
 arch/powerpc/kvm/booke.c   | 18 +++--
 arch/powerpc/kvm/emulate.c | 18 +
 6 files changed, 82 insertions(+), 7 deletions(-)

-- 
1.7.11.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2 v5] powerpc/kvm: support to handle sw breakpoint

2014-09-07 Thread Madhavan Srinivasan
This patch adds kernel side support for software breakpoint.
Design is that, by using an illegal instruction, we trap to hypervisor
via Emulation Assistance interrupt, where we check for the illegal instruction
and accordingly we return to Host or Guest. Patch also adds support for
software breakpoint in PR KVM.

Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/kvm_ppc.h |  6 ++
 arch/powerpc/kvm/book3s.c  |  3 ++-
 arch/powerpc/kvm/book3s_hv.c   | 41 ++
 arch/powerpc/kvm/book3s_pr.c   |  3 +++
 arch/powerpc/kvm/emulate.c | 18 +
 5 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index fb86a22..dd83c9a 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -38,6 +38,12 @@
 #include asm/paca.h
 #endif
 
+/*
+ * KVMPPC_INST_SW_BREAKPOINT is debug Instruction
+ * for supporting software breakpoint.
+ */
+#define KVMPPC_INST_SW_BREAKPOINT  0x0000
+
 enum emulation_result {
EMULATE_DONE, /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with MMIO request */
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index dd03f6b..00e9c9f 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -778,7 +778,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg)
 {
-   return -EINVAL;
+   vcpu-guest_debug = dbg-control;
+   return 0;
 }
 
 void kvmppc_decrementer_func(unsigned long data)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 27cced9..3a2414c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -725,6 +725,30 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd)
return kvmppc_hcall_impl_hv_realmode(cmd);
 }
 
+static int kvmppc_emulate_debug_inst(struct kvm_run *run,
+   struct kvm_vcpu *vcpu)
+{
+   u32 last_inst;
+
+   if(kvmppc_get_last_inst(vcpu, INST_GENERIC, last_inst) !=
+   EMULATE_DONE) {
+   /*
+* Fetch failed, so return to guest and
+* try executing it again.
+*/
+   return RESUME_GUEST;
+   } else {
+   if (last_inst == KVMPPC_INST_SW_BREAKPOINT) {
+   run-exit_reason = KVM_EXIT_DEBUG;
+   run-debug.arch.address = kvmppc_get_pc(vcpu);
+   return RESUME_HOST;
+   } else {
+   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+   return RESUME_GUEST;
+   }
+   }
+}
+
 static int kvmppc_handle_exit_hv(struct kvm_run *run, struct kvm_vcpu *vcpu,
 struct task_struct *tsk)
 {
@@ -807,12 +831,18 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
break;
/*
 * This occurs if the guest executes an illegal instruction.
-* We just generate a program interrupt to the guest, since
-* we don't emulate any guest instructions at this stage.
+* If the guest debug is disabled, generate a program interrupt
+* to the guest. If guest debug is enabled, we need to check
+* whether the instruction is a software breakpoint instruction.
+* Accordingly return to Guest or Host.
 */
case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
-   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
-   r = RESUME_GUEST;
+   if(vcpu-guest_debug  KVM_GUESTDBG_USE_SW_BP) {
+   r = kvmppc_emulate_debug_inst(run, vcpu);
+   } else {
+   kvmppc_core_queue_program(vcpu, SRR1_PROGILL);
+   r = RESUME_GUEST;
+   }
break;
/*
 * This occurs if the guest (kernel or userspace), does something that
@@ -922,6 +952,9 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 
id,
long int i;
 
switch (id) {
+   case KVM_REG_PPC_DEBUG_INST:
+   *val = get_reg_val(id, KVMPPC_INST_SW_BREAKPOINT);
+   break;
case KVM_REG_PPC_HIOR:
*val = get_reg_val(id, 0);
break;
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index faffb27..6d73708 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1319,6 +1319,9 @@ static int kvmppc_get_one_reg_pr(struct kvm_vcpu *vcpu, 
u64 id,
int r = 0;
 
switch (id) {
+   case KVM_REG_PPC_DEBUG_INST:
+   *val = get_reg_val(id, KVMPPC_INST_SW_BREAKPOINT);
+   

[PATCH 2/2 v5] powerpc/kvm: common sw breakpoint instr across ppc

2014-09-07 Thread Madhavan Srinivasan
This patch extends the use of illegal instruction as software
breakpoint instruction across the ppc platform. Patch extends
booke program interrupt code to support software breakpoint.

Signed-off-by: Madhavan Srinivasan ma...@linux.vnet.ibm.com
---

Patch is only compile tested. Will really help if
someone can try it out and let me know comments.

 arch/powerpc/kvm/booke.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index b4c89fa..1b84853 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -870,6 +870,10 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
case BOOKE_INTERRUPT_HV_PRIV:
emulated = kvmppc_get_last_inst(vcpu, false, last_inst);
break;
+   case BOOKE_INTERRUPT_PROGRAM:
+   /*SW breakpoints arrive as illegal instructions on HV */
+   emulated = kvmppc_get_last_inst(vcpu, false, last_inst);
+   break;
default:
break;
}
@@ -947,7 +951,17 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
break;
 
case BOOKE_INTERRUPT_PROGRAM:
-   if (vcpu-arch.shared-msr  (MSR_PR | MSR_GS)) {
+   if ((vcpu-arch.shared-msr  (MSR_PR | MSR_GS)) 
+   (last_inst == KVMPPC_INST_SW_BREAKPOINT)) {
+   /*
+* We are here because of an SW breakpoint instr,
+* so lets return to host to handle.
+*/
+   r = kvmppc_handle_debug(run, vcpu);
+   run-exit_reason = KVM_EXIT_DEBUG;
+   kvmppc_account_exit(vcpu, DEBUG_EXITS);
+   break;
+   } else {
/*
 * Program traps generated by user-level software must
 * be handled by the guest kernel.
@@ -1505,7 +1519,7 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, 
struct kvm_one_reg *reg)
val = get_reg_val(reg-id, vcpu-arch.tsr);
break;
case KVM_REG_PPC_DEBUG_INST:
-   val = get_reg_val(reg-id, KVMPPC_INST_EHPRIV_DEBUG);
+   val = get_reg_val(reg-id, KVMPPC_INST_SW_BREAKPOINT);
break;
case KVM_REG_PPC_VRSAVE:
val = get_reg_val(reg-id, vcpu-arch.vrsave);
-- 
1.7.11.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-07 Thread James Bottomley
On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
 On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
  On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
   On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
Hi James,

On 09/04/2014 10:11 PM, James Bottomley wrote:
 On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
 +And there are anti-guarantees:
 +
 + (*) These guarantees do not apply to bitfields, because compilers 
 often
 + generate code to modify these using non-atomic 
 read-modify-write
 + sequences.  Do not attempt to use bitfields to synchronize 
 parallel
 + algorithms.
 +
 + (*) Even in cases where bitfields are protected by locks, all 
 fields
 + in a given bitfield must be protected by one lock.  If two 
 fields
 + in a given bitfield are protected by different locks, the 
 compiler's
 + non-atomic read-modify-write sequences can cause an update to 
 one
 + field to corrupt the value of an adjacent field.
 +
 + (*) These guarantees apply only to properly aligned and sized 
 scalar
 + variables.  Properly sized currently means int and long,
 + because some CPU families do not support loads and stores of
 + other sizes.  (Some CPU families is currently believed to
 + be only Alpha 21064.  If this is actually the case, a different
 + non-guarantee is likely to be formulated.)
 
 This is a bit unclear.  Presumably you're talking about definiteness 
 of
 the outcome (as in what's seen after multiple stores to the same
 variable).

No, the last conditions refers to adjacent byte stores from different
cpu contexts (either interrupt or SMP).

 The guarantees are only for natural width on Parisc as well,
 so you would get a mess if you did byte stores to adjacent memory
 locations.

For a simple test like:

struct x {
long a;
char b;
char c;
char d;
char e;
};

void store_bc(struct x *p) {
p-b = 1;
p-c = 2;
}

on parisc, gcc generates separate byte stores

void store_bc(struct x *p) {
   0:   34 1c 00 02 ldi 1,ret0
   4:   0f 5c 12 08 stb ret0,4(r26)
   8:   34 1c 00 04 ldi 2,ret0
   c:   e8 40 c0 00 bv r0(rp)
  10:   0f 5c 12 0a stb ret0,5(r26)

which appears to confirm that on parisc adjacent byte data
is safe from corruption by concurrent cpu updates; that is,

CPU 0| CPU 1
 |
p-b = 1 | p-c = 2
 |

will result in p-b == 1  p-c == 2 (assume both values
were 0 before the call to store_bc()).
   
   What Peter said.  I would ask for suggestions for better wording, but
   I would much rather be able to say that single-byte reads and writes
   are atomic and that aligned-short reads and writes are also atomic.
   
   Thus far, it looks like we lose only very old Alpha systems, so unless
   I hear otherwise, I update my patch to outlaw these very old systems.
  
  This isn't universally true according to the architecture manual.  The
  PARISC CPU can make byte to long word stores atomic against the memory
  bus but not against the I/O bus for instance.  Atomicity is a property
  of the underlying substrate, not of the CPU.  Implying that atomicity is
  a CPU property is incorrect.
 
 OK, fair point.
 
 But are there in-use-for-Linux PARISC memory fabrics (for normal memory,
 not I/O) that do not support single-byte and double-byte stores?

For aligned access, I believe that's always the case for the memory bus
(on both 32 and 64 bit systems).  However, it only applies to machine
instruction loads and stores of the same width..  If you mix the widths
on the loads and stores, all bets are off.  That means you have to
beware of the gcc penchant for coalescing loads and stores: if it sees
two adjacent byte stores it can coalesce them into a short store
instead ... that screws up the atomicity guarantees.

James


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] PCI/MSI: Remove arch_msi_check_device()

2014-09-07 Thread Alexander Gordeev
On Fri, Sep 05, 2014 at 03:27:49PM -0600, Bjorn Helgaas wrote:
  I applied these (with Michael's ack on the first, and v2 of the second) to
  pci/msi for v3.18, thanks!

Hi Bjorn,

I resent a series with updates that fix kbuild robot errors.
Hopefully, the rebase for pci/msi would not cause trouble for anyone.

 Oh, I forgot -- if you'd rather take the first one through the PPC
 tree, you can do that and I can merge the second one later.  Let me
 know if you want to do that.

Nah, your treee is just fine.

Thanks!

 Bjorn

-- 
Regards,
Alexander Gordeev
agord...@redhat.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-07 Thread Peter Hurley
On 09/07/2014 03:04 PM, James Bottomley wrote:
 On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
 On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
 On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
 On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
 Hi James,

 On 09/04/2014 10:11 PM, James Bottomley wrote:
 On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
 +And there are anti-guarantees:
 +
 + (*) These guarantees do not apply to bitfields, because compilers 
 often
 + generate code to modify these using non-atomic read-modify-write
 + sequences.  Do not attempt to use bitfields to synchronize 
 parallel
 + algorithms.
 +
 + (*) Even in cases where bitfields are protected by locks, all fields
 + in a given bitfield must be protected by one lock.  If two fields
 + in a given bitfield are protected by different locks, the 
 compiler's
 + non-atomic read-modify-write sequences can cause an update to one
 + field to corrupt the value of an adjacent field.
 +
 + (*) These guarantees apply only to properly aligned and sized scalar
 + variables.  Properly sized currently means int and long,
 + because some CPU families do not support loads and stores of
 + other sizes.  (Some CPU families is currently believed to
 + be only Alpha 21064.  If this is actually the case, a different
 + non-guarantee is likely to be formulated.)

 This is a bit unclear.  Presumably you're talking about definiteness of
 the outcome (as in what's seen after multiple stores to the same
 variable).

 No, the last conditions refers to adjacent byte stores from different
 cpu contexts (either interrupt or SMP).

 The guarantees are only for natural width on Parisc as well,
 so you would get a mess if you did byte stores to adjacent memory
 locations.

 For a simple test like:

 struct x {
   long a;
   char b;
   char c;
   char d;
   char e;
 };

 void store_bc(struct x *p) {
   p-b = 1;
   p-c = 2;
 }

 on parisc, gcc generates separate byte stores

 void store_bc(struct x *p) {
0: 34 1c 00 02 ldi 1,ret0
4: 0f 5c 12 08 stb ret0,4(r26)
8: 34 1c 00 04 ldi 2,ret0
c: e8 40 c0 00 bv r0(rp)
   10: 0f 5c 12 0a stb ret0,5(r26)

 which appears to confirm that on parisc adjacent byte data
 is safe from corruption by concurrent cpu updates; that is,

 CPU 0| CPU 1
  |
 p-b = 1 | p-c = 2
  |

 will result in p-b == 1  p-c == 2 (assume both values
 were 0 before the call to store_bc()).

 What Peter said.  I would ask for suggestions for better wording, but
 I would much rather be able to say that single-byte reads and writes
 are atomic and that aligned-short reads and writes are also atomic.

 Thus far, it looks like we lose only very old Alpha systems, so unless
 I hear otherwise, I update my patch to outlaw these very old systems.

 This isn't universally true according to the architecture manual.  The
 PARISC CPU can make byte to long word stores atomic against the memory
 bus but not against the I/O bus for instance.  Atomicity is a property
 of the underlying substrate, not of the CPU.  Implying that atomicity is
 a CPU property is incorrect.

To go back to this briefly, while it's true that atomicity is not
a CPU property, atomicity is not possible if the CPU is not cooperating.

 OK, fair point.

 But are there in-use-for-Linux PARISC memory fabrics (for normal memory,
 not I/O) that do not support single-byte and double-byte stores?
 
 For aligned access, I believe that's always the case for the memory bus
 (on both 32 and 64 bit systems).  However, it only applies to machine
 instruction loads and stores of the same width..  If you mix the widths
 on the loads and stores, all bets are off.  That means you have to
 beware of the gcc penchant for coalescing loads and stores: if it sees
 two adjacent byte stores it can coalesce them into a short store
 instead ... that screws up the atomicity guarantees.

Two things: I think that gcc has given up on combining adjacent writes,
perhaps because unaligned writes on some arches are prohibitive, so
whatever minor optimization was believed to be gained was quickly lost,
multi-fold. (Although this might not be the reason since one would
expect that gcc would know the alignment of the promoted store).

But additionally, even if gcc combines adjacent writes _that are part
of the program flow_ then I believe the situation is no worse than
would otherwise exist.

For instance, given the following:

struct x {
spinlock_t lock;
long a;
byte b;
byte c;
};

void locked_store_b(struct x *p)
{
spin_lock(p-lock);
p-b = 1;
spin_unlock(p-lock);
p-c = 2;
}

Granted, the author probably expects ordered writes of
STORE B
STORE C
but that's not guaranteed because there is no memory barrier
ordering B before C.

I 

Re: bit fields data tearing

2014-09-07 Thread Paul E. McKenney
On Sun, Sep 07, 2014 at 12:04:47PM -0700, James Bottomley wrote:
 On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
  On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
   On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
 Hi James,
 
 On 09/04/2014 10:11 PM, James Bottomley wrote:
  On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
  +And there are anti-guarantees:
  +
  + (*) These guarantees do not apply to bitfields, because 
  compilers often
  + generate code to modify these using non-atomic 
  read-modify-write
  + sequences.  Do not attempt to use bitfields to synchronize 
  parallel
  + algorithms.
  +
  + (*) Even in cases where bitfields are protected by locks, all 
  fields
  + in a given bitfield must be protected by one lock.  If two 
  fields
  + in a given bitfield are protected by different locks, the 
  compiler's
  + non-atomic read-modify-write sequences can cause an update 
  to one
  + field to corrupt the value of an adjacent field.
  +
  + (*) These guarantees apply only to properly aligned and sized 
  scalar
  + variables.  Properly sized currently means int and 
  long,
  + because some CPU families do not support loads and stores of
  + other sizes.  (Some CPU families is currently believed to
  + be only Alpha 21064.  If this is actually the case, a 
  different
  + non-guarantee is likely to be formulated.)
  
  This is a bit unclear.  Presumably you're talking about 
  definiteness of
  the outcome (as in what's seen after multiple stores to the same
  variable).
 
 No, the last conditions refers to adjacent byte stores from different
 cpu contexts (either interrupt or SMP).
 
  The guarantees are only for natural width on Parisc as well,
  so you would get a mess if you did byte stores to adjacent memory
  locations.
 
 For a simple test like:
 
 struct x {
   long a;
   char b;
   char c;
   char d;
   char e;
 };
 
 void store_bc(struct x *p) {
   p-b = 1;
   p-c = 2;
 }
 
 on parisc, gcc generates separate byte stores
 
 void store_bc(struct x *p) {
0: 34 1c 00 02 ldi 1,ret0
4: 0f 5c 12 08 stb ret0,4(r26)
8: 34 1c 00 04 ldi 2,ret0
c: e8 40 c0 00 bv r0(rp)
   10: 0f 5c 12 0a stb ret0,5(r26)
 
 which appears to confirm that on parisc adjacent byte data
 is safe from corruption by concurrent cpu updates; that is,
 
 CPU 0| CPU 1
  |
 p-b = 1 | p-c = 2
  |
 
 will result in p-b == 1  p-c == 2 (assume both values
 were 0 before the call to store_bc()).

What Peter said.  I would ask for suggestions for better wording, but
I would much rather be able to say that single-byte reads and writes
are atomic and that aligned-short reads and writes are also atomic.

Thus far, it looks like we lose only very old Alpha systems, so unless
I hear otherwise, I update my patch to outlaw these very old systems.
   
   This isn't universally true according to the architecture manual.  The
   PARISC CPU can make byte to long word stores atomic against the memory
   bus but not against the I/O bus for instance.  Atomicity is a property
   of the underlying substrate, not of the CPU.  Implying that atomicity is
   a CPU property is incorrect.
  
  OK, fair point.
  
  But are there in-use-for-Linux PARISC memory fabrics (for normal memory,
  not I/O) that do not support single-byte and double-byte stores?
 
 For aligned access, I believe that's always the case for the memory bus
 (on both 32 and 64 bit systems).  However, it only applies to machine
 instruction loads and stores of the same width..  If you mix the widths
 on the loads and stores, all bets are off.  That means you have to
 beware of the gcc penchant for coalescing loads and stores: if it sees
 two adjacent byte stores it can coalesce them into a short store
 instead ... that screws up the atomicity guarantees.

OK, that means that to make PARISC work reliably, we need to use
ACCESS_ONCE() for loads and stores that could have racing accesses.
If I understand correctly, this will -not- be needed for code guarded
by locks, even with Peter's examples.

So if we have something like this:

struct foo {
char a;
char b;
};
struct foo *fp;

then this code would be bad:

fp-a = 1;
fp-b = 2;

The reason is (as you say) that GCC would be happy to store 0x0102
(or vice versa, depending on endianness) to the pair.  We instead
need:


Re: bit fields data tearing

2014-09-07 Thread H. Peter Anvin
I'm confused why storing 0x0102 would be a problem.  I think gcc does that even 
on other cpus.

More atomicity can't hurt, can it?

On September 7, 2014 4:00:19 PM PDT, Paul E. McKenney 
paul...@linux.vnet.ibm.com wrote:
On Sun, Sep 07, 2014 at 12:04:47PM -0700, James Bottomley wrote:
 On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
  On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
   On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
 Hi James,
 
 On 09/04/2014 10:11 PM, James Bottomley wrote:
  On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
  +And there are anti-guarantees:
  +
  + (*) These guarantees do not apply to bitfields, because
compilers often
  + generate code to modify these using non-atomic
read-modify-write
  + sequences.  Do not attempt to use bitfields to
synchronize parallel
  + algorithms.
  +
  + (*) Even in cases where bitfields are protected by
locks, all fields
  + in a given bitfield must be protected by one lock. 
If two fields
  + in a given bitfield are protected by different
locks, the compiler's
  + non-atomic read-modify-write sequences can cause an
update to one
  + field to corrupt the value of an adjacent field.
  +
  + (*) These guarantees apply only to properly aligned and
sized scalar
  + variables.  Properly sized currently means int
and long,
  + because some CPU families do not support loads and
stores of
  + other sizes.  (Some CPU families is currently
believed to
  + be only Alpha 21064.  If this is actually the case,
a different
  + non-guarantee is likely to be formulated.)
  
  This is a bit unclear.  Presumably you're talking about
definiteness of
  the outcome (as in what's seen after multiple stores to the
same
  variable).
 
 No, the last conditions refers to adjacent byte stores from
different
 cpu contexts (either interrupt or SMP).
 
  The guarantees are only for natural width on Parisc as
well,
  so you would get a mess if you did byte stores to adjacent
memory
  locations.
 
 For a simple test like:
 
 struct x {
  long a;
  char b;
  char c;
  char d;
  char e;
 };
 
 void store_bc(struct x *p) {
  p-b = 1;
  p-c = 2;
 }
 
 on parisc, gcc generates separate byte stores
 
 void store_bc(struct x *p) {
0:34 1c 00 02 ldi 1,ret0
4:0f 5c 12 08 stb ret0,4(r26)
8:34 1c 00 04 ldi 2,ret0
c:e8 40 c0 00 bv r0(rp)
   10:0f 5c 12 0a stb ret0,5(r26)
 
 which appears to confirm that on parisc adjacent byte data
 is safe from corruption by concurrent cpu updates; that is,
 
 CPU 0| CPU 1
  |
 p-b = 1 | p-c = 2
  |
 
 will result in p-b == 1  p-c == 2 (assume both values
 were 0 before the call to store_bc()).

What Peter said.  I would ask for suggestions for better
wording, but
I would much rather be able to say that single-byte reads and
writes
are atomic and that aligned-short reads and writes are also
atomic.

Thus far, it looks like we lose only very old Alpha systems, so
unless
I hear otherwise, I update my patch to outlaw these very old
systems.
   
   This isn't universally true according to the architecture manual.
 The
   PARISC CPU can make byte to long word stores atomic against the
memory
   bus but not against the I/O bus for instance.  Atomicity is a
property
   of the underlying substrate, not of the CPU.  Implying that
atomicity is
   a CPU property is incorrect.
  
  OK, fair point.
  
  But are there in-use-for-Linux PARISC memory fabrics (for normal
memory,
  not I/O) that do not support single-byte and double-byte stores?
 
 For aligned access, I believe that's always the case for the memory
bus
 (on both 32 and 64 bit systems).  However, it only applies to machine
 instruction loads and stores of the same width..  If you mix the
widths
 on the loads and stores, all bets are off.  That means you have to
 beware of the gcc penchant for coalescing loads and stores: if it
sees
 two adjacent byte stores it can coalesce them into a short store
 instead ... that screws up the atomicity guarantees.

OK, that means that to make PARISC work reliably, we need to use
ACCESS_ONCE() for loads and stores that could have racing accesses.
If I understand correctly, this will -not- be needed for code guarded
by locks, even with Peter's examples.

So if we have something like this:

   struct foo {
   char a;
   char b;
   };
   struct foo *fp;

then this code would be bad:

   fp-a = 1;
   

Re: bit fields data tearing

2014-09-07 Thread Paul E. McKenney
On Sun, Sep 07, 2014 at 04:17:30PM -0700, H. Peter Anvin wrote:
 I'm confused why storing 0x0102 would be a problem.  I think gcc does that 
 even on other cpus.
 
 More atomicity can't hurt, can it?

I must defer to James for any additional details on why PARISC systems
don't provide atomicity for partially overlapping stores.  ;-)

Thanx, Paul

 On September 7, 2014 4:00:19 PM PDT, Paul E. McKenney 
 paul...@linux.vnet.ibm.com wrote:
 On Sun, Sep 07, 2014 at 12:04:47PM -0700, James Bottomley wrote:
  On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
   On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
 On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
  Hi James,
  
  On 09/04/2014 10:11 PM, James Bottomley wrote:
   On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
   +And there are anti-guarantees:
   +
   + (*) These guarantees do not apply to bitfields, because
 compilers often
   + generate code to modify these using non-atomic
 read-modify-write
   + sequences.  Do not attempt to use bitfields to
 synchronize parallel
   + algorithms.
   +
   + (*) Even in cases where bitfields are protected by
 locks, all fields
   + in a given bitfield must be protected by one lock. 
 If two fields
   + in a given bitfield are protected by different
 locks, the compiler's
   + non-atomic read-modify-write sequences can cause an
 update to one
   + field to corrupt the value of an adjacent field.
   +
   + (*) These guarantees apply only to properly aligned and
 sized scalar
   + variables.  Properly sized currently means int
 and long,
   + because some CPU families do not support loads and
 stores of
   + other sizes.  (Some CPU families is currently
 believed to
   + be only Alpha 21064.  If this is actually the case,
 a different
   + non-guarantee is likely to be formulated.)
   
   This is a bit unclear.  Presumably you're talking about
 definiteness of
   the outcome (as in what's seen after multiple stores to the
 same
   variable).
  
  No, the last conditions refers to adjacent byte stores from
 different
  cpu contexts (either interrupt or SMP).
  
   The guarantees are only for natural width on Parisc as
 well,
   so you would get a mess if you did byte stores to adjacent
 memory
   locations.
  
  For a simple test like:
  
  struct x {
 long a;
 char b;
 char c;
 char d;
 char e;
  };
  
  void store_bc(struct x *p) {
 p-b = 1;
 p-c = 2;
  }
  
  on parisc, gcc generates separate byte stores
  
  void store_bc(struct x *p) {
 0:  34 1c 00 02 ldi 1,ret0
 4:  0f 5c 12 08 stb ret0,4(r26)
 8:  34 1c 00 04 ldi 2,ret0
 c:  e8 40 c0 00 bv r0(rp)
10:  0f 5c 12 0a stb ret0,5(r26)
  
  which appears to confirm that on parisc adjacent byte data
  is safe from corruption by concurrent cpu updates; that is,
  
  CPU 0| CPU 1
   |
  p-b = 1 | p-c = 2
   |
  
  will result in p-b == 1  p-c == 2 (assume both values
  were 0 before the call to store_bc()).
 
 What Peter said.  I would ask for suggestions for better
 wording, but
 I would much rather be able to say that single-byte reads and
 writes
 are atomic and that aligned-short reads and writes are also
 atomic.
 
 Thus far, it looks like we lose only very old Alpha systems, so
 unless
 I hear otherwise, I update my patch to outlaw these very old
 systems.

This isn't universally true according to the architecture manual.
  The
PARISC CPU can make byte to long word stores atomic against the
 memory
bus but not against the I/O bus for instance.  Atomicity is a
 property
of the underlying substrate, not of the CPU.  Implying that
 atomicity is
a CPU property is incorrect.
   
   OK, fair point.
   
   But are there in-use-for-Linux PARISC memory fabrics (for normal
 memory,
   not I/O) that do not support single-byte and double-byte stores?
  
  For aligned access, I believe that's always the case for the memory
 bus
  (on both 32 and 64 bit systems).  However, it only applies to machine
  instruction loads and stores of the same width..  If you mix the
 widths
  on the loads and stores, all bets are off.  That means you have to
  beware of the gcc penchant for coalescing loads and stores: if it
 sees
  two adjacent byte stores it can coalesce them into a short store
  instead ... that screws up the atomicity guarantees.
 
 OK, that means that to make PARISC work reliably, we 

Re: bit fields data tearing

2014-09-07 Thread H. Peter Anvin
How many PARISC systems do we have that actually do real work on Linux?

On September 7, 2014 4:36:55 PM PDT, Paul E. McKenney 
paul...@linux.vnet.ibm.com wrote:
On Sun, Sep 07, 2014 at 04:17:30PM -0700, H. Peter Anvin wrote:
 I'm confused why storing 0x0102 would be a problem.  I think gcc does
that even on other cpus.
 
 More atomicity can't hurt, can it?

I must defer to James for any additional details on why PARISC systems
don't provide atomicity for partially overlapping stores.  ;-)

   Thanx, Paul

 On September 7, 2014 4:00:19 PM PDT, Paul E. McKenney
paul...@linux.vnet.ibm.com wrote:
 On Sun, Sep 07, 2014 at 12:04:47PM -0700, James Bottomley wrote:
  On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
   On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
 On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley
wrote:
  Hi James,
  
  On 09/04/2014 10:11 PM, James Bottomley wrote:
   On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney
wrote:
   +And there are anti-guarantees:
   +
   + (*) These guarantees do not apply to bitfields,
because
 compilers often
   + generate code to modify these using non-atomic
 read-modify-write
   + sequences.  Do not attempt to use bitfields to
 synchronize parallel
   + algorithms.
   +
   + (*) Even in cases where bitfields are protected by
 locks, all fields
   + in a given bitfield must be protected by one
lock. 
 If two fields
   + in a given bitfield are protected by different
 locks, the compiler's
   + non-atomic read-modify-write sequences can cause
an
 update to one
   + field to corrupt the value of an adjacent field.
   +
   + (*) These guarantees apply only to properly aligned
and
 sized scalar
   + variables.  Properly sized currently means
int
 and long,
   + because some CPU families do not support loads
and
 stores of
   + other sizes.  (Some CPU families is currently
 believed to
   + be only Alpha 21064.  If this is actually the
case,
 a different
   + non-guarantee is likely to be formulated.)
   
   This is a bit unclear.  Presumably you're talking about
 definiteness of
   the outcome (as in what's seen after multiple stores to
the
 same
   variable).
  
  No, the last conditions refers to adjacent byte stores
from
 different
  cpu contexts (either interrupt or SMP).
  
   The guarantees are only for natural width on Parisc as
 well,
   so you would get a mess if you did byte stores to
adjacent
 memory
   locations.
  
  For a simple test like:
  
  struct x {
long a;
char b;
char c;
char d;
char e;
  };
  
  void store_bc(struct x *p) {
p-b = 1;
p-c = 2;
  }
  
  on parisc, gcc generates separate byte stores
  
  void store_bc(struct x *p) {
 0: 34 1c 00 02 ldi 1,ret0
 4: 0f 5c 12 08 stb ret0,4(r26)
 8: 34 1c 00 04 ldi 2,ret0
 c: e8 40 c0 00 bv r0(rp)
10: 0f 5c 12 0a stb ret0,5(r26)
  
  which appears to confirm that on parisc adjacent byte data
  is safe from corruption by concurrent cpu updates; that
is,
  
  CPU 0| CPU 1
   |
  p-b = 1 | p-c = 2
   |
  
  will result in p-b == 1  p-c == 2 (assume both values
  were 0 before the call to store_bc()).
 
 What Peter said.  I would ask for suggestions for better
 wording, but
 I would much rather be able to say that single-byte reads
and
 writes
 are atomic and that aligned-short reads and writes are also
 atomic.
 
 Thus far, it looks like we lose only very old Alpha systems,
so
 unless
 I hear otherwise, I update my patch to outlaw these very old
 systems.

This isn't universally true according to the architecture
manual.
  The
PARISC CPU can make byte to long word stores atomic against
the
 memory
bus but not against the I/O bus for instance.  Atomicity is a
 property
of the underlying substrate, not of the CPU.  Implying that
 atomicity is
a CPU property is incorrect.
   
   OK, fair point.
   
   But are there in-use-for-Linux PARISC memory fabrics (for normal
 memory,
   not I/O) that do not support single-byte and double-byte stores?
  
  For aligned access, I believe that's always the case for the
memory
 bus
  (on both 32 and 64 bit systems).  However, it only applies to
machine
  instruction loads and stores of the same width..  If you mix the
 widths
  on the loads and stores, all bets are off.  That means you have to
  beware of the gcc penchant for coalescing loads and stores: if it
 sees
  two adjacent byte stores it can 

[PATCH v2 3/3] PCI/MSI: Remove arch_msi_check_device()

2014-09-07 Thread Alexander Gordeev
There are no archs that override arch_msi_check_device()
hook. Remove it as it is completely redundant.

If an arch would need to check MSI/MSI-X possibility for a
device it should make it within arch_setup_msi_irqs() hook.

Cc: Michael Ellerman m...@ellerman.id.au
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-...@vger.kernel.org
Signed-off-by: Alexander Gordeev agord...@redhat.com
---
 drivers/pci/msi.c   | 49 +
 include/linux/msi.h |  3 ---
 2 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 5a40516..6c2cc41 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -56,16 +56,6 @@ void __weak arch_teardown_msi_irq(unsigned int irq)
chip-teardown_irq(chip, irq);
 }
 
-int __weak arch_msi_check_device(struct pci_dev *dev, int nvec, int type)
-{
-   struct msi_chip *chip = dev-bus-msi;
-
-   if (!chip || !chip-check_device)
-   return 0;
-
-   return chip-check_device(chip, dev, nvec, type);
-}
-
 int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
struct msi_desc *entry;
@@ -806,22 +796,23 @@ out_free:
 }
 
 /**
- * pci_msi_check_device - check whether MSI may be enabled on a device
+ * pci_msi_supported - check whether MSI may be enabled on a device
  * @dev: pointer to the pci_dev data structure of MSI device function
  * @nvec: how many MSIs have been requested ?
- * @type: are we checking for MSI or MSI-X ?
  *
  * Look at global flags, the device itself, and its parent buses
  * to determine if MSI/-X are supported for the device. If MSI/-X is
  * supported return 0, else return an error code.
  **/
-static int pci_msi_check_device(struct pci_dev *dev, int nvec, int type)
+static int pci_msi_supported(struct pci_dev *dev, int nvec)
 {
struct pci_bus *bus;
-   int ret;
 
/* MSI must be globally enabled and supported by the device */
-   if (!pci_msi_enable || !dev || dev-no_msi)
+   if (!pci_msi_enable)
+   return -EINVAL;
+
+   if (!dev || dev-no_msi || dev-current_state != PCI_D0)
return -EINVAL;
 
/*
@@ -843,10 +834,6 @@ static int pci_msi_check_device(struct pci_dev *dev, int 
nvec, int type)
if (bus-bus_flags  PCI_BUS_FLAGS_NO_MSI)
return -EINVAL;
 
-   ret = arch_msi_check_device(dev, nvec, type);
-   if (ret)
-   return ret;
-
return 0;
 }
 
@@ -949,13 +936,13 @@ int pci_enable_msix(struct pci_dev *dev, struct 
msix_entry *entries, int nvec)
int status, nr_entries;
int i, j;
 
-   if (!entries || !dev-msix_cap || dev-current_state != PCI_D0)
-   return -EINVAL;
-
-   status = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSIX);
+   status = pci_msi_supported(dev, nvec);
if (status)
return status;
 
+   if (!entries)
+   return -EINVAL;
+
nr_entries = pci_msix_vec_count(dev);
if (nr_entries  0)
return nr_entries;
@@ -1062,8 +1049,9 @@ int pci_enable_msi_range(struct pci_dev *dev, int minvec, 
int maxvec)
int nvec;
int rc;
 
-   if (dev-current_state != PCI_D0)
-   return -EINVAL;
+   rc = pci_msi_supported(dev, minvec);
+   if (rc)
+   return rc;
 
WARN_ON(!!dev-msi_enabled);
 
@@ -1086,17 +1074,6 @@ int pci_enable_msi_range(struct pci_dev *dev, int 
minvec, int maxvec)
nvec = maxvec;
 
do {
-   rc = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSI);
-   if (rc  0) {
-   return rc;
-   } else if (rc  0) {
-   if (rc  minvec)
-   return -ENOSPC;
-   nvec = rc;
-   }
-   } while (rc);
-
-   do {
rc = msi_capability_init(dev, nvec);
if (rc  0) {
return rc;
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 8103f32..dbf7cc9 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -60,7 +60,6 @@ int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc 
*desc);
 void arch_teardown_msi_irq(unsigned int irq);
 int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
 void arch_teardown_msi_irqs(struct pci_dev *dev);
-int arch_msi_check_device(struct pci_dev* dev, int nvec, int type);
 void arch_restore_msi_irqs(struct pci_dev *dev);
 
 void default_teardown_msi_irqs(struct pci_dev *dev);
@@ -77,8 +76,6 @@ struct msi_chip {
int (*setup_irq)(struct msi_chip *chip, struct pci_dev *dev,
 struct msi_desc *desc);
void (*teardown_irq)(struct msi_chip *chip, unsigned int irq);
-   int (*check_device)(struct msi_chip *chip, struct pci_dev *dev,
-   int nvec, int type);
 };
 
 #endif /* LINUX_MSI_H */
-- 
1.9.3


[PATCH v2 1/3] PCI/MSI/PPC: Remove arch_msi_check_device()

2014-09-07 Thread Alexander Gordeev
Moving MSI checks from arch_msi_check_device() function to
arch_setup_msi_irqs() function makes code more compact and
allows removing unnecessary hook arch_msi_check_device()
from generic MSI code.

Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-...@vger.kernel.org
Cc: Michael Ellerman m...@ellerman.id.au
Signed-off-by: Alexander Gordeev agord...@redhat.com
---
 arch/powerpc/include/asm/machdep.h |  2 --
 arch/powerpc/kernel/msi.c  | 12 +-
 arch/powerpc/platforms/cell/axon_msi.c |  9 
 arch/powerpc/platforms/powernv/pci.c   | 19 ---
 arch/powerpc/platforms/pseries/msi.c   | 42 +-
 arch/powerpc/sysdev/fsl_msi.c  | 12 +++---
 arch/powerpc/sysdev/mpic_pasemi_msi.c  | 11 ++---
 arch/powerpc/sysdev/mpic_u3msi.c   | 28 +--
 arch/powerpc/sysdev/ppc4xx_hsta_msi.c  | 18 +--
 arch/powerpc/sysdev/ppc4xx_msi.c   | 19 +--
 10 files changed, 50 insertions(+), 122 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index b125cea..3af7216 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -136,8 +136,6 @@ struct machdep_calls {
int (*pci_setup_phb)(struct pci_controller *host);
 
 #ifdef CONFIG_PCI_MSI
-   int (*msi_check_device)(struct pci_dev* dev,
-   int nvec, int type);
int (*setup_msi_irqs)(struct pci_dev *dev,
  int nvec, int type);
void(*teardown_msi_irqs)(struct pci_dev *dev);
diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c
index 8bbc12d..71bd161 100644
--- a/arch/powerpc/kernel/msi.c
+++ b/arch/powerpc/kernel/msi.c
@@ -13,7 +13,7 @@
 
 #include asm/machdep.h
 
-int arch_msi_check_device(struct pci_dev* dev, int nvec, int type)
+int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
if (!ppc_md.setup_msi_irqs || !ppc_md.teardown_msi_irqs) {
pr_debug(msi: Platform doesn't provide MSI callbacks.\n);
@@ -24,16 +24,6 @@ int arch_msi_check_device(struct pci_dev* dev, int nvec, int 
type)
if (type == PCI_CAP_ID_MSI  nvec  1)
return 1;
 
-   if (ppc_md.msi_check_device) {
-   pr_debug(msi: Using platform check routine.\n);
-   return ppc_md.msi_check_device(dev, nvec, type);
-   }
-
-return 0;
-}
-
-int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-{
return ppc_md.setup_msi_irqs(dev, nvec, type);
 }
 
diff --git a/arch/powerpc/platforms/cell/axon_msi.c 
b/arch/powerpc/platforms/cell/axon_msi.c
index 85825b5..862b327 100644
--- a/arch/powerpc/platforms/cell/axon_msi.c
+++ b/arch/powerpc/platforms/cell/axon_msi.c
@@ -199,14 +199,6 @@ out_error:
return msic;
 }
 
-static int axon_msi_check_device(struct pci_dev *dev, int nvec, int type)
-{
-   if (!find_msi_translator(dev))
-   return -ENODEV;
-
-   return 0;
-}
-
 static int setup_msi_msg_address(struct pci_dev *dev, struct msi_msg *msg)
 {
struct device_node *dn;
@@ -416,7 +408,6 @@ static int axon_msi_probe(struct platform_device *device)
 
ppc_md.setup_msi_irqs = axon_msi_setup_msi_irqs;
ppc_md.teardown_msi_irqs = axon_msi_teardown_msi_irqs;
-   ppc_md.msi_check_device = axon_msi_check_device;
 
axon_msi_debug_setup(dn, msic);
 
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index b854b57..b45c492 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -46,29 +46,21 @@
 //#define cfg_dbg(fmt...)  printk(fmt)
 
 #ifdef CONFIG_PCI_MSI
-static int pnv_msi_check_device(struct pci_dev* pdev, int nvec, int type)
-{
-   struct pci_controller *hose = pci_bus_to_host(pdev-bus);
-   struct pnv_phb *phb = hose-private_data;
-   struct pci_dn *pdn = pci_get_pdn(pdev);
-
-   if (pdn  pdn-force_32bit_msi  !phb-msi32_support)
-   return -ENODEV;
-
-   return (phb  phb-msi_bmp.bitmap) ? 0 : -ENODEV;
-}
-
 static int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
struct pci_controller *hose = pci_bus_to_host(pdev-bus);
struct pnv_phb *phb = hose-private_data;
+   struct pci_dn *pdn = pci_get_pdn(pdev);
struct msi_desc *entry;
struct msi_msg msg;
int hwirq;
unsigned int virq;
int rc;
 
-   if (WARN_ON(!phb))
+   if (WARN_ON(!phb) || !phb-msi_bmp.bitmap)
+   return -ENODEV;
+
+   if (pdn  pdn-force_32bit_msi  !phb-msi32_support)
return -ENODEV;
 
list_for_each_entry(entry, pdev-msi_list, list) {
@@ -860,7 +852,6 @@ void __init pnv_pci_init(void)
 
/* Configure MSIs */
 #ifdef CONFIG_PCI_MSI
-   ppc_md.msi_check_device = pnv_msi_check_device;

[PATCH v2 0/3] PCI/MSI: Remove arch_msi_check_device()

2014-09-07 Thread Alexander Gordeev
Hello,

This is a cleanup effort to get rid of arch_msi_check_device() function.

I am sending v2 series, since kbuild for v1 reports compile errors on
ppc4xx and Armada 370. Still, I have not checked the fixes on these
platforms.

Changes since v1:
  - patch 1: 'pdev' undeclared compile error fixed on ppc4xx. I updated
 changelog and removed ACK from this patch in case there are
 any objections;

  - patch 2: 'struct msi_chip' has no 'check_device' error fixed on
 Armada 370 - armada_370_xp_check_msi_device() hook removed;

  - patch 3: msi_check_device() renamed to pci_msi_supported(). This is
 the very same patch I sent earlier;

Thanks!

Cc: Thomas Gleixner t...@linutronix.de
Cc: Jason Cooper ja...@lakedaemon.net
Cc: Michael Ellerman m...@ellerman.id.au
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-...@vger.kernel.org

Alexander Gordeev (3):
  patch 1 - PCI/MSI/PPC: Remove arch_msi_check_device()
  patch 2 - PCI/MSI/Armada-370-xp: Remove arch_msi_check_device()
  patch 3 - PCI/MSI: Remove arch_msi_check_device()

 arch/powerpc/include/asm/machdep.h |  2 --
 arch/powerpc/kernel/msi.c  | 12 +
 arch/powerpc/platforms/cell/axon_msi.c |  9 ---
 arch/powerpc/platforms/powernv/pci.c   | 19 -
 arch/powerpc/platforms/pseries/msi.c   | 42 +++--
 arch/powerpc/sysdev/fsl_msi.c  | 12 +++--
 arch/powerpc/sysdev/mpic_pasemi_msi.c  | 11 ++--
 arch/powerpc/sysdev/mpic_u3msi.c   | 28 ---
 arch/powerpc/sysdev/ppc4xx_hsta_msi.c  | 18 +
 arch/powerpc/sysdev/ppc4xx_msi.c   | 19 +
 drivers/irqchip/irq-armada-370-xp.c| 14 +++---
 drivers/pci/msi.c  | 49 +-
 include/linux/msi.h|  3 ---
 13 files changed, 67 insertions(+), 171 deletions(-)

-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-07 Thread James Bottomley
On Sun, 2014-09-07 at 16:41 -0400, Peter Hurley wrote:
 On 09/07/2014 03:04 PM, James Bottomley wrote:
  On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
  On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
  On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
  On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
  Hi James,
 
  On 09/04/2014 10:11 PM, James Bottomley wrote:
  On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
  +And there are anti-guarantees:
  +
  + (*) These guarantees do not apply to bitfields, because compilers 
  often
  + generate code to modify these using non-atomic read-modify-write
  + sequences.  Do not attempt to use bitfields to synchronize 
  parallel
  + algorithms.
  +
  + (*) Even in cases where bitfields are protected by locks, all fields
  + in a given bitfield must be protected by one lock.  If two 
  fields
  + in a given bitfield are protected by different locks, the 
  compiler's
  + non-atomic read-modify-write sequences can cause an update to 
  one
  + field to corrupt the value of an adjacent field.
  +
  + (*) These guarantees apply only to properly aligned and sized scalar
  + variables.  Properly sized currently means int and long,
  + because some CPU families do not support loads and stores of
  + other sizes.  (Some CPU families is currently believed to
  + be only Alpha 21064.  If this is actually the case, a different
  + non-guarantee is likely to be formulated.)
 
  This is a bit unclear.  Presumably you're talking about definiteness of
  the outcome (as in what's seen after multiple stores to the same
  variable).
 
  No, the last conditions refers to adjacent byte stores from different
  cpu contexts (either interrupt or SMP).
 
  The guarantees are only for natural width on Parisc as well,
  so you would get a mess if you did byte stores to adjacent memory
  locations.
 
  For a simple test like:
 
  struct x {
  long a;
  char b;
  char c;
  char d;
  char e;
  };
 
  void store_bc(struct x *p) {
  p-b = 1;
  p-c = 2;
  }
 
  on parisc, gcc generates separate byte stores
 
  void store_bc(struct x *p) {
 0:   34 1c 00 02 ldi 1,ret0
 4:   0f 5c 12 08 stb ret0,4(r26)
 8:   34 1c 00 04 ldi 2,ret0
 c:   e8 40 c0 00 bv r0(rp)
10:   0f 5c 12 0a stb ret0,5(r26)
 
  which appears to confirm that on parisc adjacent byte data
  is safe from corruption by concurrent cpu updates; that is,
 
  CPU 0| CPU 1
   |
  p-b = 1 | p-c = 2
   |
 
  will result in p-b == 1  p-c == 2 (assume both values
  were 0 before the call to store_bc()).
 
  What Peter said.  I would ask for suggestions for better wording, but
  I would much rather be able to say that single-byte reads and writes
  are atomic and that aligned-short reads and writes are also atomic.
 
  Thus far, it looks like we lose only very old Alpha systems, so unless
  I hear otherwise, I update my patch to outlaw these very old systems.
 
  This isn't universally true according to the architecture manual.  The
  PARISC CPU can make byte to long word stores atomic against the memory
  bus but not against the I/O bus for instance.  Atomicity is a property
  of the underlying substrate, not of the CPU.  Implying that atomicity is
  a CPU property is incorrect.
 
 To go back to this briefly, while it's true that atomicity is not
 a CPU property, atomicity is not possible if the CPU is not cooperating.

You mean if it doesn't have the bus logic to emit?  Like ia32 mostly
can't do 64 bit writes ... sure.

  OK, fair point.
 
  But are there in-use-for-Linux PARISC memory fabrics (for normal memory,
  not I/O) that do not support single-byte and double-byte stores?
  
  For aligned access, I believe that's always the case for the memory bus
  (on both 32 and 64 bit systems).  However, it only applies to machine
  instruction loads and stores of the same width..  If you mix the widths
  on the loads and stores, all bets are off.  That means you have to
  beware of the gcc penchant for coalescing loads and stores: if it sees
  two adjacent byte stores it can coalesce them into a short store
  instead ... that screws up the atomicity guarantees.
 
 Two things: I think that gcc has given up on combining adjacent writes,
 perhaps because unaligned writes on some arches are prohibitive, so
 whatever minor optimization was believed to be gained was quickly lost,
 multi-fold. (Although this might not be the reason since one would
 expect that gcc would know the alignment of the promoted store).

Um, gcc assumes architecturally correct alignment; that's why it pads
structures.  Only when accessing packed structures will it use the
lowest unit load/store.

if you have a struct { char a, char b }; and load first a then b with a
constant gcc will obligingly optimise to a 

Re: bit fields data tearing

2014-09-07 Thread James Bottomley
On Sun, 2014-09-07 at 16:39 -0700, H. Peter Anvin wrote:
 How many PARISC systems do we have that actually do real work on Linux?

I'd be very surprised if this problem didn't exist on all alignment
requiring architectures, like PPC and Sparc as well.  I know it would be
very convenient if all the world were an x86 ... but it would also be
very boring as well.

The rules for coping with it are well known and a relaxation of what we
currently do in the kernel, so I don't see what the actual problem is.

In the context of this thread, PA can't do atomic bit sets (no atomic
RMW except the ldcw operation) it can do atomic writes to fundamental
sizes (byte, short, int, long) provided gcc emits the correct primitive
(there are lots of gotchas in this, but that's not an architectural
problem).  These atomicity guarantees depend on the underlying storage
and are respected for main memory but not for any other type of bus.

James


 On September 7, 2014 4:36:55 PM PDT, Paul E. McKenney 
 paul...@linux.vnet.ibm.com wrote:
 On Sun, Sep 07, 2014 at 04:17:30PM -0700, H. Peter Anvin wrote:
  I'm confused why storing 0x0102 would be a problem.  I think gcc does
 that even on other cpus.
  
  More atomicity can't hurt, can it?
 
 I must defer to James for any additional details on why PARISC systems
 don't provide atomicity for partially overlapping stores.  ;-)
 
  Thanx, Paul
 
  On September 7, 2014 4:00:19 PM PDT, Paul E. McKenney
 paul...@linux.vnet.ibm.com wrote:
  On Sun, Sep 07, 2014 at 12:04:47PM -0700, James Bottomley wrote:
   On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
 On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
  On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley
 wrote:
   Hi James,
   
   On 09/04/2014 10:11 PM, James Bottomley wrote:
On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney
 wrote:
+And there are anti-guarantees:
+
+ (*) These guarantees do not apply to bitfields,
 because
  compilers often
+ generate code to modify these using non-atomic
  read-modify-write
+ sequences.  Do not attempt to use bitfields to
  synchronize parallel
+ algorithms.
+
+ (*) Even in cases where bitfields are protected by
  locks, all fields
+ in a given bitfield must be protected by one
 lock. 
  If two fields
+ in a given bitfield are protected by different
  locks, the compiler's
+ non-atomic read-modify-write sequences can cause
 an
  update to one
+ field to corrupt the value of an adjacent field.
+
+ (*) These guarantees apply only to properly aligned
 and
  sized scalar
+ variables.  Properly sized currently means
 int
  and long,
+ because some CPU families do not support loads
 and
  stores of
+ other sizes.  (Some CPU families is currently
  believed to
+ be only Alpha 21064.  If this is actually the
 case,
  a different
+ non-guarantee is likely to be formulated.)

This is a bit unclear.  Presumably you're talking about
  definiteness of
the outcome (as in what's seen after multiple stores to
 the
  same
variable).
   
   No, the last conditions refers to adjacent byte stores
 from
  different
   cpu contexts (either interrupt or SMP).
   
The guarantees are only for natural width on Parisc as
  well,
so you would get a mess if you did byte stores to
 adjacent
  memory
locations.
   
   For a simple test like:
   
   struct x {
   long a;
   char b;
   char c;
   char d;
   char e;
   };
   
   void store_bc(struct x *p) {
   p-b = 1;
   p-c = 2;
   }
   
   on parisc, gcc generates separate byte stores
   
   void store_bc(struct x *p) {
  0:   34 1c 00 02 ldi 1,ret0
  4:   0f 5c 12 08 stb ret0,4(r26)
  8:   34 1c 00 04 ldi 2,ret0
  c:   e8 40 c0 00 bv r0(rp)
 10:   0f 5c 12 0a stb ret0,5(r26)
   
   which appears to confirm that on parisc adjacent byte data
   is safe from corruption by concurrent cpu updates; that
 is,
   
   CPU 0| CPU 1
|
   p-b = 1 | p-c = 2
|
   
   will result in p-b == 1  p-c == 2 (assume both values
   were 0 before the call to store_bc()).
  
  What Peter said.  I would ask for suggestions for better
  wording, but
  I would much rather be able to say that single-byte reads
 and
  writes
  are atomic and that aligned-short reads and writes are also
  atomic.
  
  Thus far, it looks like we lose only very