Re: [PATCH v2 2/2] leds/powernv: Add driver for PowerNV platform

2015-04-24 Thread Jacek Anaszewski
On Fri, 24 Apr 2015 14:18:30 +1000
Stewart Smith stew...@linux.vnet.ibm.com wrote:

 Jacek Anaszewski j.anaszewsk...@gmail.com writes:
  These device tree comes from out firmware ... which is immutable .
 
  How the firmware is related to kernel? These bindings are for
  kernel, not for the firmware.
 
  DT bindings are compiled to *.dtb file which is concatenated with
  zImage. During system boot device drivers are matched with DT
  bindings through 'compatible' property. A driver should have single
  matching DT node, i.e. no other driver can probe with the same DT
  node. This implies that the node should contain only the properties
  required for configuring the related device.
 
 For OPAL firmware on POWER, firmware hands kernel a flattened device
 tree of the machine it's booting on. It's not added to kernel as the
 kernels aren't board specific - they're generic.

Is the DT node we are discussing used by some other drivers than the
LED class driver? Or is it required in this form by other components of
your platform?

 https://github.com/open-power/skiboot/ is the firmware that generates
 the device tree for booting under OPAL.
 


-- 
Best Regards,
Jacek Anaszewski
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] pci-phb: check for the 32-bit overflow

2015-04-24 Thread Thomas Huth
On Fri, 24 Apr 2015 09:22:33 +0530
Nikunj A Dadhania nik...@linux.vnet.ibm.com wrote:

 
 Hi Thomas,
 
 Thomas Huth th...@redhat.com writes:
  Am Wed, 22 Apr 2015 16:27:19 +0530
  schrieb Nikunj A Dadhania nik...@linux.vnet.ibm.com:
 
  With the addition of 64-bit BARS and increase in the mmio address
  space, the code was hitting this limit. The memory of pci devices
  across the bridges were not accessible due to which the drivers
  failed.
  
  Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com
  ---
   board-qemu/slof/pci-phb.fs | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)
  
  diff --git a/board-qemu/slof/pci-phb.fs b/board-qemu/slof/pci-phb.fs
  index 529772f..e307d95 100644
  --- a/board-qemu/slof/pci-phb.fs
  +++ b/board-qemu/slof/pci-phb.fs
  @@ -258,7 +258,8 @@ setup-puid
   decode-64 2 / dup r\ Decode and calc size/2
   pci-next-mem @ + dup pci-max-mem !  \ and calc max mem address
 
  Could pci-max-mem overflow, too?
 
 Should not, its only the boundary that was an issue.
 
 Qemu sends base and size, base + size can be till uint32 max. So for
 example base was 0xC000. and size was 0x4000., we add up base +
 size and put pci-max-mmio as 0x1.., which would get programmend
 in the bridge bars: lower limit as 0xC000 and 0x as upper
 limit. And no mmio access were going across the bridge.
 
 In my testing, I have found one more issue with translate-my-address,
 it does not take care of 64-bit addresses. I have a patch working for
 SLOF, but its breaking the guest kernel booting.
 
 
   dup pci-next-mmio ! \ which is the same as 
  MMIO base
  -r + pci-max-mmio ! \ calc max MMIO address
  +r +  min pci-max-mmio !\ calc max MMIO address 
  and
  +\ check the 32-bit 
  boundary

Ok, thanks a lot for the example! I think your patch likely works in
practice, but after staring at the code for a while, I think the real
bug is slightly different. If I get the code above right, pci-max-mmio
is normally set to the first address that is _not_ part of the mmio
window anymore, right. Now have a look at pci-bridge-set-mmio-base in
pci-scan.fs:

: pci-bridge-set-mmio-base ( addr -- )
pci-next-mmio @ 10 #aligned \ read the current Value and 
align to 1MB boundary
dup 10 + pci-next-mmio !\ and write back with 1MB for 
bridge
10 rshift   \ mmio-base reg is only the 
upper 16 bits
pci-max-mmio @  and or  \ and Insert mmio Limit (set it 
to max)
swap 20 + rtas-config-l!\ and write it into the bridge
;

Seems like the pci-max-mmio, i.e. the first address that is not in the
window anymore, is programmed into the memory limit register here - but
according to the pci-to-pci bridge specification, it should be the last
address of the window instead.

So I think the correct fix would be to decrease the pci-max-mmio
value in pci-bridge-set-mmio-base by 1- before programming it into the
limit register (note: in pci-bridge-set-mmio-limit you can find a 1-
already, so I think this also should be done in
pci-bridge-set-mmio-base, too)

So if you've got some spare minutes, could you please check whether that
would fix the issue, too?

 Thomas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] leds/powernv: Add driver for PowerNV platform

2015-04-24 Thread Jacek Anaszewski
On Fri, 24 Apr 2015 11:00:41 +0530
Hi Vasant,

Vasant Hegde hegdevas...@linux.vnet.ibm.com wrote:

 On 04/23/2015 07:43 PM, Jacek Anaszewski wrote:
  On Thu, 23 Apr 2015 10:55:40 +0530
  Vasant Hegde hegdevas...@linux.vnet.ibm.com wrote:
  
 
 Hi Jacek,
 
 .../...
 
 
  These device tree comes from out firmware ... which is immutable .
  
  How the firmware is related to kernel? These bindings are for
  kernel, not for the firmware.
  
  DT bindings are compiled to *.dtb file which is concatenated with
  zImage. During system boot device drivers are matched with DT
  bindings through 'compatible' property. A driver should have single
  matching DT node, i.e. no other driver can probe with the same DT
  node. This implies that the node should contain only the properties
  required for configuring the related device.
  
 
 As Stewart mentioned, its not .dtb file in our case.. we pass
 flattened device tree .. which is built by OPAL.

No matter what format of device tree OPAL produces, I assume that
it must compile it from some sources.

dtb file is a compiled form of human readable dts file containing
Flattened Device Tree - a data structure for describing the hardware
in the system.

Please refer to: http://elinux.org/Device_Tree

  We can use LED node name + led-type property for naming...which is
  what I do currently (v4.. which I haven't posted)
 
 
  1. Each LED would have one corresponding LED class device.
 
  2. Operations on attn and fault LED types:
turn on:
echo 255  brightness
turn off:
echo 0  brightness
get status
cat brightness

  3. Operations on identify LED:
turn on:
echo timer  trigger
(blink_set op would have to be implemented in the
driver)
turn off:
echo 0  brightness
get status:
support for this would have to be added to the LED
subsystem core
 
  I see few issues here.
- Overloading same LED device with multiple opeartion complicates
  things .. as these operations can be done independently (say user
  is allowed to enable both identify and fault simultaneously)
  
  I agree, it would be hard to distinguish whether by executing
  `echo 0  brightness` we want to turn off identify or fault
  function.
  
- point 3: IIUC after duration value expires identify indicator
  reverts.. we don't want to revert until user asks .
  
  From what you shared, blinking has hardware acceleration on OPAL
  side. At first timer trigger tries to use HW accelerated blinking by
  calling blink_set op and resorts to using software fallback only if
  the op fails or is not defined.
 
 Blinking is the physical state of LED to represent identify state.
 which is taken care by hardware. and OS doesn't have control on
 this ..

I am aware of it. Therefore we would probably need to add a flag
LED_BLINK_HW_ONLY to the LED subsystem core and modify led_blink_set
function to log an error and avoid setting software fallback in
case blink_set op fails and the flag is set.

Nevertheless, I am leaning towards using brightness_set op for this.

 From software point of view its just another LED with two state (ON
 and OFF).

  
  BTW timer trigger re-sets blink after timer expires, unless
  LED_BLINK_ONESHOT flag is set by LED class device.
 
 In my case, I want to retain the state.'

  
- point 3: if I use brightness for both identify/fault, how to
  disable these LEDs independently?
  
  Another sysfs attribute would be required, but it would be ugly.
 
 yeah.
 
 
- Also how to use trigger property for each LED (if at all we
  want to use them later)?
 
  
  After analyzing pros and cons I think that separate LED class
  devices for each LED type would be most suitable solution in this
  case.
 
 Agree.
 
  
  For 'identify' LED the operation would be:
  
  #echo timer  trigger //set 'identify' (blinking)
  #cat trigger//check identify state
  #none [timer]   //'identify' is ON
  #echo 0  brightness//unset 'identify
  #cat trigger
  #[none] timer   //'identify' is OFF
  
  You would have to implement blink_set op
  (see Documentation/leds/leds-class.txt and other LED class drivers
  for reference).
 
 Implementing another op should be fine.. I can try to implement it.
 
 But from user perspective identify is just another LED. Hence can we
 just use brightness property itself?

OK, let's use only brightness. Usage of blinking API would impose
turning on led-triggers, which would be used only for exposing trigger
sysfs attribute. Triggers however would not be used, as the intention
is using only HW accelerated blinking.

Please add comment to the driver, describing the reasons for abusing API
semantics.
 
 
  
  For attention and fault LEDs only brightness attribute would matter.
  
 
 Sure.
 
  DT bindings would look as follows:
  
  opal-leds {
  compatible = 

Re: [PATCH 1/2] pci-phb: check for the 32-bit overflow

2015-04-24 Thread Thomas Huth
On Fri, 24 Apr 2015 12:56:57 +0200
Thomas Huth th...@redhat.com wrote:

 On Fri, 24 Apr 2015 09:22:33 +0530
 Nikunj A Dadhania nik...@linux.vnet.ibm.com wrote:
 
  
  Hi Thomas,
  
  Thomas Huth th...@redhat.com writes:
   Am Wed, 22 Apr 2015 16:27:19 +0530
   schrieb Nikunj A Dadhania nik...@linux.vnet.ibm.com:
  
   With the addition of 64-bit BARS and increase in the mmio address
   space, the code was hitting this limit. The memory of pci devices
   across the bridges were not accessible due to which the drivers
   failed.
   
   Signed-off-by: Nikunj A Dadhania nik...@linux.vnet.ibm.com
   ---
board-qemu/slof/pci-phb.fs | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
   
   diff --git a/board-qemu/slof/pci-phb.fs b/board-qemu/slof/pci-phb.fs
   index 529772f..e307d95 100644
   --- a/board-qemu/slof/pci-phb.fs
   +++ b/board-qemu/slof/pci-phb.fs
   @@ -258,7 +258,8 @@ setup-puid
decode-64 2 / dup r\ Decode and calc size/2
pci-next-mem @ + dup pci-max-mem !  \ and calc max mem 
   address
  
   Could pci-max-mem overflow, too?
  
  Should not, its only the boundary that was an issue.
  
  Qemu sends base and size, base + size can be till uint32 max. So for
  example base was 0xC000. and size was 0x4000., we add up base +
  size and put pci-max-mmio as 0x1.., which would get programmend
  in the bridge bars: lower limit as 0xC000 and 0x as upper
  limit. And no mmio access were going across the bridge.
  
  In my testing, I have found one more issue with translate-my-address,
  it does not take care of 64-bit addresses. I have a patch working for
  SLOF, but its breaking the guest kernel booting.
  
  
dup pci-next-mmio ! \ which is the same as 
   MMIO base
   -r + pci-max-mmio ! \ calc max MMIO address
   +r +  min pci-max-mmio !\ calc max MMIO address 
   and
   +\ check the 32-bit 
   boundary
 
 Ok, thanks a lot for the example! I think your patch likely works in
 practice, but after staring at the code for a while, I think the real
 bug is slightly different. If I get the code above right, pci-max-mmio
 is normally set to the first address that is _not_ part of the mmio
 window anymore, right. Now have a look at pci-bridge-set-mmio-base in
 pci-scan.fs:
 
 : pci-bridge-set-mmio-base ( addr -- )
 pci-next-mmio @ 10 #aligned \ read the current Value and 
 align to 1MB boundary
 dup 10 + pci-next-mmio !\ and write back with 1MB for 
 bridge
 10 rshift   \ mmio-base reg is only the 
 upper 16 bits
 pci-max-mmio @  and or  \ and Insert mmio Limit (set 
 it to max)
 swap 20 + rtas-config-l!\ and write it into the bridge
 ;
 
 Seems like the pci-max-mmio, i.e. the first address that is not in the
 window anymore, is programmed into the memory limit register here - but
 according to the pci-to-pci bridge specification, it should be the last
 address of the window instead.
 
 So I think the correct fix would be to decrease the pci-max-mmio
 value in pci-bridge-set-mmio-base by 1- before programming it into the
 limit register (note: in pci-bridge-set-mmio-limit you can find a 1-
 already, so I think this also should be done in
 pci-bridge-set-mmio-base, too)
 
 So if you've got some spare minutes, could you please check whether that
 would fix the issue, too?

By the way, if I'm right, pci-bridge-set-mem-base seems to suffer from
the same problem, too.

 Thomas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/ftrace: add powerpc timebase as a trace clock source

2015-04-24 Thread Naveen N. Rao
On 2015/04/23 09:10AM, Steven Rostedt wrote:
 On Thu, 23 Apr 2015 12:15:04 +0530
 Naveen N. Rao naveen.n@linux.vnet.ibm.com wrote:
 
  diff --git a/arch/powerpc/include/asm/trace_clock.h 
  b/arch/powerpc/include/asm/trace_clock.h
  new file mode 100644
  index 000..0b0d094
  --- /dev/null
  +++ b/arch/powerpc/include/asm/trace_clock.h
  @@ -0,0 +1,27 @@
  +/*
  + * This program is free software; you can redistribute it and/or modify
  + * it under the terms of the GNU General Public License, version 2, as
  + * published by the Free Software Foundation.
  + *
  + * Copyright (C) 2015 Naveen N. Rao, IBM Corporation
  + */
  +
  +#ifndef _ASM_PPC_TRACE_CLOCK_H
  +#define _ASM_PPC_TRACE_CLOCK_H
  +
  +#include linux/compiler.h
  +#include linux/types.h
  +
  +#ifdef CONFIG_TRACE_CLOCK
 
 You don't need this #if statement. What else is using this besides
 kernel/trace/trace.c, which selects TRACE_CLOCK if it is compiled.
 
 If you were trying to match x86, where it has:
 
 #ifdef CONFIG_X86_TSC
 
 where you have CONFIG_TRACE_CLOCK. We needed the #ifdef because you
 can compile the x86 kernel without TSC support, and we did not want to
 export a tsc tracing clock if one did not exist.
 
 And the only place that I see that even includes this header in ppc, is
 also only compiled if CONFIG_TRACE_CLOCK is selected.

Ah yes, agreed. I have removed it and seeing as CONFIG_TRACE_CLOCK is 
really for the generic clocks, I have moved the dependency on 
arch/powerpc/kernel/trace_clock.o to CONFIG_TRACING since that is what 
gates kernel/trace/trace.o

 
 I'm fine with the change, just nuke the unnecessary #ifdef.

Thanks for the review!
- Naveen
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc/ftrace: add powerpc timebase as a trace clock source

2015-04-24 Thread Naveen N. Rao
Add a new powerpc-specific trace clock using the timebase register,
similar to x86-tsc. This gives us
- a fast, monotonic, hardware clock source for trace entries, and
- a clock that can be used to correlate events across cpus as well as across
  hypervisor and guests.

Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
---
Changes since v1:
- removed unnecessary #ifdef in trace_clock.h
- changed config build dependency for trace_clock.o from TRACE_CLOCK to TRACING


 Documentation/trace/ftrace.txt |  5 +
 arch/powerpc/include/asm/Kbuild|  1 -
 arch/powerpc/include/asm/trace_clock.h | 19 +++
 arch/powerpc/kernel/Makefile   |  1 +
 arch/powerpc/kernel/trace_clock.c  | 15 +++
 5 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/trace_clock.h
 create mode 100644 arch/powerpc/kernel/trace_clock.c

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 572ca92..689f61a 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -346,6 +346,11 @@ of ftrace. Here is a list of some of the key files:
  x86-tsc: Architectures may define their own clocks. For
   example, x86 uses its own TSC cycle clock here.
 
+ ppc-tb: This uses the powerpc timebase register value.
+ This is in sync across CPUs and can also be used
+ to correlate events across hypervisor/guest if
+ tb_offset is known.
+
To set a clock, simply echo the clock name into this file.
 
  echo global  trace_clock
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 382b28e..5041c66 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -5,5 +5,4 @@ generic-y += mcs_spinlock.h
 generic-y += preempt.h
 generic-y += rwsem.h
 generic-y += scatterlist.h
-generic-y += trace_clock.h
 generic-y += vtime.h
diff --git a/arch/powerpc/include/asm/trace_clock.h 
b/arch/powerpc/include/asm/trace_clock.h
new file mode 100644
index 000..cf1ee75
--- /dev/null
+++ b/arch/powerpc/include/asm/trace_clock.h
@@ -0,0 +1,19 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Copyright (C) 2015 Naveen N. Rao, IBM Corporation
+ */
+
+#ifndef _ASM_PPC_TRACE_CLOCK_H
+#define _ASM_PPC_TRACE_CLOCK_H
+
+#include linux/compiler.h
+#include linux/types.h
+
+extern u64 notrace trace_clock_ppc_tb(void);
+
+#define ARCH_TRACE_CLOCKS { trace_clock_ppc_tb, ppc-tb, 0 },
+
+#endif  /* _ASM_PPC_TRACE_CLOCK_H */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 502cf69..18e038e 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -118,6 +118,7 @@ obj-$(CONFIG_PPC_IO_WORKAROUNDS)+= io-workarounds.o
 obj-$(CONFIG_DYNAMIC_FTRACE)   += ftrace.o
 obj-$(CONFIG_FUNCTION_GRAPH_TRACER)+= ftrace.o
 obj-$(CONFIG_FTRACE_SYSCALLS)  += ftrace.o
+obj-$(CONFIG_TRACING)  += trace_clock.o
 
 ifneq ($(CONFIG_PPC_INDIRECT_PIO),y)
 obj-y  += iomap.o
diff --git a/arch/powerpc/kernel/trace_clock.c 
b/arch/powerpc/kernel/trace_clock.c
new file mode 100644
index 000..4917069
--- /dev/null
+++ b/arch/powerpc/kernel/trace_clock.c
@@ -0,0 +1,15 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * Copyright (C) 2015 Naveen N. Rao, IBM Corporation
+ */
+
+#include asm/trace_clock.h
+#include asm/time.h
+
+u64 notrace trace_clock_ppc_tb(void)
+{
+   return get_tb();
+}
-- 
2.3.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] spi: fsl-spi: fix devm_ioremap_resource() error case

2015-04-24 Thread Mark Brown
On Thu, Apr 23, 2015 at 02:11:47PM +0200, Christophe Leroy wrote:
 devm_ioremap_resource() doesn't return NULL but an ERR_PTR on error.

Applied, thanks.


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[git pull] Please pull mpe/linux.git powerpc-4.1-2 tag

2015-04-24 Thread Michael Ellerman
Hi Linus,

Please pull powerpc fixes for 4.1:

The following changes since commit d19d5efd8c8840aa4f38a6dfbfe500d8cc27de46:

  Merge tag 'powerpc-4.1-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux (2015-04-16 13:53:32 
-0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux.git tags/powerpc-4.1-2

for you to fetch changes up to 2e826695d87c2d213def07bc344ae97d88384f62:

  powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled 
(2015-04-23 17:42:14 +1000)


powerpc fixes for 4.1

- Fix for mm_dec_nr_pmds() from Scott.
- Fixes for oopses seen with KVM + THP from Aneesh.
- Build fixes from Aneesh  Shreyas.


Aneesh Kumar K.V (5):
  KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer
  KVM: PPC: Remove page table walk helpers
  powerpc/mm/thp: Make page table walk safe against thp split/collapse
  powerpc/mm/thp: Return pte address if we find trans_splitting.
  powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled

Michael Ellerman (1):
  Merge branch 'master' of git://git.kernel.org/.../scottwood/linux into 
fixes

Scott Wood (1):
  powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()

Shreyas B. Prabhu (1):
  powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error

 arch/powerpc/include/asm/kvm_book3s_64.h | 17 +++
 arch/powerpc/include/asm/pgtable.h   | 28 +++
 arch/powerpc/kernel/eeh.c|  6 ++-
 arch/powerpc/kernel/io-workarounds.c | 10 ++--
 arch/powerpc/kvm/Kconfig |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c  | 14 +++---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c  | 86 +---
 arch/powerpc/kvm/e500_mmu_host.c | 32 
 arch/powerpc/mm/hash_utils_64.c  |  3 +-
 arch/powerpc/mm/hugetlbpage.c| 32 
 arch/powerpc/perf/callchain.c| 24 +
 11 files changed, 137 insertions(+), 117 deletions(-)




signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/2] powerpc/kvm: Enable running guests on RT Linux

2015-04-24 Thread Bogdan Purcareata
This patchset enables running KVM SMP guests with external interrupts on an
underlying RT-enabled Linux. Previous to this patch, a guest with in-kernel MPIC
emulation could easily panic the kernel due to preemption when delivering IPIs
and external interrupts, because of the openpic spinlock becoming a sleeping
mutex on PREEMPT_RT_FULL Linux.

0001: converts the openpic spinlock to a raw spinlock, in order to circumvent
this behavior. While this change is targeted for a RT enabled Linux, it has no
effect on upstream kvm-ppc, so send it upstream for better future maintenance.

0002: disables in-kernel MPIC emulation for guest running on RT, in order to
prevent a potential DoS attack due to large system latencies. This patch is
targeted to RT (due to CONFIG_PREEMPT_RT_FULL), but it can also be applied on
upstream Linux, with no effect.

- applied  compiled against vanilla 4.0
- applied  compiled against stable-rt 3.18-rt

v2:
- updated commit messages
- change the fix for potentially large latencies from limiting the max number of
  VCPUs a guest can have to disabling the in-kernel MPIC

Bogdan Purcareata (2):
  powerpc/kvm: Convert openpic lock to raw_spinlock
  powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT_FULL

 arch/powerpc/kvm/Kconfig |  1 +
 arch/powerpc/kvm/mpic.c  | 44 ++--
 2 files changed, 23 insertions(+), 22 deletions(-)

-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 1/2] powerpc/kvm: Convert openpic lock to raw_spinlock

2015-04-24 Thread Bogdan Purcareata
The lock in the KVM openpic emulation on PPC is a spinlock_t, meaning it becomes
a sleeping mutex under PREEMPT_RT_FULL. This yields to a situation where this
non-raw lock is grabbed with interrupts already disabled by hard_irq_disable():

kvmppc_prepare_to_enter()
  hard_irq_disable()
  kvmppc_core_prepare_to_enter()
kvmppc_core_check_exceptions()
  kvmppc_booke_irqprio_deliver()
kvmppc_mpic_set_epr()
  spin_lock_irqsave()
...

This happens for guest interrupts that go through this openpic emulation code.
The result is a kernel crash on guest enter (include/linux/kvm_host.h:784).

Converting the lock to a raw_spinlock fixes the issue and enables the guest to
run I/O intensive workloads in a SMP configuration. A similar fix can be found
for the i8254 PIT emulation on x86 [1].

[1] https://lkml.org/lkml/2010/1/11/289

v2:
- updated commit message

Signed-off-by: Bogdan Purcareata bogdan.purcare...@freescale.com
---
 arch/powerpc/kvm/mpic.c | 44 ++--
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c
index 6249cdc..2f70660 100644
--- a/arch/powerpc/kvm/mpic.c
+++ b/arch/powerpc/kvm/mpic.c
@@ -196,7 +196,7 @@ struct openpic {
int num_mmio_regions;
 
gpa_t reg_base;
-   spinlock_t lock;
+   raw_spinlock_t lock;
 
/* Behavior control */
struct fsl_mpic_info *fsl;
@@ -1103,9 +1103,9 @@ static int openpic_cpu_write_internal(void *opaque, gpa_t 
addr,
mpic_irq_raise(opp, dst, ILR_INTTGT_INT);
}
 
-   spin_unlock(opp-lock);
+   raw_spin_unlock(opp-lock);
kvm_notify_acked_irq(opp-kvm, 0, notify_eoi);
-   spin_lock(opp-lock);
+   raw_spin_lock(opp-lock);
 
break;
}
@@ -1180,12 +1180,12 @@ void kvmppc_mpic_set_epr(struct kvm_vcpu *vcpu)
int cpu = vcpu-arch.irq_cpu_id;
unsigned long flags;
 
-   spin_lock_irqsave(opp-lock, flags);
+   raw_spin_lock_irqsave(opp-lock, flags);
 
if ((opp-gcr  opp-mpic_mode_mask) == GCR_MODE_PROXY)
kvmppc_set_epr(vcpu, openpic_iack(opp, opp-dst[cpu], cpu));
 
-   spin_unlock_irqrestore(opp-lock, flags);
+   raw_spin_unlock_irqrestore(opp-lock, flags);
 }
 
 static int openpic_cpu_read_internal(void *opaque, gpa_t addr,
@@ -1386,9 +1386,9 @@ static int kvm_mpic_read(struct kvm_vcpu *vcpu,
return -EINVAL;
}
 
-   spin_lock_irq(opp-lock);
+   raw_spin_lock_irq(opp-lock);
ret = kvm_mpic_read_internal(opp, addr - opp-reg_base, u.val);
-   spin_unlock_irq(opp-lock);
+   raw_spin_unlock_irq(opp-lock);
 
/*
 * Technically only 32-bit accesses are allowed, but be nice to
@@ -1427,10 +1427,10 @@ static int kvm_mpic_write(struct kvm_vcpu *vcpu,
return -EOPNOTSUPP;
}
 
-   spin_lock_irq(opp-lock);
+   raw_spin_lock_irq(opp-lock);
ret = kvm_mpic_write_internal(opp, addr - opp-reg_base,
  *(const u32 *)ptr);
-   spin_unlock_irq(opp-lock);
+   raw_spin_unlock_irq(opp-lock);
 
pr_debug(%s: addr %llx ret %d val %x\n,
 __func__, addr, ret, *(const u32 *)ptr);
@@ -1501,14 +1501,14 @@ static int access_reg(struct openpic *opp, gpa_t addr, 
u32 *val, int type)
if (addr  3)
return -ENXIO;
 
-   spin_lock_irq(opp-lock);
+   raw_spin_lock_irq(opp-lock);
 
if (type == ATTR_SET)
ret = kvm_mpic_write_internal(opp, addr, *val);
else
ret = kvm_mpic_read_internal(opp, addr, val);
 
-   spin_unlock_irq(opp-lock);
+   raw_spin_unlock_irq(opp-lock);
 
pr_debug(%s: type %d addr %llx val %x\n, __func__, type, addr, *val);
 
@@ -1545,9 +1545,9 @@ static int mpic_set_attr(struct kvm_device *dev, struct 
kvm_device_attr *attr)
if (attr32 != 0  attr32 != 1)
return -EINVAL;
 
-   spin_lock_irq(opp-lock);
+   raw_spin_lock_irq(opp-lock);
openpic_set_irq(opp, attr-attr, attr32);
-   spin_unlock_irq(opp-lock);
+   raw_spin_unlock_irq(opp-lock);
return 0;
}
 
@@ -1592,9 +1592,9 @@ static int mpic_get_attr(struct kvm_device *dev, struct 
kvm_device_attr *attr)
if (attr-attr  MAX_SRC)
return -EINVAL;
 
-   spin_lock_irq(opp-lock);
+   raw_spin_lock_irq(opp-lock);
attr32 = opp-src[attr-attr].pending;
-   spin_unlock_irq(opp-lock);
+   raw_spin_unlock_irq(opp-lock);
 
if (put_user(attr32, (u32 __user *)(long)attr-addr))
return -EFAULT;
@@ -1670,7 +1670,7 @@ static int mpic_create(struct kvm_device *dev, u32 type)
opp-kvm = dev-kvm;

Re: [PATCH v2] powerpc/ftrace: add powerpc timebase as a trace clock source

2015-04-24 Thread Steven Rostedt
On Fri, 24 Apr 2015 14:24:44 +0530
Naveen N. Rao naveen.n@linux.vnet.ibm.com wrote:

 Add a new powerpc-specific trace clock using the timebase register,
 similar to x86-tsc. This gives us
 - a fast, monotonic, hardware clock source for trace entries, and
 - a clock that can be used to correlate events across cpus as well as across
   hypervisor and guests.
 
 Signed-off-by: Naveen N. Rao naveen.n@linux.vnet.ibm.com
 ---
 Changes since v1:
 - removed unnecessary #ifdef in trace_clock.h
 - changed config build dependency for trace_clock.o from TRACE_CLOCK to 
 TRACING
 

Looks fine to me.

Acked-by: Steven Rostedt rost...@goodmis.org

-- Steve

 
  Documentation/trace/ftrace.txt |  5 +
  arch/powerpc/include/asm/Kbuild|  1 -
  arch/powerpc/include/asm/trace_clock.h | 19 +++
  arch/powerpc/kernel/Makefile   |  1 +
  arch/powerpc/kernel/trace_clock.c  | 15 +++
  5 files changed, 40 insertions(+), 1 deletion(-)
  create mode 100644 arch/powerpc/include/asm/trace_clock.h
  create mode 100644 arch/powerpc/kernel/trace_clock.c
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 2/2] powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT_FULL

2015-04-24 Thread Bogdan Purcareata
While converting the openpic emulation code to use a raw_spinlock_t enables
guests to run on RT, there's still a performance issue. For interrupts sent in
directed delivery mode with a multiple CPU mask, the emulated openpic will loop
through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
through all the pending interrupts for that VCPU. This is done while holding the
raw_lock, meaning that in all this time the interrupts and preemption are
disabled on the host Linux. A malicious user app can max both these number and
cause a DoS.

This temporary fix is sent for two reasons. First is so that users who want to
use the in-kernel MPIC emulation are aware of the potential latencies, thus
making sure that the hardware MPIC and their usage scenario does not involve
interrupts sent in directed delivery mode, and the number of possible pending
interrupts is kept small. Secondly, this should incentivize the development of a
proper openpic emulation that would be better suited for RT.

Signed-off-by: Bogdan Purcareata bogdan.purcare...@freescale.com
---
 arch/powerpc/kvm/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 11850f3..415499a 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -158,6 +158,7 @@ config KVM_E500MC
 config KVM_MPIC
bool KVM in-kernel MPIC emulation
depends on KVM  E500
+   depends on !PREEMPT_RT_FULL
select HAVE_KVM_IRQCHIP
select HAVE_KVM_IRQFD
select HAVE_KVM_IRQ_ROUTING
-- 
2.1.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] pci: Use Qemu created PCI device nodes

2015-04-24 Thread Thomas Huth

 Hi Nikunj,

On Wed, 22 Apr 2015 16:27:20 +0530
Nikunj A Dadhania nik...@linux.vnet.ibm.com wrote:

 PCI Enumeration has been part of SLOF. Now with hotplug code addition
 in Qemu, it makes more sense to have this code a one place, i.e. Qemu.

s/Qemu/QEMU/ and s/code a one place/code in one place/ ?

 Adding routines to walk through the device nodes created by Qemu. SLOF
 will configure the device/bridges and program the BARs for
 communicating with the devices.

I wonder whether it would make more sense to also set up the BARs etc.
in QEMU instead of SLOF?

 
 diff --git a/board-qemu/slof/pci-phb.fs b/board-qemu/slof/pci-phb.fs
 index e307d95..30b7443 100644
 --- a/board-qemu/slof/pci-phb.fs
 +++ b/board-qemu/slof/pci-phb.fs
 @@ -283,6 +283,41 @@ setup-puid
 THEN
  ;
  
 +: phb-pci-walk-bridge ( -- )
 +phb-debug? IF .   Calling pci-walk-bridge  pwd cr THEN
 +
 +get-node child ?dup 0= IF EXIT THEN\ get and check if we have 
 children
 +BEGIN
 +dup   \ Continue as long as there are 
 children
 +WHILE

Most Forth code uses the same indentation for the code between
BEGIN...WHILE and WHILE...REPEAT ... so I think you could decrease the
indentation of the following block by one level.

 +\ Set child node as current node:
 +dup set-node

Below you are calling pci-device-setup which in turn might include some
pci-class_*.fs or pci-device_*.fs files (or even run some FCODE?). At
least pci-class_02.fs seems to use an INSTANCE VARIABLE, i.e. the
instance template should get modified in that case == Please
double-check whether you need to use extend-device here instead (I'm
not 100% sure right now ... what happens
for example when you run qemu with a network device that SLOF does not
provide a pci-device_*.fs for? I guess it will try to include
pci-class_02.fs and fail due to the INSTANCE VARIABLE ?)

 +my-space pci-set-slot   \ set the slot bit

pci-set-slot seems to rely on the pci-device-slots global variable.
This is normally initialized by pci-probe-bus. Now that you provide
your own implementation of that function below, I think it should
likely also set up the pci-device-slots variable, shouldn't it?

 +my-space pci-htype@ \ read HEADER-Type
 +7f and  \ Mask bit 7 - multifunction device
 +CASE
 +   0 OF my-space pci-device-setup ENDOF  \ | set up the device
 +   1 OF my-space pci-bridge-setup ENDOF  \ | set up the bridge
 +   dup OF my-space pci-htype@ pci-out ENDOF
 +   ENDCASE
 +   peer
 +REPEAT drop
 +get-parent set-node
 +;

The remaining part of the patch looks ok to me.

 Thomas
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev