AW: [PATCH v1 2/7] AES for PPC/SPE - aes tables

2015-02-16 Thread Markus Stockhausen
 Von: linux-crypto-ow...@vger.kernel.org 
 [linux-crypto-ow...@vger.kernel.org]quot; im Auftrag von quot;Segher 
 Boessenkool [seg...@kernel.crashing.org]
 Gesendet: Montag, 16. Februar 2015 15:37
 An: David Laight
 Cc: Markus Stockhausen; linux-cry...@vger.kernel.org; 
 linuxppc-dev@lists.ozlabs.org
 Betreff: Re: [PATCH v1 2/7] AES for PPC/SPE - aes tables
 
 On Mon, Feb 16, 2015 at 02:19:50PM +, David Laight wrote:
  From:  Markus Stockhausen
   4K AES tables for big endian
 
  I can't help feeling that you could give more information about how the
  values are generated.
 
 ... and an explanation of why this does not open you up to a timing attack?

Good points,

the tables are only big endian reversed ones of those found in 
crypto/aes_generic.c.

Regarding timing attacks: I understand, that reducing AES table sizes for a 
constant 
AES processing time is important to avoid cache timing attacks. Hopefully the 
following points will mitigate the concern.

Target architecture are low performance e500 cores without available caam 
features. These can currently use only aes-generic module. That one depends 
on 16K T-tables. 2*4K for encryption and 2*4K for decryption. The new module
reduces the T-table sizes to 8K+256 bytes. Far away from a minimal 256 byte
S-BOX but at least an improvement.

To narrow it down further. Intended use is for cheap routers. So no multiuser 
environments where a malicous process could drive complex flush+reload 
attacks. In case someone gains unallowed access there will be a lot of other 
and simpler ways to compromise the system.

In case that is sufficient for you I will add respective notes into a patch v2.

Markus
Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC v2 03/10] powerpc/mpc85xx: Add platform support for the Freescale DPAA BMan

2015-02-16 Thread Emil Medve
From: Geoff Thorpe geoff.tho...@freescale.com

Change-Id: I59de17c040cdd304f86306336fcf89f130f7db2d
Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com
---
 arch/powerpc/Kconfig  |  5 +
 arch/powerpc/configs/mpc85xx_defconfig|  1 +
 arch/powerpc/configs/mpc85xx_smp_defconfig|  1 +
 arch/powerpc/platforms/85xx/Kconfig   |  1 +
 arch/powerpc/platforms/85xx/corenet_generic.c | 16 
 arch/powerpc/platforms/85xx/p1023_rdb.c   | 14 ++
 6 files changed, 38 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 7258b468..6ab5ad5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -779,6 +779,11 @@ config FSL_GTM
help
  Freescale General-purpose Timers support
 
+config HAS_FSL_QBMAN
+   bool Datapath Acceleration Queue and Buffer management
+   help
+ Datapath Acceleration Queue and Buffer management
+
 # Yes MCA RS/6000s exist but Linux-PPC does not currently support any
 config MCA
bool
diff --git a/arch/powerpc/configs/mpc85xx_defconfig 
b/arch/powerpc/configs/mpc85xx_defconfig
index 8496389..aa7e41f 100644
--- a/arch/powerpc/configs/mpc85xx_defconfig
+++ b/arch/powerpc/configs/mpc85xx_defconfig
@@ -49,6 +49,7 @@ CONFIG_HIGHMEM=y
 CONFIG_BINFMT_MISC=m
 CONFIG_MATH_EMULATION=y
 CONFIG_FORCE_MAX_ZONEORDER=12
+CONFIG_HAS_FSL_QBMAN=y
 CONFIG_PCI=y
 CONFIG_PCIEPORTBUS=y
 # CONFIG_PCIEASPM is not set
diff --git a/arch/powerpc/configs/mpc85xx_smp_defconfig 
b/arch/powerpc/configs/mpc85xx_smp_defconfig
index bf88caf..82feda1 100644
--- a/arch/powerpc/configs/mpc85xx_smp_defconfig
+++ b/arch/powerpc/configs/mpc85xx_smp_defconfig
@@ -50,6 +50,7 @@ CONFIG_HIGHMEM=y
 CONFIG_BINFMT_MISC=m
 CONFIG_MATH_EMULATION=y
 CONFIG_FORCE_MAX_ZONEORDER=12
+CONFIG_HAS_FSL_QBMAN=y
 CONFIG_PCI=y
 CONFIG_PCI_MSI=y
 CONFIG_RAPIDIO=y
diff --git a/arch/powerpc/platforms/85xx/Kconfig 
b/arch/powerpc/platforms/85xx/Kconfig
index dbdd5fa..51e9a7b 100644
--- a/arch/powerpc/platforms/85xx/Kconfig
+++ b/arch/powerpc/platforms/85xx/Kconfig
@@ -276,6 +276,7 @@ config CORENET_GENERIC
select GPIO_MPC8XXX
select HAS_RAPIDIO
select PPC_EPAPR_HV_PIC
+   select HAS_FSL_QBMAN
help
  This option enables support for the FSL CoreNet based boards.
  For 32bit kernel, the following boards are supported:
diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c 
b/arch/powerpc/platforms/85xx/corenet_generic.c
index 63bef30..74faab7 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -197,6 +197,21 @@ static int __init corenet_generic_probe(void)
return 0;
 }
 
+/* Early setup is required for large chunks of contiguous (and 
coarsely-aligned)
+ * memory. The following shoe-horns Bman init_early calls into the
+ * platform setup to let them parse their CCSR nodes early on.
+ */
+#ifdef CONFIG_FSL_BMAN_CONFIG
+void __init bman_init_early(void);
+#endif
+
+__init void corenet_ds_init_early(void)
+{
+#ifdef CONFIG_FSL_BMAN_CONFIG
+   bman_init_early();
+#endif
+}
+
 define_machine(corenet_generic) {
.name   = CoreNet Generic,
.probe  = corenet_generic_probe,
@@ -215,6 +230,7 @@ define_machine(corenet_generic) {
 #else
.power_save = e500_idle,
 #endif
+   .init_early = corenet_ds_init_early,
 };
 
 machine_arch_initcall(corenet_generic, corenet_gen_publish_devices);
diff --git a/arch/powerpc/platforms/85xx/p1023_rdb.c 
b/arch/powerpc/platforms/85xx/p1023_rdb.c
index d5b7509..624d3d6 100644
--- a/arch/powerpc/platforms/85xx/p1023_rdb.c
+++ b/arch/powerpc/platforms/85xx/p1023_rdb.c
@@ -103,7 +103,20 @@ static int __init p1023_rdb_probe(void)
unsigned long root = of_get_flat_dt_root();
 
return of_flat_dt_is_compatible(root, fsl,P1023RDB);
+}
+
+/* Early setup is required for large chunks of contiguous (and 
coarsely-aligned)
+ * memory. The following shoe-horns Bman init_early calls into the
+ * platform setup to let them parse their CCSR nodes early on. */
+#ifdef CONFIG_FSL_BMAN_CONFIG
+void __init bman_init_early(void);
+#endif
 
+static __init void p1023_rdb_init_early(void)
+{
+#ifdef CONFIG_FSL_BMAN_CONFIG
+   bman_init_early();
+#endif
 }
 
 define_machine(p1023_rdb) {
@@ -119,4 +132,5 @@ define_machine(p1023_rdb) {
.pcibios_fixup_bus  = fsl_pcibios_fixup_bus,
.pcibios_fixup_phb  = fsl_pcibios_fixup_phb,
 #endif
+   .init_early = p1023_rdb_init_early,
 };
-- 
2.3.0
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC v2 04/10] powerpc/mpc85xx: Add platform support for the Freescale DPAA QMan

2015-02-16 Thread Emil Medve
From: Geoff Thorpe geoff.tho...@freescale.com

Change-Id: I59de17c040cdd304f86306336fcf89f130f7db2d
Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com
---
 arch/powerpc/platforms/85xx/corenet_generic.c | 8 +++-
 arch/powerpc/platforms/85xx/p1023_rdb.c   | 8 +++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c 
b/arch/powerpc/platforms/85xx/corenet_generic.c
index 74faab7..20b8f9a 100644
--- a/arch/powerpc/platforms/85xx/corenet_generic.c
+++ b/arch/powerpc/platforms/85xx/corenet_generic.c
@@ -198,15 +198,21 @@ static int __init corenet_generic_probe(void)
 }
 
 /* Early setup is required for large chunks of contiguous (and 
coarsely-aligned)
- * memory. The following shoe-horns Bman init_early calls into the
+ * memory. The following shoe-horns Q/Bman init_early calls into the
  * platform setup to let them parse their CCSR nodes early on.
  */
+#ifdef CONFIG_FSL_QMAN_CONFIG
+void __init qman_init_early(void);
+#endif
 #ifdef CONFIG_FSL_BMAN_CONFIG
 void __init bman_init_early(void);
 #endif
 
 __init void corenet_ds_init_early(void)
 {
+#ifdef CONFIG_FSL_QMAN_CONFIG
+   qman_init_early();
+#endif
 #ifdef CONFIG_FSL_BMAN_CONFIG
bman_init_early();
 #endif
diff --git a/arch/powerpc/platforms/85xx/p1023_rdb.c 
b/arch/powerpc/platforms/85xx/p1023_rdb.c
index 624d3d6..dc69801 100644
--- a/arch/powerpc/platforms/85xx/p1023_rdb.c
+++ b/arch/powerpc/platforms/85xx/p1023_rdb.c
@@ -106,14 +106,20 @@ static int __init p1023_rdb_probe(void)
 }
 
 /* Early setup is required for large chunks of contiguous (and 
coarsely-aligned)
- * memory. The following shoe-horns Bman init_early calls into the
+ * memory. The following shoe-horns Q/Bman init_early calls into the
  * platform setup to let them parse their CCSR nodes early on. */
+#ifdef CONFIG_FSL_QMAN_CONFIG
+void __init qman_init_early(void);
+#endif
 #ifdef CONFIG_FSL_BMAN_CONFIG
 void __init bman_init_early(void);
 #endif
 
 static __init void p1023_rdb_init_early(void)
 {
+#ifdef CONFIG_FSL_QMAN_CONFIG
+   qman_init_early();
+#endif
 #ifdef CONFIG_FSL_BMAN_CONFIG
bman_init_early();
 #endif
-- 
2.3.0
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net-next 0/6] bpf: Enable BPF JIT on ppc32

2015-02-16 Thread Daniel Borkmann

On 02/16/2015 08:13 AM, Denis Kirjanov wrote:
...

Well, I don't see significant challenges to enable eBPF on ppc64 in the future.
I'll start working on it after I get this merged


Cool, awesome!
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC v2 06/10] fsl_qman: Add self-tester

2015-02-16 Thread Emil Medve
From: Geoff Thorpe geoff.tho...@freescale.com

Change-Id: I314d36d94717cfc34053b6212899f71cb729d16c
Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com
---
 drivers/soc/freescale/Kconfig  |  26 +-
 drivers/soc/freescale/Makefile |   4 +
 drivers/soc/freescale/bman_test.c  |   2 +-
 drivers/soc/freescale/{bman_test.c = qman_test.c} |  15 +-
 drivers/soc/freescale/{bman_test.c = qman_test.h} |  33 +-
 drivers/soc/freescale/qman_test_api.c  | 213 +
 drivers/soc/freescale/qman_test_stash.c| 497 +
 7 files changed, 758 insertions(+), 32 deletions(-)
 copy drivers/soc/freescale/{bman_test.c = qman_test.c} (91%)
 copy drivers/soc/freescale/{bman_test.c = qman_test.h} (81%)
 create mode 100644 drivers/soc/freescale/qman_test_api.c
 create mode 100644 drivers/soc/freescale/qman_test_stash.c

diff --git a/drivers/soc/freescale/Kconfig b/drivers/soc/freescale/Kconfig
index 6786b94..001999e 100644
--- a/drivers/soc/freescale/Kconfig
+++ b/drivers/soc/freescale/Kconfig
@@ -53,7 +53,7 @@ config FSL_BMAN_TEST
---help---
  This option compiles self-test code for BMan.
 
-config FSL_BMAN_TEST_HIGH
+config FSL_BMAN_TEST_API
bool BMan high-level self-test
depends on FSL_BMAN_TEST
default y
@@ -93,6 +93,30 @@ config FSL_QMAN_CONFIG
  linux image is running as a guest OS under the hypervisor, only one
  guest OS (the control plane) needs this option.
 
+config FSL_QMAN_TEST
+   tristate QMan self-tests
+   default n
+   ---help---
+ This option compiles self-test code for QMan.
+
+config FSL_QMAN_TEST_API
+   bool QMan high-level self-test
+   depends on FSL_QMAN_TEST
+   default y
+   ---help---
+ This requires the presence of cpu-affine portals, and performs
+ high-level API testing with them (whichever portal(s) are affine to
+ the cpu(s) the test executes on).
+
+config FSL_QMAN_TEST_STASH
+   bool QMan 'hot potato' data-stashing self-test
+   depends on FSL_QMAN_TEST
+   default y
+   ---help---
+ This performs a hot potato style test enqueuing/dequeuing a frame
+ across a series of FQs scheduled to different portals (and cpus), with
+ DQRR, data and context stashing always on.
+
 # H/w settings that can be hard-coded for now.
 config FSL_QMAN_FQD_SZ
int size of Frame Queue Descriptor region
diff --git a/drivers/soc/freescale/Makefile b/drivers/soc/freescale/Makefile
index 09f31b0..1f59a6b 100644
--- a/drivers/soc/freescale/Makefile
+++ b/drivers/soc/freescale/Makefile
@@ -13,3 +13,7 @@ bman_tester-$(CONFIG_FSL_BMAN_TEST_THRESH)+= 
bman_test_thresh.o
 # QMan
 obj-$(CONFIG_FSL_QMAN) += qman_api.o qman_utils.o
 obj-$(CONFIG_FSL_QMAN_CONFIG)  += qman.o qman_portal.o
+obj-$(CONFIG_FSL_QMAN_TEST)+= qman_tester.o
+qman_tester-y   = qman_test.o
+qman_tester-$(CONFIG_FSL_QMAN_TEST_API)+= qman_test_api.o
+qman_tester-$(CONFIG_FSL_QMAN_TEST_STASH)  += qman_test_stash.o
diff --git a/drivers/soc/freescale/bman_test.c 
b/drivers/soc/freescale/bman_test.c
index 350f2f8..5d23752 100644
--- a/drivers/soc/freescale/bman_test.c
+++ b/drivers/soc/freescale/bman_test.c
@@ -37,7 +37,7 @@ MODULE_DESCRIPTION(BMan testing);
 
 static int test_init(void)
 {
-#ifdef CONFIG_FSL_BMAN_TEST_HIGH
+#ifdef CONFIG_FSL_BMAN_TEST_API
int loop = 1;
while (loop--)
bman_test_api();
diff --git a/drivers/soc/freescale/bman_test.c 
b/drivers/soc/freescale/qman_test.c
similarity index 91%
copy from drivers/soc/freescale/bman_test.c
copy to drivers/soc/freescale/qman_test.c
index 350f2f8..094d46d 100644
--- a/drivers/soc/freescale/bman_test.c
+++ b/drivers/soc/freescale/qman_test.c
@@ -29,22 +29,23 @@
  * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#include bman_test.h
+#include qman_test.h
 
 MODULE_AUTHOR(Geoff Thorpe);
 MODULE_LICENSE(Dual BSD/GPL);
-MODULE_DESCRIPTION(BMan testing);
+MODULE_DESCRIPTION(QMan testing);
 
 static int test_init(void)
 {
-#ifdef CONFIG_FSL_BMAN_TEST_HIGH
int loop = 1;
-   while (loop--)
-   bman_test_api();
+   while (loop--) {
+#ifdef CONFIG_FSL_QMAN_TEST_STASH
+   qman_test_stash();
 #endif
-#ifdef CONFIG_FSL_BMAN_TEST_THRESH
-   bman_test_thresh();
+#ifdef CONFIG_FSL_QMAN_TEST_API
+   qman_test_api();
 #endif
+   }
return 0;
 }
 
diff --git a/drivers/soc/freescale/bman_test.c 
b/drivers/soc/freescale/qman_test.h
similarity index 81%
copy from drivers/soc/freescale/bman_test.c
copy to drivers/soc/freescale/qman_test.h
index 350f2f8..4690376 100644
--- a/drivers/soc/freescale/bman_test.c
+++ b/drivers/soc/freescale/qman_test.h
@@ -29,28 +29,15 @@
  * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 

[RFC v2 00/10] Freescale DPAA B/QMan drivers

2015-02-16 Thread Emil Medve
v2: Moved out of staging into soc/freescale

Hello,


This is the se attempt to publish the . They are
not to be applied yet.  

These are the Freescale DPAA B/QMan drivers. At this stage, this is more or less
the drivers from the Freescale PowerPC SDK roughly squashed and split in a
sequence of component patches. They still needs some work and cleanup before we
expect to have them applied, but we appreciate early feedback

To do:  Add a maintainer(s) entry
Add module(s) support
Some important clean-ups

Cheers,


Geoff Thorpe (8):
  fsl_bman: Add drivers for the Freescale DPAA BMan
  fsl_qman: Add drivers for the Freescale DPAA QMan
  powerpc/mpc85xx: Add platform support for the Freescale DPAA BMan
  powerpc/mpc85xx: Add platform support for the Freescale DPAA QMan
  fsl_bman: Add self-tester
  fsl_qman: Add self-tester
  fsl_bman: Add debugfs support
  fsl_qman: Add debugfs support

Hai-Ying Wang (2):
  fsl_bman: Add HOTPLUG_CPU support
  fsl_qman: Add HOTPLUG_CPU support

 arch/powerpc/Kconfig  |5 +
 arch/powerpc/configs/mpc85xx_defconfig|1 +
 arch/powerpc/configs/mpc85xx_smp_defconfig|1 +
 arch/powerpc/platforms/85xx/Kconfig   |1 +
 arch/powerpc/platforms/85xx/corenet_generic.c |   22 +
 arch/powerpc/platforms/85xx/p1023_rdb.c   |   20 +
 drivers/soc/Kconfig   |1 +
 drivers/soc/Makefile  |1 +
 drivers/soc/freescale/Kconfig |  187 ++
 drivers/soc/freescale/Makefile|   21 +
 drivers/soc/freescale/bman.c  |  611 ++
 drivers/soc/freescale/bman.h  |  524 +
 drivers/soc/freescale/bman_api.c  | 1055 ++
 drivers/soc/freescale/bman_debugfs.c  |  119 ++
 drivers/soc/freescale/bman_portal.c   |  373 
 drivers/soc/freescale/bman_priv.h |  149 ++
 drivers/soc/freescale/bman_test.c |   56 +
 drivers/soc/freescale/bman_test.h |   44 +
 drivers/soc/freescale/bman_test_api.c |  181 ++
 drivers/soc/freescale/bman_test_thresh.c  |  196 ++
 drivers/soc/freescale/dpaa_alloc.c|  573 ++
 drivers/soc/freescale/dpaa_sys.h  |  294 +++
 drivers/soc/freescale/qbman_driver.c  |   85 +
 drivers/soc/freescale/qman.c  |  991 ++
 drivers/soc/freescale/qman.h  | 1302 
 drivers/soc/freescale/qman_api.c  | 2624 +
 drivers/soc/freescale/qman_debugfs.c  | 1326 +
 drivers/soc/freescale/qman_portal.c   |  548 ++
 drivers/soc/freescale/qman_priv.h |  283 +++
 drivers/soc/freescale/qman_test.c |   57 +
 drivers/soc/freescale/qman_test.h |   43 +
 drivers/soc/freescale/qman_test_api.c |  213 ++
 drivers/soc/freescale/qman_test_stash.c   |  497 +
 drivers/soc/freescale/qman_utils.c|  129 ++
 include/linux/fsl_bman.h  |  517 +
 include/linux/fsl_qman.h  | 1955 ++
 36 files changed, 15005 insertions(+)
 create mode 100644 drivers/soc/freescale/Kconfig
 create mode 100644 drivers/soc/freescale/Makefile
 create mode 100644 drivers/soc/freescale/bman.c
 create mode 100644 drivers/soc/freescale/bman.h
 create mode 100644 drivers/soc/freescale/bman_api.c
 create mode 100644 drivers/soc/freescale/bman_debugfs.c
 create mode 100644 drivers/soc/freescale/bman_portal.c
 create mode 100644 drivers/soc/freescale/bman_priv.h
 create mode 100644 drivers/soc/freescale/bman_test.c
 create mode 100644 drivers/soc/freescale/bman_test.h
 create mode 100644 drivers/soc/freescale/bman_test_api.c
 create mode 100644 drivers/soc/freescale/bman_test_thresh.c
 create mode 100644 drivers/soc/freescale/dpaa_alloc.c
 create mode 100644 drivers/soc/freescale/dpaa_sys.h
 create mode 100644 drivers/soc/freescale/qbman_driver.c
 create mode 100644 drivers/soc/freescale/qman.c
 create mode 100644 drivers/soc/freescale/qman.h
 create mode 100644 drivers/soc/freescale/qman_api.c
 create mode 100644 drivers/soc/freescale/qman_debugfs.c
 create mode 100644 drivers/soc/freescale/qman_portal.c
 create mode 100644 drivers/soc/freescale/qman_priv.h
 create mode 100644 drivers/soc/freescale/qman_test.c
 create mode 100644 drivers/soc/freescale/qman_test.h
 create mode 100644 drivers/soc/freescale/qman_test_api.c
 create mode 100644 drivers/soc/freescale/qman_test_stash.c
 create mode 100644 drivers/soc/freescale/qman_utils.c
 create mode 100644 include/linux/fsl_bman.h
 create mode 100644 include/linux/fsl_qman.h

-- 
2.3.0
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC v2 10/10] fsl_qman: Add HOTPLUG_CPU support

2015-02-16 Thread Emil Medve
From: Hai-Ying Wang haiying.w...@freescale.com

Change-Id: Ica4d1b2b0fd3c3ae5e043663febd9f4cb7c762cf
Signed-off-by: Hai-Ying Wang haiying.w...@freescale.com
---
 drivers/soc/freescale/qman_portal.c | 45 +
 1 file changed, 45 insertions(+)

diff --git a/drivers/soc/freescale/qman_portal.c 
b/drivers/soc/freescale/qman_portal.c
index 7216de1..b008b41 100644
--- a/drivers/soc/freescale/qman_portal.c
+++ b/drivers/soc/freescale/qman_portal.c
@@ -30,6 +30,9 @@
  */
 
 #include qman_priv.h
+#ifdef CONFIG_HOTPLUG_CPU
+#include linux/cpu.h
+#endif
 
 /* Global variable containing revision id (even on non-control plane systems
  * where CCSR isn't available) */
@@ -381,6 +384,45 @@ static void qman_offline_cpu(unsigned int cpu)
}
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static void qman_online_cpu(unsigned int cpu)
+{
+   struct qman_portal *p;
+   const struct qm_portal_config *pcfg;
+   p = (struct qman_portal *)affine_portals[cpu];
+   if (p) {
+   pcfg = qman_get_qm_portal_config(p);
+   if (pcfg) {
+   irq_set_affinity(pcfg-public_cfg.irq, cpumask_of(cpu));
+   qman_portal_update_sdest(pcfg, cpu);
+   }
+   }
+}
+
+static int __cpuinit qman_hotplug_cpu_callback(struct notifier_block *nfb,
+   unsigned long action, void *hcpu)
+{
+   unsigned int cpu = (unsigned long)hcpu;
+
+   switch (action) {
+   case CPU_ONLINE:
+   case CPU_ONLINE_FROZEN:
+   qman_online_cpu(cpu);
+   break;
+   case CPU_DOWN_PREPARE:
+   case CPU_DOWN_PREPARE_FROZEN:
+   qman_offline_cpu(cpu);
+   default:
+   break;
+   }
+   return NOTIFY_OK;
+}
+
+static struct notifier_block qman_hotplug_cpu_notifier = {
+   .notifier_call = qman_hotplug_cpu_callback,
+};
+#endif /* CONFIG_HOTPLUG_CPU */
+
 __init int qman_init(void)
 {
struct cpumask slave_cpus;
@@ -499,5 +541,8 @@ __init int qman_init(void)
cpumask_andnot(offline_cpus, cpu_possible_mask, cpu_online_mask);
for_each_cpu(cpu, offline_cpus)
qman_offline_cpu(cpu);
+#ifdef CONFIG_HOTPLUG_CPU
+   register_hotcpu_notifier(qman_hotplug_cpu_notifier);
+#endif
return 0;
 }
-- 
2.3.0
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/5 v2] powerpc/boot/fdt: Add little-endian support to libfdt wrappers

2015-02-16 Thread Cedric Le Goater
On 02/11/2015 05:55 AM, Jeremy Kerr wrote:
 For epapr-style boot, we may be little-endian. This change implements
 the proper conversion for fdt*_to_cpu and cpu_to_fdt*. We also need the
 full cpu_to_* and *_to_cpu macros for this.
 
 Signed-off-by: Jeremy Kerr j...@ozlabs.org
 Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org


I did not work on the little endian epapr wrapper :/

To test the patchset, I used the 'le' branch of 
git://github.com/jk-ozlabs/linux 
and successfully booted a tuleta and an open-power system using the latest 
skiboot and a LE zImage.epapr. Looks good to me. 

You might want to add CONFIG_IPMI_WATCHDOG to openpower_defconfig 

Thanks.

C.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC v2 05/10] fsl_bman: Add self-tester

2015-02-16 Thread Emil Medve
From: Geoff Thorpe geoff.tho...@freescale.com

Change-Id: If1b44bb013addc1e855c73a4e6ff74bc8b6e4829
Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com
---
 drivers/soc/freescale/Kconfig|  26 
 drivers/soc/freescale/Makefile   |   8 +-
 drivers/soc/freescale/bman.c |  16 +--
 drivers/soc/freescale/bman_api.c |  14 +--
 drivers/soc/freescale/bman_portal.c  |  18 +--
 drivers/soc/freescale/bman_priv.h|   2 +-
 drivers/soc/freescale/bman_test.c|  56 +
 drivers/soc/freescale/bman_test.h|  44 +++
 drivers/soc/freescale/bman_test_api.c| 181 
 drivers/soc/freescale/bman_test_thresh.c | 196 +++
 drivers/soc/freescale/dpaa_alloc.c   |   2 +-
 drivers/soc/freescale/dpaa_sys.h |   2 +-
 drivers/soc/freescale/qman.c |  26 ++--
 drivers/soc/freescale/qman_api.c |  18 +--
 drivers/soc/freescale/qman_portal.c  |  18 +--
 drivers/soc/freescale/qman_priv.h|   2 +-
 include/linux/fsl_bman.h |  10 +-
 include/linux/fsl_qman.h |  18 +--
 18 files changed, 582 insertions(+), 75 deletions(-)
 create mode 100644 drivers/soc/freescale/bman_test.c
 create mode 100644 drivers/soc/freescale/bman_test.h
 create mode 100644 drivers/soc/freescale/bman_test_api.c
 create mode 100644 drivers/soc/freescale/bman_test_thresh.c

diff --git a/drivers/soc/freescale/Kconfig b/drivers/soc/freescale/Kconfig
index 9329c5c..6786b94 100644
--- a/drivers/soc/freescale/Kconfig
+++ b/drivers/soc/freescale/Kconfig
@@ -47,6 +47,32 @@ config FSL_BMAN_CONFIG
  linux image is running as a guest OS under the hypervisor, only one
  guest OS (the control plane) needs this option.
 
+config FSL_BMAN_TEST
+   tristate BMan self-tests
+   default n
+   ---help---
+ This option compiles self-test code for BMan.
+
+config FSL_BMAN_TEST_HIGH
+   bool BMan high-level self-test
+   depends on FSL_BMAN_TEST
+   default y
+   ---help---
+ This requires the presence of cpu-affine portals, and performs
+ high-level API testing with them (whichever portal(s) are affine to
+ the cpu(s) the test executes on).
+
+config FSL_BMAN_TEST_THRESH
+   bool BMan threshold test
+   depends on FSL_BMAN_TEST
+   default y
+   ---help---
+ Multi-threaded (SMP) test of BMan pool depletion. A pool is seeded
+ before multiple threads (one per cpu) create pool objects to track
+ depletion state changes. The pool is then drained to empty by a
+ drainer thread, and the other threads that they observe exactly
+ the depletion state changes that are expected.
+
 endif # FSL_BMAN
 
 config FSL_QMAN
diff --git a/drivers/soc/freescale/Makefile b/drivers/soc/freescale/Makefile
index 69be592..09f31b0 100644
--- a/drivers/soc/freescale/Makefile
+++ b/drivers/soc/freescale/Makefile
@@ -2,10 +2,14 @@
 obj-$(CONFIG_FSL_DPA)  += dpaa_alloc.o
 obj-$(CONFIG_HAS_FSL_QBMAN)+= qbman_driver.o
 
-# Bman
+# BMan
 obj-$(CONFIG_FSL_BMAN) += bman_api.o
 obj-$(CONFIG_FSL_BMAN_CONFIG)  += bman.o bman_portal.o
+obj-$(CONFIG_FSL_BMAN_TEST)+= bman_tester.o
+bman_tester-y   = bman_test.o
+bman_tester-$(CONFIG_FSL_BMAN_TEST_API)+= bman_test_api.o
+bman_tester-$(CONFIG_FSL_BMAN_TEST_THRESH) += bman_test_thresh.o
 
-# Qman
+# QMan
 obj-$(CONFIG_FSL_QMAN) += qman_api.o qman_utils.o
 obj-$(CONFIG_FSL_QMAN_CONFIG)  += qman.o qman_portal.o
diff --git a/drivers/soc/freescale/bman.c b/drivers/soc/freescale/bman.c
index fba6ae0..66986f2 100644
--- a/drivers/soc/freescale/bman.c
+++ b/drivers/soc/freescale/bman.c
@@ -275,7 +275,7 @@ static int __init fsl_bman_init(struct device_node *node)
BUG_ON(!bm);
bm_node = node;
bm_get_version(bm, id, major, minor);
-   pr_info(Bman ver:%04x,%02x,%02x\n, id, major, minor);
+   pr_info(BMan ver:%04x,%02x,%02x\n, id, major, minor);
if ((major == 1)  (minor == 0)) {
bman_ip_rev = BMAN_REV10;
bman_pool_max = 64;
@@ -286,7 +286,7 @@ static int __init fsl_bman_init(struct device_node *node)
bman_ip_rev = BMAN_REV21;
bman_pool_max = 64;
} else {
-   pr_warn(unknown Bman version, default to rev1.0\n);
+   pr_warn(unknown BMan version, default to rev1.0\n);
}
 
return 0;
@@ -330,7 +330,7 @@ static void log_edata_bits(u32 bit_count)
 {
u32 i, j, mask = 0x;
 
-   pr_warn(Bman ErrInt, EDATA:\n);
+   pr_warn(BMan ErrInt, EDATA:\n);
i = bit_count/32;
if (bit_count%32) {
i++;
@@ -351,13 +351,13 @@ static void log_additional_error_info(u32 isr_val, u32 
ecsr_val)
  

[RFC v2 08/10] fsl_qman: Add debugfs support

2015-02-16 Thread Emil Medve
From: Geoff Thorpe geoff.tho...@freescale.com

Change-Id: I59a75a91b289193b5ab1d30a08f60119dc4d7568
Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com
---
 drivers/soc/freescale/Kconfig|7 +
 drivers/soc/freescale/Makefile   |1 +
 drivers/soc/freescale/qman_api.c |   58 ++
 drivers/soc/freescale/qman_debugfs.c | 1326 ++
 drivers/soc/freescale/qman_priv.h|8 +
 include/linux/fsl_qman.h |6 +
 6 files changed, 1406 insertions(+)
 create mode 100644 drivers/soc/freescale/qman_debugfs.c

diff --git a/drivers/soc/freescale/Kconfig b/drivers/soc/freescale/Kconfig
index f819671..5f683c8 100644
--- a/drivers/soc/freescale/Kconfig
+++ b/drivers/soc/freescale/Kconfig
@@ -124,6 +124,13 @@ config FSL_QMAN_TEST_STASH
  across a series of FQs scheduled to different portals (and cpus), with
  DQRR, data and context stashing always on.
 
+config FSL_QMAN_DEBUGFS
+   tristate QMan debugfs interface
+   depends on DEBUG_FS
+   default y
+   ---help---
+ This option compiles debugfs code for QMan.
+
 # H/w settings that can be hard-coded for now.
 config FSL_QMAN_FQD_SZ
int size of Frame Queue Descriptor region
diff --git a/drivers/soc/freescale/Makefile b/drivers/soc/freescale/Makefile
index c980dac..aac7cb2 100644
--- a/drivers/soc/freescale/Makefile
+++ b/drivers/soc/freescale/Makefile
@@ -18,3 +18,4 @@ obj-$(CONFIG_FSL_QMAN_TEST)   += qman_tester.o
 qman_tester-y   = qman_test.o
 qman_tester-$(CONFIG_FSL_QMAN_TEST_API)+= qman_test_api.o
 qman_tester-$(CONFIG_FSL_QMAN_TEST_STASH)  += qman_test_stash.o
+obj-$(CONFIG_FSL_QMAN_DEBUGFS) += qman_debugfs.o
diff --git a/drivers/soc/freescale/qman_api.c b/drivers/soc/freescale/qman_api.c
index 08dbb36..7556118 100644
--- a/drivers/soc/freescale/qman_api.c
+++ b/drivers/soc/freescale/qman_api.c
@@ -1765,6 +1765,37 @@ int qman_query_wq(u8 query_dedicated, struct 
qm_mcr_querywq *wq)
 }
 EXPORT_SYMBOL(qman_query_wq);
 
+int qman_testwrite_cgr(struct qman_cgr *cgr, u64 i_bcnt,
+   struct qm_mcr_cgrtestwrite *result)
+{
+   struct qm_mc_command *mcc;
+   struct qm_mc_result *mcr;
+   struct qman_portal *p = get_affine_portal();
+   unsigned long irqflags __maybe_unused;
+   u8 res;
+
+   PORTAL_IRQ_LOCK(p, irqflags);
+   mcc = qm_mc_start(p-p);
+   mcc-cgrtestwrite.cgid = cgr-cgrid;
+   mcc-cgrtestwrite.i_bcnt_hi = (u8)(i_bcnt  32);
+   mcc-cgrtestwrite.i_bcnt_lo = (u32)i_bcnt;
+   qm_mc_commit(p-p, QM_MCC_VERB_CGRTESTWRITE);
+   while (!(mcr = qm_mc_result(p-p)))
+   cpu_relax();
+   DPA_ASSERT((mcr-verb  QM_MCR_VERB_MASK) == QM_MCC_VERB_CGRTESTWRITE);
+   res = mcr-result;
+   if (res == QM_MCR_RESULT_OK)
+   *result = mcr-cgrtestwrite;
+   PORTAL_IRQ_UNLOCK(p, irqflags);
+   put_affine_portal();
+   if (res != QM_MCR_RESULT_OK) {
+   pr_err(CGR TEST WRITE failed: %s\n, mcr_result_str(res));
+   return -EIO;
+   }
+   return 0;
+}
+EXPORT_SYMBOL(qman_testwrite_cgr);
+
 int qman_query_cgr(struct qman_cgr *cgr, struct qm_mcr_querycgr *cgrd)
 {
struct qm_mc_command *mcc;
@@ -1793,6 +1824,33 @@ int qman_query_cgr(struct qman_cgr *cgr, struct 
qm_mcr_querycgr *cgrd)
 }
 EXPORT_SYMBOL(qman_query_cgr);
 
+int qman_query_congestion(struct qm_mcr_querycongestion *congestion)
+{
+   struct qm_mc_result *mcr;
+   struct qman_portal *p = get_affine_portal();
+   unsigned long irqflags __maybe_unused;
+   u8 res;
+
+   PORTAL_IRQ_LOCK(p, irqflags);
+   qm_mc_start(p-p);
+   qm_mc_commit(p-p, QM_MCC_VERB_QUERYCONGESTION);
+   while (!(mcr = qm_mc_result(p-p)))
+   cpu_relax();
+   DPA_ASSERT((mcr-verb  QM_MCR_VERB_MASK) ==
+   QM_MCC_VERB_QUERYCONGESTION);
+   res = mcr-result;
+   if (res == QM_MCR_RESULT_OK)
+   *congestion = mcr-querycongestion;
+   PORTAL_IRQ_UNLOCK(p, irqflags);
+   put_affine_portal();
+   if (res != QM_MCR_RESULT_OK) {
+   pr_err(QUERY_CONGESTION failed: %s\n, mcr_result_str(res));
+   return -EIO;
+   }
+   return 0;
+}
+EXPORT_SYMBOL(qman_query_congestion);
+
 /* internal function used as a wait_event() expression */
 static int set_p_vdqcr(struct qman_portal *p, struct qman_fq *fq, u32 vdqcr)
 {
diff --git a/drivers/soc/freescale/qman_debugfs.c 
b/drivers/soc/freescale/qman_debugfs.c
new file mode 100644
index 000..c09f88f
--- /dev/null
+++ b/drivers/soc/freescale/qman_debugfs.c
@@ -0,0 +1,1326 @@
+/* Copyright 2010-2011 Freescale Semiconductor, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright

[RFC v2 09/10] fsl_bman: Add HOTPLUG_CPU support

2015-02-16 Thread Emil Medve
From: Hai-Ying Wang haiying.w...@freescale.com

Change-Id: I863d5c15c7f35f9de4ea3d985e4ff467167924b7
Signed-off-by: Hai-Ying Wang haiying.w...@freescale.com
---
 drivers/soc/freescale/bman_portal.c | 45 -
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/soc/freescale/bman_portal.c 
b/drivers/soc/freescale/bman_portal.c
index a91081c..f484a91 100644
--- a/drivers/soc/freescale/bman_portal.c
+++ b/drivers/soc/freescale/bman_portal.c
@@ -30,6 +30,9 @@
  */
 
 #include bman_priv.h
+#ifdef CONFIG_HOTPLUG_CPU
+#include linux/cpu.h
+#endif
 
 /*
  * Global variables of the max portal/pool number this bman version supported
@@ -180,7 +183,7 @@ static int __init parse_bportals(char *str)
 }
 __setup(bportals=, parse_bportals);
 
-static void bman_offline_cpu(unsigned int cpu)
+static void __cold bman_offline_cpu(unsigned int cpu)
 {
struct bman_portal *p = (struct bman_portal *)affine_bportals[cpu];
const struct bm_portal_config *pcfg;
@@ -192,6 +195,42 @@ static void bman_offline_cpu(unsigned int cpu)
}
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static void __cold bman_online_cpu(unsigned int cpu)
+{
+   struct bman_portal *p = (struct bman_portal *)affine_bportals[cpu];
+   const struct bm_portal_config *pcfg;
+
+   if (p) {
+   pcfg = bman_get_bm_portal_config(p);
+   if (pcfg)
+   irq_set_affinity(pcfg-public_cfg.irq, cpumask_of(cpu));
+   }
+}
+
+static int __cold bman_hotplug_cpu_callback(struct notifier_block *nfb,
+   unsigned long action, void *hcpu)
+{
+   unsigned int cpu = (unsigned long)hcpu;
+
+   switch (action) {
+   case CPU_ONLINE:
+   case CPU_ONLINE_FROZEN:
+   bman_online_cpu(cpu);
+   break;
+   case CPU_DOWN_PREPARE:
+   case CPU_DOWN_PREPARE_FROZEN:
+   bman_offline_cpu(cpu);
+   }
+
+   return NOTIFY_OK;
+}
+
+static struct notifier_block bman_hotplug_cpu_notifier = {
+   .notifier_call = bman_hotplug_cpu_callback,
+};
+#endif /* CONFIG_HOTPLUG_CPU */
+
 /* Initialise the BMan driver. The meat of this function deals with portals. 
The
  * following describes the flow of portal-handling, the code steps refer to
  * this description;
@@ -326,5 +365,9 @@ int __init bman_init(void)
for_each_cpu(cpu, offline_cpus)
bman_offline_cpu(cpu);
 
+#ifdef CONFIG_HOTPLUG_CPU
+   register_hotcpu_notifier(bman_hotplug_cpu_notifier);
+#endif
+
return 0;
 }
-- 
2.3.0
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC v2 07/10] fsl_bman: Add debugfs support

2015-02-16 Thread Emil Medve
From: Geoff Thorpe geoff.tho...@freescale.com

Change-Id: I7eea7aea8a58ad0c28451b70801c0d101e88d263
Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com
---
 drivers/soc/freescale/Kconfig|   7 +++
 drivers/soc/freescale/Makefile   |   1 +
 drivers/soc/freescale/bman_api.c |  19 ++
 drivers/soc/freescale/bman_debugfs.c | 119 +++
 drivers/soc/freescale/dpaa_sys.h |   1 +
 include/linux/fsl_bman.h |   6 ++
 6 files changed, 153 insertions(+)
 create mode 100644 drivers/soc/freescale/bman_debugfs.c

diff --git a/drivers/soc/freescale/Kconfig b/drivers/soc/freescale/Kconfig
index 001999e..f819671 100644
--- a/drivers/soc/freescale/Kconfig
+++ b/drivers/soc/freescale/Kconfig
@@ -73,6 +73,13 @@ config FSL_BMAN_TEST_THRESH
  drainer thread, and the other threads that they observe exactly
  the depletion state changes that are expected.
 
+config FSL_BMAN_DEBUGFS
+   tristate BMan debugfs interface
+   depends on DEBUG_FS
+   default y
+   ---help---
+ This option compiles debugfs code for BMan.
+
 endif # FSL_BMAN
 
 config FSL_QMAN
diff --git a/drivers/soc/freescale/Makefile b/drivers/soc/freescale/Makefile
index 1f59a6b..c980dac 100644
--- a/drivers/soc/freescale/Makefile
+++ b/drivers/soc/freescale/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_FSL_BMAN_TEST) += bman_tester.o
 bman_tester-y   = bman_test.o
 bman_tester-$(CONFIG_FSL_BMAN_TEST_API)+= bman_test_api.o
 bman_tester-$(CONFIG_FSL_BMAN_TEST_THRESH) += bman_test_thresh.o
+obj-$(CONFIG_FSL_BMAN_DEBUGFS) += bman_debugfs.o
 
 # QMan
 obj-$(CONFIG_FSL_QMAN) += qman_api.o qman_utils.o
diff --git a/drivers/soc/freescale/bman_api.c b/drivers/soc/freescale/bman_api.c
index 7bb4840..20f510a 100644
--- a/drivers/soc/freescale/bman_api.c
+++ b/drivers/soc/freescale/bman_api.c
@@ -997,6 +997,25 @@ int bman_flush_stockpile(struct bman_pool *pool, u32 flags)
 }
 EXPORT_SYMBOL(bman_flush_stockpile);
 
+int bman_query_pools(struct bm_pool_state *state)
+{
+   struct bman_portal *p = get_affine_portal();
+   struct bm_mc_result *mcr;
+   __maybe_unused unsigned long irqflags;
+
+   PORTAL_IRQ_LOCK(p, irqflags);
+   bm_mc_start(p-p);
+   bm_mc_commit(p-p, BM_MCC_VERB_CMD_QUERY);
+   while (!(mcr = bm_mc_result(p-p)))
+   cpu_relax();
+   DPA_ASSERT((mcr-verb  BM_MCR_VERB_CMD_MASK) == BM_MCR_VERB_CMD_QUERY);
+   *state = mcr-query;
+   PORTAL_IRQ_UNLOCK(p, irqflags);
+   put_affine_portal();
+   return 0;
+}
+EXPORT_SYMBOL(bman_query_pools);
+
 #ifdef CONFIG_FSL_BMAN_CONFIG
 u32 bman_query_free_buffers(struct bman_pool *pool)
 {
diff --git a/drivers/soc/freescale/bman_debugfs.c 
b/drivers/soc/freescale/bman_debugfs.c
new file mode 100644
index 000..b93e705
--- /dev/null
+++ b/drivers/soc/freescale/bman_debugfs.c
@@ -0,0 +1,119 @@
+/* Copyright 2010-2011 Freescale Semiconductor, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *  notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *  notice, this list of conditions and the following disclaimer in the
+ *  documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *  names of its contributors may be used to endorse or promote products
+ *  derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License (GPL) as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+#include linux/module.h
+#include linux/fsl_bman.h
+#include linux/debugfs.h
+#include linux/seq_file.h
+#include linux/uaccess.h
+
+static struct dentry *dfs_root; /* debugfs 

Re: [PATCH 1/3] perf/e6500: Make event translations available in sysfs

2015-02-16 Thread Tom Huynh
On Mon, Feb 09, 2015 at 09:40:19PM +0100, Andi Kleen wrote:
  I'll NAK any external 'download area' (and I told that Andi 
  before): tools/perf/event-tables/ or so is a good enough 
  'download area' with fast enough update cycles.
 
 The proposal was to put it on kernel.org, similar to how
 external firmware blobs are distributed. CPU event lists
 are data sheets, so are like firmware. They do not
 follow the normal kernel code licenses. They are not 
 source code. They cannot be reviewed in the normal way.
 
Could you provide more details about the license and review 
concern? How are the event list files different from hardware-
specific information (e.g. reg mapping) in header files?

  If any 'update' of event descriptions is needed it can 
  happen through the distro package mechanism, or via a 
  simple 'git pull' if it's compiled directly.
  
  Lets not overengineer this with any dependence on an 
  external site and with a separate update mechanism - lets 
  just get the tables into tools/ and see it from there...
 
 That experiment has been already done for oprofile,
 didn't work very well.

Please excuse my ignorance, could you say exactly what didn't
work well for oprofile?

Ingo's suggestion seems good to me because these event files 
will be transparent to the users, and it's just more 
convenient not having to go to a website to look for 
the event file that matches the machine to download.
The distro package or the perf make mechanism can put these
files into the appropriate directory. The users who are not
perf developers won't need to know about these files.

- Tom
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/4] Support registering specific reset handler

2015-02-16 Thread Gavin Shan
On Mon, Feb 16, 2015 at 11:14:27AM -0200, casca...@linux.vnet.ibm.com wrote:
On Fri, Feb 13, 2015 at 03:54:55PM +1100, Gavin Shan wrote:
 VFIO PCI infrastructure depends on pci_reset_function() to do reset on
 PCI devices so that they would be in clean state when host or guest grabs
 them. Unfortunately, the function doesn't work (or not well) on some PCI
 devices that require EEH PE reset.
 
 The patchset extends the quirk for PCI device speicific reset methods to
 allow dynamically registration. With it, we can translate reset requests
 for those special PCI devcies to EEH PE reset, which is only avaialble on
 64-bits PowerPC platforms.
 

Hi, Gavin.

I like your approach overall. That allows us to confine these quirks to
the platforms where they are relevant. I would make the quirks more
specific, though, instead of doing them for all IBM and Mellanox
devices.


Yeah, we need have more specific vendor/device IDs for PATCH[4/4]. Could
you please take a look on PATCH[4/4] and then suggest the specific devices
that requries the platform-dependent reset quirk? Especially the device IDs
for IBM/Mellanox we need put add quirks for.

I wonder if we should not have some form of domain reset, where we would
reset all the devices on the same group, and use that on vfio. Grouping
the devices would then be made platform-dependent, as well as the reset
method. On powernv, we would group by IOMMU group and issue a
fundamental reset.


I'm assuming domain reset is PE reset, which is the specific reset handler
tries to do on PowerNV platform. The reason why we need platform specific
reset handler is some adapters can't support function level reset methods
(except pci_dev_specific_reset()) in __pci_dev_reset().

Thanks,
Gavin

Cascardo.

 Gavin Shan (4):
   PCI: Rename struct pci_dev_reset_methods
   PCI: Introduce list for device reset methods
   PCI: Allow registering reset method
   powerpc/powernv: Register PCI dev specific reset handlers
 
  arch/powerpc/platforms/powernv/pci.c |  61 +++
  drivers/pci/pci.h|   3 +-
  drivers/pci/quirks.c | 139 
 ++-
  include/linux/pci.h  |   9 +++
  4 files changed, 192 insertions(+), 20 deletions(-)
 
 -- 
 1.8.3.2
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RESEND v2 0/7] powerpc/powernv: Unified PCI slot reset and hotplug

2015-02-16 Thread Gavin Shan
The patchset was built based on patchset powerpc/powernv: Simplify EEH
implementation, which can be found from:

https://patchwork.ozlabs.org/patch/439956/

The patchset corresponds to skiboot changes, which manages PCI slots
in a unified way: OPAL APIs used to do slot reset, power management,
presence status retrival. The patchset shouldn't be merged before
the OPAL firmware counterpart is merged:

https://patchwork.ozlabs.org/patch/440463/

The kernel changes have been split into 2 parts:
(A) Use the unified PCI slot reset OPAL API, PATCH[1] and PATCH[2]
(B) powernv-php driver to support PCI hotplug for PowerNV platform, starting
from PATCH[3]. The device tree is scanned when the driver is loaded. If
any PCI device node is equipped with property ibm,slot-pluggable and
ibm,reset-by-firmware, it's regarded as hotpluggable slot and the driver
creates/registers slot for it. After that, the sysfs entries can be used
to operate the slot.

PATCH[3/4/5/6]: Necessary code changes to PPC PCI subsystem in order to
support PCI slots for PPC PowerNV platform.
PATCH[7]  : powernv-php driver to support PCI hotplug for PowerNV
platform.


Hotplug testing 
===
# cat /proc/cpuinfo | grep -i powernv
platform: PowerNV
machine : PowerNV 8286-41A

# pwd
/sys/bus/pci/slots
# ls
C10  C11  C12  C14  C15  C6  C7  C8  C9

# lspci -s 0003::.
0003:00:00.0 PCI bridge: IBM Device 03dc
0003:01:00.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:01.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:08.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:09.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:10.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:11.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI 
Host Controller (rev 02)
0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:0f:00.0 Network controller: Mellanox Technologies MT27500 Family 
[ConnectX-3]
# pwd
/sys/bus/pci/slots/C10
# cat address
0003:09:00
# cat cur_bus_speed
5.0 GT/s PCIe
# cat max_bus_speed
8.0 GT/s PCIe
# cat power
1
# echo 0  power
# lspci -s 0003::.
0003:00:00.0 PCI bridge: IBM Device 03dc
0003:01:00.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:01.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:08.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:09.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:10.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:11.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI 
Host Controller (rev 02)
0003:0f:00.0 Network controller: Mellanox Technologies MT27500 Family 
[ConnectX-3]
# echo 1  power
# lspci -s 0003::.
0003:00:00.0 PCI bridge: IBM Device 03dc
0003:01:00.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:01.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:08.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:09.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:10.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:11.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI 
Host Controller (rev 02)
0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:0f:00.0 Network controller: Mellanox Technologies MT27500 Family 
[ConnectX-3]

Changelog
=
v1 - v2
   * Keep opal_pci_reinit(). In case the slot is resetted by kernel,
 instead of skiboot, this API should be called to restore states
 for those affected devices.
   * Reworked slot ID scheme so that old/new kernel can work with
 skiboot with or without unified PCI slot management support.
   * Code cleanup here and there.
   * Separate powernv-php driver to support PCI hotplug for
 PowerNV platform.
   * Check if the OPAL API supported by firmware before calling
 into it, which is necessary for back-compability.
   * Separate patch for factoring 

[PATCH RESEND v2 6/7] powerpc/powernv: Functions to retrieve PCI slot status

2015-02-16 Thread Gavin Shan
The patch exports two functions, which base on corresponding OPAL
APIs to retrieve PCI slot status. Those functions are going to be
used by PCI hotplug module in subsequent patches:

   pnv_pci_get_power_status() opal_pci_get_power_status()
   pnv_pci_get_presence_status()  opal_pci_get_presence_status()

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/opal.h|  5 +
 arch/powerpc/include/asm/pnv-pci.h |  3 +++
 arch/powerpc/platforms/powernv/opal-wrappers.S |  2 ++
 arch/powerpc/platforms/powernv/pci.c   | 24 
 4 files changed, 34 insertions(+)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index ceec756..b3fcf07 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -169,6 +169,8 @@ struct opal_sg_list {
 #define OPAL_IPMI_SEND 107
 #define OPAL_IPMI_RECV 108
 #define OPAL_I2C_REQUEST   109
+#define OPAL_PCI_GET_POWER_STATUS  110
+#define OPAL_PCI_GET_PRESENCE_STATUS   111
 
 /* Device tree flags */
 
@@ -926,6 +928,9 @@ int64_t opal_ipmi_recv(uint64_t interface, struct 
opal_ipmi_msg *msg,
uint64_t *msg_len);
 int64_t opal_i2c_request(uint64_t async_token, uint32_t bus_id,
 struct opal_i2c_request *oreq);
+int64_t opal_pci_get_power_status(uint64_t id, uint8_t *status);
+int64_t opal_pci_get_presence_status(uint64_t id, uint8_t *status);
+
 
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
diff --git a/arch/powerpc/include/asm/pnv-pci.h 
b/arch/powerpc/include/asm/pnv-pci.h
index f9b4982..6a19b50 100644
--- a/arch/powerpc/include/asm/pnv-pci.h
+++ b/arch/powerpc/include/asm/pnv-pci.h
@@ -13,6 +13,9 @@
 #include linux/pci.h
 #include misc/cxl.h
 
+extern int pnv_pci_get_power_status(uint64_t id, uint8_t *status);
+extern int pnv_pci_get_presence_status(uint64_t id, uint8_t *status);
+
 int pnv_phb_to_cxl_mode(struct pci_dev *dev, uint64_t mode);
 int pnv_cxl_ioda_msi_setup(struct pci_dev *dev, unsigned int hwirq,
   unsigned int virq);
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S 
b/arch/powerpc/platforms/powernv/opal-wrappers.S
index 0509bca..35e513e 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -292,3 +292,5 @@ OPAL_CALL(opal_tpo_read,OPAL_READ_TPO);
 OPAL_CALL(opal_ipmi_send,  OPAL_IPMI_SEND);
 OPAL_CALL(opal_ipmi_recv,  OPAL_IPMI_RECV);
 OPAL_CALL(opal_i2c_request,OPAL_I2C_REQUEST);
+OPAL_CALL(opal_pci_get_power_status,   OPAL_PCI_GET_POWER_STATUS);
+OPAL_CALL(opal_pci_get_presence_status,
OPAL_PCI_GET_PRESENCE_STATUS);
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index a0ffae2..bee6798 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -63,6 +63,30 @@ int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval)
return rval ? -EIO : 0;
 }
 
+int pnv_pci_get_power_status(uint64_t id, uint8_t *status)
+{
+   long rc;
+
+   if (!opal_check_token(OPAL_PCI_GET_POWER_STATUS))
+   return -ENXIO;
+
+   rc = opal_pci_get_power_status(id, status);
+   return pnv_pci_poll(id, rc, status);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_power_status);
+
+int pnv_pci_get_presence_status(uint64_t id, uint8_t *status)
+{
+   long rc;
+
+   if (!opal_check_token(OPAL_PCI_GET_PRESENCE_STATUS))
+   return -ENXIO;
+
+   rc = opal_pci_get_presence_status(id, status);
+   return pnv_pci_poll(id, rc, status);
+}
+EXPORT_SYMBOL_GPL(pnv_pci_get_presence_status);
+
 #ifdef CONFIG_PCI_MSI
 static int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
-- 
1.8.3.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RESEND v2 2/7] powerpc/powernv: Issue fundamental reset if required

2015-02-16 Thread Gavin Shan
Function pnv_pci_reset_secondary_bus() is used to reset specified
PCI bus, which is leaded by root complex or PCI bridge. That means
the function shouldn't be called on PCI root bus and the patch
removes the logic for that case.

Also, some adapters may require fundamental reset to reload their
firmwares. The patch translates hot reset to fundamental reset
if required.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 36 +---
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 8cec57d..eeda6e1 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -811,18 +811,36 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int 
option)
return (rc == OPAL_SUCCESS) ? 0 : -EIO;
 }
 
-void pnv_pci_reset_secondary_bus(struct pci_dev *dev)
+static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
 {
-   struct pci_controller *hose;
+   int *freset = data;
 
-   if (pci_is_root_bus(dev-bus)) {
-   hose = pci_bus_to_host(dev-bus);
-   pnv_eeh_phb_reset(hose, EEH_RESET_HOT);
-   pnv_eeh_phb_reset(hose, EEH_RESET_DEACTIVATE);
-   } else {
-   pnv_eeh_bridge_reset(dev, EEH_RESET_HOT);
-   pnv_eeh_bridge_reset(dev, EEH_RESET_DEACTIVATE);
+   /*
+* Stop the iteration immediately if there is any
+* one PCI device requesting fundamental reset
+*/
+   *freset |= pdev-needs_freset;
+   return *freset;
+}
+
+void pnv_pci_reset_secondary_bus(struct pci_dev *pdev)
+{
+   int option = EEH_RESET_HOT;
+   int freset = 0;
+
+   /* In case we need fundamental reset */
+   if (pdev-subordinate) {
+   pci_walk_bus(pdev-subordinate,
+pnv_pci_dev_reset_type,
+freset);
+
+   if (freset)
+   option = EEH_RESET_FUNDAMENTAL;
}
+
+   /* Issue the requested type of reset */
+   pnv_eeh_bridge_reset(pdev, option);
+   pnv_eeh_bridge_reset(pdev, EEH_RESET_DEACTIVATE);
 }
 
 /**
-- 
1.8.3.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RESEND v2 1/7] powerpc/powernv: Use PCI slot reset infrastructure

2015-02-16 Thread Gavin Shan
For PowerNV platform, running on top of skiboot, all PE level reset
should be routed to firmware if the bridge of the PE primary bus has
device-node property ibm,reset-by-firmware. Otherwise, the kernel
has to issue hot reset on PE's primary bus despite the requested
reset types, which is the behaviour before the firmware supports PCI
slot reset. So the code doesn't depend on the PCI slot reset capability
exposed from the firmware.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h   |   1 +
 arch/powerpc/include/asm/opal.h  |   9 +-
 arch/powerpc/platforms/powernv/eeh-powernv.c | 218 ++-
 3 files changed, 114 insertions(+), 114 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index 55abfd0..9de87ce 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -193,6 +193,7 @@ enum {
 #define EEH_RESET_DEACTIVATE   0   /* Deactivate the PE reset  */
 #define EEH_RESET_HOT  1   /* Hot reset*/
 #define EEH_RESET_FUNDAMENTAL  3   /* Fundamental reset*/
+#define EEH_RESET_COMPLETE 4   /* PHB complete reset   */
 #define EEH_LOG_TEMP   1   /* EEH temporary error log  */
 #define EEH_LOG_PERM   2   /* EEH permanent error log  */
 
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 9ee0a30..ceec756 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -374,11 +374,6 @@ enum OpalPciResetState {
OPAL_ASSERT_RESET = 1
 };
 
-enum OpalPciMaskAction {
-   OPAL_UNMASK_ERROR_TYPE = 0,
-   OPAL_MASK_ERROR_TYPE = 1
-};
-
 enum OpalSlotLedType {
OPAL_SLOT_LED_ID_TYPE = 0,
OPAL_SLOT_LED_FAULT_TYPE = 1
@@ -867,7 +862,7 @@ int64_t opal_pci_map_pe_dma_window(uint64_t phb_id, 
uint16_t pe_number, uint16_t
 int64_t opal_pci_map_pe_dma_window_real(uint64_t phb_id, uint16_t pe_number,
uint16_t dma_window_number, uint64_t 
pci_start_addr,
uint64_t pci_mem_size);
-int64_t opal_pci_reset(uint64_t phb_id, uint8_t reset_scope, uint8_t 
assert_state);
+int64_t opal_pci_reset(uint64_t id, uint8_t reset_scope, uint8_t assert_state);
 
 int64_t opal_pci_get_hub_diag_data(uint64_t hub_id, void *diag_buffer,
   uint64_t diag_buffer_len);
@@ -883,7 +878,7 @@ int64_t opal_get_epow_status(__be64 *status);
 int64_t opal_set_system_attention_led(uint8_t led_action);
 int64_t opal_pci_next_error(uint64_t phb_id, __be64 *first_frozen_pe,
__be16 *pci_error_type, __be16 *severity);
-int64_t opal_pci_poll(uint64_t phb_id);
+int64_t opal_pci_poll(uint64_t id, uint8_t *val);
 int64_t opal_return_cpu(void);
 int64_t opal_check_token(uint64_t token);
 int64_t opal_reinit_cpus(uint64_t flags);
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index ede6906..8cec57d 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -665,12 +665,12 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int 
*delay)
return ret;
 }
 
-static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
+static s64 pnv_eeh_poll(uint64_t id)
 {
s64 rc = OPAL_HARDWARE;
 
while (1) {
-   rc = opal_pci_poll(phb-opal_id);
+   rc = opal_pci_poll(id, NULL);
if (rc = 0)
break;
 
@@ -686,84 +686,38 @@ static s64 pnv_eeh_phb_poll(struct pnv_phb *phb)
 int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 {
struct pnv_phb *phb = hose-private_data;
+   uint8_t scope;
s64 rc = OPAL_HARDWARE;
 
pr_debug(%s: Reset PHB#%x, option=%d\n,
 __func__, hose-global_number, option);
+   switch (option) {
+   case EEH_RESET_HOT:
+   scope = OPAL_RESET_PCI_HOT;
+   break;
+   case EEH_RESET_FUNDAMENTAL:
+   scope = OPAL_RESET_PCI_FUNDAMENTAL;
+   break;
+   case EEH_RESET_COMPLETE:
+   scope = OPAL_RESET_PHB_COMPLETE;
+   break;
+   case EEH_RESET_DEACTIVATE:
+   return 0;
+   default:
+   pr_warn(%s: Unsupported option %d\n,
+   __func__, option);
+   return -EINVAL;
+}
 
-   /* Issue PHB complete reset request */
-   if (option == EEH_RESET_FUNDAMENTAL ||
-   option == EEH_RESET_HOT)
-   rc = opal_pci_reset(phb-opal_id,
-   OPAL_RESET_PHB_COMPLETE,
-   OPAL_ASSERT_RESET);
-   else if (option == EEH_RESET_DEACTIVATE)
-   rc = opal_pci_reset(phb-opal_id,
-   OPAL_RESET_PHB_COMPLETE,
-   

[PATCH RESEND v2 3/7] powerpc/pci: Move pcibios_find_pci_bus() around

2015-02-16 Thread Gavin Shan
The patch moves pcibios_find_pci_bus() to PPC kerenl directory so
that it can be reused by hotplug code for pSeries and PowerNV
platform at the same time.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org
---
 arch/powerpc/kernel/pci-hotplug.c  | 36 ++
 arch/powerpc/platforms/pseries/pci_dlpar.c | 32 --
 2 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 5b78917..6e2b4e3 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -21,6 +21,42 @@
 #include asm/firmware.h
 #include asm/eeh.h
 
+static struct pci_bus *find_pci_bus(struct pci_bus *bus,
+   struct device_node *dn)
+{
+   struct pci_bus *tmp, *child = NULL;
+   struct device_node *busdn;
+
+   busdn = pci_bus_to_OF_node(bus);
+   if (busdn == dn)
+   return bus;
+
+   list_for_each_entry(tmp, bus-children, node) {
+   child = find_pci_bus(tmp, dn);
+   if (child)
+   break;
+   }
+
+   return child;
+}
+
+/**
+ * pcibios_find_pci_bus - find PCI bus according to the given device node
+ * @dn: Device node
+ *
+ * Find the corresponding PCI bus according to the given device node.
+ */
+struct pci_bus *pcibios_find_pci_bus(struct device_node *dn)
+{
+   struct pci_dn *pdn = PCI_DN(dn);
+
+   if (!pdn  || !pdn-phb || !pdn-phb-bus)
+   return NULL;
+
+   return find_pci_bus(pdn-phb-bus, dn);
+}
+EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
+
 /**
  * pcibios_release_device - release PCI device
  * @dev: PCI device
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c 
b/arch/powerpc/platforms/pseries/pci_dlpar.c
index 89e2381..98c50bc 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -32,38 +32,6 @@
 #include asm/firmware.h
 #include asm/eeh.h
 
-static struct pci_bus *
-find_bus_among_children(struct pci_bus *bus,
-struct device_node *dn)
-{
-   struct pci_bus *child = NULL;
-   struct pci_bus *tmp;
-   struct device_node *busdn;
-
-   busdn = pci_bus_to_OF_node(bus);
-   if (busdn == dn)
-   return bus;
-
-   list_for_each_entry(tmp, bus-children, node) {
-   child = find_bus_among_children(tmp, dn);
-   if (child)
-   break;
-   };
-   return child;
-}
-
-struct pci_bus *
-pcibios_find_pci_bus(struct device_node *dn)
-{
-   struct pci_dn *pdn = dn-data;
-
-   if (!pdn  || !pdn-phb || !pdn-phb-bus)
-   return NULL;
-
-   return find_bus_among_children(pdn-phb-bus, dn);
-}
-EXPORT_SYMBOL_GPL(pcibios_find_pci_bus);
-
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
struct pci_controller *phb;
-- 
1.8.3.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RESEND v2 7/7] PCI/hotplug: PowerPC PowerNV PCI hotplug driver

2015-02-16 Thread Gavin Shan
The patch intends to add standalone driver to support PCI hotplug
for PowerPC PowerNV platform, which runs on top of skiboot firmware.
The firmware identified hotpluggable slots and marked their device
tree node with proper ibm,slot-pluggable and ibm,reset-by-firmware.
The driver simply scans device-tree to create/register PCI hotplug slot
accordingly.

If the skiboot firmware doesn't support slot status retrieval, the PCI
slot device node shouldn't have property ibm,reset-by-firmware. In
that case, none of valid PCI slots will be detected from device tree.
The skiboot firmware doesn't export the capability to access attention
LEDs yet and it's something for TBD.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 drivers/pci/hotplug/Kconfig|  12 ++
 drivers/pci/hotplug/Makefile   |   4 +
 drivers/pci/hotplug/powernv_php.c  | 126 +++
 drivers/pci/hotplug/powernv_php.h  |  70 ++
 drivers/pci/hotplug/powernv_php_slot.c | 382 +
 5 files changed, 594 insertions(+)
 create mode 100644 drivers/pci/hotplug/powernv_php.c
 create mode 100644 drivers/pci/hotplug/powernv_php.h
 create mode 100644 drivers/pci/hotplug/powernv_php_slot.c

diff --git a/drivers/pci/hotplug/Kconfig b/drivers/pci/hotplug/Kconfig
index df8caec..ef55dae 100644
--- a/drivers/pci/hotplug/Kconfig
+++ b/drivers/pci/hotplug/Kconfig
@@ -113,6 +113,18 @@ config HOTPLUG_PCI_SHPC
 
  When in doubt, say N.
 
+config HOTPLUG_PCI_POWERNV
+   tristate PowerPC PowerNV PCI Hotplug driver
+   depends on PPC_POWERNV  EEH
+   help
+ Say Y here if you run PowerPC PowerNV platform that supports
+  PCI Hotplug
+
+ To compile this driver as a module, choose M here: the
+ module will be called powernv-php.
+
+ When in doubt, say N.
+
 config HOTPLUG_PCI_RPA
tristate RPA PCI Hotplug driver
depends on PPC_PSERIES  EEH
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index 4a9aa08..a69665e 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_HOTPLUG_PCI_PCIE)+= pciehp.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_ZT5550)  += cpcihp_zt5550.o
 obj-$(CONFIG_HOTPLUG_PCI_CPCI_GENERIC) += cpcihp_generic.o
 obj-$(CONFIG_HOTPLUG_PCI_SHPC) += shpchp.o
+obj-$(CONFIG_HOTPLUG_PCI_POWERNV)  += powernv-php.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA)  += rpaphp.o
 obj-$(CONFIG_HOTPLUG_PCI_RPA_DLPAR)+= rpadlpar_io.o
 obj-$(CONFIG_HOTPLUG_PCI_SGI)  += sgi_hotplug.o
@@ -50,6 +51,9 @@ ibmphp-objs   :=  ibmphp_core.o   \
 acpiphp-objs   :=  acpiphp_core.o  \
acpiphp_glue.o
 
+powernv-php-objs   :=  powernv_php.o   \
+   powernv_php_slot.o
+
 rpaphp-objs:=  rpaphp_core.o   \
rpaphp_pci.o\
rpaphp_slot.o
diff --git a/drivers/pci/hotplug/powernv_php.c 
b/drivers/pci/hotplug/powernv_php.c
new file mode 100644
index 000..e36eaf1
--- /dev/null
+++ b/drivers/pci/hotplug/powernv_php.c
@@ -0,0 +1,126 @@
+/*
+ * PCI Hotplug Driver for PowerPC PowerNV platform.
+ *
+ * Copyright Gavin Shan, IBM Corporation 2015.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include linux/kernel.h
+#include linux/module.h
+#include linux/sysfs.h
+#include linux/pci.h
+#include linux/pci_hotplug.h
+#include linux/string.h
+#include linux/slab.h
+#include asm/opal.h
+#include asm/pnv-pci.h
+
+#include powernv_php.h
+
+#define DRIVER_VERSION 0.1
+#define DRIVER_AUTHOR  Gavin Shan, IBM Corporation
+#define DRIVER_DESCPowerPC PowerNV PCI Hotplug Driver
+
+static int powernv_php_register_one(struct device_node *dn)
+{
+   struct powernv_php_slot *slot;
+   const __be32 *prop32;
+   int ret;
+
+   /* Check if it's hotpluggable slot */
+   prop32 = of_get_property(dn, ibm,slot-pluggable, NULL);
+   if (!prop32 || !of_read_number(prop32, 1))
+   return 0;
+
+   prop32 = of_get_property(dn, ibm,reset-by-firmware, NULL);
+   if (!prop32 || !of_read_number(prop32, 1))
+   return 0;
+
+   /* Allocate slot */
+   slot = powernv_php_slot_alloc(dn);
+   if (!slot)
+   return -ENODEV;
+
+   /* Register it */
+   ret = powernv_php_slot_register(slot);
+   if (ret) {
+   powernv_php_slot_put(slot);
+   return ret;
+   }
+
+   return powernv_php_slot_enable(slot-php_slot, false);
+}
+
+int powernv_php_register(struct device_node *dn)
+{
+   struct device_node *child;
+   int ret = 0;
+
+   for_each_child_of_node(dn, child) {
+   ret = 

[PATCH RESEND v2 4/7] powerpc/pci: Don't scan empty slot

2015-02-16 Thread Gavin Shan
In hotplug case, function pcibios_add_pci_devices() is called to
rescan the specified PCI bus, which might not have any child devices.
Access to the PCI bus's child device node will cause kernel crash
without exception. The patch adds condition of skipping scanning
PCI bus without child devices, in order to avoid kernel crash.

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/pci-hotplug.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 6e2b4e3..270a26d 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -120,7 +120,8 @@ void pcibios_add_pci_devices(struct pci_bus * bus)
if (mode == PCI_PROBE_DEVTREE) {
/* use ofdt-based probe */
of_rescan_bus(dn, bus);
-   } else if (mode == PCI_PROBE_NORMAL) {
+   } else if (mode == PCI_PROBE_NORMAL 
+  dn-child  PCI_DN(dn-child)) {
/*
 * Use legacy probe. In the partial hotplug case, we
 * probably have grandchildren devices unplugged. So
-- 
1.8.3.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RESEND v2 5/7] powerpc/powernv: Introduce pnv_pci_poll()

2015-02-16 Thread Gavin Shan
We might not get some PCI slot information (e.g. power status)
immediately by OPAL API. Instead, opal_pci_poll() need to be called
for the required information.

The patch introduces pnv_pci_poll(), which bases on original
pnv_eeh_poll(), to cover the above case

Signed-off-by: Gavin Shan gws...@linux.vnet.ibm.com
---
 arch/powerpc/platforms/powernv/eeh-powernv.c | 28 ++--
 arch/powerpc/platforms/powernv/pci.c | 19 +++
 arch/powerpc/platforms/powernv/pci.h |  1 +
 3 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index eeda6e1..81a9cb2 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -665,24 +665,6 @@ static int pnv_eeh_get_state(struct eeh_pe *pe, int *delay)
return ret;
 }
 
-static s64 pnv_eeh_poll(uint64_t id)
-{
-   s64 rc = OPAL_HARDWARE;
-
-   while (1) {
-   rc = opal_pci_poll(id, NULL);
-   if (rc = 0)
-   break;
-
-   if (system_state  SYSTEM_RUNNING)
-   udelay(1000 * rc);
-   else
-   msleep(rc);
-   }
-
-   return rc;
-}
-
 int pnv_eeh_phb_reset(struct pci_controller *hose, int option)
 {
struct pnv_phb *phb = hose-private_data;
@@ -711,10 +693,7 @@ int pnv_eeh_phb_reset(struct pci_controller *hose, int 
option)
 
/* Issue reset and poll until it's completed */
rc = opal_pci_reset(phb-opal_id, scope, OPAL_ASSERT_RESET);
-   if (rc  0)
-   rc = pnv_eeh_poll(phb-opal_id);
-
-   return (rc == OPAL_SUCCESS) ? 0 : -EIO;
+   return pnv_pci_poll(phb-opal_id, rc, NULL);
 }
 
 static int __pnv_eeh_bridge_reset(struct pci_dev *dev, int option)
@@ -805,10 +784,7 @@ static int pnv_eeh_bridge_reset(struct pci_dev *dev, int 
option)
phb = hose-private_data;
id |= (dev-bus-number  24) | (dev-devfn  16) | phb-opal_id;
rc = opal_pci_reset(id, scope, OPAL_ASSERT_RESET);
-   if (rc  0)
-   pnv_eeh_poll(id);
-
-   return (rc == OPAL_SUCCESS) ? 0 : -EIO;
+   return pnv_pci_poll(id, rc, NULL);
 }
 
 static int pnv_pci_dev_reset_type(struct pci_dev *pdev, void *data)
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index c68d508..a0ffae2 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -44,6 +44,25 @@
 #define cfg_dbg(fmt...)do { } while(0)
 //#define cfg_dbg(fmt...)  printk(fmt)
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval)
+{
+   while (rval  0) {
+   rval = opal_pci_poll(id, pval);
+   if (rval == OPAL_SUCCESS  pval)
+   rval = opal_pci_poll(id, pval);
+
+   if (rval = 0)
+   break;
+
+   if (system_state  SYSTEM_RUNNING)
+   udelay(1000 * rval);
+   else
+   msleep(rval);
+   }
+
+   return rval ? -EIO : 0;
+}
+
 #ifdef CONFIG_PCI_MSI
 static int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type)
 {
diff --git a/arch/powerpc/platforms/powernv/pci.h 
b/arch/powerpc/platforms/powernv/pci.h
index 18ae927..19f532b 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -194,6 +194,7 @@ struct pnv_phb {
 
 extern struct pci_ops pnv_pci_ops;
 
+int pnv_pci_poll(uint64_t id, int64_t rval, uint8_t *pval);
 void pnv_pci_dump_phb_diag_data(struct pci_controller *hose,
unsigned char *log_buff);
 int pnv_pci_cfg_read(struct device_node *dn,
-- 
1.8.3.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

powerpc: fsl,pq2-localbus is broken for ages

2015-02-16 Thread Nikita Yushchenko
Hi

arch/powerpc/sysdev/fsl_lbc.c driver claims that it is compatible with
fsl,pq2-localbus and fsl,pq2pro-localbus

arch/powerpc/sysdev/fsl_lbc.c driver requires irq configured in device
tree since very old times (commit 3ab8f2a2 from 2.6.37 times)

None of in-tree dts files that have either fsl,pq2-localbus or
fsl,pq2pro-localbus, ever defined irq for that. As far as I understand,
that hardware did not have an irq.

This leads to problems running recent kernels on legacy Freescale boards.

I don't know if support for legacy boards worths fixing. Especially
since nobody cared for years. But maybe at least remove these
compatible lines from the drivers (and perhaps completely remove
broken support for pq2 boards at all)?

Nikita
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] tick/hotplug: Handover time related duties before cpu offline

2015-02-16 Thread Michael Ellerman
On Sat, 2015-01-31 at 09:44 +0530, Preeti U Murthy wrote:
 These duties include do_timer to update jiffies and broadcast wakeups on those
 platforms which do not have an external device to handle wakeup of cpus from 
 deep
 idle states. The handover of these duties is not robust against a cpu offline
 operation today.
 
 The do_timer duty is handed over in the CPU_DYING phase today to one of the 
 online
 cpus. This relies on the fact that *all* cpus participate in stop_machine 
 phase.
 But if this design is to change in the future, i.e. if all cpus are not
 required to participate in stop_machine, the freshly nominated do_timer cpu
 could be idle at the time of handover. In that case, unless its interrupted,
 it will not wakeup to update jiffies and timekeeping will hang.
 
 With regard to broadcast wakeups, today if the cpu handling broadcast of 
 wakeups
 goes offline, the job of broadcasting is handed over to another cpu in the 
 CPU_DEAD
 phase. The CPU_DEAD notifiers are run only after the offline cpu sets its 
 state as
 CPU_DEAD. Meanwhile, the kthread doing the offline is scheduled out while 
 waiting for
 this transition by queuing a timer. This is fatal because if the cpu on which
 this kthread was running has no other work queued on it, it can re-enter deep
 idle state, since it sees that a broadcast cpu still exists. However the 
 broadcast
 wakeup will never come since the cpu which was handling it is offline, and 
 the cpu
 on which the kthread doing the hotplug operation was running never wakes up 
 to see
 this because its in deep idle state.
 
 Fix these issues by handing over the do_timer and broadcast wakeup duties 
 just before
 the offline cpu kills itself, to the cpu performing the hotplug operation. 
 Since the
 cpu performing the hotplug operation is up and running, it becomes aware of 
 the handover
 of do_timer duty and queues the broadcast timer upon itself so as to 
 seamlessly
 continue both these operations.
 
 It fixes the bug reported here:
 http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html
 
 Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com
 ---
 Changes from V3: https://lkml.org/lkml/2015/1/20/236
 1. Move handover of broadcast duty away from CPU_DYING phase to just before
 the cpu kills itself.
 2. Club the handover of timekeeping duty along with broadcast duty to make
 timekeeping robust against hotplug.

Hi Preeti,

This bug is still causing breakage for people on Power8 machines.

Are we just waiting for Thomas to take the patch?

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC v2 01/10] fsl_bman: Add drivers for the Freescale DPAA BMan

2015-02-16 Thread Scott Wood
On Mon, 2015-02-16 at 09:46 -0600, Emil Medve wrote:
 From: Geoff Thorpe geoff.tho...@freescale.com
 
 Change-Id: I075944acf740dbaae861104c17a9ff7247dec1be
 Signed-off-by: Geoff Thorpe geoff.tho...@freescale.com

Remove Change-Id.
Provide a description of what BMan is.

 diff --git a/drivers/soc/freescale/Kconfig b/drivers/soc/freescale/Kconfig
 new file mode 100644
 index 000..63bf002
 --- /dev/null
 +++ b/drivers/soc/freescale/Kconfig
 @@ -0,0 +1,51 @@
 +menuconfig FSL_DPA
 + tristate Freescale DPAA Buffer management
 + depends on HAS_FSL_QBMAN
 + default y

default y is normally a bad idea for drivers (though it can make sense
for options within a driver).

Where is HAS_FSL_QBMAN defined (if the answer is in a later patch,
rearrange it)?

 +if FSL_DPA
 +
 +config FSL_DPA_CHECKING
 + bool additional driver checking

Option texts are usually capitalized.

 + default n
 + ---help---
 +   Compiles in additional checks to sanity-check the drivers and any
 +   use of it by other code. Not recommended for performance.
 +
 +config FSL_DPA_CAN_WAIT
 + bool
 + default y
 +config FSL_DPA_CAN_WAIT_SYNC
 + bool
 + default y
 +
 +config FSL_DPA_PIRQ_FAST
 + bool
 + default y

Use consistent whitespace.

Add help texts to these options to document them, even if they're not
user-visible.

 +config FSL_BMAN_CONFIG
 + bool BMan device management
 + default y
 + ---help---
 +   If this linux image is running natively, you need this option. If this
 +   linux image is running as a guest OS under the hypervisor, only one
 +   guest OS (the control plane) needs this option.

the hypervisor?  I suspect this refers to the Freescale Embedded
Hypervisor (a.k.a. Topaz), rather than KVM or something else.

It would also be nice if the help text explained what the option does,
rather than just when it's needed.

 +struct bman;

Where is this struct defined?

 +union bman_ecir {
 + u32 ecir_raw;
 + struct {
 + u32 __reserved1:4;
 + u32 portal_num:4;
 + u32 __reserved2:12;
 + u32 numb:4;
 + u32 __reserved3:2;
 + u32 pid:6;
 + } __packed info;
 +};

Get rid of __packed.  It causes GCC to generate terrible code with
bitfields.  For that matter, consider getting rid of the bitfields.

 +#define BMAN_HWE_TXT(a, b) { .mask = BM_EIRQ_##a, .txt = b }
 +
 +static const struct bman_hwerr_txt bman_hwerr_txts[] = {
 + BMAN_HWE_TXT(IVCI, Invalid Command Verb),
 + BMAN_HWE_TXT(FLWI, FBPR Low Watermark),
 + BMAN_HWE_TXT(MBEI, Multi-bit ECC Error),
 + BMAN_HWE_TXT(SBEI, Single-bit ECC Error),
 + BMAN_HWE_TXT(BSCN, Pool State Change Notification),
 +};
 +#define BMAN_HWE_COUNT (sizeof(bman_hwerr_txts)/sizeof(struct 
 bman_hwerr_txt))

Use ARRAY_SIZE(), here and elsewhere.

 +/* Add this in Kconfig */
 +#define BMAN_ERRS_TO_UNENABLE (BM_EIRQ_FLWI)

Add what to kconfig?

 +/**
 + * bm_err_isr_reg_verb - Manipulate global interrupt registers
 + * @v: for accessors that write values, this is the 32-bit value
 + *
 + * Manipulates BMAN_ERR_ISR, BMAN_ERR_IER, BMAN_ERR_ISDR, BMAN_ERR_IIR. All
 + * manipulations except bm_err_isr_[un]inhibit() use 32-bit masks composed of
 + * the BM_EIRQ_*** definitions. Note that bm_err_isr_enable_write means
 + * write the enable register rather than enable the write register!
 + */
 +#define bm_err_isr_status_read(bm)   \
 + __bm_err_isr_read(bm, bm_isr_status)
 +#define bm_err_isr_status_clear(bm, m)   \
 + __bm_err_isr_write(bm, bm_isr_status, m)
 +#define bm_err_isr_enable_read(bm)   \
 + __bm_err_isr_read(bm, bm_isr_enable)
 +#define bm_err_isr_enable_write(bm, v)   \
 + __bm_err_isr_write(bm, bm_isr_enable, v)
 +#define bm_err_isr_disable_read(bm)  \
 + __bm_err_isr_read(bm, bm_isr_disable)
 +#define bm_err_isr_disable_write(bm, v)  \
 + __bm_err_isr_write(bm, bm_isr_disable, v)
 +#define bm_err_isr_inhibit(bm)   \
 + __bm_err_isr_write(bm, bm_isr_inhibit, 1)
 +#define bm_err_isr_uninhibit(bm) \
 + __bm_err_isr_write(bm, bm_isr_inhibit, 0)

Is this layer really helpful?

 +/*
 + * TODO: unimplemented registers
 + *
 + * BMAN_POOLk_SDCNT, BMAN_POOLk_HDCNT, BMAN_FULT,
 + * BMAN_VLDPL, BMAN_EECC, BMAN_SBET, BMAN_EINJ
 + */

What does it mean for registers to be unimplemented, in a piece of
software?  If you mean accessors to those registers, why is that needed
if nothing uses them (yet)?

 +/* Encapsulate struct bman * as a cast of the register space address. */
 +
 +static struct bman *bm_create(void *regs)
 +{
 + return (struct bman *)regs;
 +}

Unnecessary cast -- and unnecessary encapsulation (especially since you
only use it once).  It's also missing __iomem -- have you run sparse on
this?

 +static u32 __generate_thresh(u32 val, int roundup)
 +{
 + u32 e = 0;  /* co-efficient, exponent */
 +   

[PATCH v1 5/7] AES for PPC/SPE - ECB/CBC/CTR/XTS modes

2015-02-16 Thread Markus Stockhausen
[PATCH v1 5/7] AES for PPC/SPE - ECB/CBC/CTR/XTS modes

The assembler block cipher module that controls the core
AES functions.

Signed-off-by: Markus Stockhausen stockhau...@collogia.de

diff --git a/arch/powerpc/crypto/aes-spe-modes.S 
b/arch/powerpc/crypto/aes-spe-modes.S
new file mode 100644
index 000..1141841
--- /dev/null
+++ b/arch/powerpc/crypto/aes-spe-modes.S
@@ -0,0 +1,630 @@
+/*
+ * AES modes (ECB/CBC/CTR/XTS) for PPC AES implementation
+ *
+ * Copyright (c) 2015 Markus Stockhausen stockhau...@collogia.de
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include asm/ppc_asm.h
+#include aes-spe-regs.h
+
+#ifdef __BIG_ENDIAN__  /* Macros for big endian builds */
+
+#define LOAD_DATA(reg, off) \
+   lwz reg,off(rSP);   /* load with offset */
+#define SAVE_DATA(reg, off) \
+   stw reg,off(rDP);   /* save with offset */
+#define NEXT_BLOCK \
+   addirSP,rSP,16; /* increment pointers per bloc  */ \
+   addirDP,rDP,16;
+#define LOAD_IV(reg, off) \
+   lwz reg,off(rIP);   /* IV loading with offset   */
+#define SAVE_IV(reg, off) \
+   stw reg,off(rIP);   /* IV saving with offset*/
+#define START_IV   /* nothing to reset */
+#define CBC_DEC 16 /* CBC decrement per block  */
+#define CTR_DEC 1  /* CTR decrement one byte   */
+
+#else  /* Macros for little endian */
+
+#define LOAD_DATA(reg, off) \
+   lwbrx   reg,0,rSP;  /* load reversed*/ \
+   addirSP,rSP,4;  /* and increment pointer*/
+#define SAVE_DATA(reg, off) \
+   stwbrx  reg,0,rDP;  /* save reversed*/ \
+   addirDP,rDP,4;  /* and increment pointer*/
+#define NEXT_BLOCK /* nothing todo */
+#define LOAD_IV(reg, off) \
+   lwbrx   reg,0,rIP;  /* load reversed*/ \
+   addirIP,rIP,4;  /* and increment pointer*/
+#define SAVE_IV(reg, off) \
+   stwbrx  reg,0,rIP;  /* load reversed*/ \
+   addirIP,rIP,4;  /* and increment pointer*/
+#define START_IV \
+   subirIP,rIP,16; /* must reset pointer   */
+#define CBC_DEC 32 /* 2 blocks because of incs */
+#define CTR_DEC 17 /* 1 block because of incs  */
+
+#endif
+
+#define SAVE_0_REGS
+#define LOAD_0_REGS
+
+#define SAVE_4_REGS \
+   stw rI0,96(r1); /* save 32 bit registers*/ \
+   stw rI1,100(r1);   \
+   stw rI2,104(r1);   \
+   stw rI3,108(r1);
+
+#define LOAD_4_REGS \
+   lwz rI0,96(r1); /* restore 32 bit registers */ \
+   lwz rI1,100(r1);   \
+   lwz rI2,104(r1);   \
+   lwz rI3,108(r1);
+
+#define SAVE_8_REGS \
+   SAVE_4_REGS\
+   stw rG0,112(r1);/* save 32 bit registers*/ \
+   stw rG1,116(r1);   \
+   stw rG2,120(r1);   \
+   stw rG3,124(r1);
+
+#define LOAD_8_REGS \
+   LOAD_4_REGS\
+   lwz rG0,112(r1);/* restore 32 bit registers */ \
+   lwz rG1,116(r1);   \
+   lwz rG2,120(r1);   \
+   lwz rG3,124(r1);
+
+#define INITIALIZE_CRYPT(tab,nr32bitregs) \
+   mflrr0;\
+   stwur1,-160(r1);/* create stack frame   */ \
+   lis rT0,tab@h;  /* en-/decryption table pointer */ \
+   stw r0,8(r1);   /* save link register   */ \
+   ori rT0,rT0,tab@l; \
+   evstdw  r14,16(r1);\
+   mr  rKS,rKP;   \
+   evstdw  r15,24(r1); /* We must save non volatile*/ \
+   evstdw  r16,32(r1); /* registers. Take the chance   */ \
+   evstdw  r17,40(r1); /* 

[PATCH v1 4/7] AES for PPC/SPE - key handling

2015-02-16 Thread Markus Stockhausen
[PATCH v1 4/7] AES for PPC/SPE - key handling

Key generation for big endian core routines.

Signed-off-by: Markus Stockhausen stockhau...@collogia.de

diff --git a/arch/powerpc/crypto/aes-spe-keys.S 
b/arch/powerpc/crypto/aes-spe-keys.S
new file mode 100644
index 000..55b258c
--- /dev/null
+++ b/arch/powerpc/crypto/aes-spe-keys.S
@@ -0,0 +1,283 @@
+/*
+ * Key handling functions for PPC AES implementation
+ *
+ * Copyright (c) 2015 Markus Stockhausen stockhau...@collogia.de
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include asm/ppc_asm.h
+
+#ifdef __BIG_ENDIAN__
+#define LOAD_KEY(d, s, off) \
+   lwz d,off(s);
+#else
+#define LOAD_KEY(d, s, off) \
+   li  r0,off; \
+   lwbrx   d,s,r0;
+#endif
+
+#define INITIALIZE_KEY \
+   stwur1,-32(r1); /* create stack frame   */ \
+   stw r14,8(r1);  /* save registers   */ \
+   stw r15,12(r1);\
+   stw r16,16(r1);
+
+#define FINALIZE_KEY \
+   lwz r14,8(r1);  /* restore registers*/ \
+   lwz r15,12(r1);\
+   lwz r16,16(r1);\
+   xor r5,r5,r5;   /* clear sensitive data */ \
+   xor r6,r6,r6;  \
+   xor r7,r7,r7;  \
+   xor r8,r8,r8;  \
+   xor r9,r9,r9;  \
+   xor r10,r10,r10;   \
+   xor r11,r11,r11;   \
+   xor r12,r12,r12;   \
+   addir1,r1,32;   /* cleanup stack*/
+
+#define LS_BOX(r, t1, t2) \
+   lis t2,PPC_AES_4K_ENCTAB@h;\
+   ori t2,t2,PPC_AES_4K_ENCTAB@l; \
+   rlwimi  t2,r,4,20,27;  \
+   lbz t1,8(t2);  \
+   rlwimi  r,t1,0,24,31;  \
+   rlwimi  t2,r,28,20,27; \
+   lbz t1,8(t2);  \
+   rlwimi  r,t1,8,16,23;  \
+   rlwimi  t2,r,20,20,27; \
+   lbz t1,8(t2);  \
+   rlwimi  r,t1,16,8,15;  \
+   rlwimi  t2,r,12,20,27; \
+   lbz t1,8(t2);  \
+   rlwimi  r,t1,24,0,7;
+
+#define GF8_MUL(out, in, t1, t2) \
+   lis t1,0x8080;  /* multiplication in GF8*/ \
+   ori t1,t1,0x8080;  \
+   and t1,t1,in;  \
+   srwi t1,t1,7;  \
+   mulli t1,t1,0x1b;  \
+   lis t2,0x7f7f; \
+   ori t2,t2,0x7f7f;  \
+   and t2,t2,in;  \
+   slwi t2,t2,1;  \
+   xor out,t1,t2;
+
+/*
+ * ppc_expand_key_128(u32 *key_enc, const u8 *key)
+ *
+ * Expand 128 bit key into 176 bytes encryption key. It consists of
+ * key itself plus 10 rounds with 16 bytes each
+ *
+ */
+_GLOBAL(ppc_expand_key_128)
+   INITIALIZE_KEY
+   LOAD_KEY(r5,r4,0)
+   LOAD_KEY(r6,r4,4)
+   LOAD_KEY(r7,r4,8)
+   LOAD_KEY(r8,r4,12)
+   stw r5,0(r3)/* key[0..3] = input data   */
+   stw r6,4(r3)
+   stw r7,8(r3)
+   stw r8,12(r3)
+   li  r16,10  /* 10 expansion rounds  */
+   lis r0,0x0100   /* RCO(1)   */
+ppc_expand_128_loop:
+   addir3,r3,16
+   mr  r14,r8  /* apply LS_BOX to 4th temp */
+   rotlwi  r14,r14,8
+   LS_BOX(r14, r15, r4)
+   xor r14,r14,r0
+   xor r5,r5,r14   /* xor next 4 

RE: [PATCH v1 2/7] AES for PPC/SPE - aes tables

2015-02-16 Thread David Laight
From:  Markus Stockhausen
 4K AES tables for big endian

I can't help feeling that you could give more information about how the
values are generated.

...
 + * These big endian AES encryption/decryption tables are designed to be 
 simply
 + * accessed by a combination of rlwimi/lwz instructions with a minimum
 + * of table registers (usually only one required). Thus they are aligned to
 + * 4K. The locality of rotated values is derived from the reduced offsets 
 that
 + * are available in the SPE load instructions. E.g. evldw, evlwwsplat, ...
 + *
 + */
 +.data
 +.align 12
 +.globl PPC_AES_4K_ENCTAB
 +PPC_AES_4K_ENCTAB:
 + .long 0xc66363a5,0xa5c66363,0x63a5c663,0x6363a5c6

These seem to be byte rotates (all down the table).
If so then use a CPP define to generate the rotated values.

I'd like to see a reference to where the values themselves come from.

 + .long 0xf87c7c84,0x84f87c7c,0x7c84f87c,0x7c7c84f8
...
 + .long 0x6dd6,0xd66d,0xbbd66dbb,0xd66d
 + .long 0x2c16163a,0x3a2c1616,0x163a2c16,0x16163a2c
 +.globl PPC_AES_4K_DECTAB
 +PPC_AES_4K_DECTAB:
 + .long 0x51f4a750,0x5051f4a7,0xa75051f4,0xf4a75051
...
 + .long 0xd0b85742,0x42d0b857,0x5742d0b8,0xb85742d0

Some explanation of this third dataset is also needed.

 + .byte 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38
...
 + .byte 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d

David

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/5] PCI Hotplug Driver for PowerPC PowerNV

2015-02-16 Thread Gavin Shan
On Thu, Dec 04, 2014 at 04:54:43PM +1100, Gavin Shan wrote:

Please ignore this one. I'll rebase the repost new revision later.

Thanks,
Gavin

The series of patches depends on the OPAL firmware changes. If the firmware
doesn't have the changes, PCI hotplug slots won't be populated properly.
Other than that, no more problems found.

A new driver powernv-php.ko is introduced by the patchset to support
PCI hotplug for PowerNV platform. The device tree is scanned when the
driver is loaded. If any PCI device node is equipped with property ibm,
slot-pluggable and ibm,reset-by-firmware, it's regarded as hotpluggable
slot and the driver creates/registers slot for it. After that, the sysfs
entries can be used to operate the slot.

PATCH[1-4]: Necessary code changes to PPC PCI subsystem in order to
support PCI slots for PPC PowerNV platform.
PATCH[5]  : powernv-php driver to support PCI hotplug for PowerNV
platform.

Testing
===
# cat /proc/cpuinfo | grep -i powernv
platform: PowerNV
machine : PowerNV 8286-41A

# pwd
/sys/bus/pci/slots
# ls
C10  C11  C12  C14  C15  C6  C7  C8  C9

# lspci -s 0003::.
0003:00:00.0 PCI bridge: IBM Device 03dc
0003:01:00.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:01.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:08.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:09.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:10.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:11.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 
xHCI Host Controller (rev 02)
0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:0f:00.0 Network controller: Mellanox Technologies MT27500 Family 
[ConnectX-3]
# pwd
/sys/bus/pci/slots/C10
# cat address
0003:09:00
# cat cur_bus_speed
5.0 GT/s PCIe
# cat max_bus_speed
8.0 GT/s PCIe
# cat power
1
# echo 0  power
# lspci -s 0003::.
0003:00:00.0 PCI bridge: IBM Device 03dc
0003:01:00.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:01.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:08.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:09.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:10.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:11.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 
xHCI Host Controller (rev 02)
0003:0f:00.0 Network controller: Mellanox Technologies MT27500 Family 
[ConnectX-3]
# echo 1  power
# lspci -s 0003::.
0003:00:00.0 PCI bridge: IBM Device 03dc
0003:01:00.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:01.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:08.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:09.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:10.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:02:11.0 PCI bridge: PLX Technology, Inc. Device 8748 (rev ca)
0003:03:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 
xHCI Host Controller (rev 02)
0003:09:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.2 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:09:00.3 Ethernet controller: Broadcom Corporation NetXtreme BCM5719 
Gigabit Ethernet PCIe (rev 01)
0003:0f:00.0 Network controller: Mellanox Technologies MT27500 Family 
[ConnectX-3]

Changelog
=
v1 - v2:
* Separate powernv-php driver to support PCI hotplug for
 PowerNV platform.
   * Check if the OPAL API supported by firmware before calling
  into it, which is necessary for back-compability.
* Separate patch for factoring pnv_pci_poll().

Gavin Shan (5):
  powerpc/pci: Move pcibios_find_pci_bus() around
  powerpc/pci: Don't scan empty slot
  powerpc/powernv: Introduce pnv_pci_poll()
  powerpc/powernv: Functions to retrieve PCI slot status
  PCI/hotplug: PowerPC PowerNV PCI hotplug driver

 arch/powerpc/include/asm/opal.h|   4 +
 arch/powerpc/include/asm/pnv-pci.h |   3 +
 arch/powerpc/kernel/pci-hotplug.c  |  39 ++-
 arch/powerpc/platforms/powernv/eeh-ioda.c  |  28 +-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
 arch/powerpc/platforms/powernv/pci.c   |  43 +++

Re: [PATCH V4] tick/hotplug: Handover time related duties before cpu offline

2015-02-16 Thread Preeti U Murthy
On 02/17/2015 07:28 AM, Michael Ellerman wrote:
 On Sat, 2015-01-31 at 09:44 +0530, Preeti U Murthy wrote:
 These duties include do_timer to update jiffies and broadcast wakeups on 
 those
 platforms which do not have an external device to handle wakeup of cpus from 
 deep
 idle states. The handover of these duties is not robust against a cpu offline
 operation today.

 The do_timer duty is handed over in the CPU_DYING phase today to one of the 
 online
 cpus. This relies on the fact that *all* cpus participate in stop_machine 
 phase.
 But if this design is to change in the future, i.e. if all cpus are not
 required to participate in stop_machine, the freshly nominated do_timer cpu
 could be idle at the time of handover. In that case, unless its interrupted,
 it will not wakeup to update jiffies and timekeeping will hang.

 With regard to broadcast wakeups, today if the cpu handling broadcast of 
 wakeups
 goes offline, the job of broadcasting is handed over to another cpu in the 
 CPU_DEAD
 phase. The CPU_DEAD notifiers are run only after the offline cpu sets its 
 state as
 CPU_DEAD. Meanwhile, the kthread doing the offline is scheduled out while 
 waiting for
 this transition by queuing a timer. This is fatal because if the cpu on which
 this kthread was running has no other work queued on it, it can re-enter deep
 idle state, since it sees that a broadcast cpu still exists. However the 
 broadcast
 wakeup will never come since the cpu which was handling it is offline, and 
 the cpu
 on which the kthread doing the hotplug operation was running never wakes up 
 to see
 this because its in deep idle state.

 Fix these issues by handing over the do_timer and broadcast wakeup duties 
 just before
 the offline cpu kills itself, to the cpu performing the hotplug operation. 
 Since the
 cpu performing the hotplug operation is up and running, it becomes aware of 
 the handover
 of do_timer duty and queues the broadcast timer upon itself so as to 
 seamlessly
 continue both these operations.

 It fixes the bug reported here:
 http://linuxppc.10917.n7.nabble.com/offlining-cpus-breakage-td88619.html

 Signed-off-by: Preeti U Murthy pre...@linux.vnet.ibm.com
 ---
 Changes from V3: https://lkml.org/lkml/2015/1/20/236
 1. Move handover of broadcast duty away from CPU_DYING phase to just before
 the cpu kills itself.
 2. Club the handover of timekeeping duty along with broadcast duty to make
 timekeeping robust against hotplug.
 
 Hi Preeti,
 
 This bug is still causing breakage for people on Power8 machines.
 
 Are we just waiting for Thomas to take the patch?

Hi mpe,

Thomas has included the patch for fixing this issue in a recent patchset
that he posted for cleaning up tick/clockevents related code.

https://lkml.org/lkml/2015/2/16/213. I think it will go into this
merge-window. There are a couple of issues there, once that is fixed I
will remind him to mark it for stable.

Regards
Preeti U Murthy

 
 cheers
 
 
 ___
 Linuxppc-dev mailing list
 Linuxppc-dev@lists.ozlabs.org
 https://lists.ozlabs.org/listinfo/linuxppc-dev
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/3] powerpc/pci: PCI slot unified reset

2015-02-16 Thread Gavin Shan
On Thu, Dec 04, 2014 at 04:50:50PM +1100, Gavin Shan wrote:

Please ignore this one. I'll rebase and send out a new revision
shortly.

Thanks,
Gavin

The patchset corresponds to skiboot changes, which manages PCI slots
in a unified way: OPAL APIs used to do slot reset, power management,
presence status retrival. The patchset shouldn't be merged before
the OPAL firmware counterpart is merged.

The kernel changes have been split into 2 parts: (A) Use the unified
PCI slot reset OPAL API - this patchset; (B) powernv-php driver to
support PCI hotplug for PowerNV platform, which will be sent separately.

This patchset affects EEH, PCI reset logics, which is used mainly by
VFIO. A couple of cases have been tested out on P7/P8 boxes and looks
good: EEH on PowerNV, EEH support for guest, PCI passthrou.

Changelog
=
v1 - v2
* Keep opal_pci_reinit(). In case the slot is resetted by kernel,
  instead of skiboot, this API should be called to restore states
  for those affected devices.
* Reworked slot ID scheme so that old/new kernel can work with
  skiboot with or without unified PCI slot management support.
* Code cleanup here and there.

Gavin Shan (3):
  powerpc/powernv: Use PCI slot reset infrastructure
  powerpc/powernv: Refactor ioda_eeh_reset()
  powerpc/powernv: Issue fundamental reset if required

 arch/powerpc/include/asm/eeh.h|   1 +
 arch/powerpc/include/asm/opal.h   |   9 +-
 arch/powerpc/platforms/powernv/eeh-ioda.c | 240 --
 3 files changed, 132 insertions(+), 118 deletions(-)

Thanks,
Gavin


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] tick/hotplug: Handover time related duties before cpu offline

2015-02-16 Thread Michael Ellerman
On Tue, 2015-02-17 at 09:38 +0530, Preeti U Murthy wrote:
 On 02/17/2015 07:28 AM, Michael Ellerman wrote:
  On Sat, 2015-01-31 at 09:44 +0530, Preeti U Murthy wrote:
  These duties include do_timer to update jiffies and broadcast wakeups on 
  those
  platforms which do not have an external device to handle wakeup of cpus 
  from deep
  idle states. The handover of these duties is not robust against a cpu 
  offline
  operation today.
  
  Hi Preeti,
  
  This bug is still causing breakage for people on Power8 machines.
  
  Are we just waiting for Thomas to take the patch?
 
 Hi mpe,
 
 Thomas has included the patch for fixing this issue in a recent patchset
 that he posted for cleaning up tick/clockevents related code.

 https://lkml.org/lkml/2015/2/16/213. I think it will go into this
 merge-window. There are a couple of issues there, once that is fixed I
 will remind him to mark it for stable.

Ah thanks. I missed it when searching LKML because it was posted by Peter.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1 2/7] AES for PPC/SPE - aes tables

2015-02-16 Thread Markus Stockhausen
[PATCH v1 2/7] AES for PPC/SPE - aes tables

4K AES tables for big endian

Signed-off-by: Markus Stockhausen stockhau...@collogia.de

diff --git a/arch/powerpc/crypto/aes-tab-4k.S b/arch/powerpc/crypto/aes-tab-4k.S
new file mode 100644
index 000..6bc1755
--- /dev/null
+++ b/arch/powerpc/crypto/aes-tab-4k.S
@@ -0,0 +1,570 @@
+/*
+ * 4K AES tables for PPC AES implementation
+ *
+ * Copyright (c) 2015 Markus Stockhausen stockhau...@collogia.de
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+/*
+ * These big endian AES encryption/decryption tables are designed to be simply
+ * accessed by a combination of rlwimi/lwz instructions with a minimum
+ * of table registers (usually only one required). Thus they are aligned to
+ * 4K. The locality of rotated values is derived from the reduced offsets that
+ * are available in the SPE load instructions. E.g. evldw, evlwwsplat, ...
+ *
+ */
+.data
+.align 12
+.globl PPC_AES_4K_ENCTAB
+PPC_AES_4K_ENCTAB:
+   .long 0xc66363a5,0xa5c66363,0x63a5c663,0x6363a5c6
+   .long 0xf87c7c84,0x84f87c7c,0x7c84f87c,0x7c7c84f8
+   .long 0xee99,0x99ee,0x7799ee77,0x99ee
+   .long 0xf67b7b8d,0x8df67b7b,0x7b8df67b,0x7b7b8df6
+   .long 0xfff2f20d,0x0dfff2f2,0xf20dfff2,0xf2f20dff
+   .long 0xd66b6bbd,0xbdd66b6b,0x6bbdd66b,0x6b6bbdd6
+   .long 0xde6f6fb1,0xb1de6f6f,0x6fb1de6f,0x6f6fb1de
+   .long 0x91c5c554,0x5491c5c5,0xc55491c5,0xc5c55491
+   .long 0x60303050,0x50603030,0x30506030,0x30305060
+   .long 0x02010103,0x03020101,0x01030201,0x01010302
+   .long 0xce6767a9,0xa9ce6767,0x67a9ce67,0x6767a9ce
+   .long 0x562b2b7d,0x7d562b2b,0x2b7d562b,0x2b2b7d56
+   .long 0xe7fefe19,0x19e7fefe,0xfe19e7fe,0xfefe19e7
+   .long 0xb5d7d762,0x62b5d7d7,0xd762b5d7,0xd7d762b5
+   .long 0x4dababe6,0xe64dabab,0xabe64dab,0xababe64d
+   .long 0xec76769a,0x9aec7676,0x769aec76,0x76769aec
+   .long 0x8fcaca45,0x458fcaca,0xca458fca,0xcaca458f
+   .long 0x1f82829d,0x9d1f8282,0x829d1f82,0x82829d1f
+   .long 0x89c9c940,0x4089c9c9,0xc94089c9,0xc9c94089
+   .long 0xfa7d7d87,0x87fa7d7d,0x7d87fa7d,0x7d7d87fa
+   .long 0xeffafa15,0x15effafa,0xfa15effa,0xfafa15ef
+   .long 0xb25959eb,0xebb25959,0x59ebb259,0x5959ebb2
+   .long 0x8e4747c9,0xc98e4747,0x47c98e47,0x4747c98e
+   .long 0xfbf0f00b,0x0bfbf0f0,0xf00bfbf0,0xf0f00bfb
+   .long 0x41adadec,0xec41adad,0xadec41ad,0xadadec41
+   .long 0xb3d4d467,0x67b3d4d4,0xd467b3d4,0xd4d467b3
+   .long 0x5fa2a2fd,0xfd5fa2a2,0xa2fd5fa2,0xa2a2fd5f
+   .long 0x45afafea,0xea45afaf,0xafea45af,0xafafea45
+   .long 0x239c9cbf,0xbf239c9c,0x9cbf239c,0x9c9cbf23
+   .long 0x53a4a4f7,0xf753a4a4,0xa4f753a4,0xa4a4f753
+   .long 0xe4727296,0x96e47272,0x7296e472,0x727296e4
+   .long 0x9bc0c05b,0x5b9bc0c0,0xc05b9bc0,0xc0c05b9b
+   .long 0x75b7b7c2,0xc275b7b7,0xb7c275b7,0xb7b7c275
+   .long 0xe1fdfd1c,0x1ce1fdfd,0xfd1ce1fd,0xfdfd1ce1
+   .long 0x3d9393ae,0xae3d9393,0x93ae3d93,0x9393ae3d
+   .long 0x4c26266a,0x6a4c2626,0x266a4c26,0x26266a4c
+   .long 0x6c36365a,0x5a6c3636,0x365a6c36,0x36365a6c
+   .long 0x7e3f3f41,0x417e3f3f,0x3f417e3f,0x3f3f417e
+   .long 0xf5f7f702,0x02f5f7f7,0xf702f5f7,0xf7f702f5
+   .long 0x834f,0x4f83,0xcc4f83cc,0x4f83
+   .long 0x6834345c,0x5c683434,0x345c6834,0x34345c68
+   .long 0x51a5a5f4,0xf451a5a5,0xa5f451a5,0xa5a5f451
+   .long 0xd1e5e534,0x34d1e5e5,0xe534d1e5,0xe5e534d1
+   .long 0xf9f1f108,0x08f9f1f1,0xf108f9f1,0xf1f108f9
+   .long 0xe2717193,0x93e27171,0x7193e271,0x717193e2
+   .long 0xabd8d873,0x73abd8d8,0xd873abd8,0xd8d873ab
+   .long 0x62313153,0x53623131,0x31536231,0x31315362
+   .long 0x2a15153f,0x3f2a1515,0x153f2a15,0x15153f2a
+   .long 0x0804040c,0x0c080404,0x040c0804,0x04040c08
+   .long 0x95c7c752,0x5295c7c7,0xc75295c7,0xc7c75295
+   .long 0x46232365,0x65462323,0x23654623,0x23236546
+   .long 0x9dc3c35e,0x5e9dc3c3,0xc35e9dc3,0xc3c35e9d
+   .long 0x30181828,0x28301818,0x18283018,0x18182830
+   .long 0x379696a1,0xa1379696,0x96a13796,0x9696a137
+   .long 0x0a05050f,0x0f0a0505,0x050f0a05,0x05050f0a
+   .long 0x2f9a9ab5,0xb52f9a9a,0x9ab52f9a,0x9a9ab52f
+   .long 0x0e070709,0x090e0707,0x07090e07,0x0707090e
+   .long 0x24121236,0x36241212,0x12362412,0x12123624
+   .long 0x1b80809b,0x9b1b8080,0x809b1b80,0x80809b1b
+   .long 0xdfe2e23d,0x3ddfe2e2,0xe23ddfe2,0xe2e23ddf
+   .long 0xcdebeb26,0x26cdebeb,0xeb26cdeb,0xebeb26cd
+   .long 0x4e272769,0x694e2727,0x27694e27,0x2727694e
+   .long 0x7fb2b2cd,0xcd7fb2b2,0xb2cd7fb2,0xb2b2cd7f
+   .long 0xea75759f,0x9fea7575,0x759fea75,0x75759fea
+   .long 0x1209091b,0x1b120909,0x091b1209,0x09091b12
+   .long 

[PATCH v1 6/7] AES for PPC/SPE - glue code

2015-02-16 Thread Markus Stockhausen
[PATCH v1 6/7] AES for PPC/SPE - glue code

Integrate the assembler modules into the kernel crypto
framework. Take care to avoid long intervals of disabled
preemption.

Signed-off-by: Markus Stockhausen stockhau...@collogia.de

diff --git a/arch/powerpc/crypto/aes_spe_glue.c 
b/arch/powerpc/crypto/aes_spe_glue.c
new file mode 100644
index 000..bd5e63f
--- /dev/null
+++ b/arch/powerpc/crypto/aes_spe_glue.c
@@ -0,0 +1,512 @@
+/*
+ * Glue code for AES implementation for SPE instructions (PPC)
+ *
+ * Based on generic implementation. The assembler module takes care
+ * about the SPE registers so it can run from interrupt context.
+ *
+ * Copyright (c) 2015 Markus Stockhausen stockhau...@collogia.de
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include crypto/aes.h
+#include linux/module.h
+#include linux/init.h
+#include linux/types.h
+#include linux/errno.h
+#include linux/crypto.h
+#include asm/byteorder.h
+#include asm/switch_to.h
+#include crypto/algapi.h
+
+/*
+ * MAX_BYTES defines the number of bytes that are allowed to be processed
+ * between preempt_disable() and preempt_enable(). e500 cores can issue two
+ * instructions per clock cycle using one 32/64 bit unit (SU1) and one 32
+ * bit unit (SU2). One of these can be a memory access that is executed via
+ * a single load and store unit (LSU). XTS-AES-256 takes ~780 operations per
+ * 16 byte block block or 25 cycles per byte. Thus 768 bytes of input data
+ * will need an estimated maximum of 20,000 cycles. Headroom for cache misses
+ * included. Even with the low end model clocked at 667 MHz this equals to a
+ * critical time window of less than 30us. The value has been choosen to
+ * process a 512 byte disk block in one or a large 1400 bytes IPsec network
+ * packet in two runs.
+ *
+ */
+#define MAX_BYTES 768
+
+struct ppc_aes_ctx {
+   u32 key_enc[AES_MAX_KEYLENGTH_U32];
+   u32 key_dec[AES_MAX_KEYLENGTH_U32];
+   u32 rounds;
+};
+
+struct ppc_xts_ctx {
+   u32 key_enc[AES_MAX_KEYLENGTH_U32];
+   u32 key_dec[AES_MAX_KEYLENGTH_U32];
+   u32 key_twk[AES_MAX_KEYLENGTH_U32];
+   u32 rounds;
+};
+
+extern void ppc_encrypt_aes(u8 *out, const u8 *in, u32 *key_enc, u32 rounds);
+extern void ppc_decrypt_aes(u8 *out, const u8 *in, u32 *key_dec, u32 rounds);
+extern void ppc_encrypt_ecb(u8 *out, const u8 *in, u32 *key_enc, u32 rounds,
+   u32 bytes);
+extern void ppc_decrypt_ecb(u8 *out, const u8 *in, u32 *key_dec, u32 rounds,
+   u32 bytes);
+extern void ppc_encrypt_cbc(u8 *out, const u8 *in, u32 *key_enc, u32 rounds,
+   u32 bytes, u8 *iv);
+extern void ppc_decrypt_cbc(u8 *out, const u8 *in, u32 *key_dec, u32 rounds,
+   u32 bytes, u8 *iv);
+extern void ppc_crypt_ctr  (u8 *out, const u8 *in, u32 *key_enc, u32 rounds,
+   u32 bytes, u8 *iv);
+extern void ppc_encrypt_xts(u8 *out, const u8 *in, u32 *key_enc, u32 rounds,
+   u32 bytes, u8 *iv, u32 *key_twk);
+extern void ppc_decrypt_xts(u8 *out, const u8 *in, u32 *key_dec, u32 rounds,
+   u32 bytes, u8 *iv, u32 *key_twk);
+
+extern void ppc_expand_key_128(u32 *key_enc, const u8 *key);
+extern void ppc_expand_key_192(u32 *key_enc, const u8 *key);
+extern void ppc_expand_key_256(u32 *key_enc, const u8 *key);
+
+extern void ppc_generate_decrypt_key(u32 *key_dec,u32 *key_enc,
+unsigned int key_len);
+
+static void spe_begin(void)
+{
+   /* disable preemption and save users SPE registers if required */
+   preempt_disable();
+   enable_kernel_spe();
+}
+
+static void spe_end(void)
+{
+   /* reenable preemption */
+   preempt_enable();
+}
+
+static int ppc_aes_setkey(struct crypto_tfm *tfm, const u8 *in_key,
+   unsigned int key_len)
+{
+   struct ppc_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+   if (key_len != AES_KEYSIZE_128 
+   key_len != AES_KEYSIZE_192 
+   key_len != AES_KEYSIZE_256) {
+   tfm-crt_flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+   return -EINVAL;
+   }
+
+   switch (key_len) {
+   case AES_KEYSIZE_128:
+   ctx-rounds = 4;
+   ppc_expand_key_128(ctx-key_enc, in_key);
+   break;
+   case AES_KEYSIZE_192:
+   ctx-rounds = 5;
+   ppc_expand_key_192(ctx-key_enc, in_key);
+   break;
+   case AES_KEYSIZE_256:
+   ctx-rounds = 6;
+   ppc_expand_key_256(ctx-key_enc, in_key);
+   break;
+   }
+
+   ppc_generate_decrypt_key(ctx-key_dec, ctx-key_enc, key_len);
+
+   return 0;
+}
+
+static int ppc_xts_setkey(struct crypto_tfm *tfm, const u8 

[PATCH v1 7/7] AES for PPC/SPE - kernel config

2015-02-16 Thread Markus Stockhausen
[PATCH v1 7/7] AES for PPC/SPE - kernel config

Integrate the module into the kernel configuration

Signed-off-by: Markus Stockhausen stockhau...@collogia.de

diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile
index a07e763..1698fb9 100644
--- a/arch/powerpc/crypto/Makefile
+++ b/arch/powerpc/crypto/Makefile
@@ -4,8 +4,10 @@
 # Arch-specific CryptoAPI modules.
 #
 
+obj-$(CONFIG_CRYPTO_AES_PPC_SPE) += aes-ppc-spe.o
 obj-$(CONFIG_CRYPTO_SHA1_PPC) += sha1-powerpc.o
 obj-$(CONFIG_CRYPTO_SHA256_PPC_SPE) += sha256-ppc-spe.o
 
+aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o 
aes_spe_glue.o
 sha1-powerpc-y := sha1-powerpc-asm.o sha1.o
 sha256-ppc-spe-y := sha256-spe-asm.o sha256_spe_glue.o
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 86d35be..87dc274 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -857,6 +857,13 @@ config CRYPTO_AES_ARM_BS
  This implementation does not rely on any lookup tables so it is
  believed to be invulnerable to cache timing attacks.
 
+config CRYPTO_AES_PPC_SPE
+   tristate AES cipher algorithms (PPC SPE)
+   depends on PPC  SPE
+   help
+ AES cipher algorithms (FIPS-197). Additionally the acceleration
+ for popular block cipher modes ECB, CBC, CTR and XTS is supported.
+
 config CRYPTO_ANUBIS
tristate Anubis cipher algorithm
select CRYPTO_ALGAPI

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v1 3/7] AES for PPC/SPE - assembler core

2015-02-16 Thread Markus Stockhausen
[PATCH v1 3/7] AES for PPC/SPE - assembler core

The assembler AES encryption and decryption core routines.
Implemented  optimized for big endian. Nevertheless they
work on little endian too.

For most efficient reuse in (higher level) block cipher 
routines they are implemented as fast call modules without 
any stack handling or register saving. The caller must 
take care of that part. 

Signed-off-by: Markus Stockhausen stockhau...@collogia.de

diff --git a/arch/powerpc/crypto/aes-spe-core.S 
b/arch/powerpc/crypto/aes-spe-core.S
new file mode 100644
index 000..5dc6bce
--- /dev/null
+++ b/arch/powerpc/crypto/aes-spe-core.S
@@ -0,0 +1,351 @@
+/*
+ * Fast AES implementation for SPE instruction set (PPC)
+ *
+ * This code makes use of the SPE SIMD instruction set as defined in
+ * http://cache.freescale.com/files/32bit/doc/ref_manual/SPEPIM.pdf
+ * Implementation is based on optimization guide notes from
+ * http://cache.freescale.com/files/32bit/doc/app_note/AN2665.pdf
+ *
+ * Copyright (c) 2015 Markus Stockhausen stockhau...@collogia.de
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include asm/ppc_asm.h
+#include aes-spe-regs.h
+
+#defineEAD(in, bpos) \
+   rlwimi  rT0,in,28-((bpos+3)%4)*8,20,27;
+
+#define DAD(in, bpos) \
+   rlwimi  rT1,in,24-((bpos+3)%4)*8,24,31;
+
+#define LWH(out, off) \
+   evlwwsplat  out,off(rT0);   /* load word high   */
+
+#define LWL(out, off) \
+   lwz out,off(rT0);   /* load word low*/
+
+#define LBZ(out, tab, off) \
+   lbz out,off(tab);   /* load byte*/
+
+#define LAH(out, in, bpos, off) \
+   EAD(in, bpos)   /* calc addr + load word high   */ \
+   LWH(out, off)
+
+#define LAL(out, in, bpos, off) \
+   EAD(in, bpos)   /* calc addr + load word low*/ \
+   LWL(out, off)
+
+#define LAE(out, in, bpos) \
+   EAD(in, bpos)   /* calc addr + load enc byte*/ \
+   LBZ(out, rT0, 8)
+
+#define LBE(out) \
+   LBZ(out, rT0, 8)/* load enc byte*/
+
+#define LAD(out, in, bpos) \
+   DAD(in, bpos)   /* calc addr + load dec byte*/ \
+   LBZ(out, rT1, 0)
+
+#define LBD(out) \
+   LBZ(out, rT1, 0)
+
+/*
+ * ppc_encrypt_block: The central encryption function for a single 16 bytes
+ * block. It does no stack handling or register saving to support fast calls
+ * via bl/blr. It expects that caller has pre-xored input data with first
+ * 4 words of encryption key into rD0-rD3. Pointer/counter registers must
+ * have also been set up before (rT0, rKP, CTR). Output is stored in rD0-rD3
+ * and rW0-rW3 and caller must execute a final xor on the ouput registers.
+ * All working registers rD0-rD3  rW0-rW7 are overwritten during processing.
+ *
+ */
+_GLOBAL(ppc_encrypt_block)
+   LAH(rW4, rD1, 2, 4)
+   LAH(rW6, rD0, 3, 0)
+   LAH(rW3, rD0, 1, 8)
+ppc_encrypt_block_loop:
+   LAH(rW0, rD3, 0, 12)
+   LAL(rW0, rD0, 0, 12)
+   LAH(rW1, rD1, 0, 12)
+   LAH(rW2, rD2, 1, 8)
+   LAL(rW2, rD3, 1, 8)
+   LAL(rW3, rD1, 1, 8)
+   LAL(rW4, rD2, 2, 4)
+   LAL(rW6, rD1, 3, 0)
+   LAH(rW5, rD3, 2, 4)
+   LAL(rW5, rD0, 2, 4)
+   LAH(rW7, rD2, 3, 0)
+   evldw   rD1,16(rKP)
+   EAD(rD3, 3)
+   evxor   rW2,rW2,rW4
+   LWL(rW7, 0)
+   evxor   rW2,rW2,rW6
+   EAD(rD2, 0)
+   evxor   rD1,rD1,rW2
+   LWL(rW1, 12)
+   evxor   rD1,rD1,rW0
+   evldw   rD3,24(rKP)
+   evmergehi   rD0,rD0,rD1
+   EAD(rD1, 2)
+   evxor   rW3,rW3,rW5
+   LWH(rW4, 4)
+   evxor   rW3,rW3,rW7
+   EAD(rD0, 3)
+   evxor   rD3,rD3,rW3
+   LWH(rW6, 0)
+   evxor   rD3,rD3,rW1
+   EAD(rD0, 1)
+   evmergehi   rD2,rD2,rD3
+   LWH(rW3, 8)
+   LAH(rW0, rD3, 0, 12)
+   LAL(rW0, rD0, 0, 12)
+   LAH(rW1, rD1, 0, 12)
+   LAH(rW2, rD2, 1, 8)
+   LAL(rW2, rD3, 1, 8)
+   LAL(rW3, rD1, 1, 8)
+   LAL(rW4, rD2, 2, 4)
+   LAL(rW6, rD1, 3, 0)
+   LAH(rW5, rD3, 2, 4)
+   LAL(rW5, rD0, 2, 4)
+   LAH(rW7, rD2, 3, 0)
+   evldw   rD1,32(rKP)
+   EAD(rD3, 3)
+   evxor   rW2,rW2,rW4
+   LWL(rW7, 0)
+   evxor   rW2,rW2,rW6
+   EAD(rD2, 0)
+   evxor   rD1,rD1,rW2
+   LWL(rW1, 12)
+   evxor   rD1,rD1,rW0
+   evldw   rD3,40(rKP)
+   evmergehi   rD0,rD0,rD1
+   EAD(rD1, 2)
+   evxor   rW3,rW3,rW5
+   LWH(rW4, 4)
+   evxor   rW3,rW3,rW7
+   EAD(rD0, 3)
+   evxor   rD3,rD3,rW3
+   

[PATCH v1 0/7] AES for PPC/SPE

2015-02-16 Thread Markus Stockhausen
[PATCH v1 0/7] AES for PPC/SPE

The following patches add support for 64bit accelerated AES
calculation on PPC processors with SPE instruction set. Besides
the AES core module it implements ECB/CBC/CTR/XTS as block
ciphers. The implementation takes care of the following 
constraints:

- save SPE registers for interrupt context compatibility
- disable preemption only for short intervals
- endian independant

Module passes tcrypt mode=10 tests. Synthethic AES speedup 
factors from insmod tcrypt sec=3 mode=200 taken on e500v2 
800 MHz (TP Link WDR4900) compared with the generic kernel 
module.

key bytes  ecb   ecb   cbc   cbc   ctr   ctr   xts   xts
   enc   dec   enc   dec   enc   dec   enc   dec
--- -                
12816  1.14  1.14  1.20  1.28  1.20  1.19  1.20  1.21
12864  1.35  1.36  1.48  1.51  1.50  1.50  1.41  1.41
128   256  1.49  1.49  1.66  1.65  1.69  1.69  1.58  1.57
128  1024  1.51  1.51  1.69  1.68  1.72  1.72  1.61  1.60
128  8192  1.52  1.52  1.70  1.68  1.73  1.73  1.62  1.61
19216  1.14  1.15  1.22  1.28  1.21  1.21  1.22  1.23
19264  1.36  1.37  1.48  1.49  1.49  1.50  1.41  1.41
192   256  1.48  1.48  1.63  1.63  1.65  1.65  1.56  1.55
192  1024  1.50  1.50  1.65  1.64  1.68  1.68  1.59  1.58
192  8192  1.52  1.52  1.67  1.66  1.68  1.68  1.60  1.59
25616  1.17  1.18  1.24  1.30  1.23  1.22  1.24  1.25
25664  1.37  1.37  1.47  1.50  1.49  1.49  1.42  1.41
256   256  1.48  1.47  1.60  1.60  1.63  1.63  1.54  1.53
256  1024  1.50  1.49  1.62  1.61  1.65  1.65  1.57  1.56
256  8192  1.50  1.49  1.63  1.62  1.66  1.66  1.58  1.57

Additionally numbers from an iperf transfer benchmark. They
include the AES optimized and the SHA256 optimized module.

- Server : Xeon X3470 2.93GHz
- Client : Core I5 2.4GHz Windows (Shrew VPN client)
- Gateway: e500v2 800 MHz (TP Link WDR4900)

AES256 generic / SHA256 generic modules:
 iperf.exe -c a.b.c.d -t 60 -i 10

Client connecting to a.b.c.d, TCP port 5001
TCP window size: 63.0 KByte (default)

[  3] local u.v.w.x port 50730 connected with a.b.c.d port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  51.1 MBytes  42.9 Mbits/sec
[  3] 10.0-20.0 sec  51.9 MBytes  43.5 Mbits/sec
[  3] 20.0-30.0 sec  51.5 MBytes  43.2 Mbits/sec
[  3] 30.0-40.0 sec  51.5 MBytes  43.2 Mbits/sec
[  3] 40.0-50.0 sec  51.2 MBytes  43.0 Mbits/sec
[  3] 50.0-60.0 sec  50.6 MBytes  42.5 Mbits/sec
[  3]  0.0-60.0 sec   308 MBytes  43.0 Mbits/sec

AES256 (this patch) / SHA256 (my last patch)
 iperf.exe -c a.b.c.d -t 60 -i 10

Client connecting to a.b.c.d, TCP port 5001
TCP window size: 63.0 KByte (default)

[  3] local u.v.w.x port 50730 connected with a.b.c.d port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  69.6 MBytes  58.4 Mbits/sec
[  3] 10.0-20.0 sec  69.1 MBytes  58.0 Mbits/sec
[  3] 20.0-30.0 sec  69.2 MBytes  58.1 Mbits/sec
[  3] 30.0-40.0 sec  67.1 MBytes  56.3 Mbits/sec
[  3] 40.0-50.0 sec  67.6 MBytes  56.7 Mbits/sec
[  3] 50.0-60.0 sec  65.9 MBytes  55.3 Mbits/sec
[  3]  0.0-60.0 sec   409 MBytes  57.1 Mbits/sec

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497


[PATCH v1 1/7] AES for PPC/SPE - register defines

2015-02-16 Thread Markus Stockhausen
[PATCH v1 1/7] AES for PPC/SPE - register defines

Define some register aliases for better readability.

Signed-off-by: Markus Stockhausen stockhau...@collogia.de

diff --git a/arch/powerpc/crypto/aes-spe-regs.h 
b/arch/powerpc/crypto/aes-spe-regs.h
new file mode 100644
index 000..30d217b
--- /dev/null
+++ b/arch/powerpc/crypto/aes-spe-regs.h
@@ -0,0 +1,42 @@
+/*
+ * Common registers for PPC AES implementation
+ *
+ * Copyright (c) 2015 Markus Stockhausen stockhau...@collogia.de
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#define rKS r0 /* copy of en-/decryption key pointer   */
+#define rDP r3 /* destination pointer  */
+#define rSP r4 /* source pointer   */
+#define rKP r5 /* pointer to en-/decryption key pointer*/
+#define rRR r6 /* en-/decryption rounds*/
+#define rLN r7 /* length of data to be processed   */
+#define rIP r8 /* potiner to IV (CBC/CTR/XTS modes)*/
+#define rKT r9 /* pointer to tweak key (XTS mode)  */
+#define rT0 r11/* pointers to en-/decrpytion tables
*/
+#define rT1 r10
+#define rD0 r9 /* data */
+#define rD1 r14
+#define rD2 r12
+#define rD3 r15
+#define rW0 r16/* working registers
*/
+#define rW1 r17
+#define rW2 r18
+#define rW3 r19
+#define rW4 r20
+#define rW5 r21
+#define rW6 r22
+#define rW7 r23
+#define rI0 r24/* IV   
*/
+#define rI1 r25
+#define rI2 r26
+#define rI3 r27
+#define rG0 r28/* endian reversed tweak (XTS mode) 
*/
+#define rG1 r29
+#define rG2 r30
+#define rG3 r31

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet.

Über das Internet versandte E-Mails können unter fremden Namen erstellt oder
manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine
rechtsverbindliche Willenserklärung.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

Vorstand:
Kadir Akin
Dr. Michael Höhnerbach

Vorsitzender des Aufsichtsrates:
Hans Kristian Langva

Registergericht: Amtsgericht Köln
Registernummer: HRB 52 497

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

e-mails sent over the internet may have been written under a wrong name or
been manipulated. That is why this message sent as an e-mail is not a
legally binding declaration of intention.

Collogia
Unternehmensberatung AG
Ubierring 11
D-50678 Köln

executive board:
Kadir Akin
Dr. Michael Höhnerbach

President of the supervisory board:
Hans Kristian Langva

Registry office: district court Cologne
Register number: HRB 52 497


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/4] Support registering specific reset handler

2015-02-16 Thread cascardo
On Fri, Feb 13, 2015 at 03:54:55PM +1100, Gavin Shan wrote:
 VFIO PCI infrastructure depends on pci_reset_function() to do reset on
 PCI devices so that they would be in clean state when host or guest grabs
 them. Unfortunately, the function doesn't work (or not well) on some PCI
 devices that require EEH PE reset.
 
 The patchset extends the quirk for PCI device speicific reset methods to
 allow dynamically registration. With it, we can translate reset requests
 for those special PCI devcies to EEH PE reset, which is only avaialble on
 64-bits PowerPC platforms.
 

Hi, Gavin.

I like your approach overall. That allows us to confine these quirks to
the platforms where they are relevant. I would make the quirks more
specific, though, instead of doing them for all IBM and Mellanox
devices.

I wonder if we should not have some form of domain reset, where we would
reset all the devices on the same group, and use that on vfio. Grouping
the devices would then be made platform-dependent, as well as the reset
method. On powernv, we would group by IOMMU group and issue a
fundamental reset.

Cascardo.

 Gavin Shan (4):
   PCI: Rename struct pci_dev_reset_methods
   PCI: Introduce list for device reset methods
   PCI: Allow registering reset method
   powerpc/powernv: Register PCI dev specific reset handlers
 
  arch/powerpc/platforms/powernv/pci.c |  61 +++
  drivers/pci/pci.h|   3 +-
  drivers/pci/quirks.c | 139 
 ++-
  include/linux/pci.h  |   9 +++
  4 files changed, 192 insertions(+), 20 deletions(-)
 
 -- 
 1.8.3.2
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

bpf: Enable BPF JIT on ppc32

2015-02-16 Thread Denis Kirjanov
This patch series enables BPF JIT on ppc32. There are relatevily
few chnages in the code to make it work.

All test_bpf tests passed both on 7447a and P2041-based machines.

Changelog:
v1 -  v2: Reordered Kconfig patch in the series

Denis Kirjanov (6):
  ppc: bpf: add required compatibility macros for jit
  ppc: bpf: add reqired opcodes for ppc32
  ppc: bpf: update jit to use compatibility macros
  ppc: bpf: rename bpf_jit_64.S to bpf_jit_asm.S
  ppc: bpf: Add SKF_AD_CPU for ppc32
  ppc: Kconfig: Enable BPF JIT on ppc32

 arch/powerpc/include/asm/asm-compat.h |   4 +
 arch/powerpc/include/asm/ppc-opcode.h |   2 +
 arch/powerpc/net/Makefile |   2 +-
 arch/powerpc/net/bpf_jit.h|  64 +-
 arch/powerpc/net/bpf_jit_64.S | 229 --
 arch/powerpc/net/bpf_jit_asm.S| 229 ++
 arch/powerpc/net/bpf_jit_comp.c   |  46 +++
 7 files changed, 317 insertions(+), 259 deletions(-)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH net-next v2 1/6] ppc: bpf: add required compatibility macros for jit

2015-02-16 Thread Denis Kirjanov
Signed-off-by: Denis Kirjanov k...@linux-powerpc.org
---
 arch/powerpc/include/asm/asm-compat.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/include/asm/asm-compat.h 
b/arch/powerpc/include/asm/asm-compat.h
index 21be8ae..dc85dcb 100644
--- a/arch/powerpc/include/asm/asm-compat.h
+++ b/arch/powerpc/include/asm/asm-compat.h
@@ -23,6 +23,8 @@
 #define PPC_STLstringify_in_c(std)
 #define PPC_STLU   stringify_in_c(stdu)
 #define PPC_LCMPI  stringify_in_c(cmpdi)
+#define PPC_LCMPLI stringify_in_c(cmpldi)
+#define PPC_LCMP   stringify_in_c(cmpd)
 #define PPC_LONG   stringify_in_c(.llong)
 #define PPC_LONG_ALIGN stringify_in_c(.balign 8)
 #define PPC_TLNEI  stringify_in_c(tdnei)
@@ -52,6 +54,8 @@
 #define PPC_STLstringify_in_c(stw)
 #define PPC_STLU   stringify_in_c(stwu)
 #define PPC_LCMPI  stringify_in_c(cmpwi)
+#define PPC_LCMPLI stringify_in_c(cmplwi)
+#define PPC_LCMP   stringify_in_c(cmpw)
 #define PPC_LONG   stringify_in_c(.long)
 #define PPC_LONG_ALIGN stringify_in_c(.balign 4)
 #define PPC_TLNEI  stringify_in_c(twnei)
-- 
2.1.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH net-next v2 2/6] ppc: bpf: add reqired opcodes for ppc32

2015-02-16 Thread Denis Kirjanov
Signed-off-by: Denis Kirjanov k...@linux-powerpc.org
---
 arch/powerpc/include/asm/ppc-opcode.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 03cd858..2eadde0 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -212,6 +212,8 @@
 #define PPC_INST_LWZ   0x8000
 #define PPC_INST_STD   0xf800
 #define PPC_INST_STDU  0xf801
+#define PPC_INST_STW   0x9000
+#define PPC_INST_STWU  0x9400
 #define PPC_INST_MFLR  0x7c0802a6
 #define PPC_INST_MTLR  0x7c0803a6
 #define PPC_INST_CMPWI 0x2c00
-- 
2.1.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH net-next v2 3/6] ppc: bpf: update jit to use compatibility macros

2015-02-16 Thread Denis Kirjanov
Use helpers from the asm-compat.h to wrap up assembly mnemonics

Signed-off-by: Denis Kirjanov k...@linux-powerpc.org
---
 arch/powerpc/net/bpf_jit.h  | 47 ++-
 arch/powerpc/net/bpf_jit_64.S   | 70 -
 arch/powerpc/net/bpf_jit_comp.c | 32 ++-
 3 files changed, 98 insertions(+), 51 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index c406aa9..2d5e715 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -10,12 +10,25 @@
 #ifndef _BPF_JIT_H
 #define _BPF_JIT_H
 
+#ifdef CONFIG_PPC64
+#define BPF_PPC_STACK_R3_OFF   48
 #define BPF_PPC_STACK_LOCALS   32
 #define BPF_PPC_STACK_BASIC(48+64)
 #define BPF_PPC_STACK_SAVE (18*8)
 #define BPF_PPC_STACKFRAME (BPF_PPC_STACK_BASIC+BPF_PPC_STACK_LOCALS+ \
 BPF_PPC_STACK_SAVE)
 #define BPF_PPC_SLOWPATH_FRAME (48+64)
+#else
+#define BPF_PPC_STACK_R3_OFF   24
+#define BPF_PPC_STACK_LOCALS   16
+#define BPF_PPC_STACK_BASIC(24+32)
+#define BPF_PPC_STACK_SAVE (18*4)
+#define BPF_PPC_STACKFRAME (BPF_PPC_STACK_BASIC+BPF_PPC_STACK_LOCALS+ \
+BPF_PPC_STACK_SAVE)
+#define BPF_PPC_SLOWPATH_FRAME (24+32)
+#endif
+
+#define REG_SZ (BITS_PER_LONG/8)
 
 /*
  * Generated code register usage:
@@ -57,7 +70,11 @@ DECLARE_LOAD_FUNC(sk_load_half);
 DECLARE_LOAD_FUNC(sk_load_byte);
 DECLARE_LOAD_FUNC(sk_load_byte_msh);
 
+#ifdef CONFIG_PPC64
 #define FUNCTION_DESCR_SIZE24
+#else
+#define FUNCTION_DESCR_SIZE0
+#endif
 
 /*
  * 16-bit immediate helper macros: HA() is for use with sign-extending instrs
@@ -86,7 +103,12 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
 #define PPC_LIS(r, i)  PPC_ADDIS(r, 0, i)
 #define PPC_STD(r, base, i)EMIT(PPC_INST_STD | ___PPC_RS(r) |\
 ___PPC_RA(base) | ((i)  0xfffc))
-
+#define PPC_STDU(r, base, i)   EMIT(PPC_INST_STDU | ___PPC_RS(r) |   \
+___PPC_RA(base) | ((i)  0xfffc))
+#define PPC_STW(r, base, i)EMIT(PPC_INST_STW | ___PPC_RS(r) |\
+___PPC_RA(base) | ((i)  0xfffc))
+#define PPC_STWU(r, base, i)   EMIT(PPC_INST_STWU | ___PPC_RS(r) |   \
+___PPC_RA(base) | ((i)  0xfffc))
 
 #define PPC_LBZ(r, base, i)EMIT(PPC_INST_LBZ | ___PPC_RT(r) |\
 ___PPC_RA(base) | IMM_L(i))
@@ -98,6 +120,17 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
 ___PPC_RA(base) | IMM_L(i))
 #define PPC_LHBRX(r, base, b)  EMIT(PPC_INST_LHBRX | ___PPC_RT(r) |  \
 ___PPC_RA(base) | ___PPC_RB(b))
+
+#ifdef CONFIG_PPC64
+#define PPC_BPF_LL(r, base, i) do { PPC_LD(r, base, i); } while(0)
+#define PPC_BPF_STL(r, base, i) do { PPC_STD(r, base, i); } while(0)
+#define PPC_BPF_STLU(r, base, i) do { PPC_STDU(r, base, i); } while(0)
+#else
+#define PPC_BPF_LL(r, base, i) do { PPC_LWZ(r, base, i); } while(0)
+#define PPC_BPF_STL(r, base, i) do { PPC_STW(r, base, i); } while(0)
+#define PPC_BPF_STLU(r, base, i) do { PPC_STWU(r, base, i); } while(0)
+#endif
+
 /* Convenience helpers for the above with 'far' offsets: */
 #define PPC_LBZ_OFFS(r, base, i) do { if ((i)  32768) PPC_LBZ(r, base, i);   \
else {  PPC_ADDIS(r, base, IMM_HA(i));\
@@ -115,6 +148,12 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
else {  PPC_ADDIS(r, base, IMM_HA(i));\
PPC_LHZ(r, r, IMM_L(i)); } } while(0)
 
+#ifdef CONFIG_PPC64
+#define PPC_LL_OFFS(r, base, i) do { PPC_LD_OFFS(r, base, i); } while(0)
+#else
+#define PPC_LL_OFFS(r, base, i) do { PPC_LWZ_OFFS(r, base, i); } while(0)
+#endif
+
 #define PPC_CMPWI(a, i)EMIT(PPC_INST_CMPWI | ___PPC_RA(a) | 
IMM_L(i))
 #define PPC_CMPDI(a, i)EMIT(PPC_INST_CMPDI | ___PPC_RA(a) | 
IMM_L(i))
 #define PPC_CMPLWI(a, i)   EMIT(PPC_INST_CMPLWI | ___PPC_RA(a) | IMM_L(i))
@@ -196,6 +235,12 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
PPC_ORI(d, d, (uintptr_t)(i)  0x);   \
} } while (0);
 
+#ifdef CONFIG_PPC64
+#define PPC_FUNC_ADDR(d,i) do { PPC_LI64(d, i); } while(0)
+#else
+#define PPC_FUNC_ADDR(d,i) do { PPC_LI32(d, i); } while(0)
+#endif
+
 #define PPC_LHBRX_OFFS(r, base, i) \
do { PPC_LI32(r, i); PPC_LHBRX(r, r, base); } while(0)
 #ifdef __LITTLE_ENDIAN__
diff --git a/arch/powerpc/net/bpf_jit_64.S b/arch/powerpc/net/bpf_jit_64.S
index 8f87d92..8ff5a3b 100644
--- a/arch/powerpc/net/bpf_jit_64.S
+++ b/arch/powerpc/net/bpf_jit_64.S
@@ -34,13 +34,13 @@
  */
.globl  sk_load_word
 sk_load_word:
-   cmpdi   r_addr, 0
+   PPC_LCMPI   r_addr, 0
blt bpf_slow_path_word_neg
.globl  sk_load_word_positive_offset
 

[PATCH net-next v2 5/6] ppc: bpf: Add SKF_AD_CPU for ppc32

2015-02-16 Thread Denis Kirjanov
Signed-off-by: Denis Kirjanov k...@linux-powerpc.org
---
 arch/powerpc/net/bpf_jit.h  | 17 +
 arch/powerpc/net/bpf_jit_comp.c | 14 +-
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 2d5e715..889fd19 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -154,6 +154,23 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh);
 #define PPC_LL_OFFS(r, base, i) do { PPC_LWZ_OFFS(r, base, i); } while(0)
 #endif
 
+#ifdef CONFIG_SMP
+#ifdef CONFIG_PPC64
+#define PPC_BPF_LOAD_CPU(r)\
+   do { BUILD_BUG_ON(FIELD_SIZEOF(struct paca_struct, paca_index) != 2);   
\
+   PPC_LHZ_OFFS(r, 13, offsetof(struct paca_struct, paca_index));  
\
+   } while (0)
+#else
+#define PPC_BPF_LOAD_CPU(r) \
+   do { BUILD_BUG_ON(FIELD_SIZEOF(struct thread_info, cpu) != 4);  
\
+   PPC_LHZ_OFFS(r, (1  ~(THREAD_SIZE - 1)),   
\
+   offsetof(struct thread_info, cpu)); 
\
+   } while(0)
+#endif
+#else
+#define PPC_BPF_LOAD_CPU(r) do { PPC_LI(r, 0); } while(0)
+#endif
+
 #define PPC_CMPWI(a, i)EMIT(PPC_INST_CMPWI | ___PPC_RA(a) | 
IMM_L(i))
 #define PPC_CMPDI(a, i)EMIT(PPC_INST_CMPDI | ___PPC_RA(a) | 
IMM_L(i))
 #define PPC_CMPLWI(a, i)   EMIT(PPC_INST_CMPLWI | ___PPC_RA(a) | IMM_L(i))
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 8b29268..17cea18 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -411,20 +411,8 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
PPC_SRWI(r_A, r_A, 5);
break;
case BPF_ANC | SKF_AD_CPU:
-#ifdef CONFIG_SMP
-   /*
-* PACA ptr is r13:
-* raw_smp_processor_id() = local_paca-paca_index
-*/
-   BUILD_BUG_ON(FIELD_SIZEOF(struct paca_struct,
- paca_index) != 2);
-   PPC_LHZ_OFFS(r_A, 13,
-offsetof(struct paca_struct, paca_index));
-#else
-   PPC_LI(r_A, 0);
-#endif
+   PPC_BPF_LOAD_CPU(r_A);
break;
-
/*** Absolute loads from packet header/data ***/
case BPF_LD | BPF_W | BPF_ABS:
func = CHOOSE_LOAD_FUNC(K, sk_load_word);
-- 
2.1.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH net-next v2 6/6] ppc: Kconfig: Enable BPF JIT on ppc32

2015-02-16 Thread Denis Kirjanov
Signed-off-by: Denis Kirjanov k...@linux-powerpc.org
---
 arch/powerpc/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 22b0940..5084bdc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -126,7 +126,7 @@ config PPC
select IRQ_FORCED_THREADING
select HAVE_RCU_TABLE_FREE if SMP
select HAVE_SYSCALL_TRACEPOINTS
-   select HAVE_BPF_JIT if PPC64
+   select HAVE_BPF_JIT
select HAVE_ARCH_JUMP_LABEL
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select ARCH_HAS_GCOV_PROFILE_ALL
-- 
2.1.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH net-next v2 4/6] ppc: bpf: rename bpf_jit_64.S to bpf_jit_asm.S

2015-02-16 Thread Denis Kirjanov
Signed-off-by: Denis Kirjanov k...@linux-powerpc.org
---
 arch/powerpc/net/Makefile| 2 +-
 arch/powerpc/net/{bpf_jit_64.S = bpf_jit_asm.S} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename arch/powerpc/net/{bpf_jit_64.S = bpf_jit_asm.S} (100%)

diff --git a/arch/powerpc/net/Makefile b/arch/powerpc/net/Makefile
index 266b395..1306a58 100644
--- a/arch/powerpc/net/Makefile
+++ b/arch/powerpc/net/Makefile
@@ -1,4 +1,4 @@
 #
 # Arch-specific network modules
 #
-obj-$(CONFIG_BPF_JIT) += bpf_jit_64.o bpf_jit_comp.o
+obj-$(CONFIG_BPF_JIT) += bpf_jit_asm.o bpf_jit_comp.o
diff --git a/arch/powerpc/net/bpf_jit_64.S b/arch/powerpc/net/bpf_jit_asm.S
similarity index 100%
rename from arch/powerpc/net/bpf_jit_64.S
rename to arch/powerpc/net/bpf_jit_asm.S
-- 
2.1.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net-next 0/6] bpf: Enable BPF JIT on ppc32

2015-02-16 Thread Alexei Starovoitov
On Mon, Feb 16, 2015 at 2:13 AM, Denis Kirjanov k...@linux-powerpc.org wrote:
 On 2/15/15, Daniel Borkmann dan...@iogearbox.net wrote:
 On 02/15/2015 07:06 PM, Denis Kirjanov wrote:
 This patch series enables BPF JIT on ppc32. There are relatevily
 few chnages in the code to make it work.

 All test_bpf tests passed both on 7447a and P2041-based machines.

 I'm just wondering, next to the feedback that has already been
 provided, would opening this up for ppc32 make it significantly
 more difficult in future to migrate from classic BPF JIT to eBPF
 JIT eventually (which is what we want long-term)? Being curious,
 is there any ongoing effort from ppc people?


 Well, I don't see significant challenges to enable eBPF on ppc64 in the 
 future.
 I'll start working on it after I get this merged

sounds great. looking forward to it :)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v1 2/7] AES for PPC/SPE - aes tables

2015-02-16 Thread Segher Boessenkool
On Mon, Feb 16, 2015 at 02:19:50PM +, David Laight wrote:
 From:  Markus Stockhausen
  4K AES tables for big endian
 
 I can't help feeling that you could give more information about how the
 values are generated.

... and an explanation of why this does not open you up to a timing attack?


Segher
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 27/28] vfio: powerpc/spapr: Register memory

2015-02-16 Thread Alexey Kardashevskiy
The existing implementation accounts the whole DMA window in
the locked_vm counter which is going to be even worse with multiple
containers and huge DMA windows.

This introduces 2 ioctls to register/unregister DMA memory which
receive user space address and size of the memory region which
needs to be pinned/unpinned and counted in locked_vm.

If any memory region was registered, all subsequent DMA map requests
should address already pinned memory. If no memory was registered,
then the amount of memory required for a single default memory will be
accounted when the container is enabled and every map/unmap will pin/unpin
a page.

Dynamic DMA window and in-kernel acceleration will require memory to
be registered in order to work.

The accounting is done per VFIO container. When the support of
multiple groups per container is added, we will have accurate locked_vm
accounting.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
Changes:
v4:
* updated docs
* s/kzmalloc/vzalloc/
* in tce_pin_pages()/tce_unpin_pages() removed @vaddr, @size and
replaced offset with index
* renamed vfio_iommu_type_register_memory to vfio_iommu_spapr_register_memory
and removed duplicating vfio_iommu_spapr_register_memory
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 222 
 1 file changed, 148 insertions(+), 74 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 4ff8289..ee91d51 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -91,10 +91,16 @@ static void decrement_locked_vm(long npages)
  */
 struct tce_container {
struct mutex lock;
-   struct iommu_group *grp;
bool enabled;
unsigned long locked_pages;
struct list_head mem_list;
+   struct iommu_table tables[POWERPC_IOMMU_MAX_TABLES];
+   struct list_head group_list;
+};
+
+struct tce_iommu_group {
+   struct list_head next;
+   struct iommu_group *grp;
 };
 
 struct tce_memory {
@@ -300,19 +306,20 @@ static bool tce_page_is_contained(struct page *page, 
unsigned page_shift)
return false;
 }
 
+static inline bool tce_groups_attached(struct tce_container *container)
+{
+   return !list_empty(container-group_list);
+}
+
 static struct iommu_table *spapr_tce_find_table(
struct tce_container *container,
phys_addr_t ioba)
 {
long i;
struct iommu_table *ret = NULL;
-   struct powerpc_iommu *iommu = iommu_group_get_iommudata(container-grp);
-
-   if (!iommu)
-   return NULL;
 
for (i = 0; i  POWERPC_IOMMU_MAX_TABLES; ++i) {
-   struct iommu_table *tbl = iommu-tables[i];
+   struct iommu_table *tbl = container-tables[i];
unsigned long entry = ioba  tbl-it_page_shift;
unsigned long start = tbl-it_offset;
unsigned long end = start + tbl-it_size;
@@ -330,11 +337,8 @@ static int tce_iommu_enable(struct tce_container 
*container)
 {
int ret = 0;
unsigned long locked;
-   struct iommu_table *tbl;
struct powerpc_iommu *iommu;
-
-   if (!container-grp)
-   return -ENXIO;
+   struct tce_iommu_group *tcegrp;
 
if (!current-mm)
return -ESRCH; /* process exited */
@@ -368,12 +372,24 @@ static int tce_iommu_enable(struct tce_container 
*container)
 * KVM agnostic.
 */
if (!tce_preregistered(container)) {
-   iommu = iommu_group_get_iommudata(container-grp);
+   if (!tce_groups_attached(container))
+   return -ENODEV;
+
+   tcegrp = list_first_entry(container-group_list,
+   struct tce_iommu_group, next);
+   iommu = iommu_group_get_iommudata(tcegrp-grp);
if (!iommu)
return -ENODEV;
 
-   tbl = iommu-tables[0];
-   locked = (tbl-it_size  tbl-it_page_shift)  PAGE_SHIFT;
+   /*
+* We do not allow enabling a group if no DMA-able memory was
+* registered as there is no way to know how much we should
+* increment the locked_vm counter.
+*/
+   if (!iommu-tce32_size)
+   return -EPERM;
+
+   locked = iommu-tce32_size  PAGE_SHIFT;
ret = try_increment_locked_vm(locked);
if (ret)
return ret;
@@ -386,6 +402,10 @@ static int tce_iommu_enable(struct tce_container 
*container)
return ret;
 }
 
+static int tce_iommu_clear(struct tce_container *container,
+   struct iommu_table *tbl,
+   unsigned long entry, unsigned long pages);
+
 static void tce_iommu_disable(struct tce_container *container)
 {
if (!container-enabled)
@@ -414,6 +434,7 @@ static void *tce_iommu_open(unsigned long arg)
 

[PATCH v4 17/28] powerpc/pseries/lpar: Enable VFIO

2015-02-16 Thread Alexey Kardashevskiy
The previous patch introduced iommu_table_ops::exchange() callback
which effectively disabled VFIO on pseries. This implements exchange()
for pseries/lpar so VFIO can work in nested guests.

Since exchange() callback returns an old TCE, it has to call H_GET_TCE
for every TCE being put to the table so VFIO performance in guests
running under PR KVM is expected to be slower than in guests running under
HV KVM or bare metal hosts.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
Changes:
v5:
* added global lock for xchg operations
* added missing be64_to_cpu(oldtce)
---
 arch/powerpc/platforms/pseries/iommu.c | 44 --
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index f537e6e..a903a27 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -137,14 +137,25 @@ static void tce_freemulti_pSeriesLP(struct iommu_table*, 
long, long);
 
 static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
long npages, unsigned long uaddr,
+   unsigned long *old_tces,
enum dma_data_direction direction,
struct dma_attrs *attrs)
 {
u64 rc = 0;
u64 proto_tce, tce;
u64 rpn;
-   int ret = 0;
+   int ret = 0, i = 0;
long tcenum_start = tcenum, npages_start = npages;
+   static spinlock_t get_tces_lock;
+   static bool get_tces_lock_initialized;
+
+   if (old_tces) {
+   if (!get_tces_lock_initialized) {
+   spin_lock_init(get_tces_lock);
+   get_tces_lock_initialized = true;
+   }
+   spin_lock(get_tces_lock);
+   }
 
rpn = __pa(uaddr)  TCE_SHIFT;
proto_tce = TCE_PCI_READ;
@@ -153,6 +164,14 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, 
long tcenum,
 
while (npages--) {
tce = proto_tce | (rpn  TCE_RPN_MASK)  TCE_RPN_SHIFT;
+   if (old_tces) {
+   unsigned long oldtce = 0;
+
+   plpar_tce_get((u64)tbl-it_index, (u64)tcenum  12,
+   oldtce);
+   old_tces[i] = be64_to_cpu(oldtce);
+   i++;
+   }
rc = plpar_tce_put((u64)tbl-it_index, (u64)tcenum  12, tce);
 
if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
@@ -173,13 +192,18 @@ static int tce_build_pSeriesLP(struct iommu_table *tbl, 
long tcenum,
tcenum++;
rpn++;
}
+
+   if (old_tces)
+   spin_unlock(get_tces_lock);
+
return ret;
 }
 
 static DEFINE_PER_CPU(__be64 *, tce_page);
 
-static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
+static int tce_xchg_pSeriesLP(struct iommu_table *tbl, long tcenum,
 long npages, unsigned long uaddr,
+unsigned long *old_tces,
 enum dma_data_direction direction,
 struct dma_attrs *attrs)
 {
@@ -194,6 +218,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
 
if ((npages == 1) || !firmware_has_feature(FW_FEATURE_MULTITCE)) {
return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
+  old_tces,
   direction, attrs);
}
 
@@ -210,6 +235,7 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
if (!tcep) {
local_irq_restore(flags);
return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
+   old_tces,
direction, attrs);
}
__this_cpu_write(tce_page, tcep);
@@ -231,6 +257,10 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
for (l = 0; l  limit; l++) {
tcep[l] = cpu_to_be64(proto_tce | (rpn  TCE_RPN_MASK) 
 TCE_RPN_SHIFT);
rpn++;
+   if (old_tces)
+   plpar_tce_get((u64)tbl-it_index,
+   (u64)(tcenum + l)  12,
+   old_tces[tcenum + l]);
}
 
rc = plpar_tce_put_indirect((u64)tbl-it_index,
@@ -261,6 +291,15 @@ static int tce_buildmulti_pSeriesLP(struct iommu_table 
*tbl, long tcenum,
return ret;
 }
 
+static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
+long npages, unsigned long uaddr,
+enum 

[PATCH v4 22/28] powerpc/powernv: Implement multilevel TCE tables

2015-02-16 Thread Alexey Kardashevskiy
This adds multi-level TCE tables support to pnv_pci_ioda2_create_table()
and pnv_pci_ioda2_free_table() callbacks.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/include/asm/iommu.h  |   4 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 125 +++---
 arch/powerpc/platforms/powernv/pci.c  |  19 +
 3 files changed, 122 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index cc26eca..283f70f 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -85,6 +85,8 @@ struct iommu_pool {
 struct iommu_table {
unsigned long  it_busno; /* Bus number this table belongs to */
unsigned long  it_size;  /* Size of iommu table in entries */
+   unsigned long  it_indirect_levels;
+   unsigned long  it_level_size;
unsigned long  it_offset;/* Offset into global table */
unsigned long  it_base;  /* mapped address of tce table */
unsigned long  it_index; /* which iommu table this is */
@@ -133,6 +135,8 @@ extern struct iommu_table *iommu_init_table(struct 
iommu_table * tbl,
 
 #define POWERPC_IOMMU_MAX_TABLES   1
 
+#define POWERPC_IOMMU_DEFAULT_LEVELS   1
+
 struct powerpc_iommu;
 
 struct powerpc_iommu_ops {
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 1f725d4..f542819 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1295,16 +1295,79 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
*phb,
__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
 }
 
+static void pnv_free_tce_table(unsigned long addr, unsigned size,
+   unsigned level)
+{
+   addr = ~(TCE_PCI_READ | TCE_PCI_WRITE);
+
+   if (level) {
+   long i;
+   u64 *tmp = (u64 *) addr;
+
+   for (i = 0; i  size; ++i) {
+   unsigned long hpa = be64_to_cpu(tmp[i]);
+
+   if (!(hpa  (TCE_PCI_READ | TCE_PCI_WRITE)))
+   continue;
+
+   pnv_free_tce_table((unsigned long) __va(hpa),
+   size, level - 1);
+   }
+   }
+
+   free_pages(addr, get_order(size  3));
+}
+
+static __be64 *pnv_alloc_tce_table(int nid,
+   unsigned shift, unsigned levels, unsigned long *left)
+{
+   struct page *tce_mem = NULL;
+   __be64 *addr, *tmp;
+   unsigned order = max_t(unsigned, shift, PAGE_SHIFT) - PAGE_SHIFT;
+   unsigned long chunk = 1UL  shift, i;
+
+   tce_mem = alloc_pages_node(nid, GFP_KERNEL, order);
+   if (!tce_mem) {
+   pr_err(Failed to allocate a TCE memory\n);
+   return NULL;
+   }
+
+   if (!*left)
+   return NULL;
+
+   addr = page_address(tce_mem);
+   memset(addr, 0, chunk);
+
+   --levels;
+   if (!levels) {
+   /* This is last level, actual TCEs */
+   *left -= min(*left, chunk);
+   return addr;
+   }
+
+   for (i = 0; i  (chunk  3); ++i) {
+   /* We allocated required TCEs, mark the rest page fault */
+   if (!*left) {
+   addr[i] = cpu_to_be64(0);
+   continue;
+   }
+
+   tmp = pnv_alloc_tce_table(nid, shift, levels, left);
+   addr[i] = cpu_to_be64(__pa(tmp) |
+   TCE_PCI_READ | TCE_PCI_WRITE);
+   }
+
+   return addr;
+}
+
 static long pnv_pci_ioda2_create_table(struct pnv_ioda_pe *pe,
-   __u32 page_shift, __u32 window_shift,
+   __u32 page_shift, __u32 window_shift, __u32 levels,
struct iommu_table *tbl)
 {
int nid = pe-phb-hose-node;
-   struct page *tce_mem = NULL;
void *addr;
-   unsigned long tce_table_size;
-   int64_t rc;
-   unsigned order;
+   unsigned long tce_table_size, left;
+   unsigned shift;
 
if ((page_shift != 12)  (page_shift != 16)  (page_shift != 24))
return -EINVAL;
@@ -1312,20 +1375,27 @@ static long pnv_pci_ioda2_create_table(struct 
pnv_ioda_pe *pe,
if ((1ULL  window_shift)  memory_hotplug_max())
return -EINVAL;
 
+   if (!levels || (levels  5))
+   return -EINVAL;
+
tce_table_size = (1ULL  (window_shift - page_shift)) * 8;
tce_table_size = max(0x1000UL, tce_table_size);
 
/* Allocate TCE table */
-   order = get_order(tce_table_size);
+#define ROUND_UP(x, n) (((x) + (n) - 1u)  ~((n) - 1u))
+   shift = ROUND_UP(window_shift - page_shift, levels) / levels;
+   shift += 3;
+   shift = max_t(unsigned, shift, IOMMU_PAGE_SHIFT_4K);
+   pr_info(Creating TCE table %08llx, %d levels, TCE table size = %lx\n,
+   1ULL  

[PATCH v4 21/28] powerpc/iommu: Split iommu_free_table into 2 helpers

2015-02-16 Thread Alexey Kardashevskiy
The iommu_free_table helper release memory it is using (the TCE table and
@it_map) and release the iommu_table struct as well. We might not want
the very last step as we store iommu_table in parent structures.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/include/asm/iommu.h |  1 +
 arch/powerpc/kernel/iommu.c  | 57 
 2 files changed, 35 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index bf26d47..cc26eca 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -122,6 +122,7 @@ static inline void *get_iommu_table_base(struct device *dev)
 
 extern struct iommu_table *iommu_table_alloc(int node);
 /* Frees table for an individual device node */
+extern void iommu_reset_table(struct iommu_table *tbl, const char *node_name);
 extern void iommu_free_table(struct iommu_table *tbl, const char *node_name);
 
 /* Initializes an iommu_table based in values set in the passed-in
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index e4b89bf..c0d05ec 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -721,24 +721,46 @@ struct iommu_table *iommu_table_alloc(int node)
return iommu-tables[0];
 }
 
+void iommu_reset_table(struct iommu_table *tbl, const char *node_name)
+{
+   if (!tbl)
+   return;
+
+   if (tbl-it_map) {
+   unsigned long bitmap_sz;
+   unsigned int order;
+
+   /*
+* In case we have reserved the first bit, we should not emit
+* the warning below.
+*/
+   if (tbl-it_offset == 0)
+   clear_bit(0, tbl-it_map);
+
+   /* verify that table contains no entries */
+   if (!bitmap_empty(tbl-it_map, tbl-it_size))
+   pr_warn(%s: Unexpected TCEs for %s\n, __func__,
+   node_name);
+
+   /* calculate bitmap size in bytes */
+   bitmap_sz = BITS_TO_LONGS(tbl-it_size) * sizeof(unsigned long);
+
+   /* free bitmap */
+   order = get_order(bitmap_sz);
+   free_pages((unsigned long) tbl-it_map, order);
+   }
+
+   memset(tbl, 0, sizeof(*tbl));
+}
+
 void iommu_free_table(struct iommu_table *tbl, const char *node_name)
 {
-   unsigned long bitmap_sz;
-   unsigned int order;
struct powerpc_iommu *iommu = tbl-it_iommu;
 
-   if (!tbl || !tbl-it_map) {
-   printk(KERN_ERR %s: expected TCE map for %s\n, __func__,
-   node_name);
+   if (!tbl)
return;
-   }
 
-   /*
-* In case we have reserved the first bit, we should not emit
-* the warning below.
-*/
-   if (tbl-it_offset == 0)
-   clear_bit(0, tbl-it_map);
+   iommu_reset_table(tbl, node_name);
 
 #ifdef CONFIG_IOMMU_API
if (iommu-group) {
@@ -747,17 +769,6 @@ void iommu_free_table(struct iommu_table *tbl, const char 
*node_name)
}
 #endif
 
-   /* verify that table contains no entries */
-   if (!bitmap_empty(tbl-it_map, tbl-it_size))
-   pr_warn(%s: Unexpected TCEs for %s\n, __func__, node_name);
-
-   /* calculate bitmap size in bytes */
-   bitmap_sz = BITS_TO_LONGS(tbl-it_size) * sizeof(unsigned long);
-
-   /* free bitmap */
-   order = get_order(bitmap_sz);
-   free_pages((unsigned long) tbl-it_map, order);
-
/* free table */
kfree(iommu);
 }
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 18/28] poweppc/powernv/ioda2: Rework iommu_table creation

2015-02-16 Thread Alexey Kardashevskiy
This moves iommu_table creation to the beginning. This is a mechanical
patch.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 31 +--
 drivers/vfio/vfio_iommu_spapr_tce.c   |  4 +++-
 2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 6d279d5..ebfea0a 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1393,27 +1393,31 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
addr = page_address(tce_mem);
memset(addr, 0, tce_table_size);
 
+   /* Setup iommu */
+   pe-iommu.tables[0].it_iommu = pe-iommu;
+
+   /* Setup linux iommu table */
+   tbl = pe-iommu.tables[0];
+   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
+   IOMMU_PAGE_SHIFT_4K);
+
+   tbl-it_ops = pnv_ioda2_iommu_ops;
+   iommu_init_table(tbl, phb-hose-node);
+   pe-iommu.ops = pnv_pci_ioda2_ops;
+
/*
 * Map TCE table through TVT. The TVE index is the PE number
 * shifted by 1 bit for 32-bits DMA space.
 */
rc = opal_pci_map_pe_dma_window(phb-opal_id, pe-pe_number,
-   pe-pe_number  1, 1, __pa(addr),
-   tce_table_size, 0x1000);
+   pe-pe_number  1, 1, __pa(tbl-it_base),
+   tbl-it_size  3, 1ULL  tbl-it_page_shift);
if (rc) {
pe_err(pe, Failed to configure 32-bit TCE table,
err %ld\n, rc);
goto fail;
}
 
-   /* Setup iommu */
-   pe-iommu.tables[0].it_iommu = pe-iommu;
-
-   /* Setup linux iommu table */
-   tbl = pe-iommu.tables[0];
-   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
-   IOMMU_PAGE_SHIFT_4K);
-
/* OPAL variant of PHB3 invalidated TCEs */
swinvp = of_get_property(phb-hose-dn, ibm,opal-tce-kill, NULL);
if (swinvp) {
@@ -1427,14 +1431,13 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
8);
tbl-it_type |= (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE);
}
-   tbl-it_ops = pnv_ioda2_iommu_ops;
-   iommu_init_table(tbl, phb-hose-node);
-   pe-iommu.ops = pnv_pci_ioda2_ops;
+
iommu_register_group(pe-iommu, phb-hose-global_number,
pe-pe_number);
 
if (pe-pdev)
-   set_iommu_table_base_and_group(pe-pdev-dev, tbl);
+   set_iommu_table_base_and_group(pe-pdev-dev,
+   pe-iommu.tables[0]);
else
pnv_ioda_setup_bus_dma(pe, pe-pbus, true);
 
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index badb648..b5134b7 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -539,6 +539,7 @@ static long tce_iommu_build(struct tce_container *container,
struct page *page;
unsigned long hva, oldtce;
enum dma_data_direction direction = tce_iommu_direction(tce);
+   bool do_put = false;
 
for (i = 0; i  pages; ++i) {
if (tce_preregistered(container))
@@ -565,7 +566,8 @@ static long tce_iommu_build(struct tce_container *container,
oldtce = 0;
ret = iommu_tce_xchg(tbl, entry + i, hva, oldtce, direction);
if (ret) {
-   tce_iommu_unuse_page(container, hva);
+   if (do_put)
+   tce_iommu_unuse_page(container, hva);
pr_err(iommu_tce: %s failed ioba=%lx, tce=%lx, 
ret=%ld\n,
__func__, entry  tbl-it_page_shift,
tce, ret);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/4] powerpc: trivial unused functions cleanup

2015-02-16 Thread Arseny Solokha
This series removes unused functions from powerpc tree that I've been
able to discover.

Arseny Solokha (4):
  powerpc/boot: drop planetcore_set_serial_speed
  kvm/ppc/mpic: drop unused IRQ_testbit
  powrepc/qe: drop unused ucc_slow_poll_transmitter_now
  powerpc/mpic: remove unused functions

 arch/powerpc/boot/planetcore.c| 33 -
 arch/powerpc/boot/planetcore.h|  3 ---
 arch/powerpc/include/asm/mpic.h   | 16 
 arch/powerpc/include/asm/ucc_slow.h   | 13 -
 arch/powerpc/kvm/mpic.c   |  5 -
 arch/powerpc/sysdev/mpic.c| 35 ---
 arch/powerpc/sysdev/qe_lib/ucc_slow.c |  5 -
 7 files changed, 110 deletions(-)

-- 
2.3.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/4] powerpc/boot: drop planetcore_set_serial_speed

2015-02-16 Thread Arseny Solokha
Drop planetcore_set_serial_speed() which had no users since its inception.

Signed-off-by: Arseny Solokha asolo...@kb.kras.ru
---
 arch/powerpc/boot/planetcore.c | 33 -
 arch/powerpc/boot/planetcore.h |  3 ---
 2 files changed, 36 deletions(-)

diff --git a/arch/powerpc/boot/planetcore.c b/arch/powerpc/boot/planetcore.c
index 0d8558a..75117e6 100644
--- a/arch/powerpc/boot/planetcore.c
+++ b/arch/powerpc/boot/planetcore.c
@@ -131,36 +131,3 @@ void planetcore_set_stdout_path(const char *table)
 
setprop_str(chosen, linux,stdout-path, path);
 }
-
-void planetcore_set_serial_speed(const char *table)
-{
-   void *chosen, *stdout;
-   u64 baud;
-   u32 baud32;
-   int len;
-
-   chosen = finddevice(/chosen);
-   if (!chosen)
-   return;
-
-   len = getprop(chosen, linux,stdout-path, prop_buf, MAX_PROP_LEN);
-   if (len = 0)
-   return;
-
-   stdout = finddevice(prop_buf);
-   if (!stdout) {
-   printf(planetcore_set_serial_speed: 
-  Bad /chosen/linux,stdout-path.\r\n);
-
-   return;
-   }
-
-   if (!planetcore_get_decimal(table, PLANETCORE_KEY_SERIAL_BAUD,
-   baud)) {
-   printf(planetcore_set_serial_speed: No SB tag.\r\n);
-   return;
-   }
-
-   baud32 = baud;
-   setprop(stdout, current-speed, baud32, 4);
-}
diff --git a/arch/powerpc/boot/planetcore.h b/arch/powerpc/boot/planetcore.h
index 0d4094f..d53c733 100644
--- a/arch/powerpc/boot/planetcore.h
+++ b/arch/powerpc/boot/planetcore.h
@@ -43,7 +43,4 @@ void planetcore_set_mac_addrs(const char *table);
  */
 void planetcore_set_stdout_path(const char *table);
 
-/* Sets the current-speed property in the serial node. */
-void planetcore_set_serial_speed(const char *table);
-
 #endif
-- 
2.3.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/4] kvm/ppc/mpic: drop unused IRQ_testbit

2015-02-16 Thread Arseny Solokha
Drop unused static procedure which doesn't have callers within its
translation unit.

Signed-off-by: Arseny Solokha asolo...@kb.kras.ru
Cc: Alexander Graf ag...@suse.de
Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
---
 arch/powerpc/kvm/mpic.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/powerpc/kvm/mpic.c b/arch/powerpc/kvm/mpic.c
index 39b3a8f..a480d99 100644
--- a/arch/powerpc/kvm/mpic.c
+++ b/arch/powerpc/kvm/mpic.c
@@ -289,11 +289,6 @@ static inline void IRQ_resetbit(struct irq_queue *q, int 
n_IRQ)
clear_bit(n_IRQ, q-queue);
 }
 
-static inline int IRQ_testbit(struct irq_queue *q, int n_IRQ)
-{
-   return test_bit(n_IRQ, q-queue);
-}
-
 static void IRQ_check(struct openpic *opp, struct irq_queue *q)
 {
int irq = -1;
-- 
2.3.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 3/4] powrepc/qe: drop unused ucc_slow_poll_transmitter_now

2015-02-16 Thread Arseny Solokha
Drop ucc_slow_poll_transmitter_now() which has no users.

Signed-off-by: Arseny Solokha asolo...@kb.kras.ru
---
 arch/powerpc/include/asm/ucc_slow.h   | 13 -
 arch/powerpc/sysdev/qe_lib/ucc_slow.c |  5 -
 2 files changed, 18 deletions(-)

diff --git a/arch/powerpc/include/asm/ucc_slow.h 
b/arch/powerpc/include/asm/ucc_slow.h
index c44131e..233ef5f 100644
--- a/arch/powerpc/include/asm/ucc_slow.h
+++ b/arch/powerpc/include/asm/ucc_slow.h
@@ -251,19 +251,6 @@ void ucc_slow_enable(struct ucc_slow_private * uccs, enum 
comm_dir mode);
  */
 void ucc_slow_disable(struct ucc_slow_private * uccs, enum comm_dir mode);
 
-/* ucc_slow_poll_transmitter_now
- * Immediately forces a poll of the transmitter for data to be sent.
- * Typically, the hardware performs a periodic poll for data that the
- * transmit routine has set up to be transmitted. In cases where
- * this polling cycle is not soon enough, this optional routine can
- * be invoked to force a poll right away, instead. Proper use for
- * each transmission for which this functionality is desired is to
- * call the transmit routine and then this routine right after.
- *
- * uccs - (In) pointer to the slow UCC structure.
- */
-void ucc_slow_poll_transmitter_now(struct ucc_slow_private * uccs);
-
 /* ucc_slow_graceful_stop_tx
  * Smoothly stops transmission on a specified slow UCC.
  *
diff --git a/arch/powerpc/sysdev/qe_lib/ucc_slow.c 
b/arch/powerpc/sysdev/qe_lib/ucc_slow.c
index befaf11..5f91628 100644
--- a/arch/powerpc/sysdev/qe_lib/ucc_slow.c
+++ b/arch/powerpc/sysdev/qe_lib/ucc_slow.c
@@ -43,11 +43,6 @@ u32 ucc_slow_get_qe_cr_subblock(int uccs_num)
 }
 EXPORT_SYMBOL(ucc_slow_get_qe_cr_subblock);
 
-void ucc_slow_poll_transmitter_now(struct ucc_slow_private * uccs)
-{
-   out_be16(uccs-us_regs-utodr, UCC_SLOW_TOD);
-}
-
 void ucc_slow_graceful_stop_tx(struct ucc_slow_private * uccs)
 {
struct ucc_slow_info *us_info = uccs-us_info;
-- 
2.3.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/4] powerpc/mpic: remove unused functions

2015-02-16 Thread Arseny Solokha
Drop unused fsl_mpic_primary_get_version(), mpic_set_clk_ratio(),
mpic_set_serial_int().

Signed-off-by: Arseny Solokha asolo...@kb.kras.ru
---
 arch/powerpc/include/asm/mpic.h | 16 
 arch/powerpc/sysdev/mpic.c  | 35 ---
 2 files changed, 51 deletions(-)

diff --git a/arch/powerpc/include/asm/mpic.h b/arch/powerpc/include/asm/mpic.h
index 754f93d..3a2ab60 100644
--- a/arch/powerpc/include/asm/mpic.h
+++ b/arch/powerpc/include/asm/mpic.h
@@ -395,16 +395,6 @@ extern struct bus_type mpic_subsys;
 #defineMPIC_REGSET_STANDARDMPIC_REGSET(0)  /* Original 
MPIC */
 #defineMPIC_REGSET_TSI108  MPIC_REGSET(1)  /* Tsi108/109 
PIC */
 
-/* Get the version of primary MPIC */
-#ifdef CONFIG_MPIC
-extern u32 fsl_mpic_primary_get_version(void);
-#else
-static inline u32 fsl_mpic_primary_get_version(void)
-{
-   return 0;
-}
-#endif
-
 /* Allocate the controller structure and setup the linux irq descs
  * for the range if interrupts passed in. No HW initialization is
  * actually performed.
@@ -496,11 +486,5 @@ extern unsigned int mpic_get_coreint_irq(void);
 /* Fetch Machine Check interrupt from primary mpic */
 extern unsigned int mpic_get_mcirq(void);
 
-/* Set the EPIC clock ratio */
-void mpic_set_clk_ratio(struct mpic *mpic, u32 clock_ratio);
-
-/* Enable/Disable EPIC serial interrupt mode */
-void mpic_set_serial_int(struct mpic *mpic, int enable);
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_MPIC_H */
diff --git a/arch/powerpc/sysdev/mpic.c b/arch/powerpc/sysdev/mpic.c
index bbfbbf2..f72b592 100644
--- a/arch/powerpc/sysdev/mpic.c
+++ b/arch/powerpc/sysdev/mpic.c
@@ -1219,16 +1219,6 @@ static u32 fsl_mpic_get_version(struct mpic *mpic)
  * Exported functions
  */
 
-u32 fsl_mpic_primary_get_version(void)
-{
-   struct mpic *mpic = mpic_primary;
-
-   if (mpic)
-   return fsl_mpic_get_version(mpic);
-
-   return 0;
-}
-
 struct mpic * __init mpic_alloc(struct device_node *node,
phys_addr_t phys_addr,
unsigned int flags,
@@ -1676,31 +1666,6 @@ void __init mpic_init(struct mpic *mpic)
mpic_err_int_init(mpic, MPIC_FSL_ERR_INT);
 }
 
-void __init mpic_set_clk_ratio(struct mpic *mpic, u32 clock_ratio)
-{
-   u32 v;
-
-   v = mpic_read(mpic-gregs, MPIC_GREG_GLOBAL_CONF_1);
-   v = ~MPIC_GREG_GLOBAL_CONF_1_CLK_RATIO_MASK;
-   v |= MPIC_GREG_GLOBAL_CONF_1_CLK_RATIO(clock_ratio);
-   mpic_write(mpic-gregs, MPIC_GREG_GLOBAL_CONF_1, v);
-}
-
-void __init mpic_set_serial_int(struct mpic *mpic, int enable)
-{
-   unsigned long flags;
-   u32 v;
-
-   raw_spin_lock_irqsave(mpic_lock, flags);
-   v = mpic_read(mpic-gregs, MPIC_GREG_GLOBAL_CONF_1);
-   if (enable)
-   v |= MPIC_GREG_GLOBAL_CONF_1_SIE;
-   else
-   v = ~MPIC_GREG_GLOBAL_CONF_1_SIE;
-   mpic_write(mpic-gregs, MPIC_GREG_GLOBAL_CONF_1, v);
-   raw_spin_unlock_irqrestore(mpic_lock, flags);
-}
-
 void mpic_irq_set_priority(unsigned int irq, unsigned int pri)
 {
struct mpic *mpic = mpic_find(irq);
-- 
2.3.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 00/28] powerpc/iommu/vfio: Enable Dynamic DMA windows

2015-02-16 Thread Alexey Kardashevskiy

This enables PAPR defined feature called Dynamic DMA windows (DDW).

Each Partitionable Endpoint (IOMMU group) has a separate DMA window on
a PCI bus where devices are allows to perform DMA. By default there is
1 or 2GB window allocated at the host boot time and these windows are
used when an IOMMU group is passed to the userspace (guest). These windows
are mapped at zero offset on a PCI bus.

Hi-speed devices may suffer from limited size of this window. On the host
side a TCE bypass mode is enabled on POWER8 CPU which implements
direct mapping of the host memory to a PCI bus at 159.

For the guest, PAPR defines a DDW RTAS API which allows the pseries guest
to query the hypervisor if it supports DDW and what are the parameters
of possible windows.

Currently POWER8 supports 2 DMA windows per PE - already mentioned and used
small 32bit window and 64bit window which can only start from 159 and
can support various page sizes.

This patchset reworks PPC IOMMU code and adds necessary structures
to extend it to support big windows.

When the guest detectes the feature and the PE is capable of 64bit DMA,
it does:
1. query to hypervisor about number of available windows and page masks;
2. creates a window with the biggest possible page size (current guests can do
64K or 16MB TCEs);
3. maps the entire guest RAM via H_PUT_TCE* hypercalls
4. switches dma_ops to direct_dma_ops on the selected PE.

Once this is done, H_PUT_TCE is not called anymore and the guest gets
maximum performance.

Changes:
v4:
* moved patches around to have VFIO and PPC patches separated as much as
possible; once I get Ack from any PPC maintainer about the whole approach,
I'll start posting these in small chunks per maintainer
* now works with the existing upstream QEMU

v3:
* redesigned the whole thing
* multiple IOMMU groups per PHB - one PHB is needed for VFIO in the guest -
no problems with locked_vm counting; also we save memory on actual tables
* guest RAM preregistration is required for DDW
* PEs (IOMMU groups) are passed to VFIO with no DMA windows at all so
we do not bother with iommu_table::it_map anymore
* added multilevel TCE tables support to support really huge guests

v2:
* added missing __pa() in powerpc/powernv: Release replaced TCE
* reposted to make some noise




Alexey Kardashevskiy (28):
  vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU
driver
  vfio: powerpc/spapr: Do cleanup when releasing the group
  vfio: powerpc/spapr: Check that TCE page size is equal to it_page_size
  vfio: powerpc/spapr: Use it_page_size
  vfio: powerpc/spapr: Move locked_vm accounting to helpers
  vfio: powerpc/spapr: Disable DMA mappings on disabled container
  vfio: powerpc/spapr: Moving pinning/unpinning to helpers
  vfio: powerpc/spapr: Register memory
  powerpc/powernv: Do not set read flag if direction==DMA_NONE
  powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table
  powerpc/iommu: Introduce iommu_table_alloc() helper
  powerpc/spapr: vfio: Switch from iommu_table to new powerpc_iommu
  powerpc/iommu: Fix IOMMU ownership control functions
  vfio: powerpc/spapr: powerpc/powernv/ioda2: Rework IOMMU ownership
control
  powerpc/powernv/ioda/ioda2: Rework tce_build()/tce_free()
  powerpc/iommu/powernv: Release replaced TCE
  powerpc/pseries/lpar: Enable VFIO
  poweppc/powernv/ioda2: Rework iommu_table creation
  powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_create_table
  powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window
  powerpc/iommu: Split iommu_free_table into 2 helpers
  powerpc/powernv: Implement multilevel TCE tables
  powerpc/powernv: Change prototypes to receive iommu
  powerpc/powernv/ioda: Define and implement DMA table/window management
callbacks
  vfio: powerpc/spapr: powerpc/powernv/ioda2: Rework ownership
  vfio: powerpc/spapr: Rework an IOMMU group attach/detach
  vfio: powerpc/spapr: Register memory
  vfio: powerpc/spapr: Support Dynamic DMA windows

 Documentation/vfio.txt  |  25 +
 arch/powerpc/include/asm/iommu.h| 109 +++-
 arch/powerpc/include/asm/machdep.h  |  25 -
 arch/powerpc/kernel/eeh.c   |   2 +-
 arch/powerpc/kernel/iommu.c | 322 +-
 arch/powerpc/kernel/vio.c   |   5 +
 arch/powerpc/platforms/cell/iommu.c |   8 +-
 arch/powerpc/platforms/pasemi/iommu.c   |   7 +-
 arch/powerpc/platforms/powernv/pci-ioda.c   | 473 +++---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |  21 +-
 arch/powerpc/platforms/powernv/pci.c| 130 ++--
 arch/powerpc/platforms/powernv/pci.h|  14 +-
 arch/powerpc/platforms/pseries/iommu.c  |  99 ++-
 arch/powerpc/sysdev/dart_iommu.c|  12 +-
 drivers/vfio/vfio_iommu_spapr_tce.c | 944 +---
 include/uapi/linux/vfio.h   |  49 +-
 16 files changed, 1756 insertions(+), 489 deletions(-)

-- 
2.0.0

___
Linuxppc-dev 

[PATCH v4 02/28] vfio: powerpc/spapr: Do cleanup when releasing the group

2015-02-16 Thread Alexey Kardashevskiy
This clears the TCE table when a container is being closed as this is
a good thing to leave the table clean before passing the ownership
back to the host kernel.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 1ef46c3..daf2e2c 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -134,16 +134,24 @@ static void *tce_iommu_open(unsigned long arg)
return container;
 }
 
+static int tce_iommu_clear(struct tce_container *container,
+   struct iommu_table *tbl,
+   unsigned long entry, unsigned long pages);
+
 static void tce_iommu_release(void *iommu_data)
 {
struct tce_container *container = iommu_data;
+   struct iommu_table *tbl = container-tbl;
 
-   WARN_ON(container-tbl  !container-tbl-it_group);
+   WARN_ON(tbl  !tbl-it_group);
tce_iommu_disable(container);
 
-   if (container-tbl  container-tbl-it_group)
-   tce_iommu_detach_group(iommu_data, container-tbl-it_group);
+   if (tbl) {
+   tce_iommu_clear(container, tbl, tbl-it_offset, tbl-it_size);
 
+   if (tbl-it_group)
+   tce_iommu_detach_group(iommu_data, tbl-it_group);
+   }
mutex_destroy(container-lock);
 
kfree(container);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 23/28] powerpc/powernv: Change prototypes to receive iommu

2015-02-16 Thread Alexey Kardashevskiy
This changes few functions to receive a powerpc_iommu pointer
rather than PE as they are going to be a part of upcoming
powerpc_iommu_ops callback set.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index f542819..29bd7a4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1360,10 +1360,12 @@ static __be64 *pnv_alloc_tce_table(int nid,
return addr;
 }
 
-static long pnv_pci_ioda2_create_table(struct pnv_ioda_pe *pe,
+static long pnv_pci_ioda2_create_table(struct powerpc_iommu *iommu,
__u32 page_shift, __u32 window_shift, __u32 levels,
struct iommu_table *tbl)
 {
+   struct pnv_ioda_pe *pe = container_of(iommu, struct pnv_ioda_pe,
+   iommu);
int nid = pe-phb-hose-node;
void *addr;
unsigned long tce_table_size, left;
@@ -1419,9 +1421,11 @@ static void pnv_pci_ioda2_free_table(struct iommu_table 
*tbl)
iommu_reset_table(tbl, ioda2);
 }
 
-static long pnv_pci_ioda2_set_window(struct pnv_ioda_pe *pe,
+static long pnv_pci_ioda2_set_window(struct powerpc_iommu *iommu,
struct iommu_table *tbl)
 {
+   struct pnv_ioda_pe *pe = container_of(iommu, struct pnv_ioda_pe,
+   iommu);
struct pnv_phb *phb = pe-phb;
const __be64 *swinvp;
int64_t rc;
@@ -1554,12 +1558,11 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
 
/* The PE will reserve all possible 32-bits space */
pe-tce32_seg = 0;
-
end = (1  ilog2(phb-ioda.m32_pci_base));
pe_info(pe, Setting up 32-bit TCE table at 0..%08x\n,
end);
 
-   rc = pnv_pci_ioda2_create_table(pe, IOMMU_PAGE_SHIFT_4K,
+   rc = pnv_pci_ioda2_create_table(pe-iommu, IOMMU_PAGE_SHIFT_4K,
ilog2(phb-ioda.m32_pci_base),
POWERPC_IOMMU_DEFAULT_LEVELS, tbl);
if (rc) {
@@ -1571,7 +1574,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
pe-iommu.tables[0].it_iommu = pe-iommu;
pe-iommu.ops = pnv_pci_ioda2_ops;
 
-   rc = pnv_pci_ioda2_set_window(pe, tbl);
+   rc = pnv_pci_ioda2_set_window(pe-iommu, tbl);
if (rc) {
pe_err(pe, Failed to configure 32-bit TCE table,
err %ld\n, rc);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 06/28] vfio: powerpc/spapr: Disable DMA mappings on disabled container

2015-02-16 Thread Alexey Kardashevskiy
At the moment DMA map/unmap requests are handled irrespective to
the container's state. This allows the user space to pin memory which
it might not be allowed to pin.

This adds checks to MAP/UNMAP that the container is enabled, otherwise
-EPERM is returned.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 2beeae5..67ea392 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -345,6 +345,9 @@ static long tce_iommu_ioctl(void *iommu_data,
struct iommu_table *tbl = container-tbl;
unsigned long tce;
 
+   if (!container-enabled)
+   return -EPERM;
+
if (!tbl)
return -ENXIO;
 
@@ -389,6 +392,9 @@ static long tce_iommu_ioctl(void *iommu_data,
struct vfio_iommu_type1_dma_unmap param;
struct iommu_table *tbl = container-tbl;
 
+   if (!container-enabled)
+   return -EPERM;
+
if (WARN_ON(!tbl))
return -ENXIO;
 
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 20/28] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window

2015-02-16 Thread Alexey Kardashevskiy
This is a part of moving DMA window programming to an iommu_ops
callback.

This is a mechanical patch.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 84 ---
 1 file changed, 56 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 95d9119..1f725d4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1351,6 +1351,57 @@ static void pnv_pci_ioda2_free_table(struct iommu_table 
*tbl)
memset(tbl, 0, sizeof(struct iommu_table));
 }
 
+static long pnv_pci_ioda2_set_window(struct pnv_ioda_pe *pe,
+   struct iommu_table *tbl)
+{
+   struct pnv_phb *phb = pe-phb;
+   const __be64 *swinvp;
+   int64_t rc;
+   const __u64 start_addr = tbl-it_offset  tbl-it_page_shift;
+   const __u64 win_size = tbl-it_size  tbl-it_page_shift;
+
+   pe_info(pe, Setting up window at %llx..%llx pagesize=0x%x 
tablesize=0x%lx\n,
+   start_addr, start_addr + win_size - 1,
+   1UL  tbl-it_page_shift, tbl-it_size  3);
+
+   pe-iommu.tables[0] = *tbl;
+   tbl = pe-iommu.tables[0];
+   tbl-it_iommu = pe-iommu;
+
+   /*
+* Map TCE table through TVT. The TVE index is the PE number
+* shifted by 1 bit for 32-bits DMA space.
+*/
+   rc = opal_pci_map_pe_dma_window(phb-opal_id, pe-pe_number,
+   pe-pe_number  1, 1, __pa(tbl-it_base),
+   tbl-it_size  3, 1ULL  tbl-it_page_shift);
+   if (rc) {
+   pe_err(pe, Failed to configure TCE table, err %ld\n, rc);
+   goto fail;
+   }
+
+   /* OPAL variant of PHB3 invalidated TCEs */
+   swinvp = of_get_property(phb-hose-dn, ibm,opal-tce-kill, NULL);
+   if (swinvp) {
+   /* We need a couple more fields -- an address and a data
+* to or.  Since the bus is only printed out on table free
+* errors, and on the first pass the data will be a relative
+* bus number, print that out instead.
+*/
+   pe-tce_inval_reg_phys = be64_to_cpup(swinvp);
+   tbl-it_index = (unsigned long)ioremap(pe-tce_inval_reg_phys,
+   8);
+   tbl-it_type |= (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE);
+   }
+
+   return 0;
+fail:
+   if (pe-tce32_seg = 0)
+   pe-tce32_seg = -1;
+
+   return rc;
+}
+
 static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 {
uint16_t window_id = (pe-pe_number  1 ) + 1;
@@ -1421,7 +1472,6 @@ static struct powerpc_iommu_ops pnv_pci_ioda2_ops = {
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
   struct pnv_ioda_pe *pe)
 {
-   const __be64 *swinvp;
unsigned int end;
struct iommu_table *tbl = pe-iommu.tables[0];
int64_t rc;
@@ -1448,31 +1498,14 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
pe-iommu.tables[0].it_iommu = pe-iommu;
pe-iommu.ops = pnv_pci_ioda2_ops;
 
-   /*
-* Map TCE table through TVT. The TVE index is the PE number
-* shifted by 1 bit for 32-bits DMA space.
-*/
-   rc = opal_pci_map_pe_dma_window(phb-opal_id, pe-pe_number,
-   pe-pe_number  1, 1, __pa(tbl-it_base),
-   tbl-it_size  3, 1ULL  tbl-it_page_shift);
+   rc = pnv_pci_ioda2_set_window(pe, tbl);
if (rc) {
pe_err(pe, Failed to configure 32-bit TCE table,
err %ld\n, rc);
-   goto fail;
-   }
-
-   /* OPAL variant of PHB3 invalidated TCEs */
-   swinvp = of_get_property(phb-hose-dn, ibm,opal-tce-kill, NULL);
-   if (swinvp) {
-   /* We need a couple more fields -- an address and a data
-* to or.  Since the bus is only printed out on table free
-* errors, and on the first pass the data will be a relative
-* bus number, print that out instead.
-*/
-   pe-tce_inval_reg_phys = be64_to_cpup(swinvp);
-   tbl-it_index = (unsigned long)ioremap(pe-tce_inval_reg_phys,
-   8);
-   tbl-it_type |= (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE);
+   pnv_pci_ioda2_free_table(tbl);
+   if (pe-tce32_seg = 0)
+   pe-tce32_seg = -1;
+   return;
}
 
iommu_register_group(pe-iommu, phb-hose-global_number,
@@ -1486,11 +1519,6 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
 
/* Also create a bypass window */
pnv_pci_ioda2_setup_bypass_pe(phb, pe);
-   return;
-fail:
-   if (pe-tce32_seg = 0)
-   pe-tce32_seg = -1;
-   

[PATCH v4 05/28] vfio: powerpc/spapr: Move locked_vm accounting to helpers

2015-02-16 Thread Alexey Kardashevskiy
There moves locked pages accounting to helpers.
Later they will be reused for Dynamic DMA windows (DDW).

This reworks debug messages to show the current value and the limit.

This stores the locked pages number in the container so when unlocking
the iommu table pointer won't be needed. This does not have an effect
now but it will with the multiple tables per container as then we will
allow attaching/detaching groups on fly and we may end up having
a container with no group attached but with the counter incremented.

While we are here, update the comment explaining why RLIMIT_MEMLOCK
might be required to be bigger than the guest RAM. This also prints
pid of the current process in pr_warn/pr_debug.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
Changes:
v4:
* new helpers do nothing if @npages == 0
* tce_iommu_disable() now can decrement the counter if the group was
detached (not possible now but will be in the future)
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 82 -
 1 file changed, 63 insertions(+), 19 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index c2ca38f..2beeae5 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -31,6 +31,51 @@
 static void tce_iommu_detach_group(void *iommu_data,
struct iommu_group *iommu_group);
 
+static long try_increment_locked_vm(long npages)
+{
+   long ret = 0, locked, lock_limit;
+
+   if (!current || !current-mm)
+   return -ESRCH; /* process exited */
+
+   if (!npages)
+   return 0;
+
+   down_write(current-mm-mmap_sem);
+   locked = current-mm-locked_vm + npages;
+   lock_limit = rlimit(RLIMIT_MEMLOCK)  PAGE_SHIFT;
+   if (locked  lock_limit  !capable(CAP_IPC_LOCK))
+   ret = -ENOMEM;
+   else
+   current-mm-locked_vm += npages;
+
+   pr_debug([%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n, current-pid,
+   npages  PAGE_SHIFT,
+   current-mm-locked_vm  PAGE_SHIFT,
+   rlimit(RLIMIT_MEMLOCK),
+   ret ?  - exceeded : );
+
+   up_write(current-mm-mmap_sem);
+
+   return ret;
+}
+
+static void decrement_locked_vm(long npages)
+{
+   if (!current || !current-mm || !npages)
+   return; /* process exited */
+
+   down_write(current-mm-mmap_sem);
+   if (npages  current-mm-locked_vm)
+   npages = current-mm-locked_vm;
+   current-mm-locked_vm -= npages;
+   pr_debug([%d] RLIMIT_MEMLOCK -%ld %ld/%ld\n, current-pid,
+   npages  PAGE_SHIFT,
+   current-mm-locked_vm  PAGE_SHIFT,
+   rlimit(RLIMIT_MEMLOCK));
+   up_write(current-mm-mmap_sem);
+}
+
 /*
  * VFIO IOMMU fd for SPAPR_TCE IOMMU implementation
  *
@@ -47,6 +92,7 @@ struct tce_container {
struct mutex lock;
struct iommu_table *tbl;
bool enabled;
+   unsigned long locked_pages;
 };
 
 static bool tce_page_is_contained(struct page *page, unsigned page_shift)
@@ -68,7 +114,7 @@ static bool tce_page_is_contained(struct page *page, 
unsigned page_shift)
 static int tce_iommu_enable(struct tce_container *container)
 {
int ret = 0;
-   unsigned long locked, lock_limit, npages;
+   unsigned long locked;
struct iommu_table *tbl = container-tbl;
 
if (!container-tbl)
@@ -97,21 +143,22 @@ static int tce_iommu_enable(struct tce_container 
*container)
 * Also we don't have a nice way to fail on H_PUT_TCE due to ulimits,
 * that would effectively kill the guest at random points, much better
 * enforcing the limit based on the max that the guest can map.
+*
+* Unfortunately at the moment it counts whole tables, no matter how
+* much memory the guest has. I.e. for 4GB guest and 4 IOMMU groups
+* each with 2GB DMA window, 8GB will be counted here. The reason for
+* this is that we cannot tell here the amount of RAM used by the guest
+* as this information is only available from KVM and VFIO is
+* KVM agnostic.
 */
-   down_write(current-mm-mmap_sem);
-   npages = (tbl-it_size  tbl-it_page_shift)  PAGE_SHIFT;
-   locked = current-mm-locked_vm + npages;
-   lock_limit = rlimit(RLIMIT_MEMLOCK)  PAGE_SHIFT;
-   if (locked  lock_limit  !capable(CAP_IPC_LOCK)) {
-   pr_warn(RLIMIT_MEMLOCK (%ld) exceeded\n,
-   rlimit(RLIMIT_MEMLOCK));
-   ret = -ENOMEM;
-   } else {
+   locked = (tbl-it_size  tbl-it_page_shift)  PAGE_SHIFT;
+   ret = try_increment_locked_vm(locked);
+   if (ret)
+   return ret;
 
-   current-mm-locked_vm += npages;
-   container-enabled = true;
-   }
-   up_write(current-mm-mmap_sem);
+   container-locked_pages = locked;
+
+ 

[PATCH v4 13/28] powerpc/iommu: Fix IOMMU ownership control functions

2015-02-16 Thread Alexey Kardashevskiy
This adds missing locks in iommu_take_ownership()/
iommu_release_ownership().

This marks all pages busy in iommu_table::it_map in order to catch
errors if there is an attempt to use this table while ownership over it
is taken.

This only clears TCE content if there is no page marked busy in it_map.
Clearing must be done outside of the table locks as iommu_clear_tce()
called from iommu_clear_tces_and_put_pages() does this.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
Note: we might want to get rid of it as this patchset removes it_map
from tables passed to VFIO.

Changes:
v5:
* do not store bit#0 value, it has to be set for zero-based table
anyway
* removed test_and_clear_bit
* only disable bypass if succeeded
---
 arch/powerpc/kernel/iommu.c | 31 +--
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 952939f..407d0d6 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1024,33 +1024,48 @@ EXPORT_SYMBOL_GPL(iommu_tce_build);
 
 int iommu_take_ownership(struct iommu_table *tbl)
 {
-   unsigned long sz = (tbl-it_size + 7)  3;
+   unsigned long flags, i, sz = (tbl-it_size + 7)  3;
+   int ret = 0;
+
+   spin_lock_irqsave(tbl-large_pool.lock, flags);
+   for (i = 0; i  tbl-nr_pools; i++)
+   spin_lock(tbl-pools[i].lock);
 
if (tbl-it_offset == 0)
clear_bit(0, tbl-it_map);
 
if (!bitmap_empty(tbl-it_map, tbl-it_size)) {
pr_err(iommu_tce: it_map is not empty);
-   return -EBUSY;
+   ret = -EBUSY;
+   if (tbl-it_offset == 0)
+   set_bit(0, tbl-it_map);
+   } else {
+   memset(tbl-it_map, 0xff, sz);
}
 
-   memset(tbl-it_map, 0xff, sz);
+   for (i = 0; i  tbl-nr_pools; i++)
+   spin_unlock(tbl-pools[i].lock);
+   spin_unlock_irqrestore(tbl-large_pool.lock, flags);
 
/*
 * Disable iommu bypass, otherwise the user can DMA to all of
 * our physical memory via the bypass window instead of just
 * the pages that has been explicitly mapped into the iommu
 */
-   if (tbl-set_bypass)
+   if (!ret  tbl-set_bypass)
tbl-set_bypass(tbl, false);
 
-   return 0;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(iommu_take_ownership);
 
 void iommu_release_ownership(struct iommu_table *tbl)
 {
-   unsigned long sz = (tbl-it_size + 7)  3;
+   unsigned long flags, i, sz = (tbl-it_size + 7)  3;
+
+   spin_lock_irqsave(tbl-large_pool.lock, flags);
+   for (i = 0; i  tbl-nr_pools; i++)
+   spin_lock(tbl-pools[i].lock);
 
memset(tbl-it_map, 0, sz);
 
@@ -1058,6 +1073,10 @@ void iommu_release_ownership(struct iommu_table *tbl)
if (tbl-it_offset == 0)
set_bit(0, tbl-it_map);
 
+   for (i = 0; i  tbl-nr_pools; i++)
+   spin_unlock(tbl-pools[i].lock);
+   spin_unlock_irqrestore(tbl-large_pool.lock, flags);
+
/* The kernel owns the device now, we can restore the iommu bypass */
if (tbl-set_bypass)
tbl-set_bypass(tbl, true);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 11/28] powerpc/iommu: Introduce iommu_table_alloc() helper

2015-02-16 Thread Alexey Kardashevskiy
This replaces multiple calls of kzalloc_node() with a new
iommu_table_alloc() helper. Right now it calls kzalloc_node() but
later it will be modified to allocate a powerpc_iommu struct with
a single iommu_table in it.

Later the helper will allocate a powerpc_iommu struct which embeds
the iommu table(s).

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/include/asm/iommu.h   |  1 +
 arch/powerpc/kernel/iommu.c|  9 +
 arch/powerpc/platforms/powernv/pci.c   |  2 +-
 arch/powerpc/platforms/pseries/iommu.c | 12 
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index eb5822d..335e3d4 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -117,6 +117,7 @@ static inline void *get_iommu_table_base(struct device *dev)
return dev-archdata.dma_data.iommu_table_base;
 }
 
+extern struct iommu_table *iommu_table_alloc(int node);
 /* Frees table for an individual device node */
 extern void iommu_free_table(struct iommu_table *tbl, const char *node_name);
 
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index c51ad3e..2f7e92b 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -710,6 +710,15 @@ struct iommu_table *iommu_init_table(struct iommu_table 
*tbl, int nid)
return tbl;
 }
 
+struct iommu_table *iommu_table_alloc(int node)
+{
+   struct iommu_table *tbl;
+
+   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, node);
+
+   return tbl;
+}
+
 void iommu_free_table(struct iommu_table *tbl, const char *node_name)
 {
unsigned long bitmap_sz;
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index c4782b1..bbe529b 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -693,7 +693,7 @@ static struct iommu_table *pnv_pci_setup_bml_iommu(struct 
pci_controller *hose)
   hose-dn-full_name);
return NULL;
}
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, hose-node);
+   tbl = iommu_table_alloc(hose-node);
if (WARN_ON(!tbl))
return NULL;
pnv_pci_setup_iommu_table(tbl, __va(be64_to_cpup(basep)),
diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 1aa1815..bc14299 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -617,8 +617,7 @@ static void pci_dma_bus_setup_pSeries(struct pci_bus *bus)
pci-phb-dma_window_size = 0x800ul;
pci-phb-dma_window_base_cur = 0x800ul;
 
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-  pci-phb-node);
+   tbl = iommu_table_alloc(pci-phb-node);
 
iommu_table_setparms(pci-phb, dn, tbl);
tbl-it_ops = iommu_table_pseries_ops;
@@ -669,8 +668,7 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
 pdn-full_name, ppci-iommu_table);
 
if (!ppci-iommu_table) {
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-  ppci-phb-node);
+   tbl = iommu_table_alloc(ppci-phb-node);
iommu_table_setparms_lpar(ppci-phb, pdn, tbl, dma_window);
tbl-it_ops = iommu_table_lpar_multi_ops;
ppci-iommu_table = iommu_init_table(tbl, ppci-phb-node);
@@ -697,8 +695,7 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
struct pci_controller *phb = PCI_DN(dn)-phb;
 
pr_debug( -- first child, no bridge. Allocating iommu 
table.\n);
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-  phb-node);
+   tbl = iommu_table_alloc(phb-node);
iommu_table_setparms(phb, dn, tbl);
tbl-it_ops = iommu_table_pseries_ops;
PCI_DN(dn)-iommu_table = iommu_init_table(tbl, phb-node);
@@ -1120,8 +1117,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev 
*dev)
 
pci = PCI_DN(pdn);
if (!pci-iommu_table) {
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL,
-  pci-phb-node);
+   tbl = iommu_table_alloc(pci-phb-node);
iommu_table_setparms_lpar(pci-phb, pdn, tbl, dma_window);
tbl-it_ops = iommu_table_lpar_multi_ops;
pci-iommu_table = iommu_init_table(tbl, pci-phb-node);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 15/28] powerpc/powernv/ioda/ioda2: Rework tce_build()/tce_free()

2015-02-16 Thread Alexey Kardashevskiy
The pnv_pci_ioda_tce_invalidate() helper invalidates TCE cache. It is
supposed to be called on IODA1/2 and not called on p5ioc2. It receives
start and end host addresses of TCE table. This approach makes it possible
to get pnv_pci_ioda_tce_invalidate() unintentionally called on p5ioc2.
Another issue is that IODA2 needs PCI addresses to invalidate the cache
and those can be calculated from host addresses but since we are going
to implement multi-level TCE tables, calculating PCI address from
a host address might get either tricky or ugly as TCE table remains flat
on PCI bus but not in RAM.

This defines separate iommu_table_ops callbacks for p5ioc2 and IODA1/2
PHBs. They all call common pnv_tce_build/pnv_tce_free/pnv_tce_get helpers
but call PHB specific TCE invalidation helper (when needed).

This changes pnv_pci_ioda2_tce_invalidate() to receives TCE index and
number of pages which are PCI addresses shifted by IOMMU page shift.

The patch is pretty mechanical and behaviour is not expected to change.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/platforms/powernv/pci-ioda.c   | 92 ++---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |  8 ++-
 arch/powerpc/platforms/powernv/pci.c| 76 +---
 arch/powerpc/platforms/powernv/pci.h|  7 ++-
 4 files changed, 110 insertions(+), 73 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index a33a116..dfc56fc 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1041,18 +1041,20 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe 
*pe,
}
 }
 
-static void pnv_pci_ioda1_tce_invalidate(struct pnv_ioda_pe *pe,
-struct iommu_table *tbl,
-__be64 *startp, __be64 *endp, bool rm)
+static void pnv_pci_ioda1_tce_invalidate(struct iommu_table *tbl,
+   unsigned long index, unsigned long npages, bool rm)
 {
+   struct pnv_ioda_pe *pe = container_of(tbl-it_iommu,
+   struct pnv_ioda_pe, iommu);
__be64 __iomem *invalidate = rm ?
(__be64 __iomem *)pe-tce_inval_reg_phys :
(__be64 __iomem *)tbl-it_index;
unsigned long start, end, inc;
const unsigned shift = tbl-it_page_shift;
 
-   start = __pa(startp);
-   end = __pa(endp);
+   start = __pa((__be64 *)tbl-it_base + index - tbl-it_offset);
+   end = __pa((__be64 *)tbl-it_base + index - tbl-it_offset +
+   npages - 1);
 
/* BML uses this case for p6/p7/galaxy2: Shift addr and put in node */
if (tbl-it_busno) {
@@ -1088,10 +1090,40 @@ static void pnv_pci_ioda1_tce_invalidate(struct 
pnv_ioda_pe *pe,
 */
 }
 
-static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe,
-struct iommu_table *tbl,
-__be64 *startp, __be64 *endp, bool rm)
+static int pnv_ioda1_tce_build_vm(struct iommu_table *tbl, long index,
+   long npages, unsigned long uaddr,
+   enum dma_data_direction direction,
+   struct dma_attrs *attrs)
 {
+   long ret = pnv_tce_build(tbl, index, npages, uaddr, direction,
+   attrs);
+
+   if (!ret  (tbl-it_type  TCE_PCI_SWINV_CREATE))
+   pnv_pci_ioda1_tce_invalidate(tbl, index, npages, false);
+
+   return ret;
+}
+
+static void pnv_ioda1_tce_free_vm(struct iommu_table *tbl, long index,
+   long npages)
+{
+   pnv_tce_free(tbl, index, npages);
+
+   if (tbl-it_type  TCE_PCI_SWINV_FREE)
+   pnv_pci_ioda1_tce_invalidate(tbl, index, npages, false);
+}
+
+struct iommu_table_ops pnv_ioda1_iommu_ops = {
+   .set = pnv_ioda1_tce_build_vm,
+   .clear = pnv_ioda1_tce_free_vm,
+   .get = pnv_tce_get,
+};
+
+static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
+   unsigned long index, unsigned long npages, bool rm)
+{
+   struct pnv_ioda_pe *pe = container_of(tbl-it_iommu,
+   struct pnv_ioda_pe, iommu);
unsigned long start, end, inc;
__be64 __iomem *invalidate = rm ?
(__be64 __iomem *)pe-tce_inval_reg_phys :
@@ -1104,9 +1136,9 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
pnv_ioda_pe *pe,
end = start;
 
/* Figure out the start, end and step */
-   inc = tbl-it_offset + (((u64)startp - tbl-it_base) / sizeof(u64));
+   inc = tbl-it_offset + index / sizeof(u64);
start |= (inc  shift);
-   inc = tbl-it_offset + (((u64)endp - tbl-it_base) / sizeof(u64));
+   inc = tbl-it_offset + (index + npages - 1) / sizeof(u64);
end |= (inc  shift);
inc = (0x1ull  shift);
mb();
@@ -1120,19 +1152,35 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
pnv_ioda_pe *pe,

[PATCH v4 01/28] vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver

2015-02-16 Thread Alexey Kardashevskiy
This moves page pinning (get_user_pages_fast()/put_page()) code out of
the platform IOMMU code and puts it to VFIO IOMMU driver where it belongs
to as the platform code does not deal with page pinning.

This makes iommu_take_ownership()/iommu_release_ownership() deal with
the IOMMU table bitmap only.

This removes page unpinning from iommu_take_ownership() as the actual
TCE table might contain garbage and doing put_page() on it is undefined
behaviour.

Besides the last part, the rest of the patch is mechanical.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
Changes:
v4:
* s/iommu_tce_build(tbl, entry + 1/iommu_tce_build(tbl, entry + i/
---
 arch/powerpc/include/asm/iommu.h|  6 ---
 arch/powerpc/kernel/iommu.c | 68 
 drivers/vfio/vfio_iommu_spapr_tce.c | 90 +++--
 3 files changed, 77 insertions(+), 87 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 9cfa370..45b07f6 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -191,16 +191,10 @@ extern int iommu_tce_build(struct iommu_table *tbl, 
unsigned long entry,
unsigned long hwaddr, enum dma_data_direction direction);
 extern unsigned long iommu_clear_tce(struct iommu_table *tbl,
unsigned long entry);
-extern int iommu_clear_tces_and_put_pages(struct iommu_table *tbl,
-   unsigned long entry, unsigned long pages);
-extern int iommu_put_tce_user_mode(struct iommu_table *tbl,
-   unsigned long entry, unsigned long tce);
 
 extern void iommu_flush_tce(struct iommu_table *tbl);
 extern int iommu_take_ownership(struct iommu_table *tbl);
 extern void iommu_release_ownership(struct iommu_table *tbl);
 
-extern enum dma_data_direction iommu_tce_direction(unsigned long tce);
-
 #endif /* __KERNEL__ */
 #endif /* _ASM_IOMMU_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 5d3968c..456acb1 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -903,19 +903,6 @@ void iommu_register_group(struct iommu_table *tbl,
kfree(name);
 }
 
-enum dma_data_direction iommu_tce_direction(unsigned long tce)
-{
-   if ((tce  TCE_PCI_READ)  (tce  TCE_PCI_WRITE))
-   return DMA_BIDIRECTIONAL;
-   else if (tce  TCE_PCI_READ)
-   return DMA_TO_DEVICE;
-   else if (tce  TCE_PCI_WRITE)
-   return DMA_FROM_DEVICE;
-   else
-   return DMA_NONE;
-}
-EXPORT_SYMBOL_GPL(iommu_tce_direction);
-
 void iommu_flush_tce(struct iommu_table *tbl)
 {
/* Flush/invalidate TLB caches if necessary */
@@ -991,30 +978,6 @@ unsigned long iommu_clear_tce(struct iommu_table *tbl, 
unsigned long entry)
 }
 EXPORT_SYMBOL_GPL(iommu_clear_tce);
 
-int iommu_clear_tces_and_put_pages(struct iommu_table *tbl,
-   unsigned long entry, unsigned long pages)
-{
-   unsigned long oldtce;
-   struct page *page;
-
-   for ( ; pages; --pages, ++entry) {
-   oldtce = iommu_clear_tce(tbl, entry);
-   if (!oldtce)
-   continue;
-
-   page = pfn_to_page(oldtce  PAGE_SHIFT);
-   WARN_ON(!page);
-   if (page) {
-   if (oldtce  TCE_PCI_WRITE)
-   SetPageDirty(page);
-   put_page(page);
-   }
-   }
-
-   return 0;
-}
-EXPORT_SYMBOL_GPL(iommu_clear_tces_and_put_pages);
-
 /*
  * hwaddr is a kernel virtual address here (0xc... bazillion),
  * tce_build converts it to a physical address.
@@ -1044,35 +1007,6 @@ int iommu_tce_build(struct iommu_table *tbl, unsigned 
long entry,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_build);
 
-int iommu_put_tce_user_mode(struct iommu_table *tbl, unsigned long entry,
-   unsigned long tce)
-{
-   int ret;
-   struct page *page = NULL;
-   unsigned long hwaddr, offset = tce  IOMMU_PAGE_MASK(tbl)  ~PAGE_MASK;
-   enum dma_data_direction direction = iommu_tce_direction(tce);
-
-   ret = get_user_pages_fast(tce  PAGE_MASK, 1,
-   direction != DMA_TO_DEVICE, page);
-   if (unlikely(ret != 1)) {
-   /* pr_err(iommu_tce: get_user_pages_fast failed tce=%lx 
ioba=%lx ret=%d\n,
-   tce, entry  tbl-it_page_shift, ret); */
-   return -EFAULT;
-   }
-   hwaddr = (unsigned long) page_address(page) + offset;
-
-   ret = iommu_tce_build(tbl, entry, hwaddr, direction);
-   if (ret)
-   put_page(page);
-
-   if (ret  0)
-   pr_err(iommu_tce: %s failed ioba=%lx, tce=%lx, ret=%d\n,
-   __func__, entry  tbl-it_page_shift, tce, ret);
-
-   return ret;
-}
-EXPORT_SYMBOL_GPL(iommu_put_tce_user_mode);
-
 int iommu_take_ownership(struct iommu_table *tbl)
 {
unsigned long sz = (tbl-it_size + 7)  3;
@@ -1086,7 +1020,6 @@ 

[PATCH v4 03/28] vfio: powerpc/spapr: Check that TCE page size is equal to it_page_size

2015-02-16 Thread Alexey Kardashevskiy
This checks that the TCE table page size is not bigger that the size of
a page we just pinned and going to put its physical address to the table.

Otherwise the hardware gets unwanted access to physical memory between
the end of the actual page and the end of the aligned up TCE page.

Since compound_order() and compound_head() work correctly on non-huge
pages, there is no need for additional check whether the page is huge.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
Changes:
v4:
* s/tce_check_page_size/tce_page_is_contained/
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index daf2e2c..ce896244 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -49,6 +49,22 @@ struct tce_container {
bool enabled;
 };
 
+static bool tce_page_is_contained(struct page *page, unsigned page_shift)
+{
+   unsigned shift;
+
+   /*
+* Check that the TCE table granularity is not bigger than the size of
+* a page we just found. Otherwise the hardware can get access to
+* a bigger memory chunk that it should.
+*/
+   shift = PAGE_SHIFT + compound_order(compound_head(page));
+   if (shift = page_shift)
+   return true;
+
+   return false;
+}
+
 static int tce_iommu_enable(struct tce_container *container)
 {
int ret = 0;
@@ -209,6 +225,12 @@ static long tce_iommu_build(struct tce_container 
*container,
ret = -EFAULT;
break;
}
+
+   if (!tce_page_is_contained(page, tbl-it_page_shift)) {
+   ret = -EPERM;
+   break;
+   }
+
hva = (unsigned long) page_address(page) +
(tce  IOMMU_PAGE_MASK(tbl)  ~PAGE_MASK);
 
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 04/28] vfio: powerpc/spapr: Use it_page_size

2015-02-16 Thread Alexey Kardashevskiy
This makes use of the it_page_size from the iommu_table struct
as page size can differ.

This replaces missing IOMMU_PAGE_SHIFT macro in commented debug code
as recently introduced IOMMU_PAGE_XXX macros do not include
IOMMU_PAGE_SHIFT.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Reviewed-by: David Gibson da...@gibson.dropbear.id.au
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index ce896244..c2ca38f 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -99,7 +99,7 @@ static int tce_iommu_enable(struct tce_container *container)
 * enforcing the limit based on the max that the guest can map.
 */
down_write(current-mm-mmap_sem);
-   npages = (tbl-it_size  IOMMU_PAGE_SHIFT_4K)  PAGE_SHIFT;
+   npages = (tbl-it_size  tbl-it_page_shift)  PAGE_SHIFT;
locked = current-mm-locked_vm + npages;
lock_limit = rlimit(RLIMIT_MEMLOCK)  PAGE_SHIFT;
if (locked  lock_limit  !capable(CAP_IPC_LOCK)) {
@@ -128,7 +128,7 @@ static void tce_iommu_disable(struct tce_container 
*container)
 
down_write(current-mm-mmap_sem);
current-mm-locked_vm -= (container-tbl-it_size 
-   IOMMU_PAGE_SHIFT_4K)  PAGE_SHIFT;
+   container-tbl-it_page_shift)  PAGE_SHIFT;
up_write(current-mm-mmap_sem);
 }
 
@@ -242,7 +242,7 @@ static long tce_iommu_build(struct tce_container *container,
tce, ret);
break;
}
-   tce += IOMMU_PAGE_SIZE_4K;
+   tce += IOMMU_PAGE_SIZE(tbl);
}
 
if (ret)
@@ -287,8 +287,8 @@ static long tce_iommu_ioctl(void *iommu_data,
if (info.argsz  minsz)
return -EINVAL;
 
-   info.dma32_window_start = tbl-it_offset  IOMMU_PAGE_SHIFT_4K;
-   info.dma32_window_size = tbl-it_size  IOMMU_PAGE_SHIFT_4K;
+   info.dma32_window_start = tbl-it_offset  tbl-it_page_shift;
+   info.dma32_window_size = tbl-it_size  tbl-it_page_shift;
info.flags = 0;
 
if (copy_to_user((void __user *)arg, info, minsz))
@@ -318,8 +318,8 @@ static long tce_iommu_ioctl(void *iommu_data,
VFIO_DMA_MAP_FLAG_WRITE))
return -EINVAL;
 
-   if ((param.size  ~IOMMU_PAGE_MASK_4K) ||
-   (param.vaddr  ~IOMMU_PAGE_MASK_4K))
+   if ((param.size  ~IOMMU_PAGE_MASK(tbl)) ||
+   (param.vaddr  ~IOMMU_PAGE_MASK(tbl)))
return -EINVAL;
 
/* iova is checked by the IOMMU API */
@@ -334,8 +334,8 @@ static long tce_iommu_ioctl(void *iommu_data,
return ret;
 
ret = tce_iommu_build(container, tbl,
-   param.iova  IOMMU_PAGE_SHIFT_4K,
-   tce, param.size  IOMMU_PAGE_SHIFT_4K);
+   param.iova  tbl-it_page_shift,
+   tce, param.size  tbl-it_page_shift);
 
iommu_flush_tce(tbl);
 
@@ -361,17 +361,17 @@ static long tce_iommu_ioctl(void *iommu_data,
if (param.flags)
return -EINVAL;
 
-   if (param.size  ~IOMMU_PAGE_MASK_4K)
+   if (param.size  ~IOMMU_PAGE_MASK(tbl))
return -EINVAL;
 
ret = iommu_tce_clear_param_check(tbl, param.iova, 0,
-   param.size  IOMMU_PAGE_SHIFT_4K);
+   param.size  tbl-it_page_shift);
if (ret)
return ret;
 
ret = tce_iommu_clear(container, tbl,
-   param.iova  IOMMU_PAGE_SHIFT_4K,
-   param.size  IOMMU_PAGE_SHIFT_4K);
+   param.iova  tbl-it_page_shift,
+   param.size  tbl-it_page_shift);
iommu_flush_tce(tbl);
 
return ret;
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 19/28] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_create_table

2015-02-16 Thread Alexey Kardashevskiy
This is a part of moving TCE table allocation into an iommu_ops
callback to support multiple IOMMU groups per one VFIO container.

This is a mechanical patch.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 88 +++
 1 file changed, 65 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index ebfea0a..95d9119 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1295,6 +1295,62 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
*phb,
__free_pages(tce_mem, get_order(TCE32_TABLE_SIZE * segs));
 }
 
+static long pnv_pci_ioda2_create_table(struct pnv_ioda_pe *pe,
+   __u32 page_shift, __u32 window_shift,
+   struct iommu_table *tbl)
+{
+   int nid = pe-phb-hose-node;
+   struct page *tce_mem = NULL;
+   void *addr;
+   unsigned long tce_table_size;
+   int64_t rc;
+   unsigned order;
+
+   if ((page_shift != 12)  (page_shift != 16)  (page_shift != 24))
+   return -EINVAL;
+
+   if ((1ULL  window_shift)  memory_hotplug_max())
+   return -EINVAL;
+
+   tce_table_size = (1ULL  (window_shift - page_shift)) * 8;
+   tce_table_size = max(0x1000UL, tce_table_size);
+
+   /* Allocate TCE table */
+   order = get_order(tce_table_size);
+
+   tce_mem = alloc_pages_node(nid, GFP_KERNEL, order);
+   if (!tce_mem) {
+   pr_err(Failed to allocate a TCE memory, order=%d\n, order);
+   rc = -ENOMEM;
+   goto fail;
+   }
+   addr = page_address(tce_mem);
+   memset(addr, 0, tce_table_size);
+
+   /* Setup linux iommu table */
+   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
+   page_shift);
+
+   tbl-it_ops = pnv_ioda2_iommu_ops;
+   iommu_init_table(tbl, nid);
+
+   return 0;
+fail:
+   if (tce_mem)
+   __free_pages(tce_mem, get_order(tce_table_size));
+
+   return rc;
+}
+
+static void pnv_pci_ioda2_free_table(struct iommu_table *tbl)
+{
+   if (!tbl-it_size)
+   return;
+
+   free_pages(tbl-it_base, get_order(tbl-it_size  3));
+   memset(tbl, 0, sizeof(struct iommu_table));
+}
+
 static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable)
 {
uint16_t window_id = (pe-pe_number  1 ) + 1;
@@ -1365,11 +1421,9 @@ static struct powerpc_iommu_ops pnv_pci_ioda2_ops = {
 static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
   struct pnv_ioda_pe *pe)
 {
-   struct page *tce_mem = NULL;
-   void *addr;
const __be64 *swinvp;
-   struct iommu_table *tbl;
-   unsigned int tce_table_size, end;
+   unsigned int end;
+   struct iommu_table *tbl = pe-iommu.tables[0];
int64_t rc;
 
/* We shouldn't already have a 32-bit DMA associated */
@@ -1378,31 +1432,20 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
 
/* The PE will reserve all possible 32-bits space */
pe-tce32_seg = 0;
+
end = (1  ilog2(phb-ioda.m32_pci_base));
-   tce_table_size = (end / 0x1000) * 8;
pe_info(pe, Setting up 32-bit TCE table at 0..%08x\n,
end);
 
-   /* Allocate TCE table */
-   tce_mem = alloc_pages_node(phb-hose-node, GFP_KERNEL,
-  get_order(tce_table_size));
-   if (!tce_mem) {
-   pe_err(pe, Failed to allocate a 32-bit TCE memory\n);
-   goto fail;
+   rc = pnv_pci_ioda2_create_table(pe, IOMMU_PAGE_SHIFT_4K,
+   ilog2(phb-ioda.m32_pci_base), tbl);
+   if (rc) {
+   pe_err(pe, Failed to create 32-bit TCE table, err %ld, rc);
+   return;
}
-   addr = page_address(tce_mem);
-   memset(addr, 0, tce_table_size);
 
/* Setup iommu */
pe-iommu.tables[0].it_iommu = pe-iommu;
-
-   /* Setup linux iommu table */
-   tbl = pe-iommu.tables[0];
-   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
-   IOMMU_PAGE_SHIFT_4K);
-
-   tbl-it_ops = pnv_ioda2_iommu_ops;
-   iommu_init_table(tbl, phb-hose-node);
pe-iommu.ops = pnv_pci_ioda2_ops;
 
/*
@@ -1447,8 +1490,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb 
*phb,
 fail:
if (pe-tce32_seg = 0)
pe-tce32_seg = -1;
-   if (tce_mem)
-   __free_pages(tce_mem, get_order(tce_table_size));
+   pnv_pci_ioda2_free_table(tbl);
 }
 
 static void pnv_ioda_setup_dma(struct pnv_phb *phb)
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 08/28] vfio: powerpc/spapr: Register memory

2015-02-16 Thread Alexey Kardashevskiy
The existing implementation accounts the whole DMA window in
the locked_vm counter which is going to be even worse with multiple
containers and huge DMA windows.

This introduces 2 ioctls to register/unregister DMA memory which
receive user space address and size of a memory region which
needs to be pinned/unpinned and counted in locked_vm.

If any memory region was registered, all subsequent DMA map requests
should address already pinned memory. If no memory was registered,
then the amount of memory required for a single default memory will be
accounted when the container is enabled and every map/unmap will pin/unpin
a page (with degraded performance).

Dynamic DMA window and in-kernel acceleration will require memory to
be preregistered in order to work.

The accounting is done per VFIO container. When the support of
multiple groups per container is added, we will have accurate locked_vm
accounting.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
Changes:
v4:
* updated docs
* s/kzmalloc/vzalloc/
* in tce_pin_pages()/tce_unpin_pages() removed @vaddr, @size and
replaced offset with index
* renamed vfio_iommu_type_register_memory to vfio_iommu_spapr_register_memory
and removed duplicating vfio_iommu_spapr_register_memory
---
 Documentation/vfio.txt  |  19 +++
 drivers/vfio/vfio_iommu_spapr_tce.c | 274 +++-
 include/uapi/linux/vfio.h   |  25 
 3 files changed, 312 insertions(+), 6 deletions(-)

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index 96978ec..791e85c 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -427,6 +427,25 @@ The code flow from the example above should be slightly 
changed:
 

 
+5) PPC64 paravirtualized guests may generate a lot of map/unmap requests,
+and the handling of those includes pinning/unpinning pages and updating
+mm::locked_vm counter to make sure we do not exceed the rlimit. Handling these
+in real-mode is quite expensive and may fail. In order to simplify in-kernel
+acceleration of map/unmap requests, two ioctls have been added to pre-register
+and unregister guest RAM pages where DMA can possibly happen to. Having these
+calles, the userspace and in-kernel handlers do not have to take care of
+pinning or accounting.
+
+The ioctls are VFIO_IOMMU_SPAPR_REGISTER_MEMORY and
+VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY.
+These receive a user space address and size of the block to be pinned.
+Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to
+be called with the exact address and size used for registering
+the memory block.
+
+The user space is not expected to call these often and the block descriptors
+are stored in a linked list in the kernel.
+
 ---
 
 [1] VFIO was originally an acronym for Virtual Function I/O in its
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 7fd60f9..9b884e0 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -21,6 +21,7 @@
 #include linux/uaccess.h
 #include linux/err.h
 #include linux/vfio.h
+#include linux/vmalloc.h
 #include asm/iommu.h
 #include asm/tce.h
 
@@ -93,8 +94,196 @@ struct tce_container {
struct iommu_table *tbl;
bool enabled;
unsigned long locked_pages;
+   struct list_head mem_list;
 };
 
+struct tce_memory {
+   struct list_head next;
+   struct rcu_head rcu;
+   __u64 vaddr;
+   __u64 size;
+   __u64 hpas[];
+};
+
+static inline bool tce_preregistered(struct tce_container *container)
+{
+   return !list_empty(container-mem_list);
+}
+
+static struct tce_memory *tce_mem_alloc(struct tce_container *container,
+   __u64 vaddr, __u64 size)
+{
+   struct tce_memory *mem;
+   long ret;
+
+   ret = try_increment_locked_vm(size  PAGE_SHIFT);
+   if (ret)
+   return NULL;
+
+   mem = vzalloc(sizeof(*mem) + (size  (PAGE_SHIFT - 3)));
+   if (!mem) {
+   decrement_locked_vm(size  PAGE_SHIFT);
+   return NULL;
+   }
+
+   mem-vaddr = vaddr;
+   mem-size = size;
+
+   list_add_rcu(mem-next, container-mem_list);
+
+   return mem;
+}
+
+static void release_tce_memory(struct rcu_head *head)
+{
+   struct tce_memory *mem = container_of(head, struct tce_memory, rcu);
+
+   vfree(mem);
+}
+
+static void tce_mem_free(struct tce_memory *mem)
+{
+   decrement_locked_vm(mem-size);
+   list_del_rcu(mem-next);
+   call_rcu(mem-rcu, release_tce_memory);
+}
+
+static struct tce_memory *tce_pinned_desc(struct tce_container *container,
+   __u64 vaddr, __u64 size)
+{
+   struct tce_memory *mem, *ret = NULL;
+
+   rcu_read_lock();
+   vaddr = ~(TCE_PCI_READ | TCE_PCI_WRITE);
+   list_for_each_entry_rcu(mem, container-mem_list, next) {
+   if ((mem-vaddr = vaddr) 
+

[PATCH v4 28/28] vfio: powerpc/spapr: Support Dynamic DMA windows

2015-02-16 Thread Alexey Kardashevskiy
This adds create/remove window ioctls to create and remove DMA windows.
sPAPR defines a Dynamic DMA windows capability which allows
para-virtualized guests to create additional DMA windows on a PCI bus.
The existing linux kernels use this new window to map the entire guest
memory and switch to the direct DMA operations saving time on map/unmap
requests which would normally happen in a big amounts.

This adds 2 ioctl handlers - VFIO_IOMMU_SPAPR_TCE_CREATE and
VFIO_IOMMU_SPAPR_TCE_REMOVE - to create and remove windows.
Up to 2 windows are supported now by the hardware and by this driver.

This changes VFIO_IOMMU_SPAPR_TCE_GET_INFO handler to return additional
information such as a number of supported windows and maximum number
levels of TCE tables.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
Changes:
v4:
* moved code to tce_iommu_create_window()/tce_iommu_remove_window()
helpers
* added docs
---
 Documentation/vfio.txt  |   6 ++
 arch/powerpc/include/asm/iommu.h|   2 +-
 drivers/vfio/vfio_iommu_spapr_tce.c | 156 +++-
 include/uapi/linux/vfio.h   |  24 +-
 4 files changed, 185 insertions(+), 3 deletions(-)

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index 791e85c..11628f1 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -446,6 +446,12 @@ the memory block.
 The user space is not expected to call these often and the block descriptors
 are stored in a linked list in the kernel.
 
+6) sPAPR specification allows guests to have an ddditional DMA window(s) on
+a PCI bus with a variable page size. Two ioctls have been added to support
+this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE.
+The platform has to support the functionality or error will be returned to
+the userspace.
+
 ---
 
 [1] VFIO was originally an acronym for Virtual Function I/O in its
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 8393822..6f34b82 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -133,7 +133,7 @@ extern void iommu_free_table(struct iommu_table *tbl, const 
char *node_name);
 extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
int nid);
 
-#define POWERPC_IOMMU_MAX_TABLES   1
+#define POWERPC_IOMMU_MAX_TABLES   2
 
 #define POWERPC_IOMMU_DEFAULT_LEVELS   1
 
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index ee91d51..d5de7c6 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -333,6 +333,20 @@ static struct iommu_table *spapr_tce_find_table(
return ret;
 }
 
+static int spapr_tce_find_free_table(struct tce_container *container)
+{
+   int i;
+
+   for (i = 0; i  POWERPC_IOMMU_MAX_TABLES; ++i) {
+   struct iommu_table *tbl = container-tables[i];
+
+   if (!tbl-it_size)
+   return i;
+   }
+
+   return -1;
+}
+
 static int tce_iommu_enable(struct tce_container *container)
 {
int ret = 0;
@@ -620,11 +634,85 @@ static long tce_iommu_build(struct tce_container 
*container,
return ret;
 }
 
+static long tce_iommu_create_window(struct tce_container *container,
+   __u32 page_shift, __u32 window_shift, __u32 levels,
+   __u64 *start_addr)
+{
+   struct powerpc_iommu *iommu;
+   struct tce_iommu_group *tcegrp;
+   int num;
+   long ret;
+
+   num = spapr_tce_find_free_table(container);
+   if (num  0)
+   return -ENOSYS;
+
+   tcegrp = list_first_entry(container-group_list,
+   struct tce_iommu_group, next);
+   iommu = iommu_group_get_iommudata(tcegrp-grp);
+
+   ret = iommu-ops-create_table(iommu, num,
+   page_shift, window_shift, levels,
+   container-tables[num]);
+   if (ret)
+   return ret;
+
+   list_for_each_entry(tcegrp, container-group_list, next) {
+   struct powerpc_iommu *iommutmp =
+   iommu_group_get_iommudata(tcegrp-grp);
+
+   if (WARN_ON_ONCE(iommutmp-ops != iommu-ops))
+   return -EFAULT;
+
+   ret = iommu-ops-set_window(iommutmp, num,
+   container-tables[num]);
+   if (ret)
+   return ret;
+   }
+
+   *start_addr = container-tables[num].it_offset 
+   container-tables[num].it_page_shift;
+
+   return 0;
+}
+
+static long tce_iommu_remove_window(struct tce_container *container,
+   __u64 start_addr)
+{
+   struct powerpc_iommu *iommu = NULL;
+   struct iommu_table *tbl;
+   struct tce_iommu_group *tcegrp;
+   int num;
+
+   tbl = spapr_tce_find_table(container, start_addr);
+  

[PATCH v4 07/28] vfio: powerpc/spapr: Moving pinning/unpinning to helpers

2015-02-16 Thread Alexey Kardashevskiy
This is a pretty mechanical patch to make next patches simpler.

New tce_iommu_unuse_page() helper does put_page() now but it might skip
that after the memory registering patch applied.

As we are here, this removes unnecessary checks for a value returned
by pfn_to_page() as it cannot possibly return NULL.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 59 +++--
 1 file changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index 67ea392..7fd60f9 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -217,25 +217,34 @@ static void tce_iommu_release(void *iommu_data)
kfree(container);
 }
 
+static void tce_iommu_unuse_page(struct tce_container *container,
+   unsigned long oldtce)
+{
+   struct page *page;
+
+   if (!(oldtce  (TCE_PCI_READ | TCE_PCI_WRITE)))
+   return;
+
+   page = pfn_to_page(__pa(oldtce)  PAGE_SHIFT);
+
+   if (oldtce  TCE_PCI_WRITE)
+   SetPageDirty(page);
+
+   put_page(page);
+}
+
 static int tce_iommu_clear(struct tce_container *container,
struct iommu_table *tbl,
unsigned long entry, unsigned long pages)
 {
unsigned long oldtce;
-   struct page *page;
 
for ( ; pages; --pages, ++entry) {
oldtce = iommu_clear_tce(tbl, entry);
if (!oldtce)
continue;
 
-   page = pfn_to_page(oldtce  PAGE_SHIFT);
-   WARN_ON(!page);
-   if (page) {
-   if (oldtce  TCE_PCI_WRITE)
-   SetPageDirty(page);
-   put_page(page);
-   }
+   tce_iommu_unuse_page(container, (unsigned long) __va(oldtce));
}
 
return 0;
@@ -253,34 +262,54 @@ static enum dma_data_direction 
tce_iommu_direction(unsigned long tce)
return DMA_NONE;
 }
 
+static unsigned long tce_get_hva(struct tce_container *container,
+   unsigned page_shift, unsigned long tce)
+{
+   long ret;
+   struct page *page = NULL;
+   unsigned long hva;
+   enum dma_data_direction direction = tce_iommu_direction(tce);
+
+   ret = get_user_pages_fast(tce  PAGE_MASK, 1,
+   direction != DMA_TO_DEVICE, page);
+   if (unlikely(ret != 1))
+   return -1;
+
+   hva = (unsigned long) page_address(page);
+
+   return hva;
+}
+
 static long tce_iommu_build(struct tce_container *container,
struct iommu_table *tbl,
unsigned long entry, unsigned long tce, unsigned long pages)
 {
long i, ret = 0;
-   struct page *page = NULL;
+   struct page *page;
unsigned long hva;
enum dma_data_direction direction = tce_iommu_direction(tce);
 
for (i = 0; i  pages; ++i) {
-   ret = get_user_pages_fast(tce  PAGE_MASK, 1,
-   direction != DMA_TO_DEVICE, page);
-   if (unlikely(ret != 1)) {
+   hva = tce_get_hva(container, tbl-it_page_shift, tce);
+   if (hva == -1) {
ret = -EFAULT;
break;
}
 
+   page = pfn_to_page(__pa(hva)  PAGE_SHIFT);
if (!tce_page_is_contained(page, tbl-it_page_shift)) {
ret = -EPERM;
break;
}
 
-   hva = (unsigned long) page_address(page) +
-   (tce  IOMMU_PAGE_MASK(tbl)  ~PAGE_MASK);
+   /* Preserve offset within IOMMU page */
+   hva |= tce  IOMMU_PAGE_MASK(tbl)  ~PAGE_MASK;
+   /* Preserve permission bits */
+   hva |= tce  (TCE_PCI_READ | TCE_PCI_WRITE);
 
ret = iommu_tce_build(tbl, entry + i, hva, direction);
if (ret) {
-   put_page(page);
+   tce_iommu_unuse_page(container, hva);
pr_err(iommu_tce: %s failed ioba=%lx, tce=%lx, 
ret=%ld\n,
__func__, entry  tbl-it_page_shift,
tce, ret);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 09/28] powerpc/powernv: Do not set read flag if direction==DMA_NONE

2015-02-16 Thread Alexey Kardashevskiy
Normally a bitmap from the iommu_table is used to track what TCE entry
is in use. Since we are going to use iommu_table without its locks and
do xchg() instead, it becomes essential not to put bits which are not
implied in the direction flag.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
Reviewed-by: David Gibson da...@gibson.dropbear.id.au
---
 arch/powerpc/platforms/powernv/pci.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 4945e87..9ec7d68 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -589,19 +589,27 @@ struct pci_ops pnv_pci_ops = {
.write = pnv_pci_write_config,
 };
 
+static unsigned long pnv_dmadir_to_flags(enum dma_data_direction direction)
+{
+   switch (direction) {
+   case DMA_BIDIRECTIONAL:
+   case DMA_FROM_DEVICE:
+   return TCE_PCI_READ | TCE_PCI_WRITE;
+   case DMA_TO_DEVICE:
+   return TCE_PCI_READ;
+   default:
+   return 0;
+   }
+}
+
 static int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
 unsigned long uaddr, enum dma_data_direction direction,
 struct dma_attrs *attrs, bool rm)
 {
-   u64 proto_tce;
+   u64 proto_tce = pnv_dmadir_to_flags(direction);
__be64 *tcep, *tces;
u64 rpn;
 
-   proto_tce = TCE_PCI_READ; // Read allowed
-
-   if (direction != DMA_TO_DEVICE)
-   proto_tce |= TCE_PCI_WRITE;
-
tces = tcep = ((__be64 *)tbl-it_base) + index - tbl-it_offset;
rpn = __pa(uaddr)  tbl-it_page_shift;
 
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 10/28] powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table

2015-02-16 Thread Alexey Kardashevskiy
This adds a iommu_table_ops struct and puts pointer to it into
the iommu_table struct. This moves tce_build/tce_free/tce_get/tce_flush
callbacks from ppc_md to the new struct where they really belong to.

This adds the requirement for @it_ops to be initialized before calling
iommu_init_table() to make sure that we do not leave any IOMMU table
with iommu_table_ops uninitialized. This is not a parameter of
iommu_init_table() though as there will be cases when iommu_init_table()
will not be called on TCE tables used by VFIO.

This does s/tce_build/set/, s/tce_free/clear/ and removes tce_
redundand prefixes.

This removes tce_xxx_rm handlers from ppc_md but does not add
them to iommu_table_ops as this will be done later if we decide to
support TCE hypercalls in real mode.

For pSeries, this always uses tce_buildmulti_pSeriesLP/
tce_buildmulti_pSeriesLP. This changes multi callback to fall back to
tce_build_pSeriesLP/tce_free_pSeriesLP if FW_FEATURE_MULTITCE is not
present. The reason for this is we still have to support multitce=off
boot parameter in disable_multitce() and we do not want to walk through
all IOMMU tables in the system and replace multi callbacks with single
ones.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/include/asm/iommu.h| 17 +++
 arch/powerpc/include/asm/machdep.h  | 25 
 arch/powerpc/kernel/iommu.c | 46 +++--
 arch/powerpc/kernel/vio.c   |  5 
 arch/powerpc/platforms/cell/iommu.c |  8 +++--
 arch/powerpc/platforms/pasemi/iommu.c   |  7 +++--
 arch/powerpc/platforms/powernv/pci-ioda.c   |  2 ++
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |  1 +
 arch/powerpc/platforms/powernv/pci.c| 23 ---
 arch/powerpc/platforms/powernv/pci.h|  1 +
 arch/powerpc/platforms/pseries/iommu.c  | 34 +++--
 arch/powerpc/sysdev/dart_iommu.c| 12 
 12 files changed, 93 insertions(+), 88 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 45b07f6..eb5822d 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -43,6 +43,22 @@
 extern int iommu_is_off;
 extern int iommu_force_on;
 
+struct iommu_table_ops {
+   int (*set)(struct iommu_table *tbl,
+   long index, long npages,
+   unsigned long uaddr,
+   enum dma_data_direction direction,
+   struct dma_attrs *attrs);
+   void (*clear)(struct iommu_table *tbl,
+   long index, long npages);
+   unsigned long (*get)(struct iommu_table *tbl, long index);
+   void (*flush)(struct iommu_table *tbl);
+};
+
+/* These are used by VIO */
+extern struct iommu_table_ops iommu_table_lpar_multi_ops;
+extern struct iommu_table_ops iommu_table_pseries_ops;
+
 /*
  * IOMAP_MAX_ORDER defines the largest contiguous block
  * of dma space we can get.  IOMAP_MAX_ORDER = 13
@@ -77,6 +93,7 @@ struct iommu_table {
 #ifdef CONFIG_IOMMU_API
struct iommu_group *it_group;
 #endif
+   struct iommu_table_ops *it_ops;
void (*set_bypass)(struct iommu_table *tbl, bool enable);
 };
 
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index c8175a3..2abe744 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -65,31 +65,6 @@ struct machdep_calls {
 * destroyed as well */
void(*hpte_clear_all)(void);
 
-   int (*tce_build)(struct iommu_table *tbl,
-long index,
-long npages,
-unsigned long uaddr,
-enum dma_data_direction direction,
-struct dma_attrs *attrs);
-   void(*tce_free)(struct iommu_table *tbl,
-   long index,
-   long npages);
-   unsigned long   (*tce_get)(struct iommu_table *tbl,
-   long index);
-   void(*tce_flush)(struct iommu_table *tbl);
-
-   /* _rm versions are for real mode use only */
-   int (*tce_build_rm)(struct iommu_table *tbl,
-long index,
-long npages,
-unsigned long uaddr,
-enum dma_data_direction direction,
-struct dma_attrs *attrs);
-   void(*tce_free_rm)(struct iommu_table *tbl,
-   long index,
-   long npages);
-   void(*tce_flush_rm)(struct iommu_table *tbl);
-
void __iomem *  (*ioremap)(phys_addr_t addr, unsigned long size,
  

[PATCH v4 24/28] powerpc/powernv/ioda: Define and implement DMA table/window management callbacks

2015-02-16 Thread Alexey Kardashevskiy
This extends powerpc_iommu_ops by a set of callbacks to support dynamic
DMA windows management.

query() returns IOMMU capabilities such as default DMA window address and
supported number of DMA windows and TCE table levels.

create_table() creates a TCE table with specific parameters. For now
it receives powerpc_iommu to know nodeid in order to allocate TCE table
memory closer to the PHB. The exact format of allocated multi-level table
might be also specific to the PHB model (not the case now though).

set_window() sets the window at specified TVT index on PHB.

unset_window() unsets the window from specified TVT.

free_table() frees the memory occupied by a table.

The purpose of this separation is that we need to be able to create
one table and assign it to a set of PHB. This way we can support multiple
IOMMU groups in one VFIO container and make use of VFIO on SPAPR closer
to the way it works on x86.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/include/asm/iommu.h  | 31 +
 arch/powerpc/platforms/powernv/pci-ioda.c | 75 +--
 2 files changed, 92 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 283f70f..8393822 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -147,12 +147,43 @@ struct powerpc_iommu_ops {
 */
void (*set_ownership)(struct powerpc_iommu *iommu,
bool enable);
+
+   long (*create_table)(struct powerpc_iommu *iommu,
+   int num,
+   __u32 page_shift,
+   __u32 window_shift,
+   __u32 levels,
+   struct iommu_table *tbl);
+   long (*set_window)(struct powerpc_iommu *iommu,
+   int num,
+   struct iommu_table *tblnew);
+   long (*unset_window)(struct powerpc_iommu *iommu,
+   int num);
+   void (*free_table)(struct iommu_table *tbl);
 };
 
+/* Page size flags for ibm,query-pe-dma-window */
+#define DDW_PGSIZE_4K   0x01
+#define DDW_PGSIZE_64K  0x02
+#define DDW_PGSIZE_16M  0x04
+#define DDW_PGSIZE_32M  0x08
+#define DDW_PGSIZE_64M  0x10
+#define DDW_PGSIZE_128M 0x20
+#define DDW_PGSIZE_256M 0x40
+#define DDW_PGSIZE_16G  0x80
+#define DDW_PGSIZE_MASK 0xFF
+
 struct powerpc_iommu {
 #ifdef CONFIG_IOMMU_API
struct iommu_group *group;
 #endif
+   /* Some key properties of IOMMU */
+   __u32 tce32_start;
+   __u32 tce32_size;
+   __u32 windows_supported;
+   __u32 levels;
+   __u32 flags;
+
struct iommu_table tables[POWERPC_IOMMU_MAX_TABLES];
struct powerpc_iommu_ops *ops;
 };
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 29bd7a4..16ddaba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1360,7 +1360,7 @@ static __be64 *pnv_alloc_tce_table(int nid,
return addr;
 }
 
-static long pnv_pci_ioda2_create_table(struct powerpc_iommu *iommu,
+static long pnv_pci_ioda2_create_table(struct powerpc_iommu *iommu, int num,
__u32 page_shift, __u32 window_shift, __u32 levels,
struct iommu_table *tbl)
 {
@@ -1388,8 +1388,8 @@ static long pnv_pci_ioda2_create_table(struct 
powerpc_iommu *iommu,
shift = ROUND_UP(window_shift - page_shift, levels) / levels;
shift += 3;
shift = max_t(unsigned, shift, IOMMU_PAGE_SHIFT_4K);
-   pr_info(Creating TCE table %08llx, %d levels, TCE table size = %lx\n,
-   1ULL  window_shift, levels, 1UL  shift);
+   pr_info(Creating TCE table #%d %08llx, %d levels, TCE table size = 
%lx\n,
+   num, 1ULL  window_shift, levels, 1UL  shift);
 
tbl-it_level_size = 1ULL  (shift - 3);
left = tce_table_size;
@@ -1400,11 +1400,10 @@ static long pnv_pci_ioda2_create_table(struct 
powerpc_iommu *iommu,
tbl-it_indirect_levels = levels - 1;
 
/* Setup linux iommu table */
-   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size, 0,
-   page_shift);
+   pnv_pci_setup_iommu_table(tbl, addr, tce_table_size,
+   num ? pe-tce_bypass_base : 0, page_shift);
 
tbl-it_ops = pnv_ioda2_iommu_ops;
-   iommu_init_table(tbl, nid);
 
return 0;
 }
@@ -1421,8 +1420,21 @@ static void pnv_pci_ioda2_free_table(struct iommu_table 
*tbl)
iommu_reset_table(tbl, ioda2);
 }
 
+static inline void pnv_pci_ioda2_tvt_invalidate(unsigned int pe_number,
+   unsigned long it_index)
+{
+   __be64 __iomem *invalidate = (__be64 __iomem *)it_index;
+   /* 01xb - invalidate TCEs that match the specified PE# */
+   unsigned long addr = (0x4ull  60) | (pe_number  0xFF);
+
+   if 

[PATCH v4 16/28] powerpc/iommu/powernv: Release replaced TCE

2015-02-16 Thread Alexey Kardashevskiy
At the moment writing new TCE value to the IOMMU table fails with EBUSY
if there is a valid entry already. However PAPR specification allows
the guest to write new TCE value without clearing it first.

Another problem this patch is addressing is the use of pool locks for
external IOMMU users such as VFIO. The pool locks are to protect
DMA page allocator rather than entries and since the host kernel does
not control what pages are in use, there is no point in pool locks and
exchange()+put_page(oldtce) is sufficient to avoid possible races.

This adds an exchange() callback to iommu_table_ops which does the same
thing as set() plus it returns replaced TCE(s) so the caller can release
the pages afterwards.

The returned old TCE value is a virtual address as the new TCE value.
This is different from tce_clear() which returns a physical address.

This implements exchange() for IODA2 only. This adds a requirement
for a platform to have exchange() implemented so from now on IODA2 is
the only supported PHB for VFIO-SPAPR.

This replaces iommu_tce_build() and iommu_clear_tce() with
a single iommu_tce_xchg().

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/include/asm/iommu.h  | 13 +---
 arch/powerpc/kernel/iommu.c   | 52 ++-
 arch/powerpc/platforms/powernv/pci-ioda.c | 16 ++
 arch/powerpc/platforms/powernv/pci.c  | 22 +
 arch/powerpc/platforms/powernv/pci.h  |  4 +++
 drivers/vfio/vfio_iommu_spapr_tce.c   | 17 +++---
 6 files changed, 87 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index ba16aa0..bf26d47 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -49,6 +49,12 @@ struct iommu_table_ops {
unsigned long uaddr,
enum dma_data_direction direction,
struct dma_attrs *attrs);
+   int (*exchange)(struct iommu_table *tbl,
+   long index, long npages,
+   unsigned long uaddr,
+   unsigned long *old_tces,
+   enum dma_data_direction direction,
+   struct dma_attrs *attrs);
void (*clear)(struct iommu_table *tbl,
long index, long npages);
unsigned long (*get)(struct iommu_table *tbl, long index);
@@ -225,10 +231,9 @@ extern int iommu_tce_clear_param_check(struct iommu_table 
*tbl,
unsigned long npages);
 extern int iommu_tce_put_param_check(struct iommu_table *tbl,
unsigned long ioba, unsigned long tce);
-extern int iommu_tce_build(struct iommu_table *tbl, unsigned long entry,
-   unsigned long hwaddr, enum dma_data_direction direction);
-extern unsigned long iommu_clear_tce(struct iommu_table *tbl,
-   unsigned long entry);
+extern long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
+   unsigned long hwaddr, unsigned long *oldtce,
+   enum dma_data_direction direction);
 
 extern void iommu_flush_tce(struct iommu_table *tbl);
 extern int iommu_take_ownership(struct powerpc_iommu *iommu);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 9d06425..e4b89bf 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -974,44 +974,30 @@ int iommu_tce_put_param_check(struct iommu_table *tbl,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_put_param_check);
 
-unsigned long iommu_clear_tce(struct iommu_table *tbl, unsigned long entry)
+static void iommu_tce_mk_dirty(unsigned long tce)
 {
-   unsigned long oldtce;
-   struct iommu_pool *pool = get_pool(tbl, entry);
+   if (tce  TCE_PCI_WRITE) {
+   struct page *pg = pfn_to_page(__pa(tce)  PAGE_SHIFT);
 
-   spin_lock((pool-lock));
-
-   oldtce = tbl-it_ops-get(tbl, entry);
-   if (oldtce  (TCE_PCI_WRITE | TCE_PCI_READ))
-   tbl-it_ops-clear(tbl, entry, 1);
-   else
-   oldtce = 0;
-
-   spin_unlock((pool-lock));
-
-   return oldtce;
+   SetPageDirty(pg);
+   }
 }
-EXPORT_SYMBOL_GPL(iommu_clear_tce);
 
 /*
  * hwaddr is a kernel virtual address here (0xc... bazillion),
  * tce_build converts it to a physical address.
  */
-int iommu_tce_build(struct iommu_table *tbl, unsigned long entry,
-   unsigned long hwaddr, enum dma_data_direction direction)
+long iommu_tce_xchg(struct iommu_table *tbl, unsigned long entry,
+   unsigned long hwaddr, unsigned long *oldtce,
+   enum dma_data_direction direction)
 {
-   int ret = -EBUSY;
-   unsigned long oldtce;
-   struct iommu_pool *pool = get_pool(tbl, entry);
+   long ret;
 
-   spin_lock((pool-lock));
+   ret = tbl-it_ops-exchange(tbl, entry, 1, hwaddr, oldtce,
+   direction, NULL);
 
-   oldtce = tbl-it_ops-get(tbl, entry);

[PATCH v4 25/28] vfio: powerpc/spapr: powerpc/powernv/ioda2: Rework ownership

2015-02-16 Thread Alexey Kardashevskiy
This uses new helpers to remove the default TCE table if the ownership is
being taken and create it otherwise. So once an external user (such as
VFIO) obtained the ownership over a group, it does not have any DMA
windows, neither default 32bit not bypass window. The external user is
expected to unprogram DMA windows on PHBs before returning ownership
back to the kernel.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 30 ++
 drivers/vfio/vfio_iommu_spapr_tce.c   |  8 
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 16ddaba..79a8149 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1570,11 +1570,33 @@ static void pnv_ioda2_set_ownership(struct 
powerpc_iommu *iommu,
 {
struct pnv_ioda_pe *pe = container_of(iommu, struct pnv_ioda_pe,
iommu);
-   if (enable)
-   iommu_take_ownership(iommu);
-   else
-   iommu_release_ownership(iommu);
+   if (enable) {
+   pnv_pci_ioda2_unset_window(pe-iommu, 0);
+   pnv_pci_ioda2_free_table(pe-iommu.tables[0]);
+   } else {
+   struct iommu_table *tbl = pe-iommu.tables[0];
+   int64_t rc;
 
+   rc = pnv_pci_ioda2_create_table(pe-iommu, 0,
+   IOMMU_PAGE_SHIFT_4K,
+   ilog2(pe-phb-ioda.m32_pci_base),
+   POWERPC_IOMMU_DEFAULT_LEVELS, tbl);
+   if (rc) {
+   pe_err(pe, Failed to create 32-bit TCE table, err %ld,
+   rc);
+   return;
+   }
+
+   iommu_init_table(tbl, pe-phb-hose-node);
+
+   rc = pnv_pci_ioda2_set_window(pe-iommu, 0, tbl);
+   if (rc) {
+   pe_err(pe, Failed to configure 32-bit TCE table, err 
%ld\n,
+   rc);
+   pnv_pci_ioda2_free_table(tbl);
+   return;
+   }
+   }
pnv_pci_ioda2_set_bypass(pe, !enable);
 }
 
diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index b5134b7..fdcc04c 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -820,8 +820,16 @@ static int tce_iommu_attach_group(void *iommu_data,
 * our physical memory via the bypass window instead of 
just
 * the pages that has been explicitly mapped into the 
iommu
 */
+   struct iommu_table tbltmp = { 0 }, *tbl = tbltmp;
+
iommu-ops-set_ownership(iommu, true);
container-grp = iommu_group;
+
+   ret = iommu-ops-create_table(iommu, 0,
+   IOMMU_PAGE_SHIFT_4K,
+   ilog2(iommu-tce32_size), 1, tbl);
+   if (!ret)
+   ret = iommu-ops-set_window(iommu, 0, tbl);
} else {
ret = -ENODEV;
}
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v4 14/28] vfio: powerpc/spapr: powerpc/powernv/ioda2: Rework IOMMU ownership control

2015-02-16 Thread Alexey Kardashevskiy
At the moment the iommu_table struct has a set_bypass() which enables/
disables DMA bypass on IODA2 PHB. This is exposed to POWERPC IOMMU code
which calls this callback when external IOMMU users such as VFIO are
about to get over a PHB.

The set_bypass() callback is not really an iommu_table function but
IOMMU/PE function. This introduces a powerpc_iommu_ops struct and
adds a set_ownership() callback to it which is called when an external
user takes control over the IOMMU.

This renames set_bypass() to set_ownership() as it is not necessarily
just enabling bypassing, it can be something else/more so let's give it
more generic name. The bool parameter is inverted.

The callback is implemented for IODA2 only.

This replaces iommu_take_ownership()/iommu_release_ownership() calls
with the callback calls and it is up to the platform code to call
iommu_take_ownership()/iommu_release_ownership() if needed. Next patches
will remove these calls from IODA2 code.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/include/asm/iommu.h  | 18 +--
 arch/powerpc/kernel/iommu.c   | 53 +++
 arch/powerpc/platforms/powernv/pci-ioda.c | 30 -
 drivers/vfio/vfio_iommu_spapr_tce.c   | 23 ++
 4 files changed, 92 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 4fe..ba16aa0 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -92,7 +92,6 @@ struct iommu_table {
unsigned long  it_page_shift;/* table iommu page size */
struct powerpc_iommu *it_iommu;
struct iommu_table_ops *it_ops;
-   void (*set_bypass)(struct iommu_table *tbl, bool enable);
 };
 
 /* Pure 2^n version of get_order */
@@ -127,11 +126,24 @@ extern struct iommu_table *iommu_init_table(struct 
iommu_table * tbl,
 
 #define POWERPC_IOMMU_MAX_TABLES   1
 
+struct powerpc_iommu;
+
+struct powerpc_iommu_ops {
+   /*
+* Switches ownership from the kernel itself to an external
+* user. While onwership is enabled, the kernel cannot use IOMMU
+* for itself.
+*/
+   void (*set_ownership)(struct powerpc_iommu *iommu,
+   bool enable);
+};
+
 struct powerpc_iommu {
 #ifdef CONFIG_IOMMU_API
struct iommu_group *group;
 #endif
struct iommu_table tables[POWERPC_IOMMU_MAX_TABLES];
+   struct powerpc_iommu_ops *ops;
 };
 
 #ifdef CONFIG_IOMMU_API
@@ -219,8 +231,8 @@ extern unsigned long iommu_clear_tce(struct iommu_table 
*tbl,
unsigned long entry);
 
 extern void iommu_flush_tce(struct iommu_table *tbl);
-extern int iommu_take_ownership(struct iommu_table *tbl);
-extern void iommu_release_ownership(struct iommu_table *tbl);
+extern int iommu_take_ownership(struct powerpc_iommu *iommu);
+extern void iommu_release_ownership(struct powerpc_iommu *iommu);
 
 #endif /* __KERNEL__ */
 #endif /* _ASM_IOMMU_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 407d0d6..9d06425 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1022,7 +1022,7 @@ int iommu_tce_build(struct iommu_table *tbl, unsigned 
long entry,
 }
 EXPORT_SYMBOL_GPL(iommu_tce_build);
 
-int iommu_take_ownership(struct iommu_table *tbl)
+static int iommu_table_take_ownership(struct iommu_table *tbl)
 {
unsigned long flags, i, sz = (tbl-it_size + 7)  3;
int ret = 0;
@@ -1047,19 +1047,36 @@ int iommu_take_ownership(struct iommu_table *tbl)
spin_unlock(tbl-pools[i].lock);
spin_unlock_irqrestore(tbl-large_pool.lock, flags);
 
-   /*
-* Disable iommu bypass, otherwise the user can DMA to all of
-* our physical memory via the bypass window instead of just
-* the pages that has been explicitly mapped into the iommu
-*/
-   if (!ret  tbl-set_bypass)
-   tbl-set_bypass(tbl, false);
-
-   return ret;
+   return 0;
+}
+
+static void iommu_table_release_ownership(struct iommu_table *tbl);
+
+int iommu_take_ownership(struct powerpc_iommu *iommu)
+{
+   int i, j, rc = 0;
+
+   for (i = 0; i  POWERPC_IOMMU_MAX_TABLES; ++i) {
+   struct iommu_table *tbl = iommu-tables[i];
+
+   if (!tbl-it_map)
+   continue;
+
+   rc = iommu_table_take_ownership(tbl);
+   if (rc) {
+   for (j = 0; j  i; ++j)
+   iommu_table_release_ownership(
+   iommu-tables[j]);
+
+   return rc;
+   }
+   }
+
+   return 0;
 }
 EXPORT_SYMBOL_GPL(iommu_take_ownership);
 
-void iommu_release_ownership(struct iommu_table *tbl)
+static void iommu_table_release_ownership(struct iommu_table *tbl)
 {
unsigned long flags, i, sz = (tbl-it_size + 7)  3;
 
@@ -1076,10 +1093,18 @@ void 

[PATCH v4 12/28] powerpc/spapr: vfio: Switch from iommu_table to new powerpc_iommu

2015-02-16 Thread Alexey Kardashevskiy
Modern IBM POWERPC systems support multiple (currently two) TCE tables
per IOMMU group (a.k.a. PE). This adds a powerpc_iommu container
for TCE tables. Right now just one table is supported.

Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 arch/powerpc/include/asm/iommu.h|  18 +++--
 arch/powerpc/kernel/eeh.c   |   2 +-
 arch/powerpc/kernel/iommu.c |  34 
 arch/powerpc/platforms/powernv/pci-ioda.c   |  37 +
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |  16 ++--
 arch/powerpc/platforms/powernv/pci.c|   2 +-
 arch/powerpc/platforms/powernv/pci.h|   4 +-
 arch/powerpc/platforms/pseries/iommu.c  |   9 ++-
 drivers/vfio/vfio_iommu_spapr_tce.c | 117 +++-
 9 files changed, 156 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 335e3d4..4fe 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -90,9 +90,7 @@ struct iommu_table {
struct iommu_pool pools[IOMMU_NR_POOLS];
unsigned long *it_map;   /* A simple allocation bitmap for now */
unsigned long  it_page_shift;/* table iommu page size */
-#ifdef CONFIG_IOMMU_API
-   struct iommu_group *it_group;
-#endif
+   struct powerpc_iommu *it_iommu;
struct iommu_table_ops *it_ops;
void (*set_bypass)(struct iommu_table *tbl, bool enable);
 };
@@ -126,13 +124,23 @@ extern void iommu_free_table(struct iommu_table *tbl, 
const char *node_name);
  */
 extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
int nid);
+
+#define POWERPC_IOMMU_MAX_TABLES   1
+
+struct powerpc_iommu {
 #ifdef CONFIG_IOMMU_API
-extern void iommu_register_group(struct iommu_table *tbl,
+   struct iommu_group *group;
+#endif
+   struct iommu_table tables[POWERPC_IOMMU_MAX_TABLES];
+};
+
+#ifdef CONFIG_IOMMU_API
+extern void iommu_register_group(struct powerpc_iommu *iommu,
 int pci_domain_number, unsigned long pe_num);
 extern int iommu_add_device(struct device *dev);
 extern void iommu_del_device(struct device *dev);
 #else
-static inline void iommu_register_group(struct iommu_table *tbl,
+static inline void iommu_register_group(struct powerpc_iommu *iommu,
int pci_domain_number,
unsigned long pe_num)
 {
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index e1b6d8e..319eae3 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -1360,7 +1360,7 @@ static int dev_has_iommu_table(struct device *dev, void 
*data)
return 0;
 
tbl = get_iommu_table_base(dev);
-   if (tbl  tbl-it_group) {
+   if (tbl  tbl-it_iommu) {
*ppdev = pdev;
return 1;
}
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 2f7e92b..952939f 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -712,17 +712,20 @@ struct iommu_table *iommu_init_table(struct iommu_table 
*tbl, int nid)
 
 struct iommu_table *iommu_table_alloc(int node)
 {
-   struct iommu_table *tbl;
+   struct powerpc_iommu *iommu;
 
-   tbl = kzalloc_node(sizeof(struct iommu_table), GFP_KERNEL, node);
+   iommu = kzalloc_node(sizeof(struct powerpc_iommu), GFP_KERNEL,
+  node);
+   iommu-tables[0].it_iommu = iommu;
 
-   return tbl;
+   return iommu-tables[0];
 }
 
 void iommu_free_table(struct iommu_table *tbl, const char *node_name)
 {
unsigned long bitmap_sz;
unsigned int order;
+   struct powerpc_iommu *iommu = tbl-it_iommu;
 
if (!tbl || !tbl-it_map) {
printk(KERN_ERR %s: expected TCE map for %s\n, __func__,
@@ -738,9 +741,9 @@ void iommu_free_table(struct iommu_table *tbl, const char 
*node_name)
clear_bit(0, tbl-it_map);
 
 #ifdef CONFIG_IOMMU_API
-   if (tbl-it_group) {
-   iommu_group_put(tbl-it_group);
-   BUG_ON(tbl-it_group);
+   if (iommu-group) {
+   iommu_group_put(iommu-group);
+   BUG_ON(iommu-group);
}
 #endif
 
@@ -756,7 +759,7 @@ void iommu_free_table(struct iommu_table *tbl, const char 
*node_name)
free_pages((unsigned long) tbl-it_map, order);
 
/* free table */
-   kfree(tbl);
+   kfree(iommu);
 }
 
 /* Creates TCEs for a user provided buffer.  The user buffer must be
@@ -888,11 +891,12 @@ void iommu_free_coherent(struct iommu_table *tbl, size_t 
size,
  */
 static void group_release(void *iommu_data)
 {
-   struct iommu_table *tbl = iommu_data;
-   tbl-it_group = NULL;
+   struct powerpc_iommu *iommu = iommu_data;
+
+   iommu-group = NULL;
 }
 
-void iommu_register_group(struct iommu_table *tbl,
+void 

[PATCH v4 26/28] vfio: powerpc/spapr: Rework an IOMMU group attach/detach

2015-02-16 Thread Alexey Kardashevskiy
Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru
---
 drivers/vfio/vfio_iommu_spapr_tce.c | 62 +++--
 1 file changed, 38 insertions(+), 24 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
b/drivers/vfio/vfio_iommu_spapr_tce.c
index fdcc04c..4ff8289 100644
--- a/drivers/vfio/vfio_iommu_spapr_tce.c
+++ b/drivers/vfio/vfio_iommu_spapr_tce.c
@@ -435,7 +435,7 @@ static void tce_iommu_release(void *iommu_data)
iommu = iommu_group_get_iommudata(container-grp);
tbl = iommu-tables[0];
tce_iommu_clear(container, tbl, tbl-it_offset, tbl-it_size);
-
+   iommu-ops-free_table(tbl);
tce_iommu_detach_group(iommu_data, container-grp);
}
 
@@ -796,6 +796,7 @@ static int tce_iommu_attach_group(void *iommu_data,
int ret = 0;
struct tce_container *container = iommu_data;
struct powerpc_iommu *iommu;
+   struct iommu_table tbltmp = { 0 }, *tbl = tbltmp;
 
mutex_lock(container-lock);
 
@@ -806,35 +807,44 @@ static int tce_iommu_attach_group(void *iommu_data,
iommu_group_id(container-grp),
iommu_group_id(iommu_group));
ret = -EBUSY;
-   } else if (container-enabled) {
+   goto unlock_exit;
+   }
+
+   if (container-enabled) {
pr_err(tce_vfio: attaching group #%u to enabled container\n,
iommu_group_id(iommu_group));
ret = -EBUSY;
+   goto unlock_exit;
+   }
+
+   iommu = iommu_group_get_iommudata(iommu_group);
+   if (WARN_ON_ONCE(!iommu)) {
+   ret = -ENXIO;
+   goto unlock_exit;
+   }
+
+   /*
+* Disable iommu bypass, otherwise the user can DMA to all of
+* our physical memory via the bypass window instead of just
+* the pages that has been explicitly mapped into the iommu
+*/
+   if (iommu-ops  iommu-ops-set_ownership) {
+   iommu-ops-set_ownership(iommu, true);
} else {
-   iommu = iommu_group_get_iommudata(iommu_group);
-   if (WARN_ON_ONCE(!iommu)) {
-   ret = -ENXIO;
-   } else if (iommu-ops  iommu-ops-set_ownership) {
-   /*
-* Disable iommu bypass, otherwise the user can DMA to 
all of
-* our physical memory via the bypass window instead of 
just
-* the pages that has been explicitly mapped into the 
iommu
-*/
-   struct iommu_table tbltmp = { 0 }, *tbl = tbltmp;
-
-   iommu-ops-set_ownership(iommu, true);
-   container-grp = iommu_group;
-
-   ret = iommu-ops-create_table(iommu, 0,
-   IOMMU_PAGE_SHIFT_4K,
-   ilog2(iommu-tce32_size), 1, tbl);
-   if (!ret)
-   ret = iommu-ops-set_window(iommu, 0, tbl);
-   } else {
-   ret = -ENODEV;
-   }
+   ret = -ENODEV;
+   goto unlock_exit;
}
 
+   container-grp = iommu_group;
+
+   /* Create the default window as only now we know the parameters */
+   ret = iommu-ops-create_table(iommu, 0,
+   IOMMU_PAGE_SHIFT_4K,
+   ilog2(iommu-tce32_size), 1, tbl);
+   if (!ret)
+   ret = iommu-ops-set_window(iommu, 0, tbl);
+
+unlock_exit:
mutex_unlock(container-lock);
 
return ret;
@@ -845,6 +855,7 @@ static void tce_iommu_detach_group(void *iommu_data,
 {
struct tce_container *container = iommu_data;
struct powerpc_iommu *iommu;
+   long i;
 
mutex_lock(container-lock);
if (iommu_group != container-grp) {
@@ -865,6 +876,9 @@ static void tce_iommu_detach_group(void *iommu_data,
iommu = iommu_group_get_iommudata(iommu_group);
BUG_ON(!iommu);
 
+   for (i = 0; i  POWERPC_IOMMU_MAX_TABLES; ++i)
+   iommu-ops-unset_window(iommu, i);
+
/* Kernel owns the device now, we can restore bypass */
if (iommu-ops  iommu-ops-set_ownership)
iommu-ops-set_ownership(iommu, false);
-- 
2.0.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev