date:20150302

[PATCH v2] seccomp: switch to using asm-generic for seccomp.h

2015-03-02 Thread Kees Cook

Most architectures don't need to do anything special for the strict
seccomp syscall entries. Remove the redundant headers and reduce the
others.

Signed-off-by: Kees Cook keesc...@chromium.org
---
v2:
- use Kbuild generic-y instead of explicit #include lines (sfr)
---
 arch/arm/include/asm/Kbuild |  1 +
 arch/arm/include/asm/seccomp.h  | 11 ---
 arch/microblaze/include/asm/Kbuild  |  1 +
 arch/microblaze/include/asm/seccomp.h   | 16 
 arch/mips/include/asm/seccomp.h |  7 ++-
 arch/parisc/include/asm/Kbuild  |  1 +
 arch/parisc/include/asm/seccomp.h   | 16 
 arch/powerpc/include/asm/Kbuild |  1 +
 arch/powerpc/include/uapi/asm/Kbuild|  1 -
 arch/powerpc/include/uapi/asm/seccomp.h | 16 
 arch/s390/include/asm/Kbuild|  1 +
 arch/s390/include/asm/seccomp.h | 16 
 arch/sh/include/asm/Kbuild  |  1 +
 arch/sh/include/asm/seccomp.h   | 10 --
 arch/sparc/include/asm/Kbuild   |  1 +
 arch/sparc/include/asm/seccomp.h| 15 ---
 arch/x86/include/asm/seccomp.h  | 21 ++---
 arch/x86/include/asm/seccomp_32.h   | 11 ---
 arch/x86/include/asm/seccomp_64.h   | 17 -
 19 files changed, 27 insertions(+), 137 deletions(-)
 delete mode 100644 arch/arm/include/asm/seccomp.h
 delete mode 100644 arch/microblaze/include/asm/seccomp.h
 delete mode 100644 arch/parisc/include/asm/seccomp.h
 delete mode 100644 arch/powerpc/include/uapi/asm/seccomp.h
 delete mode 100644 arch/s390/include/asm/seccomp.h
 delete mode 100644 arch/sh/include/asm/seccomp.h
 delete mode 100644 arch/sparc/include/asm/seccomp.h
 delete mode 100644 arch/x86/include/asm/seccomp_32.h
 delete mode 100644 arch/x86/include/asm/seccomp_64.h

diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index fe74c0d1e485..d7be5a9fd171 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -22,6 +22,7 @@ generic-y += preempt.h
 generic-y += resource.h
 generic-y += rwsem.h
 generic-y += scatterlist.h
+generic-y += seccomp.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += sembuf.h
diff --git a/arch/arm/include/asm/seccomp.h b/arch/arm/include/asm/seccomp.h
deleted file mode 100644
index 52b156b341f5..
--- a/arch/arm/include/asm/seccomp.h
+++ /dev/null
@@ -1,11 +0,0 @@
-#ifndef _ASM_ARM_SECCOMP_H
-#define _ASM_ARM_SECCOMP_H
-
-#include linux/unistd.h
-
-#define __NR_seccomp_read __NR_read
-#define __NR_seccomp_write __NR_write
-#define __NR_seccomp_exit __NR_exit
-#define __NR_seccomp_sigreturn __NR_rt_sigreturn
-
-#endif /* _ASM_ARM_SECCOMP_H */
diff --git a/arch/microblaze/include/asm/Kbuild 
b/arch/microblaze/include/asm/Kbuild
index ab564a6db5c3..877e2f610655 100644
--- a/arch/microblaze/include/asm/Kbuild
+++ b/arch/microblaze/include/asm/Kbuild
@@ -8,5 +8,6 @@ generic-y += irq_work.h
 generic-y += mcs_spinlock.h
 generic-y += preempt.h
 generic-y += scatterlist.h
+generic-y += seccomp.h
 generic-y += syscalls.h
 generic-y += trace_clock.h
diff --git a/arch/microblaze/include/asm/seccomp.h 
b/arch/microblaze/include/asm/seccomp.h
deleted file mode 100644
index 0d912758a0d7..
--- a/arch/microblaze/include/asm/seccomp.h
+++ /dev/null
@@ -1,16 +0,0 @@
-#ifndef _ASM_MICROBLAZE_SECCOMP_H
-#define _ASM_MICROBLAZE_SECCOMP_H
-
-#include linux/unistd.h
-
-#define __NR_seccomp_read  __NR_read
-#define __NR_seccomp_write __NR_write
-#define __NR_seccomp_exit  __NR_exit
-#define __NR_seccomp_sigreturn __NR_sigreturn
-
-#define __NR_seccomp_read_32   __NR_read
-#define __NR_seccomp_write_32  __NR_write
-#define __NR_seccomp_exit_32   __NR_exit
-#define __NR_seccomp_sigreturn_32  __NR_sigreturn
-
-#endif /* _ASM_MICROBLAZE_SECCOMP_H */
diff --git a/arch/mips/include/asm/seccomp.h b/arch/mips/include/asm/seccomp.h
index f29c75cf83c6..1d8a2e2c75c1 100644
--- a/arch/mips/include/asm/seccomp.h
+++ b/arch/mips/include/asm/seccomp.h
@@ -2,11 +2,6 @@
 
 #include linux/unistd.h
 
-#define __NR_seccomp_read __NR_read
-#define __NR_seccomp_write __NR_write
-#define __NR_seccomp_exit __NR_exit
-#define __NR_seccomp_sigreturn __NR_rt_sigreturn
-
 /*
  * Kludge alert:
  *
@@ -29,4 +24,6 @@
 
 #endif /* CONFIG_MIPS32_O32 */
 
+#include asm-generic/seccomp.h
+
 #endif /* __ASM_SECCOMP_H */
diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index 8686237a3c3c..12b341d04f88 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -20,6 +20,7 @@ generic-y += param.h
 generic-y += percpu.h
 generic-y += poll.h
 generic-y += preempt.h
+generic-y += seccomp.h
 generic-y += segment.h
 generic-y += topology.h
 generic-y += trace_clock.h
diff --git a/arch/parisc/include/asm/seccomp.h 
b/arch/parisc/include/asm/seccomp.h
deleted file mode 100644
index

Re: [PATCH 0/5] split ET_DYN ASLR from mmap ASLR

2015-03-02 Thread Andrew Morton

On Thu, 26 Feb 2015 19:07:09 -0800 Kees Cook keesc...@chromium.org wrote:

 This separates ET_DYN ASLR from mmap ASLR, as already done on s390. The
 various architectures that are already randomizing mmap (arm, arm64, mips,
 powerpc, s390, and x86), have their various forms of arch_mmap_rnd()
 made available via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these
 architectures, arch_randomize_brk() is collapsed as well.
 
 This is an alternative to the solutions in:
 https://lkml.org/lkml/2015/2/23/442

504 Gateway Time-out

Hector's original patch had very useful descriptions of the bug, why it
occurred, how it was exploited it and how the patch fixes it.

Your changelogs contain none of this and can be summarized as randomly
churn code around for no apparent reason.

Wanna try again?  I guess the [0/5] and [4/5] changelogs are the ones
to fix.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5] split ET_DYN ASLR from mmap ASLR

2015-03-02 Thread Kees Cook

On Mon, Mar 2, 2015 at 1:26 PM, Andrew Morton a...@linux-foundation.org wrote:
 On Thu, 26 Feb 2015 19:07:09 -0800 Kees Cook keesc...@chromium.org wrote:

 This separates ET_DYN ASLR from mmap ASLR, as already done on s390. The
 various architectures that are already randomizing mmap (arm, arm64, mips,
 powerpc, s390, and x86), have their various forms of arch_mmap_rnd()
 made available via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these
 architectures, arch_randomize_brk() is collapsed as well.

 This is an alternative to the solutions in:
 https://lkml.org/lkml/2015/2/23/442

 504 Gateway Time-out

 Hector's original patch had very useful descriptions of the bug, why it
 occurred, how it was exploited it and how the patch fixes it.

 Your changelogs contain none of this and can be summarized as randomly
 churn code around for no apparent reason.

 Wanna try again?  I guess the [0/5] and [4/5] changelogs are the ones
 to fix.

Ah, yes, absolutely. I will resend.

-Kees

-- 
Kees Cook
Chrome OS Security
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode

2015-03-02 Thread Martin Hicks


On Mon, Mar 02, 2015 at 03:37:28PM +0100, Milan Broz wrote:
 
 If crypto API allows to encrypt more sectors in one run
 (handling IV internally) dmcrypt can be modified of course.
 
 But do not forget we can use another IV (not only sequential number)
 e.g. ESSIV with XTS as well (even if it doesn't make much sense, some people
 are using it).

Interesting, I'd not considered using XTS with an IV other than plain/64.
The talitos hardware would not support aes/xts in any mode other than
plain/plain64 I don't think...Although perhaps you could push in an 8-byte
IV and the hardware would interpret it as the sector #.

 Maybe the following question would be if the dmcrypt sector IV algorithms
 should moved into crypto API as well.
 (But because I misused dmcrypt IVs hooks for some additional operations
 for loopAES and old Truecrypt CBC mode, it is not so simple...)

Speaking again with talitos in mind, there would be no advantage for this
hardware.  Although larger requests are possible only a single IV can be
provided per request, so for algorithms like AES-CBC and dm-crypt 512byte IOs
are the only option (short of switching to 4kB block size).

mh

-- 
Martin Hicks P.Eng.|  m...@bork.org
Bork Consulting Inc.   |  +1 (613) 266-2296
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration

2015-03-02 Thread Tyrel Datwyler

On 03/01/2015 08:19 PM, Cyril Bur wrote:
 On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
 During suspend/migration operation we must wait for the VASI state reported
 by the hypervisor to become Suspending prior to making the ibm,suspend-me
 RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state 
 variable
 that exposes the VASI state to the caller. This is unnecessary as the caller
 only really cares about the following three conditions; if there is an error
 we should bailout, success indicating we have suspended and woken back up so
 proceed to device tree updated, or we are not suspendable yet so try calling
 rtas_ibm_suspend_me again shortly.

 This patch removes the extraneous vasi_state variable and simply uses the
 return code to communicate how to proceed. We either succeed, fail, or get
 -EAGAIN in which case we sleep for a second before trying to call
 rtas_ibm_suspend_me again.

 Signed-off-by: Tyrel Datwyler tyr...@linux.vnet.ibm.com
 ---
  arch/powerpc/include/asm/rtas.h   |  2 +-
  arch/powerpc/kernel/rtas.c| 15 +++
  arch/powerpc/platforms/pseries/mobility.c |  8 +++-
  3 files changed, 11 insertions(+), 14 deletions(-)

 diff --git a/arch/powerpc/include/asm/rtas.h 
 b/arch/powerpc/include/asm/rtas.h
 index 2e23e92..fc85eb0 100644
 --- a/arch/powerpc/include/asm/rtas.h
 +++ b/arch/powerpc/include/asm/rtas.h
 @@ -327,7 +327,7 @@ extern int rtas_suspend_cpu(struct rtas_suspend_me_data 
 *data);
  extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data);
  extern int rtas_online_cpus_mask(cpumask_var_t cpus);
  extern int rtas_offline_cpus_mask(cpumask_var_t cpus);
 -extern int rtas_ibm_suspend_me(u64 handle, int *vasi_return);
 +extern int rtas_ibm_suspend_me(u64 handle);
  
 I like ditching vasi_return, I was never happy with myself for doing
 that!
 
  struct rtc_time;
  extern unsigned long rtas_get_boot_time(void);
 diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
 index 21c45a2..603b928 100644
 --- a/arch/powerpc/kernel/rtas.c
 +++ b/arch/powerpc/kernel/rtas.c
 @@ -897,7 +897,7 @@ int rtas_offline_cpus_mask(cpumask_var_t cpus)
  }
  EXPORT_SYMBOL(rtas_offline_cpus_mask);
  
 -int rtas_ibm_suspend_me(u64 handle, int *vasi_return)
 +int rtas_ibm_suspend_me(u64 handle)
 
 That definition is actually in an #ifdef CONFIG_PPC_PSERIES, you'll need
 to change the definition for !CONFIG_PPC_PSERIES

Good catch. I'll fix it there too.

  {
  long state;
  long rc;
 @@ -919,13 +919,11 @@ int rtas_ibm_suspend_me(u64 handle, int *vasi_return)
  printk(KERN_ERR rtas_ibm_suspend_me: vasi_state returned 
 %ld\n,rc);
  return rc;
  } else if (state == H_VASI_ENABLED) {
 -*vasi_return = RTAS_NOT_SUSPENDABLE;
 -return 0;
 +return -EAGAIN;
  } else if (state != H_VASI_SUSPENDING) {
  printk(KERN_ERR rtas_ibm_suspend_me: vasi_state returned state 
 %ld\n,
 state);
 -*vasi_return = -1;
 -return 0;
 +return -EIO;
 
 I've had a look as to how these return values get passed back up the
 stack and admittedly were dealing with a confusing mess, I've compared
 back to before my patch (which wasn't perfect either it seems).
 Both the state == H_VASI_ENABLED and state == H_VASI_SUSPENDING cause
 ppc_rtas to go to the copy_return and return 0 (albeit with an error
 code in args.rets[0]), because rtas_ppc goes back to out userland, I
 hesitate to change any of that.

Agreed, that this is a bit of a mess. The problem is we have two call
paths into rtas_ibm_suspend_me(). The one from migrate_store() and one
from ppc_rtas(). I'll address each with your other comments below.

  }
  
  if (!alloc_cpumask_var(offline_mask, GFP_TEMPORARY))
 @@ -1060,9 +1058,10 @@ asmlinkage int ppc_rtas(struct rtas_args __user 
 *uargs)
  int vasi_rc = 0;
 
 This generates unused variable warning.

Sloppy on my part. Will remove.

 
  u64 handle = ((u64)be32_to_cpu(args.args[0])  32)
| be32_to_cpu(args.args[1]);
 -rc = rtas_ibm_suspend_me(handle, vasi_rc);
 -args.rets[0] = cpu_to_be32(vasi_rc);
 -if (rc)
 +rc = rtas_ibm_suspend_me(handle);
 +if (rc == -EAGAIN)
 +args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);
 
 (continuing on...) so perhaps here have
   rc = 0;
 else if (rc == -EIO)
   args.rets[0] = cpu_to_be32(-1);
   rc = 0;
 Which should keep the original behaviour, the last thing we want to do
 is break BE.

The biggest problem here is we are making what basically equates to a
fake rtas call from drmgr which we intercept in ppc_rtas(). From there
we make this special call to rtas_ibm_suspend_me() to check VASI state
and do a bunch of other specialized work that needs to be setup prior to
making the actual ibm,suspend-me rtas call. Since, we are

Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update

2015-03-02 Thread Tyrel Datwyler

On 03/01/2015 09:20 PM, Cyril Bur wrote:
 On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
 We currently use the device tree update code in the kernel after resuming
 from a suspend operation to re-sync the kernels view of the device tree with
 that of the hypervisor. The code as it stands is not endian safe as it relies
 on parsing buffers returned by RTAS calls that thusly contains data in big
 endian format.

 This patch annotates variables and structure members with __be types as well
 as performing necessary byte swaps to cpu endian for data that needs to be
 parsed.

 Signed-off-by: Tyrel Datwyler tyr...@linux.vnet.ibm.com
 ---
  arch/powerpc/platforms/pseries/mobility.c | 36 
 ---
  1 file changed, 19 insertions(+), 17 deletions(-)

 diff --git a/arch/powerpc/platforms/pseries/mobility.c 
 b/arch/powerpc/platforms/pseries/mobility.c
 index 29e4f04..0b1f70e 100644
 --- a/arch/powerpc/platforms/pseries/mobility.c
 +++ b/arch/powerpc/platforms/pseries/mobility.c
 @@ -25,10 +25,10 @@
  static struct kobject *mobility_kobj;
  
  struct update_props_workarea {
 -u32 phandle;
 -u32 state;
 -u64 reserved;
 -u32 nprops;
 +__be32 phandle;
 +__be32 state;
 +__be64 reserved;
 +__be32 nprops;
  } __packed;
  
  #define NODE_ACTION_MASK0xff00
 @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, 
 struct property **prop,
  return 0;
  }
  
 -static int update_dt_node(u32 phandle, s32 scope)
 +static int update_dt_node(__be32 phandle, s32 scope)
  {
 
 On line 153 of this function:
dn = of_find_node_by_phandle(phandle);
 
 You're passing a __be32 to device tree code, if we can treat the phandle
 as a opaque value returned to us from the rtas call and pass it around
 like that then all good.

Yes, of_find_node_by_phandle directly compares phandle passed in against
the handle stored in each device_node when searching for a matching
node. Since, the device tree is big endian it follows that the big
endian phandle received in the rtas buffer needs no conversion.

Further, we need to pass the phandle to ibm,update-properties in the
work area which is also required to be big endian. So, again it seemed
that converting to cpu endian was a waste of effort just to convert it
back to big endian.

 Its also hard to be sure if these need to be BE and have always been
 that way because we've always run BE so they've never actually wanted
 CPU endian its just that CPU endian has always been BE (I think I
 started rambling...)
 
 Just want to check that *not* converting them is done on purpose.

Yes, I explicitly did not convert them on purpose. As mentioned above we
need phandle in BE for the ibm,update-properties rtas work area.
Similarly, drc_index needs to be in BE for the ibm,configure-connector
rtas work area. Outside, of that we do no other manipulation of those
values.

 
 And having read on, I'm assuming the answer is yes since this
 observation is true for your changes which affect:
   delete_dt_node()
   update_dt_node()
 add_dt_node()
 Worth noting that you didn't change the definition of delete_dt_node()

You are correct. Oversight. I will fix that as it should generate a
sparse complaint.

-Tyrel

 
 I'll have a look once you address the non compiling in patch 1/3 (I'm
 getting blocked the unused var because somehow Werror is on, odd it
 didn't trip you up) but I also suspect this will have sparse go a bit
 nuts. 
 I wonder if there is a nice way of shutting sparse up.
 
  struct update_props_workarea *upwa;
  struct device_node *dn;
 @@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope)
  char *prop_data;
  char *rtas_buf;
  int update_properties_token;
 +u32 nprops;
  u32 vd;
  
  update_properties_token = rtas_token(ibm,update-properties);
 @@ -162,6 +163,7 @@ static int update_dt_node(u32 phandle, s32 scope)
  break;
  
  prop_data = rtas_buf + sizeof(*upwa);
 +nprops = be32_to_cpu(upwa-nprops);
  
  /* On the first call to ibm,update-properties for a node the
   * the first property value descriptor contains an empty
 @@ -170,17 +172,17 @@ static int update_dt_node(u32 phandle, s32 scope)
   */
  if (*prop_data == 0) {
  prop_data++;
 -vd = *(u32 *)prop_data;
 +vd = be32_to_cpu(*(__be32 *)prop_data);
  prop_data += vd + sizeof(vd);
 -upwa-nprops--;
 +nprops--;
  }
  
 -for (i = 0; i  upwa-nprops; i++) {
 +for (i = 0; i  nprops; i++) {
  char *prop_name;
  
  prop_name = prop_data;
  prop_data += strlen(prop_name) + 1;
 -vd = *(u32 *)prop_data;
 +vd = be32_to_cpu(*(__be32 *)prop_data);

Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode

2015-03-02 Thread Martin Hicks

On Mon, Mar 02, 2015 at 04:44:19PM -0500, Martin Hicks wrote:
 
   Write (MB/s)Read (MB/s)
 Unencrypted   140 176
 aes-xts-plain64 512b  113 115
 aes-xts-plain64 4kB   71  56

I got the two AES lines backwards.  Sorry about that.

mh

-- 
Martin Hicks P.Eng.|  m...@bork.org
Bork Consulting Inc.   |  +1 (613) 266-2296
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode

2015-03-02 Thread Martin Hicks

On Mon, Mar 02, 2015 at 03:25:56PM +0200, Horia Geantă wrote:
 On 2/20/2015 7:00 PM, Martin Hicks wrote:
  This adds the AES-XTS mode, supported by the Freescale SEC 3.3.2.
  
  One of the nice things about this hardware is that it knows how to deal
  with encrypt/decrypt requests that are larger than sector size, but that 
  also requires that that the sector size be passed into the crypto engine
  as an XTS cipher context parameter.
  
  When a request is larger than the sector size the sector number is
  incremented by the talitos engine and the tweak key is re-calculated
  for the new sector.
  
  I've tested this with 256bit and 512bit keys (tweak and data keys of 128bit
  and 256bit) to ensure interoperability with the software AES-XTS
  implementation.  All testing was done using dm-crypt/LUKS with
  aes-xts-plain64.
  
  Is there a better solution that just hard coding the sector size to
  (1SECTOR_SHIFT)?  Maybe dm-crypt should be modified to pass the
  sector size along with the plain/plain64 IV to an XTS algorithm?
 
 AFAICT, SW implementation of xts mode in kernel (crypto/xts.c) is not
 aware of a sector size (data unit size in IEEE P1619 terminology):
 There's a hidden assumption that all the data send to xts in one request
 belongs to a single sector. Even more, it's supposed that the first
 16-byte block in the request is block 0 in the sector. These can be
 seen from the way the tweak (T) value is computed.
 (Side note: there's no support of ciphertext stealing in crypto/xts.c -
 i.e. sector sizes must be a multiple of underlying block cipher size -
 that is 16B.)
 
 If dm-crypt would be modified to pass sector size somehow, all in-kernel
 xts implementations would have to be made aware of the change.
 I have nothing against this, but let's see what crypto maintainers have
 to say...

Right.  Additionally, there may be some requirement for the encryption
implementation to broadcast the maximum size that can be handled in a single
request.  For example Talitos could handle XTS encrypt/decrypt requests of
up to 64kB (regardless of the block device's sector size).

 BTW, there were some discussions back in 2013 wrt. being able to
 configure / increase sector size, smth. crypto engines would benefit from:
 http://www.saout.de/pipermail/dm-crypt/2013-January/003125.html
 (experimental patch)
 http://www.saout.de/pipermail/dm-crypt/2013-March/003202.html
 
 The experimental patch sends sector size as the req-nbytes - hidden
 assumption: data size sent in an xts crypto request equals a sector.

I found this last week, and used it as a starting point for some testing.  I
modified it to keep the underlying sector size of the dm-crypt mapping as
512byte, but allowed the code to combine requests in IOs up to 4kB.  Doing
greater request sizes would require allocating additional pages...I plan to
implement that to see how much extra performance can be squeezed out.

patch below...

With regards to performance, with my low-powered Freescale P1022 board, I see
performance numbers like this on ext4, as measured by bonnie++.

Write (MB/s)Read (MB/s)
Unencrypted 140 176
aes-xts-plain64 512b113 115
aes-xts-plain64 4kB 71  56

The more detailed bonnie++ output is here:
http://www.bork.org/~mort/dm-crypt-enc-blksize.html

The larger IO sizes is a huge win for this board.

The patch I'm using to send IOs up to 4kB to talitos follows.

Thanks,
mh


diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 08981be..88e95b5 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -42,6 +42,7 @@ struct convert_context {
struct bvec_iter iter_out;
sector_t cc_sector;
atomic_t cc_pending;
+   unsigned int block_size;
struct ablkcipher_request *req;
 };
 
@@ -142,6 +143,8 @@ struct crypt_config {
sector_t iv_offset;
unsigned int iv_size;
 
+   unsigned int block_size;
+
/* ESSIV: struct crypto_cipher *essiv_tfm */
void *iv_private;
struct crypto_ablkcipher **tfms;
@@ -801,10 +804,17 @@ static void crypt_convert_init(struct crypt_config *cc,
 {
ctx-bio_in = bio_in;
ctx-bio_out = bio_out;
-   if (bio_in)
+   ctx-block_size = 0;
+   if (bio_in) {
ctx-iter_in = bio_in-bi_iter;
-   if (bio_out)
+   ctx-block_size = max(ctx-block_size, bio_cur_bytes(bio_in));
+   }
+   if (bio_out) {
ctx-iter_out = bio_out-bi_iter;
+   ctx-block_size = max(ctx-block_size, bio_cur_bytes(bio_out));
+   }
+   if (ctx-block_size  cc-block_size)
+   ctx-block_size = cc-block_size;
ctx-cc_sector = sector + cc-iv_offset;
init_completion(ctx-restart);
 }
@@ -844,15 +854,15 @@ static int crypt_convert_block(struct crypt_config *cc,
dmreq-iv_sector = ctx-cc_sector;
dmreq-ctx = ctx;
sg_init_table(dmreq-sg_in, 1);
-

[PATCH 4/5] mm: split ET_DYN ASLR from mmap ASLR

2015-03-02 Thread Kees Cook

This fixes the offset2lib weakness in ASLR for arm, arm64, mips,
powerpc, and x86. The problem is that if there is a leak of ASLR from
the executable (ET_DYN), it means a leak of shared library offset as
well (mmap), and vice versa. Further details and a PoC of this attack
are available here:
http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html

With this patch, a PIE linked executable (ET_DYN) has its own ASLR region:

$ ./show_mmaps_pie
54859ccd6000-54859ccd7000 r-xp  ...  /tmp/show_mmaps_pie
54859ced6000-54859ced7000 r--p  ...  /tmp/show_mmaps_pie
54859ced7000-54859ced8000 rw-p  ...  /tmp/show_mmaps_pie
7f75be764000-7f75be91f000 r-xp  ...  /lib/x86_64-linux-gnu/libc.so.6
7f75be91f000-7f75beb1f000 ---p  ...  /lib/x86_64-linux-gnu/libc.so.6
7f75beb1f000-7f75beb23000 r--p  ...  /lib/x86_64-linux-gnu/libc.so.6
7f75beb23000-7f75beb25000 rw-p  ...  /lib/x86_64-linux-gnu/libc.so.6
7f75beb25000-7f75beb2a000 rw-p  ...
7f75beb2a000-7f75beb4d000 r-xp  ...  /lib64/ld-linux-x86-64.so.2
7f75bed45000-7f75bed46000 rw-p  ...
7f75bed46000-7f75bed47000 r-xp  ...
7f75bed47000-7f75bed4c000 rw-p  ...
7f75bed4c000-7f75bed4d000 r--p  ...  /lib64/ld-linux-x86-64.so.2
7f75bed4d000-7f75bed4e000 rw-p  ...  /lib64/ld-linux-x86-64.so.2
7f75bed4e000-7f75bed4f000 rw-p  ...
7fffb3741000-7fffb3762000 rw-p  ...  [stack]
7fffb377b000-7fffb377d000 r--p  ...  [vvar]
7fffb377d000-7fffb377f000 r-xp  ...  [vdso]

The change is to add a call the newly created arch_mmap_rnd() into the
ELF loader for handling ET_DYN ASLR in a separate region from mmap ASLR,
as already done on s390. Removes CONFIG_BINFMT_ELF_RANDOMIZE_PIE, which
is no longer needed.

Reported-by: Hector Marco-Gisbert hecma...@upv.es
Signed-off-by: Kees Cook keesc...@chromium.org
---
 arch/arm/Kconfig|  1 -
 arch/arm64/Kconfig  |  1 -
 arch/mips/Kconfig   |  1 -
 arch/powerpc/Kconfig|  1 -
 arch/s390/include/asm/elf.h |  4 ++--
 arch/x86/Kconfig|  1 -
 fs/Kconfig.binfmt   |  3 ---
 fs/binfmt_elf.c | 17 ++---
 8 files changed, 4 insertions(+), 25 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 248d99cabaa8..e2f0ef9c6ee3 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1,7 +1,6 @@
 config ARM
bool
default y
-   select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5f469095e0e2..07e0fc7adc88 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1,6 +1,5 @@
 config ARM64
def_bool y
-   select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 72ce5cece768..557c5f1772c1 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -23,7 +23,6 @@ config MIPS
select HAVE_KRETPROBES
select HAVE_DEBUG_KMEMLEAK
select HAVE_SYSCALL_TRACEPOINTS
-   select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ELF_RANDOMIZE
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES  64BIT
select RTC_LIB if !MACH_LOONGSON
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 14fe1c411489..910fa4f9ad1e 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -88,7 +88,6 @@ config PPC
select ARCH_MIGHT_HAVE_PC_PARPORT
select ARCH_MIGHT_HAVE_PC_SERIO
select BINFMT_ELF
-   select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ELF_RANDOMIZE
select OF
select OF_EARLY_FLATTREE
diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h
index 9ed68e7ee856..617f7fabdb0a 100644
--- a/arch/s390/include/asm/elf.h
+++ b/arch/s390/include/asm/elf.h
@@ -163,9 +163,9 @@ extern unsigned int vdso_enabled;
the loader.  We need to make sure that it is out of the way of the program
that it will exec, and that there is sufficient room for the brk. 64-bit
tasks are aligned to 4GB. */
-#define ELF_ET_DYN_BASE (arch_mmap_rnd() + (is_32bit_task() ? \
+#define ELF_ET_DYN_BASE(is_32bit_task() ? \
(STACK_TOP / 3 * 2) : \
-   (STACK_TOP / 3 * 2)  ~((1UL  32) - 1)))
+   (STACK_TOP / 3 * 2)  ~((1UL  32) - 1))
 
 /* This yields a mask that user programs can use to figure out what
instruction set this CPU supports. */
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9aa91727fbf8..328be0fab910 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -87,7 +87,6 @@ config X86
select HAVE_ARCH_KMEMCHECK
select HAVE_ARCH_KASAN if X86_64  SPARSEMEM_VMEMMAP
select HAVE_USER_RETURN_NOTIFIER
-   select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select

[PATCH 3/5] mm: move randomize_et_dyn into ELF_ET_DYN_BASE

2015-03-02 Thread Kees Cook

In preparation for moving ET_DYN randomization into the ELF loader
(which requires a static ELF_ET_DYN_BASE), this redefines s390's existing
ET_DYN randomization away from a separate function (randomize_et_dyn)
and into ELF_ET_DYN_BASE and a call to arch_mmap_rnd(). This refactoring
results in the same ET_DYN randomization on s390. Additionally removes
a copy/pasted unused arm64 extern.

Signed-off-by: Kees Cook keesc...@chromium.org
---
 arch/arm64/include/asm/elf.h |  1 -
 arch/s390/include/asm/elf.h  |  9 +
 arch/s390/mm/mmap.c  | 11 ---
 3 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index 1f65be393139..f724db00b235 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -125,7 +125,6 @@ typedef struct user_fpsimd_state elf_fpregset_t;
  * the loader.  We need to make sure that it is out of the way of the program
  * that it will exec, and that there is sufficient room for the brk.
  */
-extern unsigned long randomize_et_dyn(unsigned long base);
 #define ELF_ET_DYN_BASE(2 * TASK_SIZE_64 / 3)
 
 /*
diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h
index c9df40b5c0ac..9ed68e7ee856 100644
--- a/arch/s390/include/asm/elf.h
+++ b/arch/s390/include/asm/elf.h
@@ -161,10 +161,11 @@ extern unsigned int vdso_enabled;
 /* This is the location that an ET_DYN program is loaded if exec'ed.  Typical
use of this is to invoke ./ld.so someprog to test out a new version of
the loader.  We need to make sure that it is out of the way of the program
-   that it will exec, and that there is sufficient room for the brk.  */
-
-extern unsigned long randomize_et_dyn(void);
-#define ELF_ET_DYN_BASErandomize_et_dyn()
+   that it will exec, and that there is sufficient room for the brk. 64-bit
+   tasks are aligned to 4GB. */
+#define ELF_ET_DYN_BASE (arch_mmap_rnd() + (is_32bit_task() ? \
+   (STACK_TOP / 3 * 2) : \
+   (STACK_TOP / 3 * 2)  ~((1UL  32) - 1)))
 
 /* This yields a mask that user programs can use to figure out what
instruction set this CPU supports. */
diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index 77759e35671b..ec4c20448aef 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -179,17 +179,6 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
return addr;
 }
 
-unsigned long randomize_et_dyn(void)
-{
-   unsigned long base;
-
-   base = STACK_TOP / 3 * 2;
-   if (!is_32bit_task())
-   /* Align to 4GB */
-   base = ~((1UL  32) - 1);
-   return base + arch_mmap_rnd();
-}
-
 #ifndef CONFIG_64BIT
 
 /*
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/5] arm: factor out mmap ASLR into mmap_rnd

2015-03-02 Thread Kees Cook

In preparation for exporting per-arch mmap randomization functions,
this moves the ASLR calculations for mmap on ARM into a separate routine.

Signed-off-by: Kees Cook keesc...@chromium.org
---
 arch/arm/mm/mmap.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c
index 5e85ed371364..0f8bc158f2c6 100644
--- a/arch/arm/mm/mmap.c
+++ b/arch/arm/mm/mmap.c
@@ -169,14 +169,21 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
return addr;
 }
 
-void arch_pick_mmap_layout(struct mm_struct *mm)
+static unsigned long mmap_rnd(void)
 {
-   unsigned long random_factor = 0UL;
+   unsigned long rnd = 0UL;
 
/* 8 bits of randomness in 20 address space bits */
if ((current-flags  PF_RANDOMIZE) 
!(current-personality  ADDR_NO_RANDOMIZE))
-   random_factor = (get_random_int() % (1  8))  PAGE_SHIFT;
+   rnd = (get_random_int() % (1  8))  PAGE_SHIFT;
+
+   return rnd;
+}
+
+void arch_pick_mmap_layout(struct mm_struct *mm)
+{
+   unsigned long random_factor = mmap_rnd();
 
if (mmap_is_legacy()) {
mm-mmap_base = TASK_UNMAPPED_BASE + random_factor;
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2 0/5] split ET_DYN ASLR from mmap ASLR

2015-03-02 Thread Kees Cook

To address the offset2lib ASLR weakness[1], this separates ET_DYN
ASLR from mmap ASLR, as already done on s390. The architectures
that are already randomizing mmap (arm, arm64, mips, powerpc, s390,
and x86), have their various forms of arch_mmap_rnd() made available
via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures,
arch_randomize_brk() is collapsed as well.

This is an alternative to the solutions in:
https://lkml.org/lkml/2015/2/23/442

Thanks!

-Kees

[1] http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html

---
v2:
- verbosified the commit logs, especially 4/5 (akpm)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/5] mm: expose arch_mmap_rnd when available

2015-03-02 Thread Kees Cook

When an architecture fully supports randomizing the ELF load location, a
per-arch mmap_rnd() function is used to finding a randomized
mmap base. In preparation for randomizing the location of ET_DYN binaries
separately from mmap, this renames and exports these functions as
arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
for describing this feature on architectures that support it (which is a
superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390 already does this
witout the ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).

Signed-off-by: Kees Cook keesc...@chromium.org
---
 arch/Kconfig  |  7 +++
 arch/arm/Kconfig  |  1 +
 arch/arm/mm/mmap.c|  4 ++--
 arch/arm64/Kconfig|  1 +
 arch/arm64/mm/mmap.c  |  4 ++--
 arch/mips/Kconfig |  1 +
 arch/mips/mm/mmap.c   |  9 ++---
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/mm/mmap.c|  4 ++--
 arch/s390/Kconfig |  1 +
 arch/s390/mm/mmap.c   |  8 
 arch/x86/Kconfig  |  1 +
 arch/x86/mm/mmap.c|  6 +++---
 fs/binfmt_elf.c   |  1 +
 include/linux/elf-randomize.h | 10 ++
 15 files changed, 43 insertions(+), 16 deletions(-)
 create mode 100644 include/linux/elf-randomize.h

diff --git a/arch/Kconfig b/arch/Kconfig
index 05d7a8a458d5..e315cc79ebe7 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -484,6 +484,13 @@ config HAVE_IRQ_EXIT_ON_IRQ_STACK
  This spares a stack switch and improves cache usage on softirq
  processing.
 
+config ARCH_HAS_ELF_RANDOMIZE
+   bool
+   help
+ An architecture supports choosing randomized locations for
+ stack, mmap, brk, and ET_DYN. Defined functions:
+ - arch_mmap_rnd(), must respect (current-flags  PF_RANDOMIZE)
+
 #
 # ABI hall of shame
 #
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 9f1f09a2bc9b..248d99cabaa8 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
default y
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
+   select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_HAS_GCOV_PROFILE_ALL
diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c
index 0f8bc158f2c6..3c1fedb034bb 100644
--- a/arch/arm/mm/mmap.c
+++ b/arch/arm/mm/mmap.c
@@ -169,7 +169,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const 
unsigned long addr0,
return addr;
 }
 
-static unsigned long mmap_rnd(void)
+unsigned long arch_mmap_rnd(void)
 {
unsigned long rnd = 0UL;
 
@@ -183,7 +183,7 @@ static unsigned long mmap_rnd(void)
 
 void arch_pick_mmap_layout(struct mm_struct *mm)
 {
-   unsigned long random_factor = mmap_rnd();
+   unsigned long random_factor = arch_mmap_rnd();
 
if (mmap_is_legacy()) {
mm-mmap_base = TASK_UNMAPPED_BASE + random_factor;
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1b8e97331ffb..5f469095e0e2 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2,6 +2,7 @@ config ARM64
def_bool y
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
+   select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_SG_CHAIN
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index 54922d1275b8..b7117cb4bc07 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -47,7 +47,7 @@ static int mmap_is_legacy(void)
return sysctl_legacy_va_layout;
 }
 
-static unsigned long mmap_rnd(void)
+unsigned long arch_mmap_rnd(void)
 {
unsigned long rnd = 0;
 
@@ -66,7 +66,7 @@ static unsigned long mmap_base(void)
else if (gap  MAX_GAP)
gap = MAX_GAP;
 
-   return PAGE_ALIGN(STACK_TOP - gap - mmap_rnd());
+   return PAGE_ALIGN(STACK_TOP - gap - arch_mmap_rnd());
 }
 
 /*
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index c7a16904cd03..72ce5cece768 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -24,6 +24,7 @@ config MIPS
select HAVE_DEBUG_KMEMLEAK
select HAVE_SYSCALL_TRACEPOINTS
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
+   select ARCH_HAS_ELF_RANDOMIZE
select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES  64BIT
select RTC_LIB if !MACH_LOONGSON
select GENERIC_ATOMIC64 if !64BIT
diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
index f1baadd56e82..d32490d99671 100644
--- a/arch/mips/mm/mmap.c
+++ b/arch/mips/mm/mmap.c
@@ -164,9 +164,12 @@ void arch_pick_mmap_layout(struct mm_struct *mm)
}
 }
 
-static inline unsigned long brk_rnd(void)
+unsigned long arch_mmap_rnd(void)
 {
-   unsigned long rnd = get_random_int();
+   unsigned long rnd = 0;
+
+   if

[PATCH 5/5] mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE

2015-03-02 Thread Kees Cook

The arch_randomize_brk() function is used on several architectures,
even those that don't support ET_DYN ASLR. To avoid bulky extern/#define
tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for
the architectures that support it, while still handling CONFIG_COMPAT_BRK.

Signed-off-by: Kees Cook keesc...@chromium.org
---
 arch/Kconfig   |  1 +
 arch/arm/include/asm/elf.h |  4 
 arch/arm64/include/asm/elf.h   |  4 
 arch/mips/include/asm/elf.h|  4 
 arch/powerpc/include/asm/elf.h |  4 
 arch/s390/include/asm/elf.h|  3 ---
 arch/x86/include/asm/elf.h |  3 ---
 fs/binfmt_elf.c|  4 +---
 include/linux/elf-randomize.h  | 12 
 9 files changed, 14 insertions(+), 25 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index e315cc79ebe7..1c7e98f137db 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -490,6 +490,7 @@ config ARCH_HAS_ELF_RANDOMIZE
  An architecture supports choosing randomized locations for
  stack, mmap, brk, and ET_DYN. Defined functions:
  - arch_mmap_rnd(), must respect (current-flags  PF_RANDOMIZE)
+ - arch_randomize_brk()
 
 #
 # ABI hall of shame
diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h
index afb9cafd3786..c1ff8ab12914 100644
--- a/arch/arm/include/asm/elf.h
+++ b/arch/arm/include/asm/elf.h
@@ -125,10 +125,6 @@ int dump_task_regs(struct task_struct *t, elf_gregset_t 
*elfregs);
 extern void elf_set_personality(const struct elf32_hdr *);
 #define SET_PERSONALITY(ex)elf_set_personality((ex))
 
-struct mm_struct;
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
 #ifdef CONFIG_MMU
 #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
 struct linux_binprm;
diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index f724db00b235..faad6df49e5b 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -156,10 +156,6 @@ extern int arch_setup_additional_pages(struct linux_binprm 
*bprm,
 #define STACK_RND_MASK (0x3  (PAGE_SHIFT - 12))
 #endif
 
-struct mm_struct;
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
 #ifdef CONFIG_COMPAT
 
 #ifdef __AARCH64EB__
diff --git a/arch/mips/include/asm/elf.h b/arch/mips/include/asm/elf.h
index 535f196ffe02..31d747d46a23 100644
--- a/arch/mips/include/asm/elf.h
+++ b/arch/mips/include/asm/elf.h
@@ -410,10 +410,6 @@ struct linux_binprm;
 extern int arch_setup_additional_pages(struct linux_binprm *bprm,
   int uses_interp);
 
-struct mm_struct;
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
 struct arch_elf_state {
int fp_abi;
int interp_fp_abi;
diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
index 57d289acb803..ee46ffef608e 100644
--- a/arch/powerpc/include/asm/elf.h
+++ b/arch/powerpc/include/asm/elf.h
@@ -128,10 +128,6 @@ extern int arch_setup_additional_pages(struct linux_binprm 
*bprm,
(0x7ff  (PAGE_SHIFT - 12)) : \
(0x3  (PAGE_SHIFT - 12)))
 
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
-
 #ifdef CONFIG_SPU_BASE
 /* Notes used in ET_CORE. Note name is SPU/fd/filename. */
 #define NT_SPU 1
diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h
index 617f7fabdb0a..7cc271003ff6 100644
--- a/arch/s390/include/asm/elf.h
+++ b/arch/s390/include/asm/elf.h
@@ -226,9 +226,6 @@ struct linux_binprm;
 #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1
 int arch_setup_additional_pages(struct linux_binprm *, int);
 
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
 void *fill_cpu_elf_notes(void *ptr, struct save_area *sa, __vector128 *vxrs);
 
 #endif
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index ca3347a9dab5..bbdace22daf8 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -338,9 +338,6 @@ extern int compat_arch_setup_additional_pages(struct 
linux_binprm *bprm,
  int uses_interp);
 #define compat_arch_setup_additional_pages compat_arch_setup_additional_pages
 
-extern unsigned long arch_randomize_brk(struct mm_struct *mm);
-#define arch_randomize_brk arch_randomize_brk
-
 /*
  * True on X86_32 or when emulating IA32 on X86_64
  */
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 203c2e6f9a25..96459c18d1eb 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1041,15 +1041,13 @@ static int load_elf_binary(struct linux_binprm *bprm)
current-mm-end_data = end_data;
current-mm-start_stack = bprm-p;
 
-#ifdef arch_randomize_brk
if ((current-flags  PF_RANDOMIZE)  (randomize_va_space  1)) {

[PATCH 3/4 RFC] fsl/msi: Add MSI bank allocation for kernel owned devices

2015-03-02 Thread Bharat Bhushan

With this patch a context can allocate a MSI bank and use the
allocated MSI-bank for the devices in that context.

kernel/host context is NULL, So all devices owned by kernel
will share a MSI bank allocated with context = NULL.

This patch is in direction to have separate MSI bank for kernel
context and userspace/VM context. We do not want two software
context (kernel and VMs) to share a MSI bank for safe/reliable
interrupts with full isolation. Follow up patch will add interface
to allocate a MSI bank for userspace/VM context.

NOTE: This RFC patch allows only one MSI bank to be allocated for
kernel context. Which seems to be sufficient to me. But if we see this
is limiting some real usecase scanerio then this limitation can be
removed

One issue which still need to addressed is when to free kernel
context allocated MSI bank? Say all MSI capable devices are assigned
to VM/userspace then there is no need to have any MSI bank reserved
for kernel context.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/sysdev/fsl_msi.c | 88 ++-
 arch/powerpc/sysdev/fsl_msi.h |  4 ++
 2 files changed, 83 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 32ba1e3..027aeeb 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -142,6 +142,79 @@ static void fsl_teardown_msi_irqs(struct pci_dev *pdev)
return;
 }
 
+/*
+ * Allocate a MSI Bank for the requested context.
+ * NULL context means that this request is to allocate
+ * MSI bank for kernel owned devices. And currently we
+ * assume that one MSI bank is sufficient for kernel.
+ */
+static struct fsl_msi *fsl_msi_allocate_msi_bank(void *context)
+{
+   struct fsl_msi *msi_data;
+
+   /* Kernel context (NULL) can reserve only one msi bank */
+   if (!context) {
+   list_for_each_entry(msi_data, msi_head, list) {
+   if ((msi_data-reserved == MSI_RESERVED) 
+   (msi_data-context == NULL))
+   return NULL;
+   }
+   }
+
+   list_for_each_entry(msi_data, msi_head, list) {
+   if (msi_data-reserved == MSI_FREE) {
+   msi_data-reserved = MSI_RESERVED;
+   msi_data-context = context;
+   return msi_data;
+   }
+   }
+
+   return NULL;
+}
+
+/* FIXME: Assumption that host kernel will allocate only one MSI bank */
+ __attribute__ ((unused)) static int fsl_msi_free_msi_bank(void *context)
+{
+   struct fsl_msi *msi_data;
+
+   list_for_each_entry(msi_data, msi_head, list) {
+   if ((msi_data-reserved == MSI_RESERVED) 
+(msi_data-context == context)) {
+   msi_data-reserved = MSI_FREE;
+   msi_data-context = NULL;
+   return 0;
+   }
+   }
+   return -ENODEV;
+}
+
+/*  This API returns the allocated MSI bank of context
+ *  to which pdev device belongs.
+ *  All kernel owned devices have NULL context. All devices
+ *  in same context will share the allocated MSI bank.
+ *
+ *  Note: If no MSI bank allocated to kernel context then
+ *  we allocate a MSI bank here.
+ */
+static struct fsl_msi *fsl_msi_get_reserved_msi_bank(struct pci_dev *pdev)
+{
+   struct fsl_msi *msi_data = NULL;
+   void *context = NULL;
+
+   list_for_each_entry(msi_data, msi_head, list) {
+   if ((msi_data-reserved == MSI_RESERVED) 
+   (msi_data-context == context))
+   return msi_data;
+   }
+
+   /* If no MSI bank allocated for kernel owned device, allocate one */
+   msi_data = fsl_msi_allocate_msi_bank(NULL);
+   if (msi_data)
+   return msi_data;
+
+   return NULL;
+}
+
 static void fsl_compose_msi_msg(struct pci_dev *pdev, int hwirq,
struct msi_msg *msg,
struct fsl_msi *fsl_msi_data)
@@ -174,7 +247,7 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int 
nvec, int type)
struct pci_controller *hose = pci_bus_to_host(pdev-bus);
struct device_node *np;
phandle phandle = 0;
-   int rc, hwirq = -ENOMEM;
+   int rc = -ENODEV, hwirq = -ENOMEM;
unsigned int virq;
struct msi_desc *entry;
struct msi_msg msg;
@@ -231,15 +304,12 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int 
nvec, int type)
if (specific_msi_bank) {
hwirq = msi_bitmap_alloc_hwirqs(msi_data-bitmap, 1);
} else {
-   /*
-* Loop over all the MSI devices until we find one that 
has an
-* available interrupt.
-*/
-   list_for_each_entry(msi_data, msi_head, list) {
-

Re: [PATCH v2 0/5] split ET_DYN ASLR from mmap ASLR

2015-03-02 Thread Ingo Molnar


* Kees Cook keesc...@chromium.org wrote:

 To address the offset2lib ASLR weakness[1], this separates ET_DYN
 ASLR from mmap ASLR, as already done on s390. The architectures
 that are already randomizing mmap (arm, arm64, mips, powerpc, s390,
 and x86), have their various forms of arch_mmap_rnd() made available
 via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures,
 arch_randomize_brk() is collapsed as well.
 
 This is an alternative to the solutions in:
 https://lkml.org/lkml/2015/2/23/442

Looks good so far:

Reviewed-by: Ingo Molnar mi...@kernel.org

While reviewing this series I also noticed that the following code 
could be factored out from architecture mmap code as well:

  - arch_pick_mmap_layout() uses very similar patterns across the 
platforms, with only few variations. Many architectures use 
the same duplicated mmap_is_legacy() helper as well. There's 
usually just trivial differences between mmap_legacy_base() 
approaches as well.

  - arch_mmap_rnd(): the PF_RANDOMIZE checks are needlessly
exposed to the arch routine - the arch routine should only 
concentrate on arch details, not generic flags like
PF_RANDOMIZE.

In theory the mmap layout could be fully parametrized as well: i.e. no 
callback functions to architectures by default at all: just 
declarations of bits of randomization desired (or, available address 
space bits), and perhaps an arch helper to allow 32-bit vs. 64-bit 
address space distinctions.

'Weird' architectures could provide special routines, but only by 
overriding the default behavior, which should be generic, safe and 
robust.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 4/4 RFC] fsl/msi: Add interface to reserve/free msi bank

2015-03-02 Thread Bharat Bhushan

This patch allows a context (different from kernel context)
to reserve a MSI bank for itself. And then the devices in the
context will share the MSI bank.

VFIO meta driver is one of typical user of these APIs. It will
reserve a MSI bank for MSI interrupt support of direct assignment
PCI devices to a Guest. Patches for same will follow this patch.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/device.h  |   2 +
 arch/powerpc/include/asm/fsl_msi.h |  26 ++
 arch/powerpc/sysdev/fsl_msi.c  | 169 +++--
 arch/powerpc/sysdev/fsl_msi.h  |   1 +
 4 files changed, 173 insertions(+), 25 deletions(-)
 create mode 100644 arch/powerpc/include/asm/fsl_msi.h

diff --git a/arch/powerpc/include/asm/device.h 
b/arch/powerpc/include/asm/device.h
index 38faede..1c2bfd7 100644
--- a/arch/powerpc/include/asm/device.h
+++ b/arch/powerpc/include/asm/device.h
@@ -40,6 +40,8 @@ struct dev_archdata {
 #ifdef CONFIG_FAIL_IOMMU
int fail_iommu;
 #endif
+
+   void *context;
 };
 
 struct pdev_archdata {
diff --git a/arch/powerpc/include/asm/fsl_msi.h 
b/arch/powerpc/include/asm/fsl_msi.h
new file mode 100644
index 000..e9041c2
--- /dev/null
+++ b/arch/powerpc/include/asm/fsl_msi.h
@@ -0,0 +1,26 @@
+/*
+ * Copyright (C) 2014 Freescale Semiconductor, Inc. All rights reserved.
+ *
+ * Author: Bharat Bhushan bharat.bhus...@freescale.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2 of the
+ * License.
+ *
+ */
+
+#ifndef _POWERPC_FSL_MSI_H
+#define _POWERPC_FSL_MSI_H
+
+extern int fsl_msi_set_msi_bank_region(struct iommu_domain *domain,
+  void *context, int win,
+  dma_addr_t iova, int prot);
+extern int fsl_msi_clear_msi_bank_region(struct iommu_domain *domain,
+struct iommu_group *iommu_group,
+int win, dma_addr_t iova);
+extern struct fsl_msi *fsl_msi_reserve_msi_bank(void *context);
+extern int fsl_msi_unreserve_msi_bank(void *context);
+extern int fsl_msi_set_msi_bank_in_dev(struct device *dev, void *data);
+
+#endif /* _POWERPC_FSL_MSI_H */
diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 027aeeb..75cd196 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -25,6 +25,7 @@
 #include asm/ppc-pci.h
 #include asm/mpic.h
 #include asm/fsl_hcalls.h
+#include linux/iommu.h
 
 #include fsl_msi.h
 #include fsl_pci.h
@@ -172,22 +173,6 @@ static struct fsl_msi *fsl_msi_allocate_msi_bank(void 
*context)
return NULL;
 }
 
-/* FIXME: Assumption that host kernel will allocate only one MSI bank */
- __attribute__ ((unused)) static int fsl_msi_free_msi_bank(void *context)
-{
-   struct fsl_msi *msi_data;
-
-   list_for_each_entry(msi_data, msi_head, list) {
-   if ((msi_data-reserved == MSI_RESERVED) 
-(msi_data-context == context)) {
-   msi_data-reserved = MSI_FREE;
-   msi_data-context = NULL;
-   return 0;
-   }
-   }
-   return -ENODEV;
-}
-
 /*  This API returns the allocated MSI bank of context
  *  to which pdev device belongs.
  *  All kernel owned devices have NULL context. All devices
@@ -200,6 +185,12 @@ static struct fsl_msi 
*fsl_msi_get_reserved_msi_bank(struct pci_dev *pdev)
 {
struct fsl_msi *msi_data = NULL;
void *context = NULL;
+   struct device *dev = pdev-dev;
+
+   /* Device assigned to userspace if there is valid context */
+   if (dev-archdata.context) {
+   context = dev-archdata.context;
+   }
 
list_for_each_entry(msi_data, msi_head, list) {
if ((msi_data-reserved == MSI_RESERVED) 
@@ -208,13 +199,133 @@ static struct fsl_msi 
*fsl_msi_get_reserved_msi_bank(struct pci_dev *pdev)
}
 
/* If no MSI bank allocated for kernel owned device, allocate one */
-   msi_data = fsl_msi_allocate_msi_bank(NULL);
-   if (msi_data)
-   return msi_data;
+   if (!context) {
+   msi_data = fsl_msi_allocate_msi_bank(NULL);
+   if (msi_data)
+   return msi_data;
+   }
 
return NULL;
 }
 
+/* API to set context to which the device belongs */
+int fsl_msi_set_msi_bank_in_dev(struct device *dev, void *data)
+{
+   dev-archdata.context = data;
+   return 0;
+}
+
+/*  This API Allows a MSI bank to be reserved for a context.
+ *  All devices in same context will share the allocated
+ *  MSI bank.
+ *  Typically this function will be called from meta
+ *  driver like VFIO with a valid context.
+ */
+struct fsl_msi *fsl_msi_reserve_msi_bank(void *context)
+{
+   struct fsl_msi *msi_data;
+
+

Re: [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration

2015-03-02 Thread Michael Ellerman

On Mon, 2015-03-02 at 13:30 -0800, Tyrel Datwyler wrote:
 On 03/01/2015 08:19 PM, Cyril Bur wrote:
  On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
  During suspend/migration operation we must wait for the VASI state reported
  by the hypervisor to become Suspending prior to making the ibm,suspend-me
  RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state 
  variable
  that exposes the VASI state to the caller. This is unnecessary as the 
  caller
  only really cares about the following three conditions; if there is an 
  error
  we should bailout, success indicating we have suspended and woken back up 
  so
  proceed to device tree updated, or we are not suspendable yet so try 
  calling
  rtas_ibm_suspend_me again shortly.
 
  This patch removes the extraneous vasi_state variable and simply uses the
  return code to communicate how to proceed. We either succeed, fail, or get
  -EAGAIN in which case we sleep for a second before trying to call
  rtas_ibm_suspend_me again.
 
 u64 handle = ((u64)be32_to_cpu(args.args[0])  32)
   | be32_to_cpu(args.args[1]);
  -  rc = rtas_ibm_suspend_me(handle, vasi_rc);
  -  args.rets[0] = cpu_to_be32(vasi_rc);
  -  if (rc)
  +  rc = rtas_ibm_suspend_me(handle);
  +  if (rc == -EAGAIN)
  +  args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE);
  
  (continuing on...) so perhaps here have
  rc = 0;
  else if (rc == -EIO)
  args.rets[0] = cpu_to_be32(-1);
  rc = 0;
  Which should keep the original behaviour, the last thing we want to do
  is break BE.
 
 The biggest problem here is we are making what basically equates to a
 fake rtas call from drmgr which we intercept in ppc_rtas(). From there
 we make this special call to rtas_ibm_suspend_me() to check VASI state
 and do a bunch of other specialized work that needs to be setup prior to
 making the actual ibm,suspend-me rtas call. Since, we are cheating PAPR
 here I guess we can really handle it however we want. I chose to simply
 fail the rtas call in the case where rtas_ibm_suspend_me() fails with
 something other than -EAGAIN. In user space librtas will log errno for
 the failure and return RTAS_IO_ASSERT to drmgr which in turn will log
 that error and fail.

We don't want to change the return values of the syscall unless we absolutely
have to. And I don't think that's the case here.

Sure we think drmgr is the only thing that uses this crap, but we don't know
for sure.

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/3] powerpc/pseries: Expose post-migration in kernel device tree update to drmgr

2015-03-02 Thread Michael Ellerman

On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
 Traditionally after a migration operation drmgr has coordinated the device 
 tree
 update with the kernel in userspace via the ugly /proc/ppc64/ofdt interface. 
 This
 can be better done fully in the kernel where support already exists. 
 Currently,
 drmgr makes a faux ibm,suspend-me RTAS call which we intercept in the kernel 
 so
 that we can check VASI state for suspendability. After the LPAR resumes and
 returns to drmgr that is followed by the necessary update-nodes and
 update-properties RTAS calls which are parsed and communitated back to the 
 kernel
 through /proc/ppc64/ofdt for the device tree update. The drmgr tool should
 instead initiate the migration using the already existing
 /sysfs/kernel/mobility/migration entry that performs all this work in the 
 kernel.
 
 This patch adds a show function to the sysfs migration attribute that 
 returns
 1 to indicate the kernel will perform the device tree update after a migration
 operation and that drmgr should initiated the migration through the sysfs
 migration attribute.

I don't understand why we need this?

Can't drmgr just check if /sysfs/kernel/mobility/migration exists, and if so it
knows it should use it and that the kernel will handle the whole procedure?

cheers


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2 v4] cpufreq: qoriq: Make the driver usable on all QorIQ platforms

2015-03-02 Thread Yuantian.Tang

From: Tang Yuantian yuantian.t...@freescale.com

Freescale introduced new ARM core-based SoCs which support dynamic
frequency switch feature. DFS on new SoCs are compatible with current
PowerPC CoreNet platforms. In order to support those new platforms,
this driver needs to be updated. The main changes include:

1. Changed the names of functions in driver.
2. Added two new functions get_cpu_physical_id() and get_bus_freq().
3. Used a new way to get the CPU mask which share clock wire.

Signed-off-by: Tang Yuantian yuantian.t...@freescale.com
Acked-by: Viresh Kumar viresh.ku...@linaro.org
---
v4:
- resolve unmet direct dependencies warning
v3:
- put the menu entries into Kconfig
v2:
- split the name change into a separete patch
- use policy-driver_data instead of per_cpu variable

 drivers/cpufreq/Kconfig   |   8 ++
 drivers/cpufreq/Kconfig.powerpc   |   9 --
 drivers/cpufreq/ppc-corenet-cpufreq.c | 160 +-
 3 files changed, 107 insertions(+), 70 deletions(-)

diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index a171fef..659879a 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -293,5 +293,13 @@ config SH_CPU_FREQ
  If unsure, say N.
 endif
 
+config QORIQ_CPUFREQ
+   tristate CPU frequency scaling driver for Freescale QorIQ SoCs
+   depends on OF  COMMON_CLK  (PPC_E500MC || ARM)
+   select CLK_QORIQ
+   help
+ This adds the CPUFreq driver support for Freescale QorIQ SoCs
+ which are capable of changing the CPU's frequency dynamically.
+
 endif
 endmenu
diff --git a/drivers/cpufreq/Kconfig.powerpc b/drivers/cpufreq/Kconfig.powerpc
index 7ea2441..3a0595b 100644
--- a/drivers/cpufreq/Kconfig.powerpc
+++ b/drivers/cpufreq/Kconfig.powerpc
@@ -23,15 +23,6 @@ config CPU_FREQ_MAPLE
  This adds support for frequency switching on Maple 970FX
  Evaluation Board and compatible boards (IBM JS2x blades).
 
-config PPC_CORENET_CPUFREQ
-   tristate CPU frequency scaling driver for Freescale E500MC SoCs
-   depends on PPC_E500MC  OF  COMMON_CLK
-   select CLK_QORIQ
-   help
- This adds the CPUFreq driver support for Freescale e500mc,
- e5500 and e6500 series SoCs which are capable of changing
- the CPU's frequency dynamically.
-
 config CPU_FREQ_PMAC
bool Support for Apple PowerBooks
depends on ADB_PMU  PPC32
diff --git a/drivers/cpufreq/ppc-corenet-cpufreq.c 
b/drivers/cpufreq/ppc-corenet-cpufreq.c
index bee5df7..949d992 100644
--- a/drivers/cpufreq/ppc-corenet-cpufreq.c
+++ b/drivers/cpufreq/ppc-corenet-cpufreq.c
@@ -1,7 +1,7 @@
 /*
  * Copyright 2013 Freescale Semiconductor, Inc.
  *
- * CPU Frequency Scaling driver for Freescale PowerPC corenet SoCs.
+ * CPU Frequency Scaling driver for Freescale QorIQ SoCs.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -20,10 +20,9 @@
 #include linux/of.h
 #include linux/slab.h
 #include linux/smp.h
-#include sysdev/fsl_soc.h
 
 /**
- * struct cpu_data - per CPU data struct
+ * struct cpu_data
  * @parent: the parent node of cpu clock
  * @table: frequency table
  */
@@ -67,17 +66,78 @@ static const struct soc_data sdata[] = {
 static u32 min_cpufreq;
 static const u32 *fmask;
 
-static DEFINE_PER_CPU(struct cpu_data *, cpu_data);
+#if defined(CONFIG_ARM)
+static int get_cpu_physical_id(int cpu)
+{
+   return topology_core_id(cpu);
+}
+#else
+static int get_cpu_physical_id(int cpu)
+{
+   return get_hard_smp_processor_id(cpu);
+}
+#endif
 
-/* cpumask in a cluster */
-static DEFINE_PER_CPU(cpumask_var_t, cpu_mask);
+static u32 get_bus_freq(void)
+{
+   struct device_node *soc;
+   u32 sysfreq;
+
+   soc = of_find_node_by_type(NULL, soc);
+   if (!soc)
+   return 0;
+
+   if (of_property_read_u32(soc, bus-frequency, sysfreq))
+   sysfreq = 0;
+
+   of_node_put(soc);
+
+   return sysfreq;
+}
 
-#ifndef CONFIG_SMP
-static inline const struct cpumask *cpu_core_mask(int cpu)
+static struct device_node *cpu_to_clk_node(int cpu)
 {
-   return cpumask_of(0);
+   struct device_node *np, *clk_np;
+
+   if (!cpu_present(cpu))
+   return NULL;
+
+   np = of_get_cpu_node(cpu, NULL);
+   if (!np)
+   return NULL;
+
+   clk_np = of_parse_phandle(np, clocks, 0);
+   if (!clk_np)
+   return NULL;
+
+   of_node_put(np);
+
+   return clk_np;
+}
+
+/* traverse cpu nodes to get cpu mask of sharing clock wire */
+static void set_affected_cpus(struct cpufreq_policy *policy)
+{
+   struct device_node *np, *clk_np;
+   struct cpumask *dstp = policy-cpus;
+   int i;
+
+   np = cpu_to_clk_node(policy-cpu);
+   if (!np)
+   return;
+
+   for_each_present_cpu(i) {
+   clk_np = cpu_to_clk_node(i);
+

[PATCH 2/4 RFC] fsl/msi: Move fsl, msi mode specific MSI device search out of main loop

2015-03-02 Thread Bharat Bhushan

Moving out the specific MSI device search out of main loop. And now
the specific msi device search is placed with other fsl.msi specific
code in same function.
This is in preparation to MSI bank partitioning.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/sysdev/fsl_msi.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index ec3161b..32ba1e3 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -178,7 +178,8 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int 
nvec, int type)
unsigned int virq;
struct msi_desc *entry;
struct msi_msg msg;
-   struct fsl_msi *msi_data;
+   struct fsl_msi *msi_data = NULL;
+   bool specific_msi_bank = false;
 
if (type == PCI_CAP_ID_MSIX)
pr_debug(fslmsi: MSI-X untested, trying anyway.\n);
@@ -199,12 +200,9 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int 
nvec, int type)
hose-dn-full_name, np-phandle);
return -EINVAL;
}
-   }
-
-   list_for_each_entry(entry, pdev-msi_list, list) {
/*
-* Loop over all the MSI devices until we find one that has an
-* available interrupt.
+* Loop over all the MSI devices till we find
+* specific MSI device.
 */
list_for_each_entry(msi_data, msi_head, list) {
/*
@@ -215,12 +213,33 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int 
nvec, int type)
 * has the additional benefit of skipping over MSI
 * nodes that are not mapped in the PAMU.
 */
-   if (phandle  (phandle != msi_data-phandle))
-   continue;
+   if (phandle == msi_data-phandle) {
+   specific_msi_bank = true;
+   break;
+   }
+   }
 
+   if (!specific_msi_bank) {
+   dev_err(pdev-dev,
+   No specific MSI device found for node %s\n,
+   hose-dn-full_name);
+   return -EINVAL;
+   }
+   }
+
+   list_for_each_entry(entry, pdev-msi_list, list) {
+   if (specific_msi_bank) {
hwirq = msi_bitmap_alloc_hwirqs(msi_data-bitmap, 1);
-   if (hwirq = 0)
-   break;
+   } else {
+   /*
+* Loop over all the MSI devices until we find one that 
has an
+* available interrupt.
+*/
+   list_for_each_entry(msi_data, msi_head, list) {
+   hwirq = 
msi_bitmap_alloc_hwirqs(msi_data-bitmap, 1);
+   if (hwirq = 0)
+   break;
+   }
}
 
if (hwirq  0) {
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/4 RFC] fsl/msi: have msiir register address absolute rather than offset

2015-03-02 Thread Bharat Bhushan

Having absolute address simplifies the code and also removes the
confusion around feature-msiir_offset and msi_data-msiir_offset.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/sysdev/fsl_msi.c | 9 +++--
 arch/powerpc/sysdev/fsl_msi.h | 2 +-
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 4bbb4b8..ec3161b 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -157,7 +157,7 @@ static void fsl_compose_msi_msg(struct pci_dev *pdev, int 
hwirq,
if (reg  (len == sizeof(u64)))
address = be64_to_cpup(reg);
else
-   address = fsl_pci_immrbar_base(hose) + msi_data-msiir_offset;
+   address = msi_data-msiir;
 
msg-address_lo = lower_32_bits(address);
msg-address_hi = upper_32_bits(address);
@@ -430,18 +430,15 @@ static int fsl_of_msi_probe(struct platform_device *dev)
dev-dev.of_node-full_name);
goto error_out;
}
-   msi-msiir_offset =
-   features-msiir_offset + (res.start  0xf);
 
/*
 * First read the MSIIR/MSIIR1 offset from dts
 * On failure use the hardcode MSIIR offset
 */
if (of_address_to_resource(dev-dev.of_node, 1, msiir))
-   msi-msiir_offset = features-msiir_offset +
-   (res.start  MSIIR_OFFSET_MASK);
+   msi-msiir = res.start + features-msiir_offset;
else
-   msi-msiir_offset = msiir.start  MSIIR_OFFSET_MASK;
+   msi-msiir = msiir.start;
}
 
msi-feature = features-fsl_pic_ip;
diff --git a/arch/powerpc/sysdev/fsl_msi.h b/arch/powerpc/sysdev/fsl_msi.h
index 420cfcb..9b0ab84 100644
--- a/arch/powerpc/sysdev/fsl_msi.h
+++ b/arch/powerpc/sysdev/fsl_msi.h
@@ -34,7 +34,7 @@ struct fsl_msi {
 
unsigned long cascade_irq;
 
-   u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */
+   phys_addr_t msiir; /* MSIIR Address in CCSR */
u32 ibs_shift; /* Shift of interrupt bit select */
u32 srs_shift; /* Shift of the shared interrupt register select */
void __iomem *msi_regs;
-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v6] dmaengine: Driver support for FSL RaidEngine device.

2015-03-02 Thread xuelin.shi

From: Xuelin Shi xuelin@freescale.com

The RaidEngine is a new FSL hardware used for Raid5/6 acceration.
This patch enables the RaidEngine functionality and provides
hardware offloading capability for memcpy, xor and pq computation.
It works with async_tx.

Signed-off-by: Harninder Rai harninder@freescale.com
Signed-off-by: Xuelin Shi xuelin@freescale.com
---
 changes for v6:
   - use dev_err() instead of pr_err()
   - avoid BUG_ON with if   

 changes for v5:
   - align symbol to fsl_re_xxx to avoid namespace issue.
   - switch back to tasklet
   - add xor/pq continuation in to support more than 16 srcs.

 drivers/dma/Kconfig|  11 +
 drivers/dma/Makefile   |   1 +
 drivers/dma/fsl_raid.c | 904 +
 drivers/dma/fsl_raid.h | 306 +
 4 files changed, 1222 insertions(+)
 create mode 100644 drivers/dma/fsl_raid.c
 create mode 100644 drivers/dma/fsl_raid.h

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index f2b2c4e..37397cd 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -125,6 +125,17 @@ config FSL_DMA
  EloPlus is on mpc85xx and mpc86xx and Pxxx parts, and the Elo3 is on
  some Txxx and Bxxx parts.
 
+config FSL_RAID
+tristate Freescale RAID engine Support
+depends on FSL_SOC  !ASYNC_TX_ENABLE_CHANNEL_SWITCH
+select DMA_ENGINE
+select DMA_ENGINE_RAID
+---help---
+  Enable support for Freescale RAID Engine. RAID Engine is
+  available on some QorIQ SoCs (like P5020/P5040). It has
+  the capability to offload memcpy, xor and pq computation
+ for raid5/6.
+
 config MPC512X_DMA
tristate Freescale MPC512x built-in DMA engine support
depends on PPC_MPC512x || PPC_MPC831x
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 2022b54..b3f8d9e 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o
 obj-$(CONFIG_TI_CPPI41) += cppi41.o
 obj-$(CONFIG_K3_DMA) += k3dma.o
 obj-$(CONFIG_MOXART_DMA) += moxart-dma.o
+obj-$(CONFIG_FSL_RAID) += fsl_raid.o
 obj-$(CONFIG_FSL_EDMA) += fsl-edma.o
 obj-$(CONFIG_QCOM_BAM_DMA) += qcom_bam_dma.o
 obj-y += xilinx/
diff --git a/drivers/dma/fsl_raid.c b/drivers/dma/fsl_raid.c
new file mode 100644
index 000..12778bd
--- /dev/null
+++ b/drivers/dma/fsl_raid.c
@@ -0,0 +1,904 @@
+/*
+ * drivers/dma/fsl_raid.c
+ *
+ * Freescale RAID Engine device driver
+ *
+ * Author:
+ * Harninder Rai harninder@freescale.com
+ * Naveen Burmi naveenbu...@freescale.com
+ *
+ * Rewrite:
+ * Xuelin Shi xuelin@freescale.com
+ *
+ * Copyright (c) 2010-2014 Freescale Semiconductor, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License (GPL) as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 
THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Theory of operation:
+ *
+ * General capabilities:
+ * RAID Engine (RE) block is capable of offloading XOR, memcpy and P/Q
+ * calculations required in RAID5 and RAID6 operations. RE driver
+ * registers with Linux's ASYNC layer as dma driver. RE hardware
+ * maintains strict ordering of the requests through chained
+ * command queueing.
+ *
+ * Data flow:
+ * Software RAID layer of Linux (MD layer) maintains RAID partitions,
+ * strips, stripes etc. It sends

[PATCH 2/2 v4] cpufreq: qoriq: rename the driver

2015-03-02 Thread Yuantian.Tang

From: Tang Yuantian yuantian.t...@freescale.com

This driver works on all QorIQ platforms which include
ARM-based cores and PPC-based cores.
Rename it in order to represent better.

Signed-off-by: Tang Yuantian yuantian.t...@freescale.com
Acked-by: Viresh Kumar viresh.ku...@linaro.org
---
v3, v4
- none
v2:
- use -C -M options when format-patch

 drivers/cpufreq/{ppc-corenet-cpufreq.c = qoriq-cpufreq.c} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename drivers/cpufreq/{ppc-corenet-cpufreq.c = qoriq-cpufreq.c} (100%)

diff --git a/drivers/cpufreq/ppc-corenet-cpufreq.c 
b/drivers/cpufreq/qoriq-cpufreq.c
similarity index 100%
rename from drivers/cpufreq/ppc-corenet-cpufreq.c
rename to drivers/cpufreq/qoriq-cpufreq.c
-- 
2.1.0.27.g96db324

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 0/4 RFC] fsl/msi: Add support for MSI bank partitioning

2015-03-02 Thread Bharat Bhushan

With this patchset we add MSI bank partitioning support. MSI bank
partitioning is required for supporting direct device assignment
of MSI capable PCI devices. One MSI bank will be allocated for
kernel context. VFIO can allocate one MSI bank per context.
And all devices in the context will share the MSI bank.

We have limited number of MSI banks (2-4). So to support large
number of context we need to allow sharing of MSI banks. This
patchset does not support sharing of MSI bank but will be done
soon once this patchset take a shape.

These changes are tested with both kernel owned PCI devices and
direct assigned devices using VFIO to guest.

Bharat Bhushan (4):
  fsl/msi: have msiir register address absolute rather than offset
  fsl/msi: Move fsl,msi mode specific MSI device search out of main loop
  fsl/msi: Add MSI bank allocation for kernel owned devices
  fsl/msi: Add interface to reserve/free msi bank

 arch/powerpc/include/asm/device.h  |   2 +
 arch/powerpc/include/asm/fsl_msi.h |  26 
 arch/powerpc/sysdev/fsl_msi.c  | 249 +
 arch/powerpc/sysdev/fsl_msi.h  |   7 +-
 4 files changed, 261 insertions(+), 23 deletions(-)
 create mode 100644 arch/powerpc/include/asm/fsl_msi.h

-- 
1.9.3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code

2015-03-02 Thread Michael Ellerman

On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote:
 This patchset simplifies the usage of rtas_ibm_suspend_me() by removing an
 extraneous function parameter, fixes device tree updating on little endian
 platforms, and adds a mechanism for informing drmgr that the kernel is 
 cabable of
 performing the whole migration including device tree update itself.
 
 Tyrel Datwyler (3):
   powerpc/pseries: Simplify check for suspendability during
 suspend/migration
   powerpc/pseries: Little endian fixes for post mobility device tree
 update
   powerpc/pseries: Expose post-migration in kernel device tree update
 to drmgr

Hi Tyrel,

Firstly let me say how much I hate this code, so thanks for working on it :)

But I need you to split this series, into 1) fixes for 4.0 (and stable?), and
2) the rest.

I *think* that would be patch 2, and then patches 1  3, but I don't want to
guess. So please resend.

cheers




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode

2015-03-02 Thread Milan Broz

On 03/02/2015 02:25 PM, Horia Geantă wrote:
 On 2/20/2015 7:00 PM, Martin Hicks wrote:
 This adds the AES-XTS mode, supported by the Freescale SEC 3.3.2.

 One of the nice things about this hardware is that it knows how to deal
 with encrypt/decrypt requests that are larger than sector size, but that 
 also requires that that the sector size be passed into the crypto engine
 as an XTS cipher context parameter.

 When a request is larger than the sector size the sector number is
 incremented by the talitos engine and the tweak key is re-calculated
 for the new sector.

 I've tested this with 256bit and 512bit keys (tweak and data keys of 128bit
 and 256bit) to ensure interoperability with the software AES-XTS
 implementation.  All testing was done using dm-crypt/LUKS with
 aes-xts-plain64.

 Is there a better solution that just hard coding the sector size to
 (1SECTOR_SHIFT)?  Maybe dm-crypt should be modified to pass the
 sector size along with the plain/plain64 IV to an XTS algorithm?
 
 AFAICT, SW implementation of xts mode in kernel (crypto/xts.c) is not
 aware of a sector size (data unit size in IEEE P1619 terminology):
 There's a hidden assumption that all the data send to xts in one request
 belongs to a single sector. Even more, it's supposed that the first
 16-byte block in the request is block 0 in the sector. These can be
 seen from the way the tweak (T) value is computed.
 (Side note: there's no support of ciphertext stealing in crypto/xts.c -
 i.e. sector sizes must be a multiple of underlying block cipher size -
 that is 16B.)
 
 If dm-crypt would be modified to pass sector size somehow, all in-kernel
 xts implementations would have to be made aware of the change.
 I have nothing against this, but let's see what crypto maintainers have
 to say...
 
 BTW, there were some discussions back in 2013 wrt. being able to
 configure / increase sector size, smth. crypto engines would benefit from:
 http://www.saout.de/pipermail/dm-crypt/2013-January/003125.html
 (experimental patch)
 http://www.saout.de/pipermail/dm-crypt/2013-March/003202.html
 
 The experimental patch sends sector size as the req-nbytes - hidden
 assumption: data size sent in an xts crypto request equals a sector.

There was no follow-up but the idea is not yet abandoned :-)

Dmcrypt will always use sector as a minimal unit
(and I believe sectors will by always multiple of block size so
no need for ciphertext stealing in XTS.)

For now, dmcrypt always use 512 bytes sector size.

If crypto API allows to encrypt more sectors in one run
(handling IV internally) dmcrypt can be modified of course.

But do not forget we can use another IV (not only sequential number)
e.g. ESSIV with XTS as well (even if it doesn't make much sense, some people
are using it).

Maybe the following question would be if the dmcrypt sector IV algorithms
should moved into crypto API as well.
(But because I misused dmcrypt IVs hooks for some additional operations
for loopAES and old Truecrypt CBC mode, it is not so simple...)

Milan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode

2015-03-02 Thread Horia Geantă

On 2/20/2015 7:00 PM, Martin Hicks wrote:
 This adds the AES-XTS mode, supported by the Freescale SEC 3.3.2.
 
 One of the nice things about this hardware is that it knows how to deal
 with encrypt/decrypt requests that are larger than sector size, but that 
 also requires that that the sector size be passed into the crypto engine
 as an XTS cipher context parameter.
 
 When a request is larger than the sector size the sector number is
 incremented by the talitos engine and the tweak key is re-calculated
 for the new sector.
 
 I've tested this with 256bit and 512bit keys (tweak and data keys of 128bit
 and 256bit) to ensure interoperability with the software AES-XTS
 implementation.  All testing was done using dm-crypt/LUKS with
 aes-xts-plain64.
 
 Is there a better solution that just hard coding the sector size to
 (1SECTOR_SHIFT)?  Maybe dm-crypt should be modified to pass the
 sector size along with the plain/plain64 IV to an XTS algorithm?

AFAICT, SW implementation of xts mode in kernel (crypto/xts.c) is not
aware of a sector size (data unit size in IEEE P1619 terminology):
There's a hidden assumption that all the data send to xts in one request
belongs to a single sector. Even more, it's supposed that the first
16-byte block in the request is block 0 in the sector. These can be
seen from the way the tweak (T) value is computed.
(Side note: there's no support of ciphertext stealing in crypto/xts.c -
i.e. sector sizes must be a multiple of underlying block cipher size -
that is 16B.)

If dm-crypt would be modified to pass sector size somehow, all in-kernel
xts implementations would have to be made aware of the change.
I have nothing against this, but let's see what crypto maintainers have
to say...

BTW, there were some discussions back in 2013 wrt. being able to
configure / increase sector size, smth. crypto engines would benefit from:
http://www.saout.de/pipermail/dm-crypt/2013-January/003125.html
(experimental patch)
http://www.saout.de/pipermail/dm-crypt/2013-March/003202.html

The experimental patch sends sector size as the req-nbytes - hidden
assumption: data size sent in an xts crypto request equals a sector.

I am not sure if there was a follow-up though...
Adding Milan - maybe he could shed some light.

Thanks,
Horia

 
 Martin Hicks (2):
   crypto: talitos: Clean ups and comment fixes for ablkcipher commands
   crypto: talitos: Add AES-XTS Support
 
  drivers/crypto/talitos.c |   45 +
  drivers/crypto/talitos.h |1 +
  2 files changed, 38 insertions(+), 8 deletions(-)
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [v1,1/3] crypto: powerpc/sha1 - assembler

2015-03-02 Thread Herbert Xu

On Tue, Feb 24, 2015 at 08:36:40PM +0100, Markus Stockhausen wrote:
 This is the assembler code for SHA1 implementation with
 the SIMD SPE instruction set. With the enhanced instruction 
 set we can operate on 2 32 bit words in parallel. That helps 
 reducing the time to calculate W16-W79. For increasing 
 performance even more the assembler function can compute 
 hashes for more than one 64 byte input block. 
 
 The state of the used SPE registers is preserved via the 
 stack so we can run from interrupt context. There might 
 be the case that we interrupt ourselves and push sensitive 
 data from another context onto our stack. Clear this area
 in the stack afterwards to avoid information leakage.
 
 The code is endian independant.
 
 Signed-off-by: Markus Stockhausen stockhau...@collogia.de

All applied.  Thanks!
-- 
Email: Herbert Xu herb...@gondor.apana.org.au
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] fsl: mpc85xx: call k(un)map_atomic other than k(un)map

2015-03-02 Thread yanjiang.jin

From: Yanjiang Jin yanjiang@windriver.com

The k(un)map function may be called in atomic context in the
function map_and_flush(), so use k(un)map_atomic to replace it,
else we would get the below warning during kdump:

BUG: sleeping function called from invalid context at include/linux/highmem.h:58
in_atomic(): 1, irqs_disabled(): 1, pid: 736, name: sh
INFO: lockdep is turned off.
irq event stamp: 0
hardirqs last  enabled at (0): [  (null)]   (null)
hardirqs last disabled at (0): [c0066d1c] 
.copy_process.part.44+0x50c/0x1360
softirqs last  enabled at (0): [c0066d1c] 
.copy_process.part.44+0x50c/0x1360
softirqs last disabled at (0): [  (null)]   (null)
CPU: 1 PID: 736 Comm: sh Tainted: G  D W3.10.62-ltsi-WR6.0.0.0_standard 
#2
Call Trace:
[c000f47cf120] [c000b150] .show_stack+0x170/0x290 (unreliable)
[c000f47cf210] [c0b71334] .dump_stack+0x28/0x3c
[c000f47cf280] [c00bb5d8] .__might_sleep+0x1a8/0x270
[c000f47cf310] [c00440cc] .map_and_flush+0x4c/0xc0
[c000f47cf390] [c00441cc] .mpc85xx_smp_machine_kexec+0x8c/0xec0
[c000f47cf420] [c002ae00] .machine_kexec+0x60/0x90
[c000f47cf4b0] [c010957c] .crash_kexec+0x8c/0x100
[c000f47cf6a0] [c0015df8] .die+0x348/0x450
[c000f47cf740] [c002f3a0] .bad_page_fault+0xe0/0x130
[c000f47cf7c0] [c001f3e4] storage_fault_common+0x40/0x44

Signed-off-by: Yanjiang Jin yanjiang@windriver.com
---
 arch/powerpc/platforms/85xx/smp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/smp.c 
b/arch/powerpc/platforms/85xx/smp.c
index d7c1e69..8631ac5 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -360,10 +360,10 @@ static void mpc85xx_smp_kexec_down(void *arg)
 static void map_and_flush(unsigned long paddr)
 {
struct page *page = pfn_to_page(paddr  PAGE_SHIFT);
-   unsigned long kaddr  = (unsigned long)kmap(page);
+   unsigned long kaddr  = (unsigned long)kmap_atomic(page);
 
flush_dcache_range(kaddr, kaddr + PAGE_SIZE);
-   kunmap(page);
+   kunmap_atomic((void *)kaddr);
 }
 
 /**
-- 
1.9.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)

2015-03-02 Thread Emil Medve

Hello Scott,


On 03/02/2015 09:32 AM, Emil Medve wrote:
 From: Igal Liberman igal.liber...@freescale.com
 
 Describe the PHY topology for all configurations supported by each board
 
 Based on prior work by Andy Fleming aflem...@gmail.com
 
 Change-Id: I4fbcc5df9ee7c4f784afae9dab5d1e78cdc24f0f

Bah, I'll remove this...

 Signed-off-by: Igal Liberman igal.liber...@freescale.com
 Signed-off-by: Shruti Kanetkar kanetkar.shr...@gmail.com
 Signed-off-by: Emil Medve emilian.me...@freescale.com
 ---
  arch/powerpc/boot/dts/b4860qds.dts|  60 -
  arch/powerpc/boot/dts/b4qds.dtsi  |  51 -
  arch/powerpc/boot/dts/p1023rdb.dts|  24 +-
  arch/powerpc/boot/dts/p2041rdb.dts|  92 +++-
  arch/powerpc/boot/dts/p3041ds.dts | 112 +-
  arch/powerpc/boot/dts/p4080ds.dts | 184 +++-
  arch/powerpc/boot/dts/p5020ds.dts | 112 +-
  arch/powerpc/boot/dts/p5040ds.dts | 234 +++-
  arch/powerpc/boot/dts/t1040rdb.dts|  32 ++-
  arch/powerpc/boot/dts/t1042rdb.dts|  30 ++-
  arch/powerpc/boot/dts/t1042rdb_pi.dts |  18 +-
  arch/powerpc/boot/dts/t104xqds.dtsi   | 178 ++-
  arch/powerpc/boot/dts/t104xrdb.dtsi   |  33 ++-
  arch/powerpc/boot/dts/t2080qds.dts| 158 +-
  arch/powerpc/boot/dts/t2080rdb.dts|  67 +-
  arch/powerpc/boot/dts/t2081qds.dts| 221 ++-
  arch/powerpc/boot/dts/t4240qds.dts| 400 
 +-
  arch/powerpc/boot/dts/t4240rdb.dts| 149 -
  18 files changed, 2135 insertions(+), 20 deletions(-)


Cheers,
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)

2015-03-02 Thread Emil Medve

From: Igal Liberman igal.liber...@freescale.com

Describe the PHY topology for all configurations supported by each board

Based on prior work by Andy Fleming aflem...@gmail.com

Change-Id: I4fbcc5df9ee7c4f784afae9dab5d1e78cdc24f0f
Signed-off-by: Igal Liberman igal.liber...@freescale.com
Signed-off-by: Shruti Kanetkar kanetkar.shr...@gmail.com
Signed-off-by: Emil Medve emilian.me...@freescale.com
---
 arch/powerpc/boot/dts/b4860qds.dts|  60 -
 arch/powerpc/boot/dts/b4qds.dtsi  |  51 -
 arch/powerpc/boot/dts/p1023rdb.dts|  24 +-
 arch/powerpc/boot/dts/p2041rdb.dts|  92 +++-
 arch/powerpc/boot/dts/p3041ds.dts | 112 +-
 arch/powerpc/boot/dts/p4080ds.dts | 184 +++-
 arch/powerpc/boot/dts/p5020ds.dts | 112 +-
 arch/powerpc/boot/dts/p5040ds.dts | 234 +++-
 arch/powerpc/boot/dts/t1040rdb.dts|  32 ++-
 arch/powerpc/boot/dts/t1042rdb.dts|  30 ++-
 arch/powerpc/boot/dts/t1042rdb_pi.dts |  18 +-
 arch/powerpc/boot/dts/t104xqds.dtsi   | 178 ++-
 arch/powerpc/boot/dts/t104xrdb.dtsi   |  33 ++-
 arch/powerpc/boot/dts/t2080qds.dts| 158 +-
 arch/powerpc/boot/dts/t2080rdb.dts|  67 +-
 arch/powerpc/boot/dts/t2081qds.dts| 221 ++-
 arch/powerpc/boot/dts/t4240qds.dts| 400 +-
 arch/powerpc/boot/dts/t4240rdb.dts| 149 -
 18 files changed, 2135 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/boot/dts/b4860qds.dts 
b/arch/powerpc/boot/dts/b4860qds.dts
index 6bb3707..98b1ef4 100644
--- a/arch/powerpc/boot/dts/b4860qds.dts
+++ b/arch/powerpc/boot/dts/b4860qds.dts
@@ -1,7 +1,7 @@
 /*
  * B4860DS Device Tree Source
  *
- * Copyright 2012 Freescale Semiconductor Inc.
+ * Copyright 2012 - 2015 Freescale Semiconductor Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
@@ -39,12 +39,69 @@
model = fsl,B4860QDS;
compatible = fsl,B4860QDS;
 
+   aliases {
+   phy_sgmii_1e = phy_sgmii_1e;
+   phy_sgmii_1f = phy_sgmii_1f;
+   phy_xaui_slot1 = phy_xaui_slot1;
+   phy_xaui_slot2 = phy_xaui_slot2;
+   };
+
ifc: localbus@ffe124000 {
board-control@3,0 {
compatible = fsl,b4860qds-fpga, fsl,fpga-qixis;
};
};
 
+   soc@ffe00 {
+   fman@40 {
+   ethernet@e8000 {
+   phy-handle = phy_sgmii_1e;
+   phy-connection-type = sgmii;
+   };
+
+   ethernet@ea000 {
+   phy-handle = phy_sgmii_1f;
+   phy-connection-type = sgmii;
+   };
+
+   ethernet@f {
+   phy-handle = phy_xaui_slot1;
+   phy-connection-type = xgmii;
+   };
+
+   ethernet@f2000 {
+   phy-handle = phy_xaui_slot2;
+   phy-connection-type = xgmii;
+   };
+
+   mdio@fc000 {
+   phy_sgmii_1e: ethernet-phy@1e {
+   reg = 0x1e;
+   status = disabled;
+   };
+
+   phy_sgmii_1f: ethernet-phy@1f {
+   reg = 0x1f;
+   status = disabled;
+   };
+   };
+
+   mdio@fd000 {
+   phy_xaui_slot1: xaui-phy@slot1 {
+   compatible = 
ethernet-phy-ieee802.3-c45;
+   reg = 0x7;
+   status = disabled;
+   };
+
+   phy_xaui_slot2: xaui-phy@slot2 {
+   compatible = 
ethernet-phy-ieee802.3-c45;
+   reg = 0x6;
+   status = disabled;
+   };
+   };
+   };
+   };
+
rio: rapidio@ffe0c {
reg = 0xf 0xfe0c 0 0x11000;
 
@@ -55,7 +112,6 @@
ranges = 0 0 0xc 0x3000 0 0x1000;
};
};
-
 };
 
 /include/ fsl/b4860si-post.dtsi
diff --git a/arch/powerpc/boot/dts/b4qds.dtsi b/arch/powerpc/boot/dts/b4qds.dtsi
index 559d006..6ef6933 100644
--- a/arch/powerpc/boot/dts/b4qds.dtsi
+++ b/arch/powerpc/boot/dts/b4qds.dtsi
@@ -1,7 +1,7 @@
 /*
  * B4420DS Device Tree Source
  *
- * Copyright 2012 - 2014 Freescale Semiconductor, Inc.
+ *

Re: [PATCH] sata-fsl: Apply link speed limits

2015-03-02 Thread Tejun Heo

On Thu, Feb 19, 2015 at 03:05:47PM -0500, Martin Hicks wrote:
 
 
 The driver was ignoring limits requested by libata.force.  The output
 would look like:
 
 fsl-sata ffe18000.sata: Sata FSL Platform/CSB Driver init
 ata1: FORCE: PHY spd limit set to 1.5Gbps
 ata1: SATA max UDMA/133 irq 74
 ata1: Signature Update detected @ 0 msecs
 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 310)
 
 Signed-off-by: Martin Hicks m...@bork.org

Applied to libata/for-4.0-fixes.

Thanks.

-- 
tejun
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4] powerpc, powernv: Add OPAL platform event driver

2015-03-02 Thread Vipin K Parashar


Hi Stewart,
  Tried to fake ACPI via acpi_bus_generate_netlink_event 
and found that

it needs other files which arch specific and use x86 assembly.

Regards,
Vipin


On 02/24/2015 03:14 PM, Vipin K Parashar wrote:

Hi Stewart,
 I looked into ACPI and found details about it. But before we 
go into
discussing more details of it, would like to  share a brief about OPAL 
platform

events (EPOW/DPO) work and original design proposed.

As if now OPAL platform events work supports two events:
EPOW (Early Power Off Warning) and DPO (Delayed Power Off).

On FSP based systems FSP notifies OPAL about EPOW and DPO events via mbox
mechanism. Subsequently OPAL sends notifications for these events to 
pkvm kernel.
Original design is to have a kernel driver maintain a queue and add 
these events
to queue upon arrival. pkvm driver also provides a character device 
for host to consume
these events. A daemon is proposed for pkvm host to poll/read these 
events from
char device. This daemon would process these events and take action to 
log
and shutdown host. Apart from this it would also send these event info 
to VMs
which is handled by OSes running on VMs. Linux on VMs already has code 
in place
to handle these events as it expects this info to reach it in PAPR 
format under

EPOW (Environmental and Power Warnings) category.

EPOW mbox msgs are received for below events:
1. UPS events - UPS Battery Low, UPS Bypassed, UPS Utility Failure, 
UPS On
2. SPCN events - Configuration Change, Log SPCN Fault, Impending Power 
Failure, Power Incomplete
3. Temprature events - Over Ambient temperature, Over internal 
temperature.


Now ACPI:

Looked into ACPI and tried to figure out how ACPI userspace/kernel 
framework

can be helpful for our work.

ACPI user space consists of below components.
acpid - ACPI daemon to receive events from kernel
acpid provides events and actions files in /etc/acpi dir to configure 
actions

for various events.

acpi, acpi_listen, acpitool - Commands to query and set various ACPI 
supported parameters.
These tools work with various sysfs files to show/set various 
parameter values.


As if today acpid and other tools don't exist for POWER so would need 
to be ported.
acpid is useful for our work but other tools might not be helpful as 
they look into
various sysfs files created by various ACPI kernel drivers which we 
won't have.

Also we would need to map our EPOW/DPO events to acpid supported events
and few events link SPCN ones won't map straight away and might need 
to be

added in acpid as new events.

ACPI in kernel has various drivers for fan, battery, laptop buttons 
etc. They handle events
and uses netlink mechanism to sent out these events to userspace. Now 
looking into ACPI
code it seems that we would be reusing a small chunk of acpi code but 
instead end up adding
unnecessary complexity due to support a lot of stuff than needed by 
us. Here too mapping our
 EPOW/DPO events to ACPI defined structures in needed and we would 
need to add
new member varaibles in ACPI event structures for unmapped events like 
SPCN ones.


In nutshell it seems that by using ACPI we would end up adding lot 
more complexity with a little

gain of code reuse.

Netlink:

On technology side netlink seems to be a faster method compared to 
character driver. So that could be
a good alternative to use as a method of communication between our 
pkvm driver and userspace.
But EPOW/DPO events occur at very low rate unlike network subsystem 
which receive data packets
at a very high rate. So probably netlink could be a faster method but 
due to slow EPOW/DPO event

traffic a character driver might be sufficient.

We already have ppc64-diag package which is part of various distros so 
would be used for hosting
daemon code. Thus it takes off overhead of convincing distros for 
adding something extra.


This was my findings and opinions on alternatives. Apologies for a 
little lengthy text :-)


Let me know if i missed out anything and any suggestions that you 
would have.


Regards,
Vipin

On 02/11/2015 10:32 AM, Stewart Smith wrote:

Vipin K Parashar vi...@linux.vnet.ibm.com writes:

(1) Environmental and Power Warning (EPOW)
(2) Delayed Power Off (DPO)
The user interface for this driver is /dev/opal_event character
device file where the user space clients can poll and read for
new opal platform events. The expected sequence of events driven
from user space should be like the following.

(1) Open the character device file
(2) Poll on the file for POLLIN event
(3) When unblocked, must attempt to read 
OPAL_PLAT_EVENT_MAX_SIZE size

(4) Kernel driver will pass at most one opal_plat_event structure
(5) Poll again for more new events

A few thoughts from discussing with Michael and Joel:
- not convinced that a chardev is the most ideal way to notify
   userspace. It seems like yet-another powerpc specific notification
   mechanism, which isn't ideal.
- netlink probably isn't right

Re: [PATCHi v2] ibmveth: Add function to enable live MAC address changes

2015-03-02 Thread Thomas Falcon

On 02/28/2015 02:59 AM, Jiri Pirko wrote:
 Sat, Feb 28, 2015 at 06:56:04AM CET, tlfal...@linux.vnet.ibm.com wrote:
 Add a function that will enable changing the MAC address
 of an ibmveth interface while it is still running.

 Signed-off-by: Thomas Falcon tlfal...@linux.vnet.ibm.com
 ---
 v2:
   If h_change_logical_lan_mac fails, dev-dev_addr will not be changed.

 drivers/net/ethernet/ibm/ibmveth.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

 diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
 b/drivers/net/ethernet/ibm/ibmveth.c
 index 21978cc..b6ac676 100644
 --- a/drivers/net/ethernet/ibm/ibmveth.c
 +++ b/drivers/net/ethernet/ibm/ibmveth.c
 @@ -1327,6 +1327,29 @@ static unsigned long ibmveth_get_desired_dma(struct 
 vio_dev *vdev)
  return ret;
 }

 +static int ibmveth_set_mac_addr(struct net_device *dev, void *p)
 +{
 +struct ibmveth_adapter *adapter = netdev_priv(dev);
 +struct sockaddr *addr = p;
 +u64 mac_address;
 +int rc;
 +
 +if (!is_valid_ether_addr(addr-sa_data))
 +return -EADDRNOTAVAIL;
 +
 +mac_address = ibmveth_encode_mac_addr(addr-sa_data);
 +rc = h_change_logical_lan_mac(adapter-vdev-unit_address, mac_address);
 +if (rc) {
 +netdev_err(adapter-netdev, h_change_logical_lan_mac failed 
 +   with rc=%d\n, rc);
 Please do not wrap text in message. For that, 80-char limit does not apply.

I will send a new patch fixing this shortly.  Thanks to you, Brian, and Dave 
for reviewing this patch.



 +return rc;
 +}
 +
 +ether_addr_copy(dev-dev_addr, addr-sa_data);
 +
 +return 0;
 +}
 +
 static const struct net_device_ops ibmveth_netdev_ops = {
  .ndo_open   = ibmveth_open,
  .ndo_stop   = ibmveth_close,
 @@ -1337,7 +1360,7 @@ static const struct net_device_ops ibmveth_netdev_ops 
 = {
  .ndo_fix_features   = ibmveth_fix_features,
  .ndo_set_features   = ibmveth_set_features,
  .ndo_validate_addr  = eth_validate_addr,
 -.ndo_set_mac_address= eth_mac_addr,
 +.ndo_set_mac_address= ibmveth_set_mac_addr,
 #ifdef CONFIG_NET_POLL_CONTROLLER
  .ndo_poll_controller= ibmveth_poll_controller,
 #endif
 -- 
 1.8.3.1


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v3] ibmveth: Add function to enable live MAC address changes

2015-03-02 Thread Thomas Falcon

Add a function that will enable changing the MAC address
of an ibmveth interface while it is still running.

Signed-off-by: Thomas Falcon tlfal...@linux.vnet.ibm.com
---
v3:
   removed text wrapping in error message
v2:
   If h_change_logical_lan_mac fails, dev-dev_addr will not be changed.

 drivers/net/ethernet/ibm/ibmveth.c | 24 +++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
b/drivers/net/ethernet/ibm/ibmveth.c
index 21978cc..072426a 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -1327,6 +1327,28 @@ static unsigned long ibmveth_get_desired_dma(struct 
vio_dev *vdev)
return ret;
 }
 
+static int ibmveth_set_mac_addr(struct net_device *dev, void *p)
+{
+   struct ibmveth_adapter *adapter = netdev_priv(dev);
+   struct sockaddr *addr = p;
+   u64 mac_address;
+   int rc;
+
+   if (!is_valid_ether_addr(addr-sa_data))
+   return -EADDRNOTAVAIL;
+
+   mac_address = ibmveth_encode_mac_addr(addr-sa_data);
+   rc = h_change_logical_lan_mac(adapter-vdev-unit_address, mac_address);
+   if (rc) {
+   netdev_err(adapter-netdev, h_change_logical_lan_mac failed 
with rc=%d\n, rc);
+   return rc;
+   }
+
+   ether_addr_copy(dev-dev_addr, addr-sa_data);
+
+   return 0;
+}
+
 static const struct net_device_ops ibmveth_netdev_ops = {
.ndo_open   = ibmveth_open,
.ndo_stop   = ibmveth_close,
@@ -1337,7 +1359,7 @@ static const struct net_device_ops ibmveth_netdev_ops = {
.ndo_fix_features   = ibmveth_fix_features,
.ndo_set_features   = ibmveth_set_features,
.ndo_validate_addr  = eth_validate_addr,
-   .ndo_set_mac_address= eth_mac_addr,
+   .ndo_set_mac_address= ibmveth_set_mac_addr,
 #ifdef CONFIG_NET_POLL_CONTROLLER
.ndo_poll_controller= ibmveth_poll_controller,
 #endif
-- 
1.8.3.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset

2015-03-02 Thread Wei Yang

On Tue, Feb 24, 2015 at 11:10:33AM -0600, Bjorn Helgaas wrote:
On Tue, Feb 24, 2015 at 3:00 AM, Bjorn Helgaas bhelg...@google.com wrote:
 On Tue, Feb 24, 2015 at 02:34:57AM -0600, Bjorn Helgaas wrote:
 From: Wei Yang weiy...@linux.vnet.ibm.com

 On PowerNV platform, resource position in M64 implies the PE# the resource
 belongs to.  In some cases, adjustment of a resource is necessary to locate
 it to a correct position in M64.

 Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address
 according to an offset.

 [bhelgaas: rework loops, rework overlap check, index resource[]
 conventionally, remove pci_regs.h include, squashed with next patch]
 Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com
 Signed-off-by: Bjorn Helgaas bhelg...@google.com

 ...

 +#ifdef CONFIG_PCI_IOV
 +static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset)
 +{
 + struct pci_dn *pdn = pci_get_pdn(dev);
 + int i;
 + struct resource *res, res2;
 + resource_size_t size;
 + u16 vf_num;
 +
 + if (!dev-is_physfn)
 + return -EINVAL;
 +
 + /*
 +  * offset is in VFs.  The M64 windows are sized so that when they
 +  * are segmented, each segment is the same size as the IOV BAR.
 +  * Each segment is in a separate PE, and the high order bits of the
 +  * address are the PE number.  Therefore, each VF's BAR is in a
 +  * separate PE, and changing the IOV BAR start address changes the
 +  * range of PEs the VFs are in.
 +  */
 + vf_num = pdn-vf_pes;
 + for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
 + res = dev-resource[i + PCI_IOV_RESOURCES];
 + if (!res-flags || !res-parent)
 + continue;
 +
 + if (!pnv_pci_is_mem_pref_64(res-flags))
 + continue;
 +
 + /*
 +  * The actual IOV BAR range is determined by the start address
 +  * and the actual size for vf_num VFs BAR.  This check is to
 +  * make sure that after shifting, the range will not overlap
 +  * with another device.
 +  */
 + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
 + res2.flags = res-flags;
 + res2.start = res-start + (size * offset);
 + res2.end = res2.start + (size * vf_num) - 1;
 +
 + if (res2.end  res-end) {
 + dev_err(dev-dev, VF BAR%d: %pR would extend past 
 %pR (trying to enable %d VFs shifted by %d)\n,
 + i, res2, res, vf_num, offset);
 + return -EBUSY;
 + }
 + }
 +
 + for (i = 0; i  PCI_SRIOV_NUM_BARS; i++) {
 + res = dev-resource[i + PCI_IOV_RESOURCES];
 + if (!res-flags || !res-parent)
 + continue;
 +
 + if (!pnv_pci_is_mem_pref_64(res-flags))
 + continue;
 +
 + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
 + res2 = *res;
 + res-start += size * offset;

 I'm still not happy about this fiddling with res-start.

 Increasing res-start means that in principle, the size * offset bytes
 that we just removed from res are now available for allocation to somebody
 else.  I don't think we *will* give that space to anything else because of
 the alignment restrictions you're enforcing, but res now doesn't
 correctly describe the real resource map.

 Would you be able to just update the BAR here while leaving the struct
 resource alone?  In that case, it would look a little funny that lspci
 would show a BAR value in the middle of the region in /proc/iomem, but
 the /proc/iomem region would be more correct.

I guess this would also require a tweak where we compute the addresses
of each of the VF resources.  Today it's probably just base + VF_num
* size, where base is res-start.  We'd have to account for the
offset there if we don't adjust it here.


Oh, this is really an interesting idea.

I will do some tests to see the result.

 +
 + dev_info(dev-dev, VF BAR%d: %pR shifted to %pR (enabling 
 %d VFs shifted by %d)\n,
 +  i, res2, res, vf_num, offset);
 + pci_update_resource(dev, i + PCI_IOV_RESOURCES);
 + }
 + pdn-max_vfs -= offset;
 + return 0;
 +}
 +#endif /* CONFIG_PCI_IOV */

-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v12 14/21] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically

2015-03-02 Thread Wei Yang

On Mon, Mar 02, 2015 at 06:56:19PM +1100, Benjamin Herrenschmidt wrote:
On Mon, 2015-03-02 at 15:50 +0800, Wei Yang wrote:
 
 Is there a hotplug remove path where we should also be calling
 iommu_free_table()?
 
 When VF is not introduced, no one calls this on powernv platform.
 
 Each PCI bus is a PE and it has its own iommu table, even a device is
 hotpluged, the iommu table will not be released.

Actually, I believe Alexey patches to add support for dynamic DMA
windows for KVM guests using VFIO will also alloc/free iommu tables. In
fact his patches somewhat change quite a few things in that area, and
I'm currently reviewing them.

Yes, I see these changes before.


Wei, can you post a new series when you've finished sync'ing with
Bjorn ? At that point, I'll try to work with Alexey to evaluate the
impact of his changes on your patches.

Sure, I will do it ASAP.


Cheers,
Ben.


-- 
Richard Yang
Help you, Help me

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

38 matches

Mail list logo