[PATCH v2] seccomp: switch to using asm-generic for seccomp.h
Most architectures don't need to do anything special for the strict seccomp syscall entries. Remove the redundant headers and reduce the others. Signed-off-by: Kees Cook keesc...@chromium.org --- v2: - use Kbuild generic-y instead of explicit #include lines (sfr) --- arch/arm/include/asm/Kbuild | 1 + arch/arm/include/asm/seccomp.h | 11 --- arch/microblaze/include/asm/Kbuild | 1 + arch/microblaze/include/asm/seccomp.h | 16 arch/mips/include/asm/seccomp.h | 7 ++- arch/parisc/include/asm/Kbuild | 1 + arch/parisc/include/asm/seccomp.h | 16 arch/powerpc/include/asm/Kbuild | 1 + arch/powerpc/include/uapi/asm/Kbuild| 1 - arch/powerpc/include/uapi/asm/seccomp.h | 16 arch/s390/include/asm/Kbuild| 1 + arch/s390/include/asm/seccomp.h | 16 arch/sh/include/asm/Kbuild | 1 + arch/sh/include/asm/seccomp.h | 10 -- arch/sparc/include/asm/Kbuild | 1 + arch/sparc/include/asm/seccomp.h| 15 --- arch/x86/include/asm/seccomp.h | 21 ++--- arch/x86/include/asm/seccomp_32.h | 11 --- arch/x86/include/asm/seccomp_64.h | 17 - 19 files changed, 27 insertions(+), 137 deletions(-) delete mode 100644 arch/arm/include/asm/seccomp.h delete mode 100644 arch/microblaze/include/asm/seccomp.h delete mode 100644 arch/parisc/include/asm/seccomp.h delete mode 100644 arch/powerpc/include/uapi/asm/seccomp.h delete mode 100644 arch/s390/include/asm/seccomp.h delete mode 100644 arch/sh/include/asm/seccomp.h delete mode 100644 arch/sparc/include/asm/seccomp.h delete mode 100644 arch/x86/include/asm/seccomp_32.h delete mode 100644 arch/x86/include/asm/seccomp_64.h diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index fe74c0d1e485..d7be5a9fd171 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -22,6 +22,7 @@ generic-y += preempt.h generic-y += resource.h generic-y += rwsem.h generic-y += scatterlist.h +generic-y += seccomp.h generic-y += sections.h generic-y += segment.h generic-y += sembuf.h diff --git a/arch/arm/include/asm/seccomp.h b/arch/arm/include/asm/seccomp.h deleted file mode 100644 index 52b156b341f5.. --- a/arch/arm/include/asm/seccomp.h +++ /dev/null @@ -1,11 +0,0 @@ -#ifndef _ASM_ARM_SECCOMP_H -#define _ASM_ARM_SECCOMP_H - -#include linux/unistd.h - -#define __NR_seccomp_read __NR_read -#define __NR_seccomp_write __NR_write -#define __NR_seccomp_exit __NR_exit -#define __NR_seccomp_sigreturn __NR_rt_sigreturn - -#endif /* _ASM_ARM_SECCOMP_H */ diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild index ab564a6db5c3..877e2f610655 100644 --- a/arch/microblaze/include/asm/Kbuild +++ b/arch/microblaze/include/asm/Kbuild @@ -8,5 +8,6 @@ generic-y += irq_work.h generic-y += mcs_spinlock.h generic-y += preempt.h generic-y += scatterlist.h +generic-y += seccomp.h generic-y += syscalls.h generic-y += trace_clock.h diff --git a/arch/microblaze/include/asm/seccomp.h b/arch/microblaze/include/asm/seccomp.h deleted file mode 100644 index 0d912758a0d7.. --- a/arch/microblaze/include/asm/seccomp.h +++ /dev/null @@ -1,16 +0,0 @@ -#ifndef _ASM_MICROBLAZE_SECCOMP_H -#define _ASM_MICROBLAZE_SECCOMP_H - -#include linux/unistd.h - -#define __NR_seccomp_read __NR_read -#define __NR_seccomp_write __NR_write -#define __NR_seccomp_exit __NR_exit -#define __NR_seccomp_sigreturn __NR_sigreturn - -#define __NR_seccomp_read_32 __NR_read -#define __NR_seccomp_write_32 __NR_write -#define __NR_seccomp_exit_32 __NR_exit -#define __NR_seccomp_sigreturn_32 __NR_sigreturn - -#endif /* _ASM_MICROBLAZE_SECCOMP_H */ diff --git a/arch/mips/include/asm/seccomp.h b/arch/mips/include/asm/seccomp.h index f29c75cf83c6..1d8a2e2c75c1 100644 --- a/arch/mips/include/asm/seccomp.h +++ b/arch/mips/include/asm/seccomp.h @@ -2,11 +2,6 @@ #include linux/unistd.h -#define __NR_seccomp_read __NR_read -#define __NR_seccomp_write __NR_write -#define __NR_seccomp_exit __NR_exit -#define __NR_seccomp_sigreturn __NR_rt_sigreturn - /* * Kludge alert: * @@ -29,4 +24,6 @@ #endif /* CONFIG_MIPS32_O32 */ +#include asm-generic/seccomp.h + #endif /* __ASM_SECCOMP_H */ diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild index 8686237a3c3c..12b341d04f88 100644 --- a/arch/parisc/include/asm/Kbuild +++ b/arch/parisc/include/asm/Kbuild @@ -20,6 +20,7 @@ generic-y += param.h generic-y += percpu.h generic-y += poll.h generic-y += preempt.h +generic-y += seccomp.h generic-y += segment.h generic-y += topology.h generic-y += trace_clock.h diff --git a/arch/parisc/include/asm/seccomp.h b/arch/parisc/include/asm/seccomp.h deleted file mode 100644 index
Re: [PATCH 0/5] split ET_DYN ASLR from mmap ASLR
On Thu, 26 Feb 2015 19:07:09 -0800 Kees Cook keesc...@chromium.org wrote: This separates ET_DYN ASLR from mmap ASLR, as already done on s390. The various architectures that are already randomizing mmap (arm, arm64, mips, powerpc, s390, and x86), have their various forms of arch_mmap_rnd() made available via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures, arch_randomize_brk() is collapsed as well. This is an alternative to the solutions in: https://lkml.org/lkml/2015/2/23/442 504 Gateway Time-out Hector's original patch had very useful descriptions of the bug, why it occurred, how it was exploited it and how the patch fixes it. Your changelogs contain none of this and can be summarized as randomly churn code around for no apparent reason. Wanna try again? I guess the [0/5] and [4/5] changelogs are the ones to fix. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/5] split ET_DYN ASLR from mmap ASLR
On Mon, Mar 2, 2015 at 1:26 PM, Andrew Morton a...@linux-foundation.org wrote: On Thu, 26 Feb 2015 19:07:09 -0800 Kees Cook keesc...@chromium.org wrote: This separates ET_DYN ASLR from mmap ASLR, as already done on s390. The various architectures that are already randomizing mmap (arm, arm64, mips, powerpc, s390, and x86), have their various forms of arch_mmap_rnd() made available via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures, arch_randomize_brk() is collapsed as well. This is an alternative to the solutions in: https://lkml.org/lkml/2015/2/23/442 504 Gateway Time-out Hector's original patch had very useful descriptions of the bug, why it occurred, how it was exploited it and how the patch fixes it. Your changelogs contain none of this and can be summarized as randomly churn code around for no apparent reason. Wanna try again? I guess the [0/5] and [4/5] changelogs are the ones to fix. Ah, yes, absolutely. I will resend. -Kees -- Kees Cook Chrome OS Security ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On Mon, Mar 02, 2015 at 03:37:28PM +0100, Milan Broz wrote: If crypto API allows to encrypt more sectors in one run (handling IV internally) dmcrypt can be modified of course. But do not forget we can use another IV (not only sequential number) e.g. ESSIV with XTS as well (even if it doesn't make much sense, some people are using it). Interesting, I'd not considered using XTS with an IV other than plain/64. The talitos hardware would not support aes/xts in any mode other than plain/plain64 I don't think...Although perhaps you could push in an 8-byte IV and the hardware would interpret it as the sector #. Maybe the following question would be if the dmcrypt sector IV algorithms should moved into crypto API as well. (But because I misused dmcrypt IVs hooks for some additional operations for loopAES and old Truecrypt CBC mode, it is not so simple...) Speaking again with talitos in mind, there would be no advantage for this hardware. Although larger requests are possible only a single IV can be provided per request, so for algorithms like AES-CBC and dm-crypt 512byte IOs are the only option (short of switching to 4kB block size). mh -- Martin Hicks P.Eng.| m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration
On 03/01/2015 08:19 PM, Cyril Bur wrote: On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote: During suspend/migration operation we must wait for the VASI state reported by the hypervisor to become Suspending prior to making the ibm,suspend-me RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable that exposes the VASI state to the caller. This is unnecessary as the caller only really cares about the following three conditions; if there is an error we should bailout, success indicating we have suspended and woken back up so proceed to device tree updated, or we are not suspendable yet so try calling rtas_ibm_suspend_me again shortly. This patch removes the extraneous vasi_state variable and simply uses the return code to communicate how to proceed. We either succeed, fail, or get -EAGAIN in which case we sleep for a second before trying to call rtas_ibm_suspend_me again. Signed-off-by: Tyrel Datwyler tyr...@linux.vnet.ibm.com --- arch/powerpc/include/asm/rtas.h | 2 +- arch/powerpc/kernel/rtas.c| 15 +++ arch/powerpc/platforms/pseries/mobility.c | 8 +++- 3 files changed, 11 insertions(+), 14 deletions(-) diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h index 2e23e92..fc85eb0 100644 --- a/arch/powerpc/include/asm/rtas.h +++ b/arch/powerpc/include/asm/rtas.h @@ -327,7 +327,7 @@ extern int rtas_suspend_cpu(struct rtas_suspend_me_data *data); extern int rtas_suspend_last_cpu(struct rtas_suspend_me_data *data); extern int rtas_online_cpus_mask(cpumask_var_t cpus); extern int rtas_offline_cpus_mask(cpumask_var_t cpus); -extern int rtas_ibm_suspend_me(u64 handle, int *vasi_return); +extern int rtas_ibm_suspend_me(u64 handle); I like ditching vasi_return, I was never happy with myself for doing that! struct rtc_time; extern unsigned long rtas_get_boot_time(void); diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 21c45a2..603b928 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -897,7 +897,7 @@ int rtas_offline_cpus_mask(cpumask_var_t cpus) } EXPORT_SYMBOL(rtas_offline_cpus_mask); -int rtas_ibm_suspend_me(u64 handle, int *vasi_return) +int rtas_ibm_suspend_me(u64 handle) That definition is actually in an #ifdef CONFIG_PPC_PSERIES, you'll need to change the definition for !CONFIG_PPC_PSERIES Good catch. I'll fix it there too. { long state; long rc; @@ -919,13 +919,11 @@ int rtas_ibm_suspend_me(u64 handle, int *vasi_return) printk(KERN_ERR rtas_ibm_suspend_me: vasi_state returned %ld\n,rc); return rc; } else if (state == H_VASI_ENABLED) { -*vasi_return = RTAS_NOT_SUSPENDABLE; -return 0; +return -EAGAIN; } else if (state != H_VASI_SUSPENDING) { printk(KERN_ERR rtas_ibm_suspend_me: vasi_state returned state %ld\n, state); -*vasi_return = -1; -return 0; +return -EIO; I've had a look as to how these return values get passed back up the stack and admittedly were dealing with a confusing mess, I've compared back to before my patch (which wasn't perfect either it seems). Both the state == H_VASI_ENABLED and state == H_VASI_SUSPENDING cause ppc_rtas to go to the copy_return and return 0 (albeit with an error code in args.rets[0]), because rtas_ppc goes back to out userland, I hesitate to change any of that. Agreed, that this is a bit of a mess. The problem is we have two call paths into rtas_ibm_suspend_me(). The one from migrate_store() and one from ppc_rtas(). I'll address each with your other comments below. } if (!alloc_cpumask_var(offline_mask, GFP_TEMPORARY)) @@ -1060,9 +1058,10 @@ asmlinkage int ppc_rtas(struct rtas_args __user *uargs) int vasi_rc = 0; This generates unused variable warning. Sloppy on my part. Will remove. u64 handle = ((u64)be32_to_cpu(args.args[0]) 32) | be32_to_cpu(args.args[1]); -rc = rtas_ibm_suspend_me(handle, vasi_rc); -args.rets[0] = cpu_to_be32(vasi_rc); -if (rc) +rc = rtas_ibm_suspend_me(handle); +if (rc == -EAGAIN) +args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE); (continuing on...) so perhaps here have rc = 0; else if (rc == -EIO) args.rets[0] = cpu_to_be32(-1); rc = 0; Which should keep the original behaviour, the last thing we want to do is break BE. The biggest problem here is we are making what basically equates to a fake rtas call from drmgr which we intercept in ppc_rtas(). From there we make this special call to rtas_ibm_suspend_me() to check VASI state and do a bunch of other specialized work that needs to be setup prior to making the actual ibm,suspend-me rtas call. Since, we are
Re: [PATCH 2/3] powerpc/pseries: Little endian fixes for post mobility device tree update
On 03/01/2015 09:20 PM, Cyril Bur wrote: On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote: We currently use the device tree update code in the kernel after resuming from a suspend operation to re-sync the kernels view of the device tree with that of the hypervisor. The code as it stands is not endian safe as it relies on parsing buffers returned by RTAS calls that thusly contains data in big endian format. This patch annotates variables and structure members with __be types as well as performing necessary byte swaps to cpu endian for data that needs to be parsed. Signed-off-by: Tyrel Datwyler tyr...@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/mobility.c | 36 --- 1 file changed, 19 insertions(+), 17 deletions(-) diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 29e4f04..0b1f70e 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -25,10 +25,10 @@ static struct kobject *mobility_kobj; struct update_props_workarea { -u32 phandle; -u32 state; -u64 reserved; -u32 nprops; +__be32 phandle; +__be32 state; +__be64 reserved; +__be32 nprops; } __packed; #define NODE_ACTION_MASK0xff00 @@ -127,7 +127,7 @@ static int update_dt_property(struct device_node *dn, struct property **prop, return 0; } -static int update_dt_node(u32 phandle, s32 scope) +static int update_dt_node(__be32 phandle, s32 scope) { On line 153 of this function: dn = of_find_node_by_phandle(phandle); You're passing a __be32 to device tree code, if we can treat the phandle as a opaque value returned to us from the rtas call and pass it around like that then all good. Yes, of_find_node_by_phandle directly compares phandle passed in against the handle stored in each device_node when searching for a matching node. Since, the device tree is big endian it follows that the big endian phandle received in the rtas buffer needs no conversion. Further, we need to pass the phandle to ibm,update-properties in the work area which is also required to be big endian. So, again it seemed that converting to cpu endian was a waste of effort just to convert it back to big endian. Its also hard to be sure if these need to be BE and have always been that way because we've always run BE so they've never actually wanted CPU endian its just that CPU endian has always been BE (I think I started rambling...) Just want to check that *not* converting them is done on purpose. Yes, I explicitly did not convert them on purpose. As mentioned above we need phandle in BE for the ibm,update-properties rtas work area. Similarly, drc_index needs to be in BE for the ibm,configure-connector rtas work area. Outside, of that we do no other manipulation of those values. And having read on, I'm assuming the answer is yes since this observation is true for your changes which affect: delete_dt_node() update_dt_node() add_dt_node() Worth noting that you didn't change the definition of delete_dt_node() You are correct. Oversight. I will fix that as it should generate a sparse complaint. -Tyrel I'll have a look once you address the non compiling in patch 1/3 (I'm getting blocked the unused var because somehow Werror is on, odd it didn't trip you up) but I also suspect this will have sparse go a bit nuts. I wonder if there is a nice way of shutting sparse up. struct update_props_workarea *upwa; struct device_node *dn; @@ -136,6 +136,7 @@ static int update_dt_node(u32 phandle, s32 scope) char *prop_data; char *rtas_buf; int update_properties_token; +u32 nprops; u32 vd; update_properties_token = rtas_token(ibm,update-properties); @@ -162,6 +163,7 @@ static int update_dt_node(u32 phandle, s32 scope) break; prop_data = rtas_buf + sizeof(*upwa); +nprops = be32_to_cpu(upwa-nprops); /* On the first call to ibm,update-properties for a node the * the first property value descriptor contains an empty @@ -170,17 +172,17 @@ static int update_dt_node(u32 phandle, s32 scope) */ if (*prop_data == 0) { prop_data++; -vd = *(u32 *)prop_data; +vd = be32_to_cpu(*(__be32 *)prop_data); prop_data += vd + sizeof(vd); -upwa-nprops--; +nprops--; } -for (i = 0; i upwa-nprops; i++) { +for (i = 0; i nprops; i++) { char *prop_name; prop_name = prop_data; prop_data += strlen(prop_name) + 1; -vd = *(u32 *)prop_data; +vd = be32_to_cpu(*(__be32 *)prop_data);
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On Mon, Mar 02, 2015 at 04:44:19PM -0500, Martin Hicks wrote: Write (MB/s)Read (MB/s) Unencrypted 140 176 aes-xts-plain64 512b 113 115 aes-xts-plain64 4kB 71 56 I got the two AES lines backwards. Sorry about that. mh -- Martin Hicks P.Eng.| m...@bork.org Bork Consulting Inc. | +1 (613) 266-2296 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On Mon, Mar 02, 2015 at 03:25:56PM +0200, Horia Geantă wrote: On 2/20/2015 7:00 PM, Martin Hicks wrote: This adds the AES-XTS mode, supported by the Freescale SEC 3.3.2. One of the nice things about this hardware is that it knows how to deal with encrypt/decrypt requests that are larger than sector size, but that also requires that that the sector size be passed into the crypto engine as an XTS cipher context parameter. When a request is larger than the sector size the sector number is incremented by the talitos engine and the tweak key is re-calculated for the new sector. I've tested this with 256bit and 512bit keys (tweak and data keys of 128bit and 256bit) to ensure interoperability with the software AES-XTS implementation. All testing was done using dm-crypt/LUKS with aes-xts-plain64. Is there a better solution that just hard coding the sector size to (1SECTOR_SHIFT)? Maybe dm-crypt should be modified to pass the sector size along with the plain/plain64 IV to an XTS algorithm? AFAICT, SW implementation of xts mode in kernel (crypto/xts.c) is not aware of a sector size (data unit size in IEEE P1619 terminology): There's a hidden assumption that all the data send to xts in one request belongs to a single sector. Even more, it's supposed that the first 16-byte block in the request is block 0 in the sector. These can be seen from the way the tweak (T) value is computed. (Side note: there's no support of ciphertext stealing in crypto/xts.c - i.e. sector sizes must be a multiple of underlying block cipher size - that is 16B.) If dm-crypt would be modified to pass sector size somehow, all in-kernel xts implementations would have to be made aware of the change. I have nothing against this, but let's see what crypto maintainers have to say... Right. Additionally, there may be some requirement for the encryption implementation to broadcast the maximum size that can be handled in a single request. For example Talitos could handle XTS encrypt/decrypt requests of up to 64kB (regardless of the block device's sector size). BTW, there were some discussions back in 2013 wrt. being able to configure / increase sector size, smth. crypto engines would benefit from: http://www.saout.de/pipermail/dm-crypt/2013-January/003125.html (experimental patch) http://www.saout.de/pipermail/dm-crypt/2013-March/003202.html The experimental patch sends sector size as the req-nbytes - hidden assumption: data size sent in an xts crypto request equals a sector. I found this last week, and used it as a starting point for some testing. I modified it to keep the underlying sector size of the dm-crypt mapping as 512byte, but allowed the code to combine requests in IOs up to 4kB. Doing greater request sizes would require allocating additional pages...I plan to implement that to see how much extra performance can be squeezed out. patch below... With regards to performance, with my low-powered Freescale P1022 board, I see performance numbers like this on ext4, as measured by bonnie++. Write (MB/s)Read (MB/s) Unencrypted 140 176 aes-xts-plain64 512b113 115 aes-xts-plain64 4kB 71 56 The more detailed bonnie++ output is here: http://www.bork.org/~mort/dm-crypt-enc-blksize.html The larger IO sizes is a huge win for this board. The patch I'm using to send IOs up to 4kB to talitos follows. Thanks, mh diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 08981be..88e95b5 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -42,6 +42,7 @@ struct convert_context { struct bvec_iter iter_out; sector_t cc_sector; atomic_t cc_pending; + unsigned int block_size; struct ablkcipher_request *req; }; @@ -142,6 +143,8 @@ struct crypt_config { sector_t iv_offset; unsigned int iv_size; + unsigned int block_size; + /* ESSIV: struct crypto_cipher *essiv_tfm */ void *iv_private; struct crypto_ablkcipher **tfms; @@ -801,10 +804,17 @@ static void crypt_convert_init(struct crypt_config *cc, { ctx-bio_in = bio_in; ctx-bio_out = bio_out; - if (bio_in) + ctx-block_size = 0; + if (bio_in) { ctx-iter_in = bio_in-bi_iter; - if (bio_out) + ctx-block_size = max(ctx-block_size, bio_cur_bytes(bio_in)); + } + if (bio_out) { ctx-iter_out = bio_out-bi_iter; + ctx-block_size = max(ctx-block_size, bio_cur_bytes(bio_out)); + } + if (ctx-block_size cc-block_size) + ctx-block_size = cc-block_size; ctx-cc_sector = sector + cc-iv_offset; init_completion(ctx-restart); } @@ -844,15 +854,15 @@ static int crypt_convert_block(struct crypt_config *cc, dmreq-iv_sector = ctx-cc_sector; dmreq-ctx = ctx; sg_init_table(dmreq-sg_in, 1); -
[PATCH 4/5] mm: split ET_DYN ASLR from mmap ASLR
This fixes the offset2lib weakness in ASLR for arm, arm64, mips, powerpc, and x86. The problem is that if there is a leak of ASLR from the executable (ET_DYN), it means a leak of shared library offset as well (mmap), and vice versa. Further details and a PoC of this attack are available here: http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html With this patch, a PIE linked executable (ET_DYN) has its own ASLR region: $ ./show_mmaps_pie 54859ccd6000-54859ccd7000 r-xp ... /tmp/show_mmaps_pie 54859ced6000-54859ced7000 r--p ... /tmp/show_mmaps_pie 54859ced7000-54859ced8000 rw-p ... /tmp/show_mmaps_pie 7f75be764000-7f75be91f000 r-xp ... /lib/x86_64-linux-gnu/libc.so.6 7f75be91f000-7f75beb1f000 ---p ... /lib/x86_64-linux-gnu/libc.so.6 7f75beb1f000-7f75beb23000 r--p ... /lib/x86_64-linux-gnu/libc.so.6 7f75beb23000-7f75beb25000 rw-p ... /lib/x86_64-linux-gnu/libc.so.6 7f75beb25000-7f75beb2a000 rw-p ... 7f75beb2a000-7f75beb4d000 r-xp ... /lib64/ld-linux-x86-64.so.2 7f75bed45000-7f75bed46000 rw-p ... 7f75bed46000-7f75bed47000 r-xp ... 7f75bed47000-7f75bed4c000 rw-p ... 7f75bed4c000-7f75bed4d000 r--p ... /lib64/ld-linux-x86-64.so.2 7f75bed4d000-7f75bed4e000 rw-p ... /lib64/ld-linux-x86-64.so.2 7f75bed4e000-7f75bed4f000 rw-p ... 7fffb3741000-7fffb3762000 rw-p ... [stack] 7fffb377b000-7fffb377d000 r--p ... [vvar] 7fffb377d000-7fffb377f000 r-xp ... [vdso] The change is to add a call the newly created arch_mmap_rnd() into the ELF loader for handling ET_DYN ASLR in a separate region from mmap ASLR, as already done on s390. Removes CONFIG_BINFMT_ELF_RANDOMIZE_PIE, which is no longer needed. Reported-by: Hector Marco-Gisbert hecma...@upv.es Signed-off-by: Kees Cook keesc...@chromium.org --- arch/arm/Kconfig| 1 - arch/arm64/Kconfig | 1 - arch/mips/Kconfig | 1 - arch/powerpc/Kconfig| 1 - arch/s390/include/asm/elf.h | 4 ++-- arch/x86/Kconfig| 1 - fs/Kconfig.binfmt | 3 --- fs/binfmt_elf.c | 17 ++--- 8 files changed, 4 insertions(+), 25 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 248d99cabaa8..e2f0ef9c6ee3 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1,7 +1,6 @@ config ARM bool default y - select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 5f469095e0e2..07e0fc7adc88 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1,6 +1,5 @@ config ARM64 def_bool y - select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 72ce5cece768..557c5f1772c1 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -23,7 +23,6 @@ config MIPS select HAVE_KRETPROBES select HAVE_DEBUG_KMEMLEAK select HAVE_SYSCALL_TRACEPOINTS - select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ELF_RANDOMIZE select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES 64BIT select RTC_LIB if !MACH_LOONGSON diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 14fe1c411489..910fa4f9ad1e 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -88,7 +88,6 @@ config PPC select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO select BINFMT_ELF - select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ELF_RANDOMIZE select OF select OF_EARLY_FLATTREE diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h index 9ed68e7ee856..617f7fabdb0a 100644 --- a/arch/s390/include/asm/elf.h +++ b/arch/s390/include/asm/elf.h @@ -163,9 +163,9 @@ extern unsigned int vdso_enabled; the loader. We need to make sure that it is out of the way of the program that it will exec, and that there is sufficient room for the brk. 64-bit tasks are aligned to 4GB. */ -#define ELF_ET_DYN_BASE (arch_mmap_rnd() + (is_32bit_task() ? \ +#define ELF_ET_DYN_BASE(is_32bit_task() ? \ (STACK_TOP / 3 * 2) : \ - (STACK_TOP / 3 * 2) ~((1UL 32) - 1))) + (STACK_TOP / 3 * 2) ~((1UL 32) - 1)) /* This yields a mask that user programs can use to figure out what instruction set this CPU supports. */ diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 9aa91727fbf8..328be0fab910 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -87,7 +87,6 @@ config X86 select HAVE_ARCH_KMEMCHECK select HAVE_ARCH_KASAN if X86_64 SPARSEMEM_VMEMMAP select HAVE_USER_RETURN_NOTIFIER - select ARCH_BINFMT_ELF_RANDOMIZE_PIE select
[PATCH 3/5] mm: move randomize_et_dyn into ELF_ET_DYN_BASE
In preparation for moving ET_DYN randomization into the ELF loader (which requires a static ELF_ET_DYN_BASE), this redefines s390's existing ET_DYN randomization away from a separate function (randomize_et_dyn) and into ELF_ET_DYN_BASE and a call to arch_mmap_rnd(). This refactoring results in the same ET_DYN randomization on s390. Additionally removes a copy/pasted unused arm64 extern. Signed-off-by: Kees Cook keesc...@chromium.org --- arch/arm64/include/asm/elf.h | 1 - arch/s390/include/asm/elf.h | 9 + arch/s390/mm/mmap.c | 11 --- 3 files changed, 5 insertions(+), 16 deletions(-) diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index 1f65be393139..f724db00b235 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -125,7 +125,6 @@ typedef struct user_fpsimd_state elf_fpregset_t; * the loader. We need to make sure that it is out of the way of the program * that it will exec, and that there is sufficient room for the brk. */ -extern unsigned long randomize_et_dyn(unsigned long base); #define ELF_ET_DYN_BASE(2 * TASK_SIZE_64 / 3) /* diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h index c9df40b5c0ac..9ed68e7ee856 100644 --- a/arch/s390/include/asm/elf.h +++ b/arch/s390/include/asm/elf.h @@ -161,10 +161,11 @@ extern unsigned int vdso_enabled; /* This is the location that an ET_DYN program is loaded if exec'ed. Typical use of this is to invoke ./ld.so someprog to test out a new version of the loader. We need to make sure that it is out of the way of the program - that it will exec, and that there is sufficient room for the brk. */ - -extern unsigned long randomize_et_dyn(void); -#define ELF_ET_DYN_BASErandomize_et_dyn() + that it will exec, and that there is sufficient room for the brk. 64-bit + tasks are aligned to 4GB. */ +#define ELF_ET_DYN_BASE (arch_mmap_rnd() + (is_32bit_task() ? \ + (STACK_TOP / 3 * 2) : \ + (STACK_TOP / 3 * 2) ~((1UL 32) - 1))) /* This yields a mask that user programs can use to figure out what instruction set this CPU supports. */ diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c index 77759e35671b..ec4c20448aef 100644 --- a/arch/s390/mm/mmap.c +++ b/arch/s390/mm/mmap.c @@ -179,17 +179,6 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, return addr; } -unsigned long randomize_et_dyn(void) -{ - unsigned long base; - - base = STACK_TOP / 3 * 2; - if (!is_32bit_task()) - /* Align to 4GB */ - base = ~((1UL 32) - 1); - return base + arch_mmap_rnd(); -} - #ifndef CONFIG_64BIT /* -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/5] arm: factor out mmap ASLR into mmap_rnd
In preparation for exporting per-arch mmap randomization functions, this moves the ASLR calculations for mmap on ARM into a separate routine. Signed-off-by: Kees Cook keesc...@chromium.org --- arch/arm/mm/mmap.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c index 5e85ed371364..0f8bc158f2c6 100644 --- a/arch/arm/mm/mmap.c +++ b/arch/arm/mm/mmap.c @@ -169,14 +169,21 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, return addr; } -void arch_pick_mmap_layout(struct mm_struct *mm) +static unsigned long mmap_rnd(void) { - unsigned long random_factor = 0UL; + unsigned long rnd = 0UL; /* 8 bits of randomness in 20 address space bits */ if ((current-flags PF_RANDOMIZE) !(current-personality ADDR_NO_RANDOMIZE)) - random_factor = (get_random_int() % (1 8)) PAGE_SHIFT; + rnd = (get_random_int() % (1 8)) PAGE_SHIFT; + + return rnd; +} + +void arch_pick_mmap_layout(struct mm_struct *mm) +{ + unsigned long random_factor = mmap_rnd(); if (mmap_is_legacy()) { mm-mmap_base = TASK_UNMAPPED_BASE + random_factor; -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v2 0/5] split ET_DYN ASLR from mmap ASLR
To address the offset2lib ASLR weakness[1], this separates ET_DYN ASLR from mmap ASLR, as already done on s390. The architectures that are already randomizing mmap (arm, arm64, mips, powerpc, s390, and x86), have their various forms of arch_mmap_rnd() made available via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures, arch_randomize_brk() is collapsed as well. This is an alternative to the solutions in: https://lkml.org/lkml/2015/2/23/442 Thanks! -Kees [1] http://cybersecurity.upv.es/attacks/offset2lib/offset2lib.html --- v2: - verbosified the commit logs, especially 4/5 (akpm) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/5] mm: expose arch_mmap_rnd when available
When an architecture fully supports randomizing the ELF load location, a per-arch mmap_rnd() function is used to finding a randomized mmap base. In preparation for randomizing the location of ET_DYN binaries separately from mmap, this renames and exports these functions as arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE for describing this feature on architectures that support it (which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390 already does this witout the ARCH_BINFMT_ELF_RANDOMIZE_PIE logic). Signed-off-by: Kees Cook keesc...@chromium.org --- arch/Kconfig | 7 +++ arch/arm/Kconfig | 1 + arch/arm/mm/mmap.c| 4 ++-- arch/arm64/Kconfig| 1 + arch/arm64/mm/mmap.c | 4 ++-- arch/mips/Kconfig | 1 + arch/mips/mm/mmap.c | 9 ++--- arch/powerpc/Kconfig | 1 + arch/powerpc/mm/mmap.c| 4 ++-- arch/s390/Kconfig | 1 + arch/s390/mm/mmap.c | 8 arch/x86/Kconfig | 1 + arch/x86/mm/mmap.c| 6 +++--- fs/binfmt_elf.c | 1 + include/linux/elf-randomize.h | 10 ++ 15 files changed, 43 insertions(+), 16 deletions(-) create mode 100644 include/linux/elf-randomize.h diff --git a/arch/Kconfig b/arch/Kconfig index 05d7a8a458d5..e315cc79ebe7 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -484,6 +484,13 @@ config HAVE_IRQ_EXIT_ON_IRQ_STACK This spares a stack switch and improves cache usage on softirq processing. +config ARCH_HAS_ELF_RANDOMIZE + bool + help + An architecture supports choosing randomized locations for + stack, mmap, brk, and ET_DYN. Defined functions: + - arch_mmap_rnd(), must respect (current-flags PF_RANDOMIZE) + # # ABI hall of shame # diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 9f1f09a2bc9b..248d99cabaa8 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -3,6 +3,7 @@ config ARM default y select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE + select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAVE_CUSTOM_GPIO_H select ARCH_HAS_GCOV_PROFILE_ALL diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c index 0f8bc158f2c6..3c1fedb034bb 100644 --- a/arch/arm/mm/mmap.c +++ b/arch/arm/mm/mmap.c @@ -169,7 +169,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0, return addr; } -static unsigned long mmap_rnd(void) +unsigned long arch_mmap_rnd(void) { unsigned long rnd = 0UL; @@ -183,7 +183,7 @@ static unsigned long mmap_rnd(void) void arch_pick_mmap_layout(struct mm_struct *mm) { - unsigned long random_factor = mmap_rnd(); + unsigned long random_factor = arch_mmap_rnd(); if (mmap_is_legacy()) { mm-mmap_base = TASK_UNMAPPED_BASE + random_factor; diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 1b8e97331ffb..5f469095e0e2 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2,6 +2,7 @@ config ARM64 def_bool y select ARCH_BINFMT_ELF_RANDOMIZE_PIE select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE + select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_SG_CHAIN select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c index 54922d1275b8..b7117cb4bc07 100644 --- a/arch/arm64/mm/mmap.c +++ b/arch/arm64/mm/mmap.c @@ -47,7 +47,7 @@ static int mmap_is_legacy(void) return sysctl_legacy_va_layout; } -static unsigned long mmap_rnd(void) +unsigned long arch_mmap_rnd(void) { unsigned long rnd = 0; @@ -66,7 +66,7 @@ static unsigned long mmap_base(void) else if (gap MAX_GAP) gap = MAX_GAP; - return PAGE_ALIGN(STACK_TOP - gap - mmap_rnd()); + return PAGE_ALIGN(STACK_TOP - gap - arch_mmap_rnd()); } /* diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index c7a16904cd03..72ce5cece768 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -24,6 +24,7 @@ config MIPS select HAVE_DEBUG_KMEMLEAK select HAVE_SYSCALL_TRACEPOINTS select ARCH_BINFMT_ELF_RANDOMIZE_PIE + select ARCH_HAS_ELF_RANDOMIZE select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES 64BIT select RTC_LIB if !MACH_LOONGSON select GENERIC_ATOMIC64 if !64BIT diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c index f1baadd56e82..d32490d99671 100644 --- a/arch/mips/mm/mmap.c +++ b/arch/mips/mm/mmap.c @@ -164,9 +164,12 @@ void arch_pick_mmap_layout(struct mm_struct *mm) } } -static inline unsigned long brk_rnd(void) +unsigned long arch_mmap_rnd(void) { - unsigned long rnd = get_random_int(); + unsigned long rnd = 0; + + if
[PATCH 5/5] mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
The arch_randomize_brk() function is used on several architectures, even those that don't support ET_DYN ASLR. To avoid bulky extern/#define tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for the architectures that support it, while still handling CONFIG_COMPAT_BRK. Signed-off-by: Kees Cook keesc...@chromium.org --- arch/Kconfig | 1 + arch/arm/include/asm/elf.h | 4 arch/arm64/include/asm/elf.h | 4 arch/mips/include/asm/elf.h| 4 arch/powerpc/include/asm/elf.h | 4 arch/s390/include/asm/elf.h| 3 --- arch/x86/include/asm/elf.h | 3 --- fs/binfmt_elf.c| 4 +--- include/linux/elf-randomize.h | 12 9 files changed, 14 insertions(+), 25 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index e315cc79ebe7..1c7e98f137db 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -490,6 +490,7 @@ config ARCH_HAS_ELF_RANDOMIZE An architecture supports choosing randomized locations for stack, mmap, brk, and ET_DYN. Defined functions: - arch_mmap_rnd(), must respect (current-flags PF_RANDOMIZE) + - arch_randomize_brk() # # ABI hall of shame diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h index afb9cafd3786..c1ff8ab12914 100644 --- a/arch/arm/include/asm/elf.h +++ b/arch/arm/include/asm/elf.h @@ -125,10 +125,6 @@ int dump_task_regs(struct task_struct *t, elf_gregset_t *elfregs); extern void elf_set_personality(const struct elf32_hdr *); #define SET_PERSONALITY(ex)elf_set_personality((ex)) -struct mm_struct; -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - #ifdef CONFIG_MMU #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1 struct linux_binprm; diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h index f724db00b235..faad6df49e5b 100644 --- a/arch/arm64/include/asm/elf.h +++ b/arch/arm64/include/asm/elf.h @@ -156,10 +156,6 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm, #define STACK_RND_MASK (0x3 (PAGE_SHIFT - 12)) #endif -struct mm_struct; -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - #ifdef CONFIG_COMPAT #ifdef __AARCH64EB__ diff --git a/arch/mips/include/asm/elf.h b/arch/mips/include/asm/elf.h index 535f196ffe02..31d747d46a23 100644 --- a/arch/mips/include/asm/elf.h +++ b/arch/mips/include/asm/elf.h @@ -410,10 +410,6 @@ struct linux_binprm; extern int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp); -struct mm_struct; -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - struct arch_elf_state { int fp_abi; int interp_fp_abi; diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h index 57d289acb803..ee46ffef608e 100644 --- a/arch/powerpc/include/asm/elf.h +++ b/arch/powerpc/include/asm/elf.h @@ -128,10 +128,6 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm, (0x7ff (PAGE_SHIFT - 12)) : \ (0x3 (PAGE_SHIFT - 12))) -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - - #ifdef CONFIG_SPU_BASE /* Notes used in ET_CORE. Note name is SPU/fd/filename. */ #define NT_SPU 1 diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h index 617f7fabdb0a..7cc271003ff6 100644 --- a/arch/s390/include/asm/elf.h +++ b/arch/s390/include/asm/elf.h @@ -226,9 +226,6 @@ struct linux_binprm; #define ARCH_HAS_SETUP_ADDITIONAL_PAGES 1 int arch_setup_additional_pages(struct linux_binprm *, int); -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - void *fill_cpu_elf_notes(void *ptr, struct save_area *sa, __vector128 *vxrs); #endif diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h index ca3347a9dab5..bbdace22daf8 100644 --- a/arch/x86/include/asm/elf.h +++ b/arch/x86/include/asm/elf.h @@ -338,9 +338,6 @@ extern int compat_arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp); #define compat_arch_setup_additional_pages compat_arch_setup_additional_pages -extern unsigned long arch_randomize_brk(struct mm_struct *mm); -#define arch_randomize_brk arch_randomize_brk - /* * True on X86_32 or when emulating IA32 on X86_64 */ diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 203c2e6f9a25..96459c18d1eb 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1041,15 +1041,13 @@ static int load_elf_binary(struct linux_binprm *bprm) current-mm-end_data = end_data; current-mm-start_stack = bprm-p; -#ifdef arch_randomize_brk if ((current-flags PF_RANDOMIZE) (randomize_va_space 1)) {
[PATCH 3/4 RFC] fsl/msi: Add MSI bank allocation for kernel owned devices
With this patch a context can allocate a MSI bank and use the allocated MSI-bank for the devices in that context. kernel/host context is NULL, So all devices owned by kernel will share a MSI bank allocated with context = NULL. This patch is in direction to have separate MSI bank for kernel context and userspace/VM context. We do not want two software context (kernel and VMs) to share a MSI bank for safe/reliable interrupts with full isolation. Follow up patch will add interface to allocate a MSI bank for userspace/VM context. NOTE: This RFC patch allows only one MSI bank to be allocated for kernel context. Which seems to be sufficient to me. But if we see this is limiting some real usecase scanerio then this limitation can be removed One issue which still need to addressed is when to free kernel context allocated MSI bank? Say all MSI capable devices are assigned to VM/userspace then there is no need to have any MSI bank reserved for kernel context. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/sysdev/fsl_msi.c | 88 ++- arch/powerpc/sysdev/fsl_msi.h | 4 ++ 2 files changed, 83 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c index 32ba1e3..027aeeb 100644 --- a/arch/powerpc/sysdev/fsl_msi.c +++ b/arch/powerpc/sysdev/fsl_msi.c @@ -142,6 +142,79 @@ static void fsl_teardown_msi_irqs(struct pci_dev *pdev) return; } +/* + * Allocate a MSI Bank for the requested context. + * NULL context means that this request is to allocate + * MSI bank for kernel owned devices. And currently we + * assume that one MSI bank is sufficient for kernel. + */ +static struct fsl_msi *fsl_msi_allocate_msi_bank(void *context) +{ + struct fsl_msi *msi_data; + + /* Kernel context (NULL) can reserve only one msi bank */ + if (!context) { + list_for_each_entry(msi_data, msi_head, list) { + if ((msi_data-reserved == MSI_RESERVED) + (msi_data-context == NULL)) + return NULL; + } + } + + list_for_each_entry(msi_data, msi_head, list) { + if (msi_data-reserved == MSI_FREE) { + msi_data-reserved = MSI_RESERVED; + msi_data-context = context; + return msi_data; + } + } + + return NULL; +} + +/* FIXME: Assumption that host kernel will allocate only one MSI bank */ + __attribute__ ((unused)) static int fsl_msi_free_msi_bank(void *context) +{ + struct fsl_msi *msi_data; + + list_for_each_entry(msi_data, msi_head, list) { + if ((msi_data-reserved == MSI_RESERVED) +(msi_data-context == context)) { + msi_data-reserved = MSI_FREE; + msi_data-context = NULL; + return 0; + } + } + return -ENODEV; +} + +/* This API returns the allocated MSI bank of context + * to which pdev device belongs. + * All kernel owned devices have NULL context. All devices + * in same context will share the allocated MSI bank. + * + * Note: If no MSI bank allocated to kernel context then + * we allocate a MSI bank here. + */ +static struct fsl_msi *fsl_msi_get_reserved_msi_bank(struct pci_dev *pdev) +{ + struct fsl_msi *msi_data = NULL; + void *context = NULL; + + list_for_each_entry(msi_data, msi_head, list) { + if ((msi_data-reserved == MSI_RESERVED) + (msi_data-context == context)) + return msi_data; + } + + /* If no MSI bank allocated for kernel owned device, allocate one */ + msi_data = fsl_msi_allocate_msi_bank(NULL); + if (msi_data) + return msi_data; + + return NULL; +} + static void fsl_compose_msi_msg(struct pci_dev *pdev, int hwirq, struct msi_msg *msg, struct fsl_msi *fsl_msi_data) @@ -174,7 +247,7 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) struct pci_controller *hose = pci_bus_to_host(pdev-bus); struct device_node *np; phandle phandle = 0; - int rc, hwirq = -ENOMEM; + int rc = -ENODEV, hwirq = -ENOMEM; unsigned int virq; struct msi_desc *entry; struct msi_msg msg; @@ -231,15 +304,12 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) if (specific_msi_bank) { hwirq = msi_bitmap_alloc_hwirqs(msi_data-bitmap, 1); } else { - /* -* Loop over all the MSI devices until we find one that has an -* available interrupt. -*/ - list_for_each_entry(msi_data, msi_head, list) { -
Re: [PATCH v2 0/5] split ET_DYN ASLR from mmap ASLR
* Kees Cook keesc...@chromium.org wrote: To address the offset2lib ASLR weakness[1], this separates ET_DYN ASLR from mmap ASLR, as already done on s390. The architectures that are already randomizing mmap (arm, arm64, mips, powerpc, s390, and x86), have their various forms of arch_mmap_rnd() made available via the new CONFIG_ARCH_HAS_ELF_RANDOMIZE. For these architectures, arch_randomize_brk() is collapsed as well. This is an alternative to the solutions in: https://lkml.org/lkml/2015/2/23/442 Looks good so far: Reviewed-by: Ingo Molnar mi...@kernel.org While reviewing this series I also noticed that the following code could be factored out from architecture mmap code as well: - arch_pick_mmap_layout() uses very similar patterns across the platforms, with only few variations. Many architectures use the same duplicated mmap_is_legacy() helper as well. There's usually just trivial differences between mmap_legacy_base() approaches as well. - arch_mmap_rnd(): the PF_RANDOMIZE checks are needlessly exposed to the arch routine - the arch routine should only concentrate on arch details, not generic flags like PF_RANDOMIZE. In theory the mmap layout could be fully parametrized as well: i.e. no callback functions to architectures by default at all: just declarations of bits of randomization desired (or, available address space bits), and perhaps an arch helper to allow 32-bit vs. 64-bit address space distinctions. 'Weird' architectures could provide special routines, but only by overriding the default behavior, which should be generic, safe and robust. Thanks, Ingo ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 4/4 RFC] fsl/msi: Add interface to reserve/free msi bank
This patch allows a context (different from kernel context) to reserve a MSI bank for itself. And then the devices in the context will share the MSI bank. VFIO meta driver is one of typical user of these APIs. It will reserve a MSI bank for MSI interrupt support of direct assignment PCI devices to a Guest. Patches for same will follow this patch. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/device.h | 2 + arch/powerpc/include/asm/fsl_msi.h | 26 ++ arch/powerpc/sysdev/fsl_msi.c | 169 +++-- arch/powerpc/sysdev/fsl_msi.h | 1 + 4 files changed, 173 insertions(+), 25 deletions(-) create mode 100644 arch/powerpc/include/asm/fsl_msi.h diff --git a/arch/powerpc/include/asm/device.h b/arch/powerpc/include/asm/device.h index 38faede..1c2bfd7 100644 --- a/arch/powerpc/include/asm/device.h +++ b/arch/powerpc/include/asm/device.h @@ -40,6 +40,8 @@ struct dev_archdata { #ifdef CONFIG_FAIL_IOMMU int fail_iommu; #endif + + void *context; }; struct pdev_archdata { diff --git a/arch/powerpc/include/asm/fsl_msi.h b/arch/powerpc/include/asm/fsl_msi.h new file mode 100644 index 000..e9041c2 --- /dev/null +++ b/arch/powerpc/include/asm/fsl_msi.h @@ -0,0 +1,26 @@ +/* + * Copyright (C) 2014 Freescale Semiconductor, Inc. All rights reserved. + * + * Author: Bharat Bhushan bharat.bhus...@freescale.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; version 2 of the + * License. + * + */ + +#ifndef _POWERPC_FSL_MSI_H +#define _POWERPC_FSL_MSI_H + +extern int fsl_msi_set_msi_bank_region(struct iommu_domain *domain, + void *context, int win, + dma_addr_t iova, int prot); +extern int fsl_msi_clear_msi_bank_region(struct iommu_domain *domain, +struct iommu_group *iommu_group, +int win, dma_addr_t iova); +extern struct fsl_msi *fsl_msi_reserve_msi_bank(void *context); +extern int fsl_msi_unreserve_msi_bank(void *context); +extern int fsl_msi_set_msi_bank_in_dev(struct device *dev, void *data); + +#endif /* _POWERPC_FSL_MSI_H */ diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c index 027aeeb..75cd196 100644 --- a/arch/powerpc/sysdev/fsl_msi.c +++ b/arch/powerpc/sysdev/fsl_msi.c @@ -25,6 +25,7 @@ #include asm/ppc-pci.h #include asm/mpic.h #include asm/fsl_hcalls.h +#include linux/iommu.h #include fsl_msi.h #include fsl_pci.h @@ -172,22 +173,6 @@ static struct fsl_msi *fsl_msi_allocate_msi_bank(void *context) return NULL; } -/* FIXME: Assumption that host kernel will allocate only one MSI bank */ - __attribute__ ((unused)) static int fsl_msi_free_msi_bank(void *context) -{ - struct fsl_msi *msi_data; - - list_for_each_entry(msi_data, msi_head, list) { - if ((msi_data-reserved == MSI_RESERVED) -(msi_data-context == context)) { - msi_data-reserved = MSI_FREE; - msi_data-context = NULL; - return 0; - } - } - return -ENODEV; -} - /* This API returns the allocated MSI bank of context * to which pdev device belongs. * All kernel owned devices have NULL context. All devices @@ -200,6 +185,12 @@ static struct fsl_msi *fsl_msi_get_reserved_msi_bank(struct pci_dev *pdev) { struct fsl_msi *msi_data = NULL; void *context = NULL; + struct device *dev = pdev-dev; + + /* Device assigned to userspace if there is valid context */ + if (dev-archdata.context) { + context = dev-archdata.context; + } list_for_each_entry(msi_data, msi_head, list) { if ((msi_data-reserved == MSI_RESERVED) @@ -208,13 +199,133 @@ static struct fsl_msi *fsl_msi_get_reserved_msi_bank(struct pci_dev *pdev) } /* If no MSI bank allocated for kernel owned device, allocate one */ - msi_data = fsl_msi_allocate_msi_bank(NULL); - if (msi_data) - return msi_data; + if (!context) { + msi_data = fsl_msi_allocate_msi_bank(NULL); + if (msi_data) + return msi_data; + } return NULL; } +/* API to set context to which the device belongs */ +int fsl_msi_set_msi_bank_in_dev(struct device *dev, void *data) +{ + dev-archdata.context = data; + return 0; +} + +/* This API Allows a MSI bank to be reserved for a context. + * All devices in same context will share the allocated + * MSI bank. + * Typically this function will be called from meta + * driver like VFIO with a valid context. + */ +struct fsl_msi *fsl_msi_reserve_msi_bank(void *context) +{ + struct fsl_msi *msi_data; + +
Re: [PATCH 1/3] powerpc/pseries: Simplify check for suspendability during suspend/migration
On Mon, 2015-03-02 at 13:30 -0800, Tyrel Datwyler wrote: On 03/01/2015 08:19 PM, Cyril Bur wrote: On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote: During suspend/migration operation we must wait for the VASI state reported by the hypervisor to become Suspending prior to making the ibm,suspend-me RTAS call. Calling routines to rtas_ibm_supend_me() pass a vasi_state variable that exposes the VASI state to the caller. This is unnecessary as the caller only really cares about the following three conditions; if there is an error we should bailout, success indicating we have suspended and woken back up so proceed to device tree updated, or we are not suspendable yet so try calling rtas_ibm_suspend_me again shortly. This patch removes the extraneous vasi_state variable and simply uses the return code to communicate how to proceed. We either succeed, fail, or get -EAGAIN in which case we sleep for a second before trying to call rtas_ibm_suspend_me again. u64 handle = ((u64)be32_to_cpu(args.args[0]) 32) | be32_to_cpu(args.args[1]); - rc = rtas_ibm_suspend_me(handle, vasi_rc); - args.rets[0] = cpu_to_be32(vasi_rc); - if (rc) + rc = rtas_ibm_suspend_me(handle); + if (rc == -EAGAIN) + args.rets[0] = cpu_to_be32(RTAS_NOT_SUSPENDABLE); (continuing on...) so perhaps here have rc = 0; else if (rc == -EIO) args.rets[0] = cpu_to_be32(-1); rc = 0; Which should keep the original behaviour, the last thing we want to do is break BE. The biggest problem here is we are making what basically equates to a fake rtas call from drmgr which we intercept in ppc_rtas(). From there we make this special call to rtas_ibm_suspend_me() to check VASI state and do a bunch of other specialized work that needs to be setup prior to making the actual ibm,suspend-me rtas call. Since, we are cheating PAPR here I guess we can really handle it however we want. I chose to simply fail the rtas call in the case where rtas_ibm_suspend_me() fails with something other than -EAGAIN. In user space librtas will log errno for the failure and return RTAS_IO_ASSERT to drmgr which in turn will log that error and fail. We don't want to change the return values of the syscall unless we absolutely have to. And I don't think that's the case here. Sure we think drmgr is the only thing that uses this crap, but we don't know for sure. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 3/3] powerpc/pseries: Expose post-migration in kernel device tree update to drmgr
On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote: Traditionally after a migration operation drmgr has coordinated the device tree update with the kernel in userspace via the ugly /proc/ppc64/ofdt interface. This can be better done fully in the kernel where support already exists. Currently, drmgr makes a faux ibm,suspend-me RTAS call which we intercept in the kernel so that we can check VASI state for suspendability. After the LPAR resumes and returns to drmgr that is followed by the necessary update-nodes and update-properties RTAS calls which are parsed and communitated back to the kernel through /proc/ppc64/ofdt for the device tree update. The drmgr tool should instead initiate the migration using the already existing /sysfs/kernel/mobility/migration entry that performs all this work in the kernel. This patch adds a show function to the sysfs migration attribute that returns 1 to indicate the kernel will perform the device tree update after a migration operation and that drmgr should initiated the migration through the sysfs migration attribute. I don't understand why we need this? Can't drmgr just check if /sysfs/kernel/mobility/migration exists, and if so it knows it should use it and that the kernel will handle the whole procedure? cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2 v4] cpufreq: qoriq: Make the driver usable on all QorIQ platforms
From: Tang Yuantian yuantian.t...@freescale.com Freescale introduced new ARM core-based SoCs which support dynamic frequency switch feature. DFS on new SoCs are compatible with current PowerPC CoreNet platforms. In order to support those new platforms, this driver needs to be updated. The main changes include: 1. Changed the names of functions in driver. 2. Added two new functions get_cpu_physical_id() and get_bus_freq(). 3. Used a new way to get the CPU mask which share clock wire. Signed-off-by: Tang Yuantian yuantian.t...@freescale.com Acked-by: Viresh Kumar viresh.ku...@linaro.org --- v4: - resolve unmet direct dependencies warning v3: - put the menu entries into Kconfig v2: - split the name change into a separete patch - use policy-driver_data instead of per_cpu variable drivers/cpufreq/Kconfig | 8 ++ drivers/cpufreq/Kconfig.powerpc | 9 -- drivers/cpufreq/ppc-corenet-cpufreq.c | 160 +- 3 files changed, 107 insertions(+), 70 deletions(-) diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig index a171fef..659879a 100644 --- a/drivers/cpufreq/Kconfig +++ b/drivers/cpufreq/Kconfig @@ -293,5 +293,13 @@ config SH_CPU_FREQ If unsure, say N. endif +config QORIQ_CPUFREQ + tristate CPU frequency scaling driver for Freescale QorIQ SoCs + depends on OF COMMON_CLK (PPC_E500MC || ARM) + select CLK_QORIQ + help + This adds the CPUFreq driver support for Freescale QorIQ SoCs + which are capable of changing the CPU's frequency dynamically. + endif endmenu diff --git a/drivers/cpufreq/Kconfig.powerpc b/drivers/cpufreq/Kconfig.powerpc index 7ea2441..3a0595b 100644 --- a/drivers/cpufreq/Kconfig.powerpc +++ b/drivers/cpufreq/Kconfig.powerpc @@ -23,15 +23,6 @@ config CPU_FREQ_MAPLE This adds support for frequency switching on Maple 970FX Evaluation Board and compatible boards (IBM JS2x blades). -config PPC_CORENET_CPUFREQ - tristate CPU frequency scaling driver for Freescale E500MC SoCs - depends on PPC_E500MC OF COMMON_CLK - select CLK_QORIQ - help - This adds the CPUFreq driver support for Freescale e500mc, - e5500 and e6500 series SoCs which are capable of changing - the CPU's frequency dynamically. - config CPU_FREQ_PMAC bool Support for Apple PowerBooks depends on ADB_PMU PPC32 diff --git a/drivers/cpufreq/ppc-corenet-cpufreq.c b/drivers/cpufreq/ppc-corenet-cpufreq.c index bee5df7..949d992 100644 --- a/drivers/cpufreq/ppc-corenet-cpufreq.c +++ b/drivers/cpufreq/ppc-corenet-cpufreq.c @@ -1,7 +1,7 @@ /* * Copyright 2013 Freescale Semiconductor, Inc. * - * CPU Frequency Scaling driver for Freescale PowerPC corenet SoCs. + * CPU Frequency Scaling driver for Freescale QorIQ SoCs. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as @@ -20,10 +20,9 @@ #include linux/of.h #include linux/slab.h #include linux/smp.h -#include sysdev/fsl_soc.h /** - * struct cpu_data - per CPU data struct + * struct cpu_data * @parent: the parent node of cpu clock * @table: frequency table */ @@ -67,17 +66,78 @@ static const struct soc_data sdata[] = { static u32 min_cpufreq; static const u32 *fmask; -static DEFINE_PER_CPU(struct cpu_data *, cpu_data); +#if defined(CONFIG_ARM) +static int get_cpu_physical_id(int cpu) +{ + return topology_core_id(cpu); +} +#else +static int get_cpu_physical_id(int cpu) +{ + return get_hard_smp_processor_id(cpu); +} +#endif -/* cpumask in a cluster */ -static DEFINE_PER_CPU(cpumask_var_t, cpu_mask); +static u32 get_bus_freq(void) +{ + struct device_node *soc; + u32 sysfreq; + + soc = of_find_node_by_type(NULL, soc); + if (!soc) + return 0; + + if (of_property_read_u32(soc, bus-frequency, sysfreq)) + sysfreq = 0; + + of_node_put(soc); + + return sysfreq; +} -#ifndef CONFIG_SMP -static inline const struct cpumask *cpu_core_mask(int cpu) +static struct device_node *cpu_to_clk_node(int cpu) { - return cpumask_of(0); + struct device_node *np, *clk_np; + + if (!cpu_present(cpu)) + return NULL; + + np = of_get_cpu_node(cpu, NULL); + if (!np) + return NULL; + + clk_np = of_parse_phandle(np, clocks, 0); + if (!clk_np) + return NULL; + + of_node_put(np); + + return clk_np; +} + +/* traverse cpu nodes to get cpu mask of sharing clock wire */ +static void set_affected_cpus(struct cpufreq_policy *policy) +{ + struct device_node *np, *clk_np; + struct cpumask *dstp = policy-cpus; + int i; + + np = cpu_to_clk_node(policy-cpu); + if (!np) + return; + + for_each_present_cpu(i) { + clk_np = cpu_to_clk_node(i); +
[PATCH 2/4 RFC] fsl/msi: Move fsl, msi mode specific MSI device search out of main loop
Moving out the specific MSI device search out of main loop. And now the specific msi device search is placed with other fsl.msi specific code in same function. This is in preparation to MSI bank partitioning. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/sysdev/fsl_msi.c | 39 +-- 1 file changed, 29 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c index ec3161b..32ba1e3 100644 --- a/arch/powerpc/sysdev/fsl_msi.c +++ b/arch/powerpc/sysdev/fsl_msi.c @@ -178,7 +178,8 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) unsigned int virq; struct msi_desc *entry; struct msi_msg msg; - struct fsl_msi *msi_data; + struct fsl_msi *msi_data = NULL; + bool specific_msi_bank = false; if (type == PCI_CAP_ID_MSIX) pr_debug(fslmsi: MSI-X untested, trying anyway.\n); @@ -199,12 +200,9 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) hose-dn-full_name, np-phandle); return -EINVAL; } - } - - list_for_each_entry(entry, pdev-msi_list, list) { /* -* Loop over all the MSI devices until we find one that has an -* available interrupt. +* Loop over all the MSI devices till we find +* specific MSI device. */ list_for_each_entry(msi_data, msi_head, list) { /* @@ -215,12 +213,33 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) * has the additional benefit of skipping over MSI * nodes that are not mapped in the PAMU. */ - if (phandle (phandle != msi_data-phandle)) - continue; + if (phandle == msi_data-phandle) { + specific_msi_bank = true; + break; + } + } + if (!specific_msi_bank) { + dev_err(pdev-dev, + No specific MSI device found for node %s\n, + hose-dn-full_name); + return -EINVAL; + } + } + + list_for_each_entry(entry, pdev-msi_list, list) { + if (specific_msi_bank) { hwirq = msi_bitmap_alloc_hwirqs(msi_data-bitmap, 1); - if (hwirq = 0) - break; + } else { + /* +* Loop over all the MSI devices until we find one that has an +* available interrupt. +*/ + list_for_each_entry(msi_data, msi_head, list) { + hwirq = msi_bitmap_alloc_hwirqs(msi_data-bitmap, 1); + if (hwirq = 0) + break; + } } if (hwirq 0) { -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/4 RFC] fsl/msi: have msiir register address absolute rather than offset
Having absolute address simplifies the code and also removes the confusion around feature-msiir_offset and msi_data-msiir_offset. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/sysdev/fsl_msi.c | 9 +++-- arch/powerpc/sysdev/fsl_msi.h | 2 +- 2 files changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c index 4bbb4b8..ec3161b 100644 --- a/arch/powerpc/sysdev/fsl_msi.c +++ b/arch/powerpc/sysdev/fsl_msi.c @@ -157,7 +157,7 @@ static void fsl_compose_msi_msg(struct pci_dev *pdev, int hwirq, if (reg (len == sizeof(u64))) address = be64_to_cpup(reg); else - address = fsl_pci_immrbar_base(hose) + msi_data-msiir_offset; + address = msi_data-msiir; msg-address_lo = lower_32_bits(address); msg-address_hi = upper_32_bits(address); @@ -430,18 +430,15 @@ static int fsl_of_msi_probe(struct platform_device *dev) dev-dev.of_node-full_name); goto error_out; } - msi-msiir_offset = - features-msiir_offset + (res.start 0xf); /* * First read the MSIIR/MSIIR1 offset from dts * On failure use the hardcode MSIIR offset */ if (of_address_to_resource(dev-dev.of_node, 1, msiir)) - msi-msiir_offset = features-msiir_offset + - (res.start MSIIR_OFFSET_MASK); + msi-msiir = res.start + features-msiir_offset; else - msi-msiir_offset = msiir.start MSIIR_OFFSET_MASK; + msi-msiir = msiir.start; } msi-feature = features-fsl_pic_ip; diff --git a/arch/powerpc/sysdev/fsl_msi.h b/arch/powerpc/sysdev/fsl_msi.h index 420cfcb..9b0ab84 100644 --- a/arch/powerpc/sysdev/fsl_msi.h +++ b/arch/powerpc/sysdev/fsl_msi.h @@ -34,7 +34,7 @@ struct fsl_msi { unsigned long cascade_irq; - u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */ + phys_addr_t msiir; /* MSIIR Address in CCSR */ u32 ibs_shift; /* Shift of interrupt bit select */ u32 srs_shift; /* Shift of the shared interrupt register select */ void __iomem *msi_regs; -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v6] dmaengine: Driver support for FSL RaidEngine device.
From: Xuelin Shi xuelin@freescale.com The RaidEngine is a new FSL hardware used for Raid5/6 acceration. This patch enables the RaidEngine functionality and provides hardware offloading capability for memcpy, xor and pq computation. It works with async_tx. Signed-off-by: Harninder Rai harninder@freescale.com Signed-off-by: Xuelin Shi xuelin@freescale.com --- changes for v6: - use dev_err() instead of pr_err() - avoid BUG_ON with if changes for v5: - align symbol to fsl_re_xxx to avoid namespace issue. - switch back to tasklet - add xor/pq continuation in to support more than 16 srcs. drivers/dma/Kconfig| 11 + drivers/dma/Makefile | 1 + drivers/dma/fsl_raid.c | 904 + drivers/dma/fsl_raid.h | 306 + 4 files changed, 1222 insertions(+) create mode 100644 drivers/dma/fsl_raid.c create mode 100644 drivers/dma/fsl_raid.h diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index f2b2c4e..37397cd 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -125,6 +125,17 @@ config FSL_DMA EloPlus is on mpc85xx and mpc86xx and Pxxx parts, and the Elo3 is on some Txxx and Bxxx parts. +config FSL_RAID +tristate Freescale RAID engine Support +depends on FSL_SOC !ASYNC_TX_ENABLE_CHANNEL_SWITCH +select DMA_ENGINE +select DMA_ENGINE_RAID +---help--- + Enable support for Freescale RAID Engine. RAID Engine is + available on some QorIQ SoCs (like P5020/P5040). It has + the capability to offload memcpy, xor and pq computation + for raid5/6. + config MPC512X_DMA tristate Freescale MPC512x built-in DMA engine support depends on PPC_MPC512x || PPC_MPC831x diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile index 2022b54..b3f8d9e 100644 --- a/drivers/dma/Makefile +++ b/drivers/dma/Makefile @@ -44,6 +44,7 @@ obj-$(CONFIG_DMA_JZ4740) += dma-jz4740.o obj-$(CONFIG_TI_CPPI41) += cppi41.o obj-$(CONFIG_K3_DMA) += k3dma.o obj-$(CONFIG_MOXART_DMA) += moxart-dma.o +obj-$(CONFIG_FSL_RAID) += fsl_raid.o obj-$(CONFIG_FSL_EDMA) += fsl-edma.o obj-$(CONFIG_QCOM_BAM_DMA) += qcom_bam_dma.o obj-y += xilinx/ diff --git a/drivers/dma/fsl_raid.c b/drivers/dma/fsl_raid.c new file mode 100644 index 000..12778bd --- /dev/null +++ b/drivers/dma/fsl_raid.c @@ -0,0 +1,904 @@ +/* + * drivers/dma/fsl_raid.c + * + * Freescale RAID Engine device driver + * + * Author: + * Harninder Rai harninder@freescale.com + * Naveen Burmi naveenbu...@freescale.com + * + * Rewrite: + * Xuelin Shi xuelin@freescale.com + * + * Copyright (c) 2010-2014 Freescale Semiconductor, Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of Freescale Semiconductor nor the + * names of its contributors may be used to endorse or promote products + * derived from this software without specific prior written permission. + * + * ALTERNATIVELY, this software may be distributed under the terms of the + * GNU General Public License (GPL) as published by the Free Software + * Foundation, either version 2 of that License or (at your option) any + * later version. + * + * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR ANY + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * Theory of operation: + * + * General capabilities: + * RAID Engine (RE) block is capable of offloading XOR, memcpy and P/Q + * calculations required in RAID5 and RAID6 operations. RE driver + * registers with Linux's ASYNC layer as dma driver. RE hardware + * maintains strict ordering of the requests through chained + * command queueing. + * + * Data flow: + * Software RAID layer of Linux (MD layer) maintains RAID partitions, + * strips, stripes etc. It sends
[PATCH 2/2 v4] cpufreq: qoriq: rename the driver
From: Tang Yuantian yuantian.t...@freescale.com This driver works on all QorIQ platforms which include ARM-based cores and PPC-based cores. Rename it in order to represent better. Signed-off-by: Tang Yuantian yuantian.t...@freescale.com Acked-by: Viresh Kumar viresh.ku...@linaro.org --- v3, v4 - none v2: - use -C -M options when format-patch drivers/cpufreq/{ppc-corenet-cpufreq.c = qoriq-cpufreq.c} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename drivers/cpufreq/{ppc-corenet-cpufreq.c = qoriq-cpufreq.c} (100%) diff --git a/drivers/cpufreq/ppc-corenet-cpufreq.c b/drivers/cpufreq/qoriq-cpufreq.c similarity index 100% rename from drivers/cpufreq/ppc-corenet-cpufreq.c rename to drivers/cpufreq/qoriq-cpufreq.c -- 2.1.0.27.g96db324 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/4 RFC] fsl/msi: Add support for MSI bank partitioning
With this patchset we add MSI bank partitioning support. MSI bank partitioning is required for supporting direct device assignment of MSI capable PCI devices. One MSI bank will be allocated for kernel context. VFIO can allocate one MSI bank per context. And all devices in the context will share the MSI bank. We have limited number of MSI banks (2-4). So to support large number of context we need to allow sharing of MSI banks. This patchset does not support sharing of MSI bank but will be done soon once this patchset take a shape. These changes are tested with both kernel owned PCI devices and direct assigned devices using VFIO to guest. Bharat Bhushan (4): fsl/msi: have msiir register address absolute rather than offset fsl/msi: Move fsl,msi mode specific MSI device search out of main loop fsl/msi: Add MSI bank allocation for kernel owned devices fsl/msi: Add interface to reserve/free msi bank arch/powerpc/include/asm/device.h | 2 + arch/powerpc/include/asm/fsl_msi.h | 26 arch/powerpc/sysdev/fsl_msi.c | 249 + arch/powerpc/sysdev/fsl_msi.h | 7 +- 4 files changed, 261 insertions(+), 23 deletions(-) create mode 100644 arch/powerpc/include/asm/fsl_msi.h -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/3] powerpc/pseries: Fixes and cleanup of suspend/migration code
On Fri, 2015-02-27 at 18:24 -0800, Tyrel Datwyler wrote: This patchset simplifies the usage of rtas_ibm_suspend_me() by removing an extraneous function parameter, fixes device tree updating on little endian platforms, and adds a mechanism for informing drmgr that the kernel is cabable of performing the whole migration including device tree update itself. Tyrel Datwyler (3): powerpc/pseries: Simplify check for suspendability during suspend/migration powerpc/pseries: Little endian fixes for post mobility device tree update powerpc/pseries: Expose post-migration in kernel device tree update to drmgr Hi Tyrel, Firstly let me say how much I hate this code, so thanks for working on it :) But I need you to split this series, into 1) fixes for 4.0 (and stable?), and 2) the rest. I *think* that would be patch 2, and then patches 1 3, but I don't want to guess. So please resend. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On 03/02/2015 02:25 PM, Horia Geantă wrote: On 2/20/2015 7:00 PM, Martin Hicks wrote: This adds the AES-XTS mode, supported by the Freescale SEC 3.3.2. One of the nice things about this hardware is that it knows how to deal with encrypt/decrypt requests that are larger than sector size, but that also requires that that the sector size be passed into the crypto engine as an XTS cipher context parameter. When a request is larger than the sector size the sector number is incremented by the talitos engine and the tweak key is re-calculated for the new sector. I've tested this with 256bit and 512bit keys (tweak and data keys of 128bit and 256bit) to ensure interoperability with the software AES-XTS implementation. All testing was done using dm-crypt/LUKS with aes-xts-plain64. Is there a better solution that just hard coding the sector size to (1SECTOR_SHIFT)? Maybe dm-crypt should be modified to pass the sector size along with the plain/plain64 IV to an XTS algorithm? AFAICT, SW implementation of xts mode in kernel (crypto/xts.c) is not aware of a sector size (data unit size in IEEE P1619 terminology): There's a hidden assumption that all the data send to xts in one request belongs to a single sector. Even more, it's supposed that the first 16-byte block in the request is block 0 in the sector. These can be seen from the way the tweak (T) value is computed. (Side note: there's no support of ciphertext stealing in crypto/xts.c - i.e. sector sizes must be a multiple of underlying block cipher size - that is 16B.) If dm-crypt would be modified to pass sector size somehow, all in-kernel xts implementations would have to be made aware of the change. I have nothing against this, but let's see what crypto maintainers have to say... BTW, there were some discussions back in 2013 wrt. being able to configure / increase sector size, smth. crypto engines would benefit from: http://www.saout.de/pipermail/dm-crypt/2013-January/003125.html (experimental patch) http://www.saout.de/pipermail/dm-crypt/2013-March/003202.html The experimental patch sends sector size as the req-nbytes - hidden assumption: data size sent in an xts crypto request equals a sector. There was no follow-up but the idea is not yet abandoned :-) Dmcrypt will always use sector as a minimal unit (and I believe sectors will by always multiple of block size so no need for ciphertext stealing in XTS.) For now, dmcrypt always use 512 bytes sector size. If crypto API allows to encrypt more sectors in one run (handling IV internally) dmcrypt can be modified of course. But do not forget we can use another IV (not only sequential number) e.g. ESSIV with XTS as well (even if it doesn't make much sense, some people are using it). Maybe the following question would be if the dmcrypt sector IV algorithms should moved into crypto API as well. (But because I misused dmcrypt IVs hooks for some additional operations for loopAES and old Truecrypt CBC mode, it is not so simple...) Milan ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] crypto: talitos: Add AES-XTS mode
On 2/20/2015 7:00 PM, Martin Hicks wrote: This adds the AES-XTS mode, supported by the Freescale SEC 3.3.2. One of the nice things about this hardware is that it knows how to deal with encrypt/decrypt requests that are larger than sector size, but that also requires that that the sector size be passed into the crypto engine as an XTS cipher context parameter. When a request is larger than the sector size the sector number is incremented by the talitos engine and the tweak key is re-calculated for the new sector. I've tested this with 256bit and 512bit keys (tweak and data keys of 128bit and 256bit) to ensure interoperability with the software AES-XTS implementation. All testing was done using dm-crypt/LUKS with aes-xts-plain64. Is there a better solution that just hard coding the sector size to (1SECTOR_SHIFT)? Maybe dm-crypt should be modified to pass the sector size along with the plain/plain64 IV to an XTS algorithm? AFAICT, SW implementation of xts mode in kernel (crypto/xts.c) is not aware of a sector size (data unit size in IEEE P1619 terminology): There's a hidden assumption that all the data send to xts in one request belongs to a single sector. Even more, it's supposed that the first 16-byte block in the request is block 0 in the sector. These can be seen from the way the tweak (T) value is computed. (Side note: there's no support of ciphertext stealing in crypto/xts.c - i.e. sector sizes must be a multiple of underlying block cipher size - that is 16B.) If dm-crypt would be modified to pass sector size somehow, all in-kernel xts implementations would have to be made aware of the change. I have nothing against this, but let's see what crypto maintainers have to say... BTW, there were some discussions back in 2013 wrt. being able to configure / increase sector size, smth. crypto engines would benefit from: http://www.saout.de/pipermail/dm-crypt/2013-January/003125.html (experimental patch) http://www.saout.de/pipermail/dm-crypt/2013-March/003202.html The experimental patch sends sector size as the req-nbytes - hidden assumption: data size sent in an xts crypto request equals a sector. I am not sure if there was a follow-up though... Adding Milan - maybe he could shed some light. Thanks, Horia Martin Hicks (2): crypto: talitos: Clean ups and comment fixes for ablkcipher commands crypto: talitos: Add AES-XTS Support drivers/crypto/talitos.c | 45 + drivers/crypto/talitos.h |1 + 2 files changed, 38 insertions(+), 8 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [v1,1/3] crypto: powerpc/sha1 - assembler
On Tue, Feb 24, 2015 at 08:36:40PM +0100, Markus Stockhausen wrote: This is the assembler code for SHA1 implementation with the SIMD SPE instruction set. With the enhanced instruction set we can operate on 2 32 bit words in parallel. That helps reducing the time to calculate W16-W79. For increasing performance even more the assembler function can compute hashes for more than one 64 byte input block. The state of the used SPE registers is preserved via the stack so we can run from interrupt context. There might be the case that we interrupt ourselves and push sensitive data from another context onto our stack. Clear this area in the stack afterwards to avoid information leakage. The code is endian independant. Signed-off-by: Markus Stockhausen stockhau...@collogia.de All applied. Thanks! -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] fsl: mpc85xx: call k(un)map_atomic other than k(un)map
From: Yanjiang Jin yanjiang@windriver.com The k(un)map function may be called in atomic context in the function map_and_flush(), so use k(un)map_atomic to replace it, else we would get the below warning during kdump: BUG: sleeping function called from invalid context at include/linux/highmem.h:58 in_atomic(): 1, irqs_disabled(): 1, pid: 736, name: sh INFO: lockdep is turned off. irq event stamp: 0 hardirqs last enabled at (0): [ (null)] (null) hardirqs last disabled at (0): [c0066d1c] .copy_process.part.44+0x50c/0x1360 softirqs last enabled at (0): [c0066d1c] .copy_process.part.44+0x50c/0x1360 softirqs last disabled at (0): [ (null)] (null) CPU: 1 PID: 736 Comm: sh Tainted: G D W3.10.62-ltsi-WR6.0.0.0_standard #2 Call Trace: [c000f47cf120] [c000b150] .show_stack+0x170/0x290 (unreliable) [c000f47cf210] [c0b71334] .dump_stack+0x28/0x3c [c000f47cf280] [c00bb5d8] .__might_sleep+0x1a8/0x270 [c000f47cf310] [c00440cc] .map_and_flush+0x4c/0xc0 [c000f47cf390] [c00441cc] .mpc85xx_smp_machine_kexec+0x8c/0xec0 [c000f47cf420] [c002ae00] .machine_kexec+0x60/0x90 [c000f47cf4b0] [c010957c] .crash_kexec+0x8c/0x100 [c000f47cf6a0] [c0015df8] .die+0x348/0x450 [c000f47cf740] [c002f3a0] .bad_page_fault+0xe0/0x130 [c000f47cf7c0] [c001f3e4] storage_fault_common+0x40/0x44 Signed-off-by: Yanjiang Jin yanjiang@windriver.com --- arch/powerpc/platforms/85xx/smp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c index d7c1e69..8631ac5 100644 --- a/arch/powerpc/platforms/85xx/smp.c +++ b/arch/powerpc/platforms/85xx/smp.c @@ -360,10 +360,10 @@ static void mpc85xx_smp_kexec_down(void *arg) static void map_and_flush(unsigned long paddr) { struct page *page = pfn_to_page(paddr PAGE_SHIFT); - unsigned long kaddr = (unsigned long)kmap(page); + unsigned long kaddr = (unsigned long)kmap_atomic(page); flush_dcache_range(kaddr, kaddr + PAGE_SIZE); - kunmap(page); + kunmap_atomic((void *)kaddr); } /** -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)
Hello Scott, On 03/02/2015 09:32 AM, Emil Medve wrote: From: Igal Liberman igal.liber...@freescale.com Describe the PHY topology for all configurations supported by each board Based on prior work by Andy Fleming aflem...@gmail.com Change-Id: I4fbcc5df9ee7c4f784afae9dab5d1e78cdc24f0f Bah, I'll remove this... Signed-off-by: Igal Liberman igal.liber...@freescale.com Signed-off-by: Shruti Kanetkar kanetkar.shr...@gmail.com Signed-off-by: Emil Medve emilian.me...@freescale.com --- arch/powerpc/boot/dts/b4860qds.dts| 60 - arch/powerpc/boot/dts/b4qds.dtsi | 51 - arch/powerpc/boot/dts/p1023rdb.dts| 24 +- arch/powerpc/boot/dts/p2041rdb.dts| 92 +++- arch/powerpc/boot/dts/p3041ds.dts | 112 +- arch/powerpc/boot/dts/p4080ds.dts | 184 +++- arch/powerpc/boot/dts/p5020ds.dts | 112 +- arch/powerpc/boot/dts/p5040ds.dts | 234 +++- arch/powerpc/boot/dts/t1040rdb.dts| 32 ++- arch/powerpc/boot/dts/t1042rdb.dts| 30 ++- arch/powerpc/boot/dts/t1042rdb_pi.dts | 18 +- arch/powerpc/boot/dts/t104xqds.dtsi | 178 ++- arch/powerpc/boot/dts/t104xrdb.dtsi | 33 ++- arch/powerpc/boot/dts/t2080qds.dts| 158 +- arch/powerpc/boot/dts/t2080rdb.dts| 67 +- arch/powerpc/boot/dts/t2081qds.dts| 221 ++- arch/powerpc/boot/dts/t4240qds.dts| 400 +- arch/powerpc/boot/dts/t4240rdb.dts| 149 - 18 files changed, 2135 insertions(+), 20 deletions(-) Cheers, ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)
From: Igal Liberman igal.liber...@freescale.com Describe the PHY topology for all configurations supported by each board Based on prior work by Andy Fleming aflem...@gmail.com Change-Id: I4fbcc5df9ee7c4f784afae9dab5d1e78cdc24f0f Signed-off-by: Igal Liberman igal.liber...@freescale.com Signed-off-by: Shruti Kanetkar kanetkar.shr...@gmail.com Signed-off-by: Emil Medve emilian.me...@freescale.com --- arch/powerpc/boot/dts/b4860qds.dts| 60 - arch/powerpc/boot/dts/b4qds.dtsi | 51 - arch/powerpc/boot/dts/p1023rdb.dts| 24 +- arch/powerpc/boot/dts/p2041rdb.dts| 92 +++- arch/powerpc/boot/dts/p3041ds.dts | 112 +- arch/powerpc/boot/dts/p4080ds.dts | 184 +++- arch/powerpc/boot/dts/p5020ds.dts | 112 +- arch/powerpc/boot/dts/p5040ds.dts | 234 +++- arch/powerpc/boot/dts/t1040rdb.dts| 32 ++- arch/powerpc/boot/dts/t1042rdb.dts| 30 ++- arch/powerpc/boot/dts/t1042rdb_pi.dts | 18 +- arch/powerpc/boot/dts/t104xqds.dtsi | 178 ++- arch/powerpc/boot/dts/t104xrdb.dtsi | 33 ++- arch/powerpc/boot/dts/t2080qds.dts| 158 +- arch/powerpc/boot/dts/t2080rdb.dts| 67 +- arch/powerpc/boot/dts/t2081qds.dts| 221 ++- arch/powerpc/boot/dts/t4240qds.dts| 400 +- arch/powerpc/boot/dts/t4240rdb.dts| 149 - 18 files changed, 2135 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/boot/dts/b4860qds.dts b/arch/powerpc/boot/dts/b4860qds.dts index 6bb3707..98b1ef4 100644 --- a/arch/powerpc/boot/dts/b4860qds.dts +++ b/arch/powerpc/boot/dts/b4860qds.dts @@ -1,7 +1,7 @@ /* * B4860DS Device Tree Source * - * Copyright 2012 Freescale Semiconductor Inc. + * Copyright 2012 - 2015 Freescale Semiconductor Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: @@ -39,12 +39,69 @@ model = fsl,B4860QDS; compatible = fsl,B4860QDS; + aliases { + phy_sgmii_1e = phy_sgmii_1e; + phy_sgmii_1f = phy_sgmii_1f; + phy_xaui_slot1 = phy_xaui_slot1; + phy_xaui_slot2 = phy_xaui_slot2; + }; + ifc: localbus@ffe124000 { board-control@3,0 { compatible = fsl,b4860qds-fpga, fsl,fpga-qixis; }; }; + soc@ffe00 { + fman@40 { + ethernet@e8000 { + phy-handle = phy_sgmii_1e; + phy-connection-type = sgmii; + }; + + ethernet@ea000 { + phy-handle = phy_sgmii_1f; + phy-connection-type = sgmii; + }; + + ethernet@f { + phy-handle = phy_xaui_slot1; + phy-connection-type = xgmii; + }; + + ethernet@f2000 { + phy-handle = phy_xaui_slot2; + phy-connection-type = xgmii; + }; + + mdio@fc000 { + phy_sgmii_1e: ethernet-phy@1e { + reg = 0x1e; + status = disabled; + }; + + phy_sgmii_1f: ethernet-phy@1f { + reg = 0x1f; + status = disabled; + }; + }; + + mdio@fd000 { + phy_xaui_slot1: xaui-phy@slot1 { + compatible = ethernet-phy-ieee802.3-c45; + reg = 0x7; + status = disabled; + }; + + phy_xaui_slot2: xaui-phy@slot2 { + compatible = ethernet-phy-ieee802.3-c45; + reg = 0x6; + status = disabled; + }; + }; + }; + }; + rio: rapidio@ffe0c { reg = 0xf 0xfe0c 0 0x11000; @@ -55,7 +112,6 @@ ranges = 0 0 0xc 0x3000 0 0x1000; }; }; - }; /include/ fsl/b4860si-post.dtsi diff --git a/arch/powerpc/boot/dts/b4qds.dtsi b/arch/powerpc/boot/dts/b4qds.dtsi index 559d006..6ef6933 100644 --- a/arch/powerpc/boot/dts/b4qds.dtsi +++ b/arch/powerpc/boot/dts/b4qds.dtsi @@ -1,7 +1,7 @@ /* * B4420DS Device Tree Source * - * Copyright 2012 - 2014 Freescale Semiconductor, Inc. + *
Re: [PATCH] sata-fsl: Apply link speed limits
On Thu, Feb 19, 2015 at 03:05:47PM -0500, Martin Hicks wrote: The driver was ignoring limits requested by libata.force. The output would look like: fsl-sata ffe18000.sata: Sata FSL Platform/CSB Driver init ata1: FORCE: PHY spd limit set to 1.5Gbps ata1: SATA max UDMA/133 irq 74 ata1: Signature Update detected @ 0 msecs ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 310) Signed-off-by: Martin Hicks m...@bork.org Applied to libata/for-4.0-fixes. Thanks. -- tejun ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH V4] powerpc, powernv: Add OPAL platform event driver
Hi Stewart, Tried to fake ACPI via acpi_bus_generate_netlink_event and found that it needs other files which arch specific and use x86 assembly. Regards, Vipin On 02/24/2015 03:14 PM, Vipin K Parashar wrote: Hi Stewart, I looked into ACPI and found details about it. But before we go into discussing more details of it, would like to share a brief about OPAL platform events (EPOW/DPO) work and original design proposed. As if now OPAL platform events work supports two events: EPOW (Early Power Off Warning) and DPO (Delayed Power Off). On FSP based systems FSP notifies OPAL about EPOW and DPO events via mbox mechanism. Subsequently OPAL sends notifications for these events to pkvm kernel. Original design is to have a kernel driver maintain a queue and add these events to queue upon arrival. pkvm driver also provides a character device for host to consume these events. A daemon is proposed for pkvm host to poll/read these events from char device. This daemon would process these events and take action to log and shutdown host. Apart from this it would also send these event info to VMs which is handled by OSes running on VMs. Linux on VMs already has code in place to handle these events as it expects this info to reach it in PAPR format under EPOW (Environmental and Power Warnings) category. EPOW mbox msgs are received for below events: 1. UPS events - UPS Battery Low, UPS Bypassed, UPS Utility Failure, UPS On 2. SPCN events - Configuration Change, Log SPCN Fault, Impending Power Failure, Power Incomplete 3. Temprature events - Over Ambient temperature, Over internal temperature. Now ACPI: Looked into ACPI and tried to figure out how ACPI userspace/kernel framework can be helpful for our work. ACPI user space consists of below components. acpid - ACPI daemon to receive events from kernel acpid provides events and actions files in /etc/acpi dir to configure actions for various events. acpi, acpi_listen, acpitool - Commands to query and set various ACPI supported parameters. These tools work with various sysfs files to show/set various parameter values. As if today acpid and other tools don't exist for POWER so would need to be ported. acpid is useful for our work but other tools might not be helpful as they look into various sysfs files created by various ACPI kernel drivers which we won't have. Also we would need to map our EPOW/DPO events to acpid supported events and few events link SPCN ones won't map straight away and might need to be added in acpid as new events. ACPI in kernel has various drivers for fan, battery, laptop buttons etc. They handle events and uses netlink mechanism to sent out these events to userspace. Now looking into ACPI code it seems that we would be reusing a small chunk of acpi code but instead end up adding unnecessary complexity due to support a lot of stuff than needed by us. Here too mapping our EPOW/DPO events to ACPI defined structures in needed and we would need to add new member varaibles in ACPI event structures for unmapped events like SPCN ones. In nutshell it seems that by using ACPI we would end up adding lot more complexity with a little gain of code reuse. Netlink: On technology side netlink seems to be a faster method compared to character driver. So that could be a good alternative to use as a method of communication between our pkvm driver and userspace. But EPOW/DPO events occur at very low rate unlike network subsystem which receive data packets at a very high rate. So probably netlink could be a faster method but due to slow EPOW/DPO event traffic a character driver might be sufficient. We already have ppc64-diag package which is part of various distros so would be used for hosting daemon code. Thus it takes off overhead of convincing distros for adding something extra. This was my findings and opinions on alternatives. Apologies for a little lengthy text :-) Let me know if i missed out anything and any suggestions that you would have. Regards, Vipin On 02/11/2015 10:32 AM, Stewart Smith wrote: Vipin K Parashar vi...@linux.vnet.ibm.com writes: (1) Environmental and Power Warning (EPOW) (2) Delayed Power Off (DPO) The user interface for this driver is /dev/opal_event character device file where the user space clients can poll and read for new opal platform events. The expected sequence of events driven from user space should be like the following. (1) Open the character device file (2) Poll on the file for POLLIN event (3) When unblocked, must attempt to read OPAL_PLAT_EVENT_MAX_SIZE size (4) Kernel driver will pass at most one opal_plat_event structure (5) Poll again for more new events A few thoughts from discussing with Michael and Joel: - not convinced that a chardev is the most ideal way to notify userspace. It seems like yet-another powerpc specific notification mechanism, which isn't ideal. - netlink probably isn't right
Re: [PATCHi v2] ibmveth: Add function to enable live MAC address changes
On 02/28/2015 02:59 AM, Jiri Pirko wrote: Sat, Feb 28, 2015 at 06:56:04AM CET, tlfal...@linux.vnet.ibm.com wrote: Add a function that will enable changing the MAC address of an ibmveth interface while it is still running. Signed-off-by: Thomas Falcon tlfal...@linux.vnet.ibm.com --- v2: If h_change_logical_lan_mac fails, dev-dev_addr will not be changed. drivers/net/ethernet/ibm/ibmveth.c | 25 - 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index 21978cc..b6ac676 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1327,6 +1327,29 @@ static unsigned long ibmveth_get_desired_dma(struct vio_dev *vdev) return ret; } +static int ibmveth_set_mac_addr(struct net_device *dev, void *p) +{ +struct ibmveth_adapter *adapter = netdev_priv(dev); +struct sockaddr *addr = p; +u64 mac_address; +int rc; + +if (!is_valid_ether_addr(addr-sa_data)) +return -EADDRNOTAVAIL; + +mac_address = ibmveth_encode_mac_addr(addr-sa_data); +rc = h_change_logical_lan_mac(adapter-vdev-unit_address, mac_address); +if (rc) { +netdev_err(adapter-netdev, h_change_logical_lan_mac failed + with rc=%d\n, rc); Please do not wrap text in message. For that, 80-char limit does not apply. I will send a new patch fixing this shortly. Thanks to you, Brian, and Dave for reviewing this patch. +return rc; +} + +ether_addr_copy(dev-dev_addr, addr-sa_data); + +return 0; +} + static const struct net_device_ops ibmveth_netdev_ops = { .ndo_open = ibmveth_open, .ndo_stop = ibmveth_close, @@ -1337,7 +1360,7 @@ static const struct net_device_ops ibmveth_netdev_ops = { .ndo_fix_features = ibmveth_fix_features, .ndo_set_features = ibmveth_set_features, .ndo_validate_addr = eth_validate_addr, -.ndo_set_mac_address= eth_mac_addr, +.ndo_set_mac_address= ibmveth_set_mac_addr, #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_poll_controller= ibmveth_poll_controller, #endif -- 1.8.3.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH v3] ibmveth: Add function to enable live MAC address changes
Add a function that will enable changing the MAC address of an ibmveth interface while it is still running. Signed-off-by: Thomas Falcon tlfal...@linux.vnet.ibm.com --- v3: removed text wrapping in error message v2: If h_change_logical_lan_mac fails, dev-dev_addr will not be changed. drivers/net/ethernet/ibm/ibmveth.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index 21978cc..072426a 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1327,6 +1327,28 @@ static unsigned long ibmveth_get_desired_dma(struct vio_dev *vdev) return ret; } +static int ibmveth_set_mac_addr(struct net_device *dev, void *p) +{ + struct ibmveth_adapter *adapter = netdev_priv(dev); + struct sockaddr *addr = p; + u64 mac_address; + int rc; + + if (!is_valid_ether_addr(addr-sa_data)) + return -EADDRNOTAVAIL; + + mac_address = ibmveth_encode_mac_addr(addr-sa_data); + rc = h_change_logical_lan_mac(adapter-vdev-unit_address, mac_address); + if (rc) { + netdev_err(adapter-netdev, h_change_logical_lan_mac failed with rc=%d\n, rc); + return rc; + } + + ether_addr_copy(dev-dev_addr, addr-sa_data); + + return 0; +} + static const struct net_device_ops ibmveth_netdev_ops = { .ndo_open = ibmveth_open, .ndo_stop = ibmveth_close, @@ -1337,7 +1359,7 @@ static const struct net_device_ops ibmveth_netdev_ops = { .ndo_fix_features = ibmveth_fix_features, .ndo_set_features = ibmveth_set_features, .ndo_validate_addr = eth_validate_addr, - .ndo_set_mac_address= eth_mac_addr, + .ndo_set_mac_address= ibmveth_set_mac_addr, #ifdef CONFIG_NET_POLL_CONTROLLER .ndo_poll_controller= ibmveth_poll_controller, #endif -- 1.8.3.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v12 17/21] powerpc/powernv: Shift VF resource with an offset
On Tue, Feb 24, 2015 at 11:10:33AM -0600, Bjorn Helgaas wrote: On Tue, Feb 24, 2015 at 3:00 AM, Bjorn Helgaas bhelg...@google.com wrote: On Tue, Feb 24, 2015 at 02:34:57AM -0600, Bjorn Helgaas wrote: From: Wei Yang weiy...@linux.vnet.ibm.com On PowerNV platform, resource position in M64 implies the PE# the resource belongs to. In some cases, adjustment of a resource is necessary to locate it to a correct position in M64. Add pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR address according to an offset. [bhelgaas: rework loops, rework overlap check, index resource[] conventionally, remove pci_regs.h include, squashed with next patch] Signed-off-by: Wei Yang weiy...@linux.vnet.ibm.com Signed-off-by: Bjorn Helgaas bhelg...@google.com ... +#ifdef CONFIG_PCI_IOV +static int pnv_pci_vf_resource_shift(struct pci_dev *dev, int offset) +{ + struct pci_dn *pdn = pci_get_pdn(dev); + int i; + struct resource *res, res2; + resource_size_t size; + u16 vf_num; + + if (!dev-is_physfn) + return -EINVAL; + + /* + * offset is in VFs. The M64 windows are sized so that when they + * are segmented, each segment is the same size as the IOV BAR. + * Each segment is in a separate PE, and the high order bits of the + * address are the PE number. Therefore, each VF's BAR is in a + * separate PE, and changing the IOV BAR start address changes the + * range of PEs the VFs are in. + */ + vf_num = pdn-vf_pes; + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { + res = dev-resource[i + PCI_IOV_RESOURCES]; + if (!res-flags || !res-parent) + continue; + + if (!pnv_pci_is_mem_pref_64(res-flags)) + continue; + + /* + * The actual IOV BAR range is determined by the start address + * and the actual size for vf_num VFs BAR. This check is to + * make sure that after shifting, the range will not overlap + * with another device. + */ + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); + res2.flags = res-flags; + res2.start = res-start + (size * offset); + res2.end = res2.start + (size * vf_num) - 1; + + if (res2.end res-end) { + dev_err(dev-dev, VF BAR%d: %pR would extend past %pR (trying to enable %d VFs shifted by %d)\n, + i, res2, res, vf_num, offset); + return -EBUSY; + } + } + + for (i = 0; i PCI_SRIOV_NUM_BARS; i++) { + res = dev-resource[i + PCI_IOV_RESOURCES]; + if (!res-flags || !res-parent) + continue; + + if (!pnv_pci_is_mem_pref_64(res-flags)) + continue; + + size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES); + res2 = *res; + res-start += size * offset; I'm still not happy about this fiddling with res-start. Increasing res-start means that in principle, the size * offset bytes that we just removed from res are now available for allocation to somebody else. I don't think we *will* give that space to anything else because of the alignment restrictions you're enforcing, but res now doesn't correctly describe the real resource map. Would you be able to just update the BAR here while leaving the struct resource alone? In that case, it would look a little funny that lspci would show a BAR value in the middle of the region in /proc/iomem, but the /proc/iomem region would be more correct. I guess this would also require a tweak where we compute the addresses of each of the VF resources. Today it's probably just base + VF_num * size, where base is res-start. We'd have to account for the offset there if we don't adjust it here. Oh, this is really an interesting idea. I will do some tests to see the result. + + dev_info(dev-dev, VF BAR%d: %pR shifted to %pR (enabling %d VFs shifted by %d)\n, + i, res2, res, vf_num, offset); + pci_update_resource(dev, i + PCI_IOV_RESOURCES); + } + pdn-max_vfs -= offset; + return 0; +} +#endif /* CONFIG_PCI_IOV */ -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v12 14/21] powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically
On Mon, Mar 02, 2015 at 06:56:19PM +1100, Benjamin Herrenschmidt wrote: On Mon, 2015-03-02 at 15:50 +0800, Wei Yang wrote: Is there a hotplug remove path where we should also be calling iommu_free_table()? When VF is not introduced, no one calls this on powernv platform. Each PCI bus is a PE and it has its own iommu table, even a device is hotpluged, the iommu table will not be released. Actually, I believe Alexey patches to add support for dynamic DMA windows for KVM guests using VFIO will also alloc/free iommu tables. In fact his patches somewhat change quite a few things in that area, and I'm currently reviewing them. Yes, I see these changes before. Wei, can you post a new series when you've finished sync'ing with Bjorn ? At that point, I'll try to work with Alexey to evaluate the impact of his changes on your patches. Sure, I will do it ASAP. Cheers, Ben. -- Richard Yang Help you, Help me ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev