intel-iommu: Is this a bug?
Hi, I am analyzing network performance with intel-iommu enabled. And found that running with iommu=pt, for every dma map/unmap it executes this code: /* * At boot time, we don't yet know if devices will be 64-bit capable. * Assume that they will — if they turn out not to be, then we can * take them out of the 1:1 domain later. */ if (!startup) { /* * If the device's dma_mask is less than the system's memory * size then this is not a candidate for identity mapping. */ u64 dma_mask = *dev->dma_mask; if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) dma_mask = dev->coherent_dma_mask; return dma_mask >= dma_get_required_mask(dev); } Do we really need this check for every dma/unmap? Considering it should be only during startup, shouldn't it be, diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 582fd01..3c8f14e 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2929,7 +2929,7 @@ static int iommu_should_identity_map(struct device *dev, int startup) * Assume that they will — if they turn out not to be, then we can * take them out of the 1:1 domain later. */ - if (!startup) { + if (startup) { /* * If the device's dma_mask is less than the system's memory * size then this is not a candidate for identity mapping. Thanks. -Tushar
intel-iommu: Is this a bug?
Hi, I am analyzing network performance with intel-iommu enabled. And found that running with iommu=pt, for every dma map/unmap it executes this code: /* * At boot time, we don't yet know if devices will be 64-bit capable. * Assume that they will — if they turn out not to be, then we can * take them out of the 1:1 domain later. */ if (!startup) { /* * If the device's dma_mask is less than the system's memory * size then this is not a candidate for identity mapping. */ u64 dma_mask = *dev->dma_mask; if (dev->coherent_dma_mask && dev->coherent_dma_mask < dma_mask) dma_mask = dev->coherent_dma_mask; return dma_mask >= dma_get_required_mask(dev); } Do we really need this check for every dma/unmap? Considering it should be only during startup, shouldn't it be, diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 582fd01..3c8f14e 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2929,7 +2929,7 @@ static int iommu_should_identity_map(struct device *dev, int startup) * Assume that they will — if they turn out not to be, then we can * take them out of the 1:1 domain later. */ - if (!startup) { + if (startup) { /* * If the device's dma_mask is less than the system's memory * size then this is not a candidate for identity mapping. Thanks. -Tushar
Re: [PATCH] selftests/bpf: Add bpf_probe_read_str to bpf_helpers.h
On 02/28/2018 08:57 AM, Daniel Borkmann wrote: Hi Tushar, On 02/28/2018 01:33 AM, Tushar Dave wrote: Using bpf_probe_read_str() from samples/bpf causes compiler warning. e.g. warning: implicit declaration of function 'bpf_probe_read_str' is invalid in C99 [-Wimplicit-function-declaration] num = bpf_probe_read_str(buf, sizeof(buf), ctx->di); ^ 1 warning generated. Add bpf_probe_read_str() to bpf_helpers.h so it can be used by samples/bpf programs. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> In general no objections to it, but it would need an in-tree user first: $ git grep -n bpf_probe_read_str tools/ tools/include/uapi/linux/bpf.h:596: * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr) $ Why not adding this along with a sample? Okay, I will send new patch along with new sample or add usage of bpf_probe_read_str() in one of our exiting sample :) Thanks. -Tushar PS: adding correct mail-list this time linux-kselft...@vger.kernel.org Thanks, Daniel
Re: [PATCH] selftests/bpf: Add bpf_probe_read_str to bpf_helpers.h
On 02/28/2018 08:57 AM, Daniel Borkmann wrote: Hi Tushar, On 02/28/2018 01:33 AM, Tushar Dave wrote: Using bpf_probe_read_str() from samples/bpf causes compiler warning. e.g. warning: implicit declaration of function 'bpf_probe_read_str' is invalid in C99 [-Wimplicit-function-declaration] num = bpf_probe_read_str(buf, sizeof(buf), ctx->di); ^ 1 warning generated. Add bpf_probe_read_str() to bpf_helpers.h so it can be used by samples/bpf programs. Signed-off-by: Tushar Dave In general no objections to it, but it would need an in-tree user first: $ git grep -n bpf_probe_read_str tools/ tools/include/uapi/linux/bpf.h:596: * int bpf_probe_read_str(void *dst, int size, const void *unsafe_ptr) $ Why not adding this along with a sample? Okay, I will send new patch along with new sample or add usage of bpf_probe_read_str() in one of our exiting sample :) Thanks. -Tushar PS: adding correct mail-list this time linux-kselft...@vger.kernel.org Thanks, Daniel
[PATCH] selftests/bpf: Add bpf_probe_read_str to bpf_helpers.h
Using bpf_probe_read_str() from samples/bpf causes compiler warning. e.g. warning: implicit declaration of function 'bpf_probe_read_str' is invalid in C99 [-Wimplicit-function-declaration] num = bpf_probe_read_str(buf, sizeof(buf), ctx->di); ^ 1 warning generated. Add bpf_probe_read_str() to bpf_helpers.h so it can be used by samples/bpf programs. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> --- tools/testing/selftests/bpf/bpf_helpers.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h index dde2c11..65a266d 100644 --- a/tools/testing/selftests/bpf/bpf_helpers.h +++ b/tools/testing/selftests/bpf/bpf_helpers.h @@ -65,6 +65,8 @@ static int (*bpf_xdp_adjust_head)(void *ctx, int offset) = (void *) BPF_FUNC_xdp_adjust_head; static int (*bpf_xdp_adjust_meta)(void *ctx, int offset) = (void *) BPF_FUNC_xdp_adjust_meta; +static int (*bpf_probe_read_str)(void *dst, int size, void *unsafe_ptr) = + (void *) BPF_FUNC_probe_read_str; static int (*bpf_setsockopt)(void *ctx, int level, int optname, void *optval, int optlen) = (void *) BPF_FUNC_setsockopt; -- 1.9.1
[PATCH] selftests/bpf: Add bpf_probe_read_str to bpf_helpers.h
Using bpf_probe_read_str() from samples/bpf causes compiler warning. e.g. warning: implicit declaration of function 'bpf_probe_read_str' is invalid in C99 [-Wimplicit-function-declaration] num = bpf_probe_read_str(buf, sizeof(buf), ctx->di); ^ 1 warning generated. Add bpf_probe_read_str() to bpf_helpers.h so it can be used by samples/bpf programs. Signed-off-by: Tushar Dave --- tools/testing/selftests/bpf/bpf_helpers.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h index dde2c11..65a266d 100644 --- a/tools/testing/selftests/bpf/bpf_helpers.h +++ b/tools/testing/selftests/bpf/bpf_helpers.h @@ -65,6 +65,8 @@ static int (*bpf_xdp_adjust_head)(void *ctx, int offset) = (void *) BPF_FUNC_xdp_adjust_head; static int (*bpf_xdp_adjust_meta)(void *ctx, int offset) = (void *) BPF_FUNC_xdp_adjust_meta; +static int (*bpf_probe_read_str)(void *dst, int size, void *unsafe_ptr) = + (void *) BPF_FUNC_probe_read_str; static int (*bpf_setsockopt)(void *ctx, int level, int optname, void *optval, int optlen) = (void *) BPF_FUNC_setsockopt; -- 1.9.1
Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device
On 12/04/2017 05:03 PM, Fengguang Wu wrote: Hi Tushar, On Tue, Nov 28, 2017 at 01:01:23AM +0530, Tushar Dave wrote: On 11/23/2017 04:43 AM, Fengguang Wu wrote: On Wed, Nov 22, 2017 at 03:40:52AM +0530, Tushar Dave wrote: On 11/21/2017 06:11 PM, Fengguang Wu wrote: Hello, FYI this happens in mainline kernel 4.14.0-01330-g3c07399. It happens since 4.13 . It occurs in 3 out of 162 boots. [ 44.637743] advantechwdt: Unexpected close, not stopping watchdog! [ 44.997548] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input6 [ 45.013419] e1000 :00:03.0: disabling already-disabled device [ 45.013447] [ cut here ] [ 45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 4.14.0-01330-g3c07399 #1 [ 45.017197] task: 88011bee9e40 task.stack: c986 [ 45.017987] RIP: 0010:pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.018603] RSP: :c9863e30 EFLAGS: 00010286 [ 45.019282] RAX: 0035 RBX: 88013a230008 RCX: [ 45.020182] RDX: RSI: RDI: 0203 [ 45.021084] RBP: 88013a3f31e8 R08: 0001 R09: [ 45.021986] R10: 827ec29c R11: 0002 R12: 0001 [ 45.022946] R13: 88013a230008 R14: 880117802b20 R15: c9863e8f [ 45.023842] FS: () GS:88013fd0() knlGS: [ 45.024863] CS: 0010 DS: ES: CR0: 80050033 [ 45.025583] CR2: c96d4000 CR3: 0220f000 CR4: 06a0 [ 45.026478] Call Trace: [ 45.026811] __e1000_shutdown+0x1d4/0x1e2: __e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5162 [ 45.027344] ? rcu_perf_cleanup+0x2a1/0x2a1: rcu_perf_shutdown at kernel/rcu/rcuperf.c:627 [ 45.027883] e1000_shutdown+0x14/0x3a: e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5235 [ 45.028351] device_shutdown+0x110/0x1aa: device_shutdown at drivers/base/core.c:2807 [ 45.028858] kernel_power_off+0x31/0x64: kernel_power_off at kernel/reboot.c:260 [ 45.029343] rcu_perf_shutdown+0x9b/0xa7: rcu_perf_shutdown at kernel/rcu/rcuperf.c:637 [ 45.029852] ? __wake_up_common_lock+0xa2/0xa2: autoremove_wake_function at kernel/sched/wait.c:376 [ 45.030414] kthread+0x126/0x12e: kthread at kernel/kthread.c:233 [ 45.030834] ? __kthread_bind_mask+0x8e/0x8e: kthread at kernel/kthread.c:190 [ 45.031399] ? ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.031883] ? kernel_init+0xa/0xf5: kernel_init at init/main.c:997 [ 45.032325] ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 b1 61 82 [ 45.035222] ---[ end trace c257137b1b1976ef ]--- [ 45.037838] ACPI: Preparing to enter system sleep state S5 Attached the full dmesg, kconfig and reproduce scripts. Looks like e1000 pci/pxi-x device is already suspended. And therefore call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device() already had disabled the device. Disabling device again by e1000_shutdown handler during system shutdown causes warning at drivers/pci/pci.c:1641. I think function __e1000_shutdown should just return if device is already suspended! I don't have e1000 hardware to test right now. So if this seems logical to others I will send a patch. Tushar, it happens on QEMU boot testing, so do not rely on e1000 HW. Unless you'd like to prevent regressions on real HW. The original report attached a reproduce script to run the QEMU test. Or you may send me the patch for testing. Fengguang, Would you please try this patch and test. The patch is compile tested only. The patch is similar to how ixgbe handled the issue. Thanks. e1000: fix disabling already-disabled warning This patch adds check so that driver does not disable already disabled device. It works! I tried 100 boots and the "e1000 :00:03.0: disabling already-disabled device" error no longer show up. Tested-by: Fengguang Wu <fengguang...@intel.com> Fengguang, Thanks for testing. I will send patch soon. -Tushar Thanks, Fengguang Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> --- drivers/net/ethe
Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device
On 12/04/2017 05:03 PM, Fengguang Wu wrote: Hi Tushar, On Tue, Nov 28, 2017 at 01:01:23AM +0530, Tushar Dave wrote: On 11/23/2017 04:43 AM, Fengguang Wu wrote: On Wed, Nov 22, 2017 at 03:40:52AM +0530, Tushar Dave wrote: On 11/21/2017 06:11 PM, Fengguang Wu wrote: Hello, FYI this happens in mainline kernel 4.14.0-01330-g3c07399. It happens since 4.13 . It occurs in 3 out of 162 boots. [ 44.637743] advantechwdt: Unexpected close, not stopping watchdog! [ 44.997548] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input6 [ 45.013419] e1000 :00:03.0: disabling already-disabled device [ 45.013447] [ cut here ] [ 45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 4.14.0-01330-g3c07399 #1 [ 45.017197] task: 88011bee9e40 task.stack: c986 [ 45.017987] RIP: 0010:pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.018603] RSP: :c9863e30 EFLAGS: 00010286 [ 45.019282] RAX: 0035 RBX: 88013a230008 RCX: [ 45.020182] RDX: RSI: RDI: 0203 [ 45.021084] RBP: 88013a3f31e8 R08: 0001 R09: [ 45.021986] R10: 827ec29c R11: 0002 R12: 0001 [ 45.022946] R13: 88013a230008 R14: 880117802b20 R15: c9863e8f [ 45.023842] FS: () GS:88013fd0() knlGS: [ 45.024863] CS: 0010 DS: ES: CR0: 80050033 [ 45.025583] CR2: c96d4000 CR3: 0220f000 CR4: 06a0 [ 45.026478] Call Trace: [ 45.026811] __e1000_shutdown+0x1d4/0x1e2: __e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5162 [ 45.027344] ? rcu_perf_cleanup+0x2a1/0x2a1: rcu_perf_shutdown at kernel/rcu/rcuperf.c:627 [ 45.027883] e1000_shutdown+0x14/0x3a: e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5235 [ 45.028351] device_shutdown+0x110/0x1aa: device_shutdown at drivers/base/core.c:2807 [ 45.028858] kernel_power_off+0x31/0x64: kernel_power_off at kernel/reboot.c:260 [ 45.029343] rcu_perf_shutdown+0x9b/0xa7: rcu_perf_shutdown at kernel/rcu/rcuperf.c:637 [ 45.029852] ? __wake_up_common_lock+0xa2/0xa2: autoremove_wake_function at kernel/sched/wait.c:376 [ 45.030414] kthread+0x126/0x12e: kthread at kernel/kthread.c:233 [ 45.030834] ? __kthread_bind_mask+0x8e/0x8e: kthread at kernel/kthread.c:190 [ 45.031399] ? ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.031883] ? kernel_init+0xa/0xf5: kernel_init at init/main.c:997 [ 45.032325] ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 b1 61 82 [ 45.035222] ---[ end trace c257137b1b1976ef ]--- [ 45.037838] ACPI: Preparing to enter system sleep state S5 Attached the full dmesg, kconfig and reproduce scripts. Looks like e1000 pci/pxi-x device is already suspended. And therefore call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device() already had disabled the device. Disabling device again by e1000_shutdown handler during system shutdown causes warning at drivers/pci/pci.c:1641. I think function __e1000_shutdown should just return if device is already suspended! I don't have e1000 hardware to test right now. So if this seems logical to others I will send a patch. Tushar, it happens on QEMU boot testing, so do not rely on e1000 HW. Unless you'd like to prevent regressions on real HW. The original report attached a reproduce script to run the QEMU test. Or you may send me the patch for testing. Fengguang, Would you please try this patch and test. The patch is compile tested only. The patch is similar to how ixgbe handled the issue. Thanks. e1000: fix disabling already-disabled warning This patch adds check so that driver does not disable already disabled device. It works! I tried 100 boots and the "e1000 :00:03.0: disabling already-disabled device" error no longer show up. Tested-by: Fengguang Wu Fengguang, Thanks for testing. I will send patch soon. -Tushar Thanks, Fengguang Signed-off-by: Tushar Dave --- drivers/net/ethernet/intel/e1000/e1000.h | 3 ++- drivers/net/ethernet/intel/e
Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device
On 11/23/2017 04:43 AM, Fengguang Wu wrote: On Wed, Nov 22, 2017 at 03:40:52AM +0530, Tushar Dave wrote: On 11/21/2017 06:11 PM, Fengguang Wu wrote: Hello, FYI this happens in mainline kernel 4.14.0-01330-g3c07399. It happens since 4.13 . It occurs in 3 out of 162 boots. [ 44.637743] advantechwdt: Unexpected close, not stopping watchdog! [ 44.997548] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input6 [ 45.013419] e1000 :00:03.0: disabling already-disabled device [ 45.013447] [ cut here ] [ 45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 4.14.0-01330-g3c07399 #1 [ 45.017197] task: 88011bee9e40 task.stack: c986 [ 45.017987] RIP: 0010:pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.018603] RSP: :c9863e30 EFLAGS: 00010286 [ 45.019282] RAX: 0035 RBX: 88013a230008 RCX: [ 45.020182] RDX: RSI: RDI: 0203 [ 45.021084] RBP: 88013a3f31e8 R08: 0001 R09: [ 45.021986] R10: 827ec29c R11: 0002 R12: 0001 [ 45.022946] R13: 88013a230008 R14: 880117802b20 R15: c9863e8f [ 45.023842] FS: () GS:88013fd0() knlGS: [ 45.024863] CS: 0010 DS: ES: CR0: 80050033 [ 45.025583] CR2: c96d4000 CR3: 0220f000 CR4: 06a0 [ 45.026478] Call Trace: [ 45.026811] __e1000_shutdown+0x1d4/0x1e2: __e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5162 [ 45.027344] ? rcu_perf_cleanup+0x2a1/0x2a1: rcu_perf_shutdown at kernel/rcu/rcuperf.c:627 [ 45.027883] e1000_shutdown+0x14/0x3a: e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5235 [ 45.028351] device_shutdown+0x110/0x1aa: device_shutdown at drivers/base/core.c:2807 [ 45.028858] kernel_power_off+0x31/0x64: kernel_power_off at kernel/reboot.c:260 [ 45.029343] rcu_perf_shutdown+0x9b/0xa7: rcu_perf_shutdown at kernel/rcu/rcuperf.c:637 [ 45.029852] ? __wake_up_common_lock+0xa2/0xa2: autoremove_wake_function at kernel/sched/wait.c:376 [ 45.030414] kthread+0x126/0x12e: kthread at kernel/kthread.c:233 [ 45.030834] ? __kthread_bind_mask+0x8e/0x8e: kthread at kernel/kthread.c:190 [ 45.031399] ? ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.031883] ? kernel_init+0xa/0xf5: kernel_init at init/main.c:997 [ 45.032325] ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 b1 61 82 [ 45.035222] ---[ end trace c257137b1b1976ef ]--- [ 45.037838] ACPI: Preparing to enter system sleep state S5 Attached the full dmesg, kconfig and reproduce scripts. Looks like e1000 pci/pxi-x device is already suspended. And therefore call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device() already had disabled the device. Disabling device again by e1000_shutdown handler during system shutdown causes warning at drivers/pci/pci.c:1641. I think function __e1000_shutdown should just return if device is already suspended! I don't have e1000 hardware to test right now. So if this seems logical to others I will send a patch. Tushar, it happens on QEMU boot testing, so do not rely on e1000 HW. Unless you'd like to prevent regressions on real HW. The original report attached a reproduce script to run the QEMU test. Or you may send me the patch for testing. Fengguang, Would you please try this patch and test. The patch is compile tested only. The patch is similar to how ixgbe handled the issue. Thanks. e1000: fix disabling already-disabled warning This patch adds check so that driver does not disable already disabled device. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> --- drivers/net/ethernet/intel/e1000/e1000.h | 3 ++- drivers/net/ethernet/intel/e1000/e1000_main.c | 23 ++- 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h index d7bdea7..8fd2458 100644 --- a/drivers/net/ethernet/intel/e1000/e1000.h +++ b/drivers/net/ethernet/i
Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device
On 11/23/2017 04:43 AM, Fengguang Wu wrote: On Wed, Nov 22, 2017 at 03:40:52AM +0530, Tushar Dave wrote: On 11/21/2017 06:11 PM, Fengguang Wu wrote: Hello, FYI this happens in mainline kernel 4.14.0-01330-g3c07399. It happens since 4.13 . It occurs in 3 out of 162 boots. [ 44.637743] advantechwdt: Unexpected close, not stopping watchdog! [ 44.997548] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input6 [ 45.013419] e1000 :00:03.0: disabling already-disabled device [ 45.013447] [ cut here ] [ 45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 4.14.0-01330-g3c07399 #1 [ 45.017197] task: 88011bee9e40 task.stack: c986 [ 45.017987] RIP: 0010:pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.018603] RSP: :c9863e30 EFLAGS: 00010286 [ 45.019282] RAX: 0035 RBX: 88013a230008 RCX: [ 45.020182] RDX: RSI: RDI: 0203 [ 45.021084] RBP: 88013a3f31e8 R08: 0001 R09: [ 45.021986] R10: 827ec29c R11: 0002 R12: 0001 [ 45.022946] R13: 88013a230008 R14: 880117802b20 R15: c9863e8f [ 45.023842] FS: () GS:88013fd0() knlGS: [ 45.024863] CS: 0010 DS: ES: CR0: 80050033 [ 45.025583] CR2: c96d4000 CR3: 0220f000 CR4: 06a0 [ 45.026478] Call Trace: [ 45.026811] __e1000_shutdown+0x1d4/0x1e2: __e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5162 [ 45.027344] ? rcu_perf_cleanup+0x2a1/0x2a1: rcu_perf_shutdown at kernel/rcu/rcuperf.c:627 [ 45.027883] e1000_shutdown+0x14/0x3a: e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5235 [ 45.028351] device_shutdown+0x110/0x1aa: device_shutdown at drivers/base/core.c:2807 [ 45.028858] kernel_power_off+0x31/0x64: kernel_power_off at kernel/reboot.c:260 [ 45.029343] rcu_perf_shutdown+0x9b/0xa7: rcu_perf_shutdown at kernel/rcu/rcuperf.c:637 [ 45.029852] ? __wake_up_common_lock+0xa2/0xa2: autoremove_wake_function at kernel/sched/wait.c:376 [ 45.030414] kthread+0x126/0x12e: kthread at kernel/kthread.c:233 [ 45.030834] ? __kthread_bind_mask+0x8e/0x8e: kthread at kernel/kthread.c:190 [ 45.031399] ? ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.031883] ? kernel_init+0xa/0xf5: kernel_init at init/main.c:997 [ 45.032325] ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 b1 61 82 [ 45.035222] ---[ end trace c257137b1b1976ef ]--- [ 45.037838] ACPI: Preparing to enter system sleep state S5 Attached the full dmesg, kconfig and reproduce scripts. Looks like e1000 pci/pxi-x device is already suspended. And therefore call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device() already had disabled the device. Disabling device again by e1000_shutdown handler during system shutdown causes warning at drivers/pci/pci.c:1641. I think function __e1000_shutdown should just return if device is already suspended! I don't have e1000 hardware to test right now. So if this seems logical to others I will send a patch. Tushar, it happens on QEMU boot testing, so do not rely on e1000 HW. Unless you'd like to prevent regressions on real HW. The original report attached a reproduce script to run the QEMU test. Or you may send me the patch for testing. Fengguang, Would you please try this patch and test. The patch is compile tested only. The patch is similar to how ixgbe handled the issue. Thanks. e1000: fix disabling already-disabled warning This patch adds check so that driver does not disable already disabled device. Signed-off-by: Tushar Dave --- drivers/net/ethernet/intel/e1000/e1000.h | 3 ++- drivers/net/ethernet/intel/e1000/e1000_main.c | 23 ++- 2 files changed, 20 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000/e1000.h b/drivers/net/ethernet/intel/e1000/e1000.h index d7bdea7..8fd2458 100644 --- a/drivers/net/ethernet/intel/e1000/e1000.h +++ b/drivers/net/ethernet/intel/e1000/e1000.h @@ -331,7 +331
Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device
On 11/21/2017 06:11 PM, Fengguang Wu wrote: Hello, FYI this happens in mainline kernel 4.14.0-01330-g3c07399. It happens since 4.13 . It occurs in 3 out of 162 boots. [ 44.637743] advantechwdt: Unexpected close, not stopping watchdog! [ 44.997548] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input6 [ 45.013419] e1000 :00:03.0: disabling already-disabled device [ 45.013447] [ cut here ] [ 45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 4.14.0-01330-g3c07399 #1 [ 45.017197] task: 88011bee9e40 task.stack: c986 [ 45.017987] RIP: 0010:pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.018603] RSP: :c9863e30 EFLAGS: 00010286 [ 45.019282] RAX: 0035 RBX: 88013a230008 RCX: [ 45.020182] RDX: RSI: RDI: 0203 [ 45.021084] RBP: 88013a3f31e8 R08: 0001 R09: [ 45.021986] R10: 827ec29c R11: 0002 R12: 0001 [ 45.022946] R13: 88013a230008 R14: 880117802b20 R15: c9863e8f [ 45.023842] FS: () GS:88013fd0() knlGS: [ 45.024863] CS: 0010 DS: ES: CR0: 80050033 [ 45.025583] CR2: c96d4000 CR3: 0220f000 CR4: 06a0 [ 45.026478] Call Trace: [ 45.026811] __e1000_shutdown+0x1d4/0x1e2: __e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5162 [ 45.027344] ? rcu_perf_cleanup+0x2a1/0x2a1: rcu_perf_shutdown at kernel/rcu/rcuperf.c:627 [ 45.027883] e1000_shutdown+0x14/0x3a: e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5235 [ 45.028351] device_shutdown+0x110/0x1aa: device_shutdown at drivers/base/core.c:2807 [ 45.028858] kernel_power_off+0x31/0x64: kernel_power_off at kernel/reboot.c:260 [ 45.029343] rcu_perf_shutdown+0x9b/0xa7: rcu_perf_shutdown at kernel/rcu/rcuperf.c:637 [ 45.029852] ? __wake_up_common_lock+0xa2/0xa2: autoremove_wake_function at kernel/sched/wait.c:376 [ 45.030414] kthread+0x126/0x12e: kthread at kernel/kthread.c:233 [ 45.030834] ? __kthread_bind_mask+0x8e/0x8e: kthread at kernel/kthread.c:190 [ 45.031399] ? ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.031883] ? kernel_init+0xa/0xf5: kernel_init at init/main.c:997 [ 45.032325] ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 b1 61 82 [ 45.035222] ---[ end trace c257137b1b1976ef ]--- [ 45.037838] ACPI: Preparing to enter system sleep state S5 Attached the full dmesg, kconfig and reproduce scripts. Looks like e1000 pci/pxi-x device is already suspended. And therefore call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device() already had disabled the device. Disabling device again by e1000_shutdown handler during system shutdown causes warning at drivers/pci/pci.c:1641. I think function __e1000_shutdown should just return if device is already suspended! I don't have e1000 hardware to test right now. So if this seems logical to others I will send a patch. -Tushar Thanks, Fengguang
Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device
On 11/21/2017 06:11 PM, Fengguang Wu wrote: Hello, FYI this happens in mainline kernel 4.14.0-01330-g3c07399. It happens since 4.13 . It occurs in 3 out of 162 boots. [ 44.637743] advantechwdt: Unexpected close, not stopping watchdog! [ 44.997548] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input6 [ 45.013419] e1000 :00:03.0: disabling already-disabled device [ 45.013447] [ cut here ] [ 45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 4.14.0-01330-g3c07399 #1 [ 45.017197] task: 88011bee9e40 task.stack: c986 [ 45.017987] RIP: 0010:pci_disable_device+0xa1/0x105: pci_disable_device at drivers/pci/pci.c:1640 [ 45.018603] RSP: :c9863e30 EFLAGS: 00010286 [ 45.019282] RAX: 0035 RBX: 88013a230008 RCX: [ 45.020182] RDX: RSI: RDI: 0203 [ 45.021084] RBP: 88013a3f31e8 R08: 0001 R09: [ 45.021986] R10: 827ec29c R11: 0002 R12: 0001 [ 45.022946] R13: 88013a230008 R14: 880117802b20 R15: c9863e8f [ 45.023842] FS: () GS:88013fd0() knlGS: [ 45.024863] CS: 0010 DS: ES: CR0: 80050033 [ 45.025583] CR2: c96d4000 CR3: 0220f000 CR4: 06a0 [ 45.026478] Call Trace: [ 45.026811] __e1000_shutdown+0x1d4/0x1e2: __e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5162 [ 45.027344] ? rcu_perf_cleanup+0x2a1/0x2a1: rcu_perf_shutdown at kernel/rcu/rcuperf.c:627 [ 45.027883] e1000_shutdown+0x14/0x3a: e1000_shutdown at drivers/net/ethernet/intel/e1000/e1000_main.c:5235 [ 45.028351] device_shutdown+0x110/0x1aa: device_shutdown at drivers/base/core.c:2807 [ 45.028858] kernel_power_off+0x31/0x64: kernel_power_off at kernel/reboot.c:260 [ 45.029343] rcu_perf_shutdown+0x9b/0xa7: rcu_perf_shutdown at kernel/rcu/rcuperf.c:637 [ 45.029852] ? __wake_up_common_lock+0xa2/0xa2: autoremove_wake_function at kernel/sched/wait.c:376 [ 45.030414] kthread+0x126/0x12e: kthread at kernel/kthread.c:233 [ 45.030834] ? __kthread_bind_mask+0x8e/0x8e: kthread at kernel/kthread.c:190 [ 45.031399] ? ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.031883] ? kernel_init+0xa/0xf5: kernel_init at init/main.c:997 [ 45.032325] ret_from_fork+0x1f/0x30: ret_from_fork at arch/x86/entry/entry_64.S:443 [ 45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 b1 61 82 [ 45.035222] ---[ end trace c257137b1b1976ef ]--- [ 45.037838] ACPI: Preparing to enter system sleep state S5 Attached the full dmesg, kconfig and reproduce scripts. Looks like e1000 pci/pxi-x device is already suspended. And therefore call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device() already had disabled the device. Disabling device again by e1000_shutdown handler during system shutdown causes warning at drivers/pci/pci.c:1641. I think function __e1000_shutdown should just return if device is already suspended! I don't have e1000 hardware to test right now. So if this seems logical to others I will send a patch. -Tushar Thanks, Fengguang
Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members
On 11/02/2017 11:58 PM, Sandipan Das wrote: For added security, the layout of some structures can be randomized by enabling CONFIG_GCC_PLUGIN_RANDSTRUCT. One such structure is task_struct. To build BPF programs, we use Clang which does not support this feature. So, if we attempt to read a field of a structure with a randomized layout within a BPF program, we do not get the expected value because of incorrect offsets. To observe this, it is not mandatory to have CONFIG_GCC_PLUGIN_RANDSTRUCT enabled because the structure annotations/members added for this purpose are enough to cause this. So, all kernel builds are affected. For example, considering samples/bpf/offwaketime_kern.c, if we try to print the values of pid and comm inside the task_struct passed to waker() by adding the following lines of code at the appropriate place char fmt[] = "waker(): p->pid = %u, p->comm = %s\n"; bpf_trace_printk(fmt, sizeof(fmt), _(p->pid), _(p->comm)); it is seen that upon rebuilding and running this sample followed by inspecting /sys/kernel/debug/tracing/trace, the output looks like the following _-=> irqs-off / _=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / delay TASK-PID CPU# TIMESTAMP FUNCTION | | | | | -0 [007] d.s. 1883.443594: 0x0001: waker(): p->pid = 0, p->comm = -0 [018] d.s. 1883.453588: 0x0001: waker(): p->pid = 0, p->comm = -0 [007] d.s. 1883.463584: 0x0001: waker(): p->pid = 0, p->comm = -0 [009] d.s. 1883.483586: 0x0001: waker(): p->pid = 0, p->comm = -0 [005] d.s. 1883.493583: 0x0001: waker(): p->pid = 0, p->comm = -0 [009] d.s. 1883.503583: 0x0001: waker(): p->pid = 0, p->comm = -0 [018] d.s. 1883.513578: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627660: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627704: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627723: 0x0001: waker(): p->pid = 0, p->comm = To avoid this, we add new BPF helpers that read the correct values for some of the important task_struct members such as pid, tgid, comm and flags which are extensively used in BPF-based analysis tools such as bcc. Since these helpers are built with GCC, they use the correct offsets when referencing a member. Just to add that we were seeing the same issue (but had no clue until looked at this patch , thanks). Its easy to reproduce by running bcc example task_switch.py where pid (prev_pid) is retrieved from struct task_struct and that is always zero. we tried printing other task_struct members such as 'comm' and see that as empty string as well. -Tushar Signed-off-by: Sandipan Das--- include/linux/bpf.h | 3 ++ include/uapi/linux/bpf.h | 13 ++ kernel/bpf/core.c | 3 ++ kernel/bpf/helpers.c | 75 +++ kernel/trace/bpf_trace.c | 6 +++ tools/testing/selftests/bpf/bpf_helpers.h | 9 6 files changed, 109 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f1af7d63d678..5993a0f5262b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -418,6 +418,9 @@ extern const struct bpf_func_proto bpf_ktime_get_ns_proto; extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto; extern const struct bpf_func_proto bpf_get_current_uid_gid_proto; extern const struct bpf_func_proto bpf_get_current_comm_proto; +extern const struct bpf_func_proto bpf_get_task_pid_tgid_proto; +extern const struct bpf_func_proto bpf_get_task_comm_proto; +extern const struct bpf_func_proto bpf_get_task_flags_proto; extern const struct bpf_func_proto bpf_skb_vlan_push_proto; extern const struct bpf_func_proto bpf_skb_vlan_pop_proto; extern const struct bpf_func_proto bpf_get_stackid_proto; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index f90860d1f897..324508d27bd2 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -338,6 +338,16 @@ union bpf_attr { * @skb: pointer to skb * Return: classid if != 0 * + * u64 bpf_get_task_pid_tgid(struct task_struct *task) + * Return: task->tgid << 32 | task->pid + * + * int bpf_get_task_comm(struct task_struct *task) + * Stores task->comm into buf + * Return: 0 on success or negative error + * + * u32 bpf_get_task_flags(struct task_struct *task) + * Return: task->flags + * * int bpf_skb_vlan_push(skb, vlan_proto, vlan_tci) * Return: 0 on success or negative error *
Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members
On 11/02/2017 11:58 PM, Sandipan Das wrote: For added security, the layout of some structures can be randomized by enabling CONFIG_GCC_PLUGIN_RANDSTRUCT. One such structure is task_struct. To build BPF programs, we use Clang which does not support this feature. So, if we attempt to read a field of a structure with a randomized layout within a BPF program, we do not get the expected value because of incorrect offsets. To observe this, it is not mandatory to have CONFIG_GCC_PLUGIN_RANDSTRUCT enabled because the structure annotations/members added for this purpose are enough to cause this. So, all kernel builds are affected. For example, considering samples/bpf/offwaketime_kern.c, if we try to print the values of pid and comm inside the task_struct passed to waker() by adding the following lines of code at the appropriate place char fmt[] = "waker(): p->pid = %u, p->comm = %s\n"; bpf_trace_printk(fmt, sizeof(fmt), _(p->pid), _(p->comm)); it is seen that upon rebuilding and running this sample followed by inspecting /sys/kernel/debug/tracing/trace, the output looks like the following _-=> irqs-off / _=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / delay TASK-PID CPU# TIMESTAMP FUNCTION | | | | | -0 [007] d.s. 1883.443594: 0x0001: waker(): p->pid = 0, p->comm = -0 [018] d.s. 1883.453588: 0x0001: waker(): p->pid = 0, p->comm = -0 [007] d.s. 1883.463584: 0x0001: waker(): p->pid = 0, p->comm = -0 [009] d.s. 1883.483586: 0x0001: waker(): p->pid = 0, p->comm = -0 [005] d.s. 1883.493583: 0x0001: waker(): p->pid = 0, p->comm = -0 [009] d.s. 1883.503583: 0x0001: waker(): p->pid = 0, p->comm = -0 [018] d.s. 1883.513578: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627660: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627704: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627723: 0x0001: waker(): p->pid = 0, p->comm = To avoid this, we add new BPF helpers that read the correct values for some of the important task_struct members such as pid, tgid, comm and flags which are extensively used in BPF-based analysis tools such as bcc. Since these helpers are built with GCC, they use the correct offsets when referencing a member. Just to add that we were seeing the same issue (but had no clue until looked at this patch , thanks). Its easy to reproduce by running bcc example task_switch.py where pid (prev_pid) is retrieved from struct task_struct and that is always zero. we tried printing other task_struct members such as 'comm' and see that as empty string as well. -Tushar Signed-off-by: Sandipan Das --- include/linux/bpf.h | 3 ++ include/uapi/linux/bpf.h | 13 ++ kernel/bpf/core.c | 3 ++ kernel/bpf/helpers.c | 75 +++ kernel/trace/bpf_trace.c | 6 +++ tools/testing/selftests/bpf/bpf_helpers.h | 9 6 files changed, 109 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f1af7d63d678..5993a0f5262b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -418,6 +418,9 @@ extern const struct bpf_func_proto bpf_ktime_get_ns_proto; extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto; extern const struct bpf_func_proto bpf_get_current_uid_gid_proto; extern const struct bpf_func_proto bpf_get_current_comm_proto; +extern const struct bpf_func_proto bpf_get_task_pid_tgid_proto; +extern const struct bpf_func_proto bpf_get_task_comm_proto; +extern const struct bpf_func_proto bpf_get_task_flags_proto; extern const struct bpf_func_proto bpf_skb_vlan_push_proto; extern const struct bpf_func_proto bpf_skb_vlan_pop_proto; extern const struct bpf_func_proto bpf_get_stackid_proto; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index f90860d1f897..324508d27bd2 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -338,6 +338,16 @@ union bpf_attr { * @skb: pointer to skb * Return: classid if != 0 * + * u64 bpf_get_task_pid_tgid(struct task_struct *task) + * Return: task->tgid << 32 | task->pid + * + * int bpf_get_task_comm(struct task_struct *task) + * Stores task->comm into buf + * Return: 0 on success or negative error + * + * u32 bpf_get_task_flags(struct task_struct *task) + * Return: task->flags + * * int bpf_skb_vlan_push(skb, vlan_proto, vlan_tci) * Return: 0 on success or negative error * @@ -602,6 +612,9 @@ union
Re: sun4v+DMA related boot crash on 4.13-git
On 07/11/2017 09:02 PM, David Miller wrote: From: Tushar Dave <tushar.n.d...@oracle.com> Date: Tue, 11 Jul 2017 20:43:39 -0700 Yes, indeed the bug is in Linus's tree. However, 'sparc' tree doesn't have DMA API change (e.g. commit b02c2b0bfd7ae) yet that introduced the panic. You can simply make a note of this when you send the bug fix to me.:( yeah, I should have mentioned this when I sent patch to you. My bad. Will make sure to left you a note for this kid of occurrence in future! -Tushar -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sun4v+DMA related boot crash on 4.13-git
On 07/11/2017 09:02 PM, David Miller wrote: From: Tushar Dave Date: Tue, 11 Jul 2017 20:43:39 -0700 Yes, indeed the bug is in Linus's tree. However, 'sparc' tree doesn't have DMA API change (e.g. commit b02c2b0bfd7ae) yet that introduced the panic. You can simply make a note of this when you send the bug fix to me.:( yeah, I should have mentioned this when I sent patch to you. My bad. Will make sure to left you a note for this kid of occurrence in future! -Tushar -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sun4v+DMA related boot crash on 4.13-git
On 07/11/2017 05:34 PM, David Miller wrote: From: Tushar Dave <tushar.n.d...@oracle.com> Date: Tue, 11 Jul 2017 15:38:21 -0700 On 07/11/2017 02:48 PM, Meelis Roos wrote: I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge works fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed on boot with DMA-related stacktrace (below). Allt he machines are sun4v physical machines, not VM-s. Older sun4 machines do not exhibit this problem. Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get time. I see whats going on with panic. I will reproduce locally. Will get back soon. This patch should fix panic. Please give it a try. Yes, this patch fixes it. Thank you for fixing it quickly! Thanks for testing. Patch sent for sparc-next. Why sparc-next - it should go into 4.13 since 4.13 would break all niagara1 and niagara2 systems otherwise?This is sparc arch fix so I used sparc tree(in this case for sparc-next). I am open to maintainers suggestions. Thanks. If the bug is in Linus's tree the fix must target 'sparc' not 'sparc-next'. Dave, Yes, indeed the bug is in Linus's tree. However, 'sparc' tree doesn't have DMA API change (e.g. commit b02c2b0bfd7ae) yet that introduced the panic. Looks like the DMA API changes have not merged into 'sparc' tree yet. In other words, 'sparc' tree doesn't have mentioned panic issue, nothing to fix there! However, 'sparc-next' is up to date (or more close to) linus tree and has DMA API change that cause mentioned panic issue. So I have send patch targeted for sparc-next. Let me know what should be the best tree to get this fix in and I will send v2. Thanks. -Tushar -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sun4v+DMA related boot crash on 4.13-git
On 07/11/2017 05:34 PM, David Miller wrote: From: Tushar Dave Date: Tue, 11 Jul 2017 15:38:21 -0700 On 07/11/2017 02:48 PM, Meelis Roos wrote: I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge works fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed on boot with DMA-related stacktrace (below). Allt he machines are sun4v physical machines, not VM-s. Older sun4 machines do not exhibit this problem. Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get time. I see whats going on with panic. I will reproduce locally. Will get back soon. This patch should fix panic. Please give it a try. Yes, this patch fixes it. Thank you for fixing it quickly! Thanks for testing. Patch sent for sparc-next. Why sparc-next - it should go into 4.13 since 4.13 would break all niagara1 and niagara2 systems otherwise?This is sparc arch fix so I used sparc tree(in this case for sparc-next). I am open to maintainers suggestions. Thanks. If the bug is in Linus's tree the fix must target 'sparc' not 'sparc-next'. Dave, Yes, indeed the bug is in Linus's tree. However, 'sparc' tree doesn't have DMA API change (e.g. commit b02c2b0bfd7ae) yet that introduced the panic. Looks like the DMA API changes have not merged into 'sparc' tree yet. In other words, 'sparc' tree doesn't have mentioned panic issue, nothing to fix there! However, 'sparc-next' is up to date (or more close to) linus tree and has DMA API change that cause mentioned panic issue. So I have send patch targeted for sparc-next. Let me know what should be the best tree to get this fix in and I will send v2. Thanks. -Tushar -- To unsubscribe from this list: send the line "unsubscribe sparclinux" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sun4v+DMA related boot crash on 4.13-git
On 07/11/2017 02:48 PM, Meelis Roos wrote: I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge works fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed on boot with DMA-related stacktrace (below). Allt he machines are sun4v physical machines, not VM-s. Older sun4 machines do not exhibit this problem. Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get time. I see whats going on with panic. I will reproduce locally. Will get back soon. This patch should fix panic. Please give it a try. Yes, this patch fixes it. Thank you for fixing it quickly! Thanks for testing. Patch sent for sparc-next. Why sparc-next - it should go into 4.13 since 4.13 would break all niagara1 and niagara2 systems otherwise?This is sparc arch fix so I used sparc tree(in this case for sparc-next). I am open to maintainers suggestions. Thanks. -Tushar -Tushar commit b02c2b0bfd ("sparc: remove arch specific dma_supported implementations") introduced a code that incorrectly allow dma_supported() to succeed for 64bit dma mask even if system doesn't have ATU IOMMU. 64bit DMA only supported on sun4v equipped with ATU IOMMU HW. diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 24f21c7..0a32c57 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev, struct scatterlist *sglist, static int dma_4v_supported(struct device *dev, u64 device_mask) { struct iommu *iommu = dev->archdata.iommu; - u64 dma_addr_mask; + u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask > DMA_BIT_MASK(32) && iommu->atu) - dma_addr_mask = iommu->atu->dma_addr_mask; - else - dma_addr_mask = iommu->dma_addr_mask; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -Tushar -Tushar [0.24] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.d 2011/07/06 14:29' [0.33] PROMLIB: Root node compatible: sun4v [0.79] Linux version 4.12.0-08915-gf263fbb (mroos@t2000) (gcc version 4.9.2 (Debian 4.9.2-20)) #141 SMP Sun Jul 9 17:51:12 EEST 2017 [0.002047] bootconsole [earlyprom0] enabled [0.002383] ARCH: SUN4V [0.002668] Ethernet address: 00:14:4f:86:99:26 [0.003406] MM: PAGE_OFFSET is 0x8000 (max_phys_bits == 39) [0.004089] MM: VMALLOC [0x0001 --> 0x6000] [0.004562] MM: VMEMMAP [0x6000 --> 0xc000] [0.095699] Kernel: Using 3 locked TLB entries for main kernel image. [0.096387] Remapping the kernel... [0.096400] done. [1.906342] OF stdout device is: /virtual-devices@100/console@1 [1.907160] PROM: Built device tree with 148821 bytes of memory. [1.907804] MDESC: Size is 42336 bytes. [1.910139] PLATFORM: banner-name [Sun Fire T200] [1.910564] PLATFORM: name [SUNW,Sun-Fire-T200] [1.910919] PLATFORM: hostid [84869926] [1.911224] PLATFORM: serial# [00ab4130] [1.911536] PLATFORM: stick-frequency [3b9aca00] [1.911894] PLATFORM: mac-address [144f869926] [1.912241] PLATFORM: watchdog-resolution [1000 ms] [1.912619] PLATFORM: watchdog-max-timeout [3153600 ms] [1.913042] PLATFORM: max-cpus [32] [1.913501] Top of RAM: 0x3ffd34000, Total RAM: 0x3f7918000 [1.913936] Memory hole size: 132MB [2.279507] Allocated 16384 bytes for kernel page tables. [2.280578] Zone ranges: [2.280819] Normal [mem 0x0840-0x0003ffd33fff] [2.281292] Movable zone start for each node [2.281626] Early memory node ranges [2.281916] node 0: [mem 0x0840-0x0003ffc1] [2.282557] node 0: [mem 0x0003ffc28000-0x0003ffcfdfff] [2.283030] node 0: [mem 0x0003ffd0e000-0x0003ffd27fff] [2.283514] node 0: [mem 0x0003ffd2c000-0x0003ffd33fff] [2.283994] Initmem setup node 0 [mem 0x0840-0x0003ffd33fff] [2.782262] Booting Linux... [2.782734] CPU CAPS: [flush,stbar,swap,muldiv,v9,blkinit,mul32,div32] [2.783255] CPU CAPS: [v8plus,ASIBlkInit] [2.897543] percpu: Embedded 12 pages/cpu @8003ff80 s55872 r8192 d34240 u262144 [2.913264] SUN4V: Mondo queue sizes [cpu(4096) dev(16384) r(8192) nr(256)] [2.915492] Built 1 zonelists in Node order, mobility grouping on. Total pages: 2063634 [2.916160] Policy zone: Normal [2.916420] Kernel command line: root=/dev/sda1 ro [2.918743] PID hash table entries: 4096 (order: 2, 32768 bytes) [2.919230] Sorting __ex_table... [3.220450] Memory: 16497120K/16639072K available (5521K kernel code, 530K rwdata, 1224K rodata, 336K init, 699K bss, 141952K reserved,
Re: sun4v+DMA related boot crash on 4.13-git
On 07/11/2017 02:48 PM, Meelis Roos wrote: I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge works fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed on boot with DMA-related stacktrace (below). Allt he machines are sun4v physical machines, not VM-s. Older sun4 machines do not exhibit this problem. Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get time. I see whats going on with panic. I will reproduce locally. Will get back soon. This patch should fix panic. Please give it a try. Yes, this patch fixes it. Thank you for fixing it quickly! Thanks for testing. Patch sent for sparc-next. Why sparc-next - it should go into 4.13 since 4.13 would break all niagara1 and niagara2 systems otherwise?This is sparc arch fix so I used sparc tree(in this case for sparc-next). I am open to maintainers suggestions. Thanks. -Tushar -Tushar commit b02c2b0bfd ("sparc: remove arch specific dma_supported implementations") introduced a code that incorrectly allow dma_supported() to succeed for 64bit dma mask even if system doesn't have ATU IOMMU. 64bit DMA only supported on sun4v equipped with ATU IOMMU HW. diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 24f21c7..0a32c57 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev, struct scatterlist *sglist, static int dma_4v_supported(struct device *dev, u64 device_mask) { struct iommu *iommu = dev->archdata.iommu; - u64 dma_addr_mask; + u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask > DMA_BIT_MASK(32) && iommu->atu) - dma_addr_mask = iommu->atu->dma_addr_mask; - else - dma_addr_mask = iommu->dma_addr_mask; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -Tushar -Tushar [0.24] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.d 2011/07/06 14:29' [0.33] PROMLIB: Root node compatible: sun4v [0.79] Linux version 4.12.0-08915-gf263fbb (mroos@t2000) (gcc version 4.9.2 (Debian 4.9.2-20)) #141 SMP Sun Jul 9 17:51:12 EEST 2017 [0.002047] bootconsole [earlyprom0] enabled [0.002383] ARCH: SUN4V [0.002668] Ethernet address: 00:14:4f:86:99:26 [0.003406] MM: PAGE_OFFSET is 0x8000 (max_phys_bits == 39) [0.004089] MM: VMALLOC [0x0001 --> 0x6000] [0.004562] MM: VMEMMAP [0x6000 --> 0xc000] [0.095699] Kernel: Using 3 locked TLB entries for main kernel image. [0.096387] Remapping the kernel... [0.096400] done. [1.906342] OF stdout device is: /virtual-devices@100/console@1 [1.907160] PROM: Built device tree with 148821 bytes of memory. [1.907804] MDESC: Size is 42336 bytes. [1.910139] PLATFORM: banner-name [Sun Fire T200] [1.910564] PLATFORM: name [SUNW,Sun-Fire-T200] [1.910919] PLATFORM: hostid [84869926] [1.911224] PLATFORM: serial# [00ab4130] [1.911536] PLATFORM: stick-frequency [3b9aca00] [1.911894] PLATFORM: mac-address [144f869926] [1.912241] PLATFORM: watchdog-resolution [1000 ms] [1.912619] PLATFORM: watchdog-max-timeout [3153600 ms] [1.913042] PLATFORM: max-cpus [32] [1.913501] Top of RAM: 0x3ffd34000, Total RAM: 0x3f7918000 [1.913936] Memory hole size: 132MB [2.279507] Allocated 16384 bytes for kernel page tables. [2.280578] Zone ranges: [2.280819] Normal [mem 0x0840-0x0003ffd33fff] [2.281292] Movable zone start for each node [2.281626] Early memory node ranges [2.281916] node 0: [mem 0x0840-0x0003ffc1] [2.282557] node 0: [mem 0x0003ffc28000-0x0003ffcfdfff] [2.283030] node 0: [mem 0x0003ffd0e000-0x0003ffd27fff] [2.283514] node 0: [mem 0x0003ffd2c000-0x0003ffd33fff] [2.283994] Initmem setup node 0 [mem 0x0840-0x0003ffd33fff] [2.782262] Booting Linux... [2.782734] CPU CAPS: [flush,stbar,swap,muldiv,v9,blkinit,mul32,div32] [2.783255] CPU CAPS: [v8plus,ASIBlkInit] [2.897543] percpu: Embedded 12 pages/cpu @8003ff80 s55872 r8192 d34240 u262144 [2.913264] SUN4V: Mondo queue sizes [cpu(4096) dev(16384) r(8192) nr(256)] [2.915492] Built 1 zonelists in Node order, mobility grouping on. Total pages: 2063634 [2.916160] Policy zone: Normal [2.916420] Kernel command line: root=/dev/sda1 ro [2.918743] PID hash table entries: 4096 (order: 2, 32768 bytes) [2.919230] Sorting __ex_table... [3.220450] Memory: 16497120K/16639072K available (5521K kernel code, 530K rwdata, 1224K rodata, 336K init, 699K bss, 141952K reserved,
Re: sun4v+DMA related boot crash on 4.13-git
On 07/10/2017 10:05 PM, Meelis Roos wrote: I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge works fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed on boot with DMA-related stacktrace (below). Allt he machines are sun4v physical machines, not VM-s. Older sun4 machines do not exhibit this problem. Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get time. I see whats going on with panic. I will reproduce locally. Will get back soon. This patch should fix panic. Please give it a try. Yes, this patch fixes it. Thank you for fixing it quickly! Thanks for testing. Patch sent for sparc-next. -Tushar commit b02c2b0bfd ("sparc: remove arch specific dma_supported implementations") introduced a code that incorrectly allow dma_supported() to succeed for 64bit dma mask even if system doesn't have ATU IOMMU. 64bit DMA only supported on sun4v equipped with ATU IOMMU HW. diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 24f21c7..0a32c57 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev, struct scatterlist *sglist, static int dma_4v_supported(struct device *dev, u64 device_mask) { struct iommu *iommu = dev->archdata.iommu; - u64 dma_addr_mask; + u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask > DMA_BIT_MASK(32) && iommu->atu) - dma_addr_mask = iommu->atu->dma_addr_mask; - else - dma_addr_mask = iommu->dma_addr_mask; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -Tushar -Tushar [0.24] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.d 2011/07/06 14:29' [0.33] PROMLIB: Root node compatible: sun4v [0.79] Linux version 4.12.0-08915-gf263fbb (mroos@t2000) (gcc version 4.9.2 (Debian 4.9.2-20)) #141 SMP Sun Jul 9 17:51:12 EEST 2017 [0.002047] bootconsole [earlyprom0] enabled [0.002383] ARCH: SUN4V [0.002668] Ethernet address: 00:14:4f:86:99:26 [0.003406] MM: PAGE_OFFSET is 0x8000 (max_phys_bits == 39) [0.004089] MM: VMALLOC [0x0001 --> 0x6000] [0.004562] MM: VMEMMAP [0x6000 --> 0xc000] [0.095699] Kernel: Using 3 locked TLB entries for main kernel image. [0.096387] Remapping the kernel... [0.096400] done. [1.906342] OF stdout device is: /virtual-devices@100/console@1 [1.907160] PROM: Built device tree with 148821 bytes of memory. [1.907804] MDESC: Size is 42336 bytes. [1.910139] PLATFORM: banner-name [Sun Fire T200] [1.910564] PLATFORM: name [SUNW,Sun-Fire-T200] [1.910919] PLATFORM: hostid [84869926] [1.911224] PLATFORM: serial# [00ab4130] [1.911536] PLATFORM: stick-frequency [3b9aca00] [1.911894] PLATFORM: mac-address [144f869926] [1.912241] PLATFORM: watchdog-resolution [1000 ms] [1.912619] PLATFORM: watchdog-max-timeout [3153600 ms] [1.913042] PLATFORM: max-cpus [32] [1.913501] Top of RAM: 0x3ffd34000, Total RAM: 0x3f7918000 [1.913936] Memory hole size: 132MB [2.279507] Allocated 16384 bytes for kernel page tables. [2.280578] Zone ranges: [2.280819] Normal [mem 0x0840-0x0003ffd33fff] [2.281292] Movable zone start for each node [2.281626] Early memory node ranges [2.281916] node 0: [mem 0x0840-0x0003ffc1] [2.282557] node 0: [mem 0x0003ffc28000-0x0003ffcfdfff] [2.283030] node 0: [mem 0x0003ffd0e000-0x0003ffd27fff] [2.283514] node 0: [mem 0x0003ffd2c000-0x0003ffd33fff] [2.283994] Initmem setup node 0 [mem 0x0840-0x0003ffd33fff] [2.782262] Booting Linux... [2.782734] CPU CAPS: [flush,stbar,swap,muldiv,v9,blkinit,mul32,div32] [2.783255] CPU CAPS: [v8plus,ASIBlkInit] [2.897543] percpu: Embedded 12 pages/cpu @8003ff80 s55872 r8192 d34240 u262144 [2.913264] SUN4V: Mondo queue sizes [cpu(4096) dev(16384) r(8192) nr(256)] [2.915492] Built 1 zonelists in Node order, mobility grouping on. Total pages: 2063634 [2.916160] Policy zone: Normal [2.916420] Kernel command line: root=/dev/sda1 ro [2.918743] PID hash table entries: 4096 (order: 2, 32768 bytes) [2.919230] Sorting __ex_table... [3.220450] Memory: 16497120K/16639072K available (5521K kernel code, 530K rwdata, 1224K rodata, 336K init, 699K bss, 141952K reserved, 0K cma-reserved) [3.223109] Hierarchical RCU implementation. [3.223452] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=16. [3.223933] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16 [3.225508] NR_IRQS: 2048,
Re: sun4v+DMA related boot crash on 4.13-git
On 07/10/2017 10:05 PM, Meelis Roos wrote: I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge works fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed on boot with DMA-related stacktrace (below). Allt he machines are sun4v physical machines, not VM-s. Older sun4 machines do not exhibit this problem. Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get time. I see whats going on with panic. I will reproduce locally. Will get back soon. This patch should fix panic. Please give it a try. Yes, this patch fixes it. Thank you for fixing it quickly! Thanks for testing. Patch sent for sparc-next. -Tushar commit b02c2b0bfd ("sparc: remove arch specific dma_supported implementations") introduced a code that incorrectly allow dma_supported() to succeed for 64bit dma mask even if system doesn't have ATU IOMMU. 64bit DMA only supported on sun4v equipped with ATU IOMMU HW. diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 24f21c7..0a32c57 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev, struct scatterlist *sglist, static int dma_4v_supported(struct device *dev, u64 device_mask) { struct iommu *iommu = dev->archdata.iommu; - u64 dma_addr_mask; + u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask > DMA_BIT_MASK(32) && iommu->atu) - dma_addr_mask = iommu->atu->dma_addr_mask; - else - dma_addr_mask = iommu->dma_addr_mask; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -Tushar -Tushar [0.24] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.d 2011/07/06 14:29' [0.33] PROMLIB: Root node compatible: sun4v [0.79] Linux version 4.12.0-08915-gf263fbb (mroos@t2000) (gcc version 4.9.2 (Debian 4.9.2-20)) #141 SMP Sun Jul 9 17:51:12 EEST 2017 [0.002047] bootconsole [earlyprom0] enabled [0.002383] ARCH: SUN4V [0.002668] Ethernet address: 00:14:4f:86:99:26 [0.003406] MM: PAGE_OFFSET is 0x8000 (max_phys_bits == 39) [0.004089] MM: VMALLOC [0x0001 --> 0x6000] [0.004562] MM: VMEMMAP [0x6000 --> 0xc000] [0.095699] Kernel: Using 3 locked TLB entries for main kernel image. [0.096387] Remapping the kernel... [0.096400] done. [1.906342] OF stdout device is: /virtual-devices@100/console@1 [1.907160] PROM: Built device tree with 148821 bytes of memory. [1.907804] MDESC: Size is 42336 bytes. [1.910139] PLATFORM: banner-name [Sun Fire T200] [1.910564] PLATFORM: name [SUNW,Sun-Fire-T200] [1.910919] PLATFORM: hostid [84869926] [1.911224] PLATFORM: serial# [00ab4130] [1.911536] PLATFORM: stick-frequency [3b9aca00] [1.911894] PLATFORM: mac-address [144f869926] [1.912241] PLATFORM: watchdog-resolution [1000 ms] [1.912619] PLATFORM: watchdog-max-timeout [3153600 ms] [1.913042] PLATFORM: max-cpus [32] [1.913501] Top of RAM: 0x3ffd34000, Total RAM: 0x3f7918000 [1.913936] Memory hole size: 132MB [2.279507] Allocated 16384 bytes for kernel page tables. [2.280578] Zone ranges: [2.280819] Normal [mem 0x0840-0x0003ffd33fff] [2.281292] Movable zone start for each node [2.281626] Early memory node ranges [2.281916] node 0: [mem 0x0840-0x0003ffc1] [2.282557] node 0: [mem 0x0003ffc28000-0x0003ffcfdfff] [2.283030] node 0: [mem 0x0003ffd0e000-0x0003ffd27fff] [2.283514] node 0: [mem 0x0003ffd2c000-0x0003ffd33fff] [2.283994] Initmem setup node 0 [mem 0x0840-0x0003ffd33fff] [2.782262] Booting Linux... [2.782734] CPU CAPS: [flush,stbar,swap,muldiv,v9,blkinit,mul32,div32] [2.783255] CPU CAPS: [v8plus,ASIBlkInit] [2.897543] percpu: Embedded 12 pages/cpu @8003ff80 s55872 r8192 d34240 u262144 [2.913264] SUN4V: Mondo queue sizes [cpu(4096) dev(16384) r(8192) nr(256)] [2.915492] Built 1 zonelists in Node order, mobility grouping on. Total pages: 2063634 [2.916160] Policy zone: Normal [2.916420] Kernel command line: root=/dev/sda1 ro [2.918743] PID hash table entries: 4096 (order: 2, 32768 bytes) [2.919230] Sorting __ex_table... [3.220450] Memory: 16497120K/16639072K available (5521K kernel code, 530K rwdata, 1224K rodata, 336K init, 699K bss, 141952K reserved, 0K cma-reserved) [3.223109] Hierarchical RCU implementation. [3.223452] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=16. [3.223933] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=16 [3.225508] NR_IRQS: 2048,
[sparc-next] SPARC64: Fix sun4v DMA panic
64bit DMA only supported on sun4v equipped with ATU IOMMU HW. 'Commit b02c2b0bfd7ae ("sparc: remove arch specific dma_supported implementations")' introduced a code that incorrectly allow dma_supported() to succeed for 64bit dma mask even if system doesn't have ATU IOMMU. This results into panic. Fix it. Reported-by: Meelis Roos <mr...@linux.ee> Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> --- arch/sparc/kernel/pci_sun4v.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 24f21c7..f10e2f7 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev, struct scatterlist *sglist, static int dma_4v_supported(struct device *dev, u64 device_mask) { struct iommu *iommu = dev->archdata.iommu; - u64 dma_addr_mask; + u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask > DMA_BIT_MASK(32) && iommu->atu) - dma_addr_mask = iommu->atu->dma_addr_mask; - else - dma_addr_mask = iommu->dma_addr_mask; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -- 1.9.1
[sparc-next] SPARC64: Fix sun4v DMA panic
64bit DMA only supported on sun4v equipped with ATU IOMMU HW. 'Commit b02c2b0bfd7ae ("sparc: remove arch specific dma_supported implementations")' introduced a code that incorrectly allow dma_supported() to succeed for 64bit dma mask even if system doesn't have ATU IOMMU. This results into panic. Fix it. Reported-by: Meelis Roos Signed-off-by: Tushar Dave --- arch/sparc/kernel/pci_sun4v.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 24f21c7..f10e2f7 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev, struct scatterlist *sglist, static int dma_4v_supported(struct device *dev, u64 device_mask) { struct iommu *iommu = dev->archdata.iommu; - u64 dma_addr_mask; + u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask > DMA_BIT_MASK(32) && iommu->atu) - dma_addr_mask = iommu->atu->dma_addr_mask; - else - dma_addr_mask = iommu->dma_addr_mask; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -- 1.9.1
[PATCH v3 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with sun4v hypervisor PCI IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. For example, we recently experienced legacy IOMMU limitation while using i40e driver in system with large number of CPUs (e.g. 128). Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has 512 (default) descriptors. So considering only RX queues (because RX premap DMA buffers), i40e takes 4*128*512 number of DMA entries in IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available in table. So bringing up four instance of i40e alone saturate existing IOMMU resource. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7) and sun4u SPARC. Thanks. -Tushar v2->v3: - Patch #5 addresses comment by Joe Perches. -- use %s, __func__ instead of embedding the function name. v1->v2: - Patch #2 addresses comments by Dave M. -- use page allocator to allocate IOTSB. -- use true/false with boolean variables. Dave Kleikamp (1): sparc64: Add FORCE_MAX_ZONEORDER and default to 13 Tushar Dave (5): sparc64: Add ATU (new IOMMU) support sparc64: Initialize iommu_map_table and iommu_pool sparc64: Bind PCIe devices to use IOMMU v2 service sparc64: Enable sun4v dma ops to use IOMMU v2 APIs sparc64: Enable 64-bit DMA arch/sparc/Kconfig | 22 ++ arch/sparc/include/asm/hypervisor.h | 343 + arch/sparc/include/asm/iommu_64.h | 28 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/iommu.c | 8 +- arch/sparc/kernel/pci_sun4v.c | 418 +++- arch/sparc/kernel/pci_sun4v.h | 21 ++ arch/sparc/kernel/pci_sun4v_asm.S | 68 ++ 8 files changed, 849 insertions(+), 60 deletions(-) -- 1.9.1
[PATCH v3 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe device has to be bound to IOTSB using HV API pci_iotsb_bind(). Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/kernel/pci_sun4v.c | 43 +++ arch/sparc/kernel/pci_sun4v.h | 3 +++ arch/sparc/kernel/pci_sun4v_asm.S | 14 + 3 files changed, 60 insertions(+) diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 242477c..d4208aa 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -216,6 +216,43 @@ static void *dma_4v_alloc_coherent(struct device *dev, size_t size, return NULL; } +unsigned long dma_4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + struct pci_bus *bus_dev) +{ + struct pci_dev *pdev; + unsigned long err; + unsigned int bus; + unsigned int device; + unsigned int fun; + + list_for_each_entry(pdev, _dev->devices, bus_list) { + if (pdev->subordinate) { + /* No need to bind pci bridge */ + dma_4v_iotsb_bind(devhandle, iotsb_num, + pdev->subordinate); + } else { + bus = bus_dev->number; + device = PCI_SLOT(pdev->devfn); + fun = PCI_FUNC(pdev->devfn); + err = pci_sun4v_iotsb_bind(devhandle, iotsb_num, + HV_PCI_DEVICE_BUILD(bus, + device, + fun)); + + /* If bind fails for one device it is going to fail +* for rest of the devices because we are sharing +* IOTSB. So in case of failure simply return with +* error. +*/ + if (err) + return err; + } + } + + return 0; +} + static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry, unsigned long npages) { @@ -629,6 +666,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info *pbm) } iotsb->iotsb_num = iotsb_num; + err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus); + if (err) { + pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err); + goto iotsb_conf_failed; + } + return 0; iotsb_conf_failed: diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h index 0ef6d1c..1019e0f 100644 --- a/arch/sparc/kernel/pci_sun4v.h +++ b/arch/sparc/kernel/pci_sun4v.h @@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle, unsigned long page_size, unsigned long dvma_base, u64 *iotsb_num); +unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + unsigned int pci_device); #endif /* !(_PCI_SUN4V_H) */ diff --git a/arch/sparc/kernel/pci_sun4v_asm.S b/arch/sparc/kernel/pci_sun4v_asm.S index fd94d0e..22024a9 100644 --- a/arch/sparc/kernel/pci_sun4v_asm.S +++ b/arch/sparc/kernel/pci_sun4v_asm.S @@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf) retl stx%o1, [%g1] ENDPROC(pci_sun4v_iotsb_conf) + + /* +* %o0: devhandle +* %o1: iotsb_num/iotsb_handle +* %o2: pci_device +* +* returns %o0: status +*/ +ENTRY(pci_sun4v_iotsb_bind) + mov HV_FAST_PCI_IOTSB_BIND, %o5 + ta HV_FAST_TRAP + retl +nop +ENDPROC(pci_sun4v_iotsb_bind) -- 1.9.1
[PATCH v3 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with sun4v hypervisor PCI IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. For example, we recently experienced legacy IOMMU limitation while using i40e driver in system with large number of CPUs (e.g. 128). Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has 512 (default) descriptors. So considering only RX queues (because RX premap DMA buffers), i40e takes 4*128*512 number of DMA entries in IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available in table. So bringing up four instance of i40e alone saturate existing IOMMU resource. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7) and sun4u SPARC. Thanks. -Tushar v2->v3: - Patch #5 addresses comment by Joe Perches. -- use %s, __func__ instead of embedding the function name. v1->v2: - Patch #2 addresses comments by Dave M. -- use page allocator to allocate IOTSB. -- use true/false with boolean variables. Dave Kleikamp (1): sparc64: Add FORCE_MAX_ZONEORDER and default to 13 Tushar Dave (5): sparc64: Add ATU (new IOMMU) support sparc64: Initialize iommu_map_table and iommu_pool sparc64: Bind PCIe devices to use IOMMU v2 service sparc64: Enable sun4v dma ops to use IOMMU v2 APIs sparc64: Enable 64-bit DMA arch/sparc/Kconfig | 22 ++ arch/sparc/include/asm/hypervisor.h | 343 + arch/sparc/include/asm/iommu_64.h | 28 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/iommu.c | 8 +- arch/sparc/kernel/pci_sun4v.c | 418 +++- arch/sparc/kernel/pci_sun4v.h | 21 ++ arch/sparc/kernel/pci_sun4v_asm.S | 68 ++ 8 files changed, 849 insertions(+), 60 deletions(-) -- 1.9.1
[PATCH v3 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe device has to be bound to IOTSB using HV API pci_iotsb_bind(). Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/kernel/pci_sun4v.c | 43 +++ arch/sparc/kernel/pci_sun4v.h | 3 +++ arch/sparc/kernel/pci_sun4v_asm.S | 14 + 3 files changed, 60 insertions(+) diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 242477c..d4208aa 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -216,6 +216,43 @@ static void *dma_4v_alloc_coherent(struct device *dev, size_t size, return NULL; } +unsigned long dma_4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + struct pci_bus *bus_dev) +{ + struct pci_dev *pdev; + unsigned long err; + unsigned int bus; + unsigned int device; + unsigned int fun; + + list_for_each_entry(pdev, _dev->devices, bus_list) { + if (pdev->subordinate) { + /* No need to bind pci bridge */ + dma_4v_iotsb_bind(devhandle, iotsb_num, + pdev->subordinate); + } else { + bus = bus_dev->number; + device = PCI_SLOT(pdev->devfn); + fun = PCI_FUNC(pdev->devfn); + err = pci_sun4v_iotsb_bind(devhandle, iotsb_num, + HV_PCI_DEVICE_BUILD(bus, + device, + fun)); + + /* If bind fails for one device it is going to fail +* for rest of the devices because we are sharing +* IOTSB. So in case of failure simply return with +* error. +*/ + if (err) + return err; + } + } + + return 0; +} + static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry, unsigned long npages) { @@ -629,6 +666,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info *pbm) } iotsb->iotsb_num = iotsb_num; + err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus); + if (err) { + pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err); + goto iotsb_conf_failed; + } + return 0; iotsb_conf_failed: diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h index 0ef6d1c..1019e0f 100644 --- a/arch/sparc/kernel/pci_sun4v.h +++ b/arch/sparc/kernel/pci_sun4v.h @@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle, unsigned long page_size, unsigned long dvma_base, u64 *iotsb_num); +unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + unsigned int pci_device); #endif /* !(_PCI_SUN4V_H) */ diff --git a/arch/sparc/kernel/pci_sun4v_asm.S b/arch/sparc/kernel/pci_sun4v_asm.S index fd94d0e..22024a9 100644 --- a/arch/sparc/kernel/pci_sun4v_asm.S +++ b/arch/sparc/kernel/pci_sun4v_asm.S @@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf) retl stx%o1, [%g1] ENDPROC(pci_sun4v_iotsb_conf) + + /* +* %o0: devhandle +* %o1: iotsb_num/iotsb_handle +* %o2: pci_device +* +* returns %o0: status +*/ +ENTRY(pci_sun4v_iotsb_bind) + mov HV_FAST_PCI_IOTSB_BIND, %o5 + ta HV_FAST_TRAP + retl +nop +ENDPROC(pci_sun4v_iotsb_bind) -- 1.9.1
[PATCH v3 2/6] sparc64: Add ATU (new IOMMU) support
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with Hypervisor IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 300MB-500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/hypervisor.h | 337 arch/sparc/include/asm/iommu_64.h | 26 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/pci_sun4v.c | 140 +++ arch/sparc/kernel/pci_sun4v.h | 7 + arch/sparc/kernel/pci_sun4v_asm.S | 18 ++ 6 files changed, 529 insertions(+) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 666d5ba..7b15df8 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, */ #define HV_FAST_PCI_MSG_SETVALID 0xd3 +/* PCI IOMMU v2 definitions and services + * + * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO + * definitions and services. + * + * CTE Clump Table Entry. First level table entry in the ATU. + * + * pci_device_list + * A 32-bit aligned list of pci_devices. + * + * pci_device_listp + * real address of a pci_device_list. 32-bit aligned. + * + * iotte IOMMU translation table entry. + * + * iotte_attributes + * IO Attributes for IOMMU v2 mappings. In addition to + * read, write IOMMU v2 supports relax ordering + * + * io_page_listA 64-bit aligned list of real addresses. Each real + * address in an io_page_list must be properly aligned + * to the pagesize of the given IOTSB. + * + * io_page_list_p Real address of an io_page_list, 64-bit aligned. + * + * IOTSB IO Translation Storage Buffer. An aligned table of + * IOTTEs. Each IOTSB has a pagesize, table size, and + * virtual address associated with it that must match + * a pagesize and table size supported by the un-derlying + * hardware implementation. The alignment requirements + * for an IOTSB depend on the pagesize used for that IOTSB. + * Each IOTTE in an IOTSB maps one pagesize-sized page. + * The size of the IOTSB dictates how large of a virtual + * address space the IOTSB is capable of mapping. + * + * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus + * iotsb_handle represents a binding of an IOTSB to a + * PCI root complex. + * + * iotsb_index Zero-based IOTTE number within an IOTSB. + */ + +/* pci_iotsb_conf() + * TRAP: HV_FAST_TRAP + * FUNCTION: HV_FAST_PCI_IOTSB_CONF + * ARG0: devhandle + * ARG1: r_addr + * ARG2: size + * ARG3: pagesize + * ARG4: iova + * RET0: status + * RET1: iotsb_handle + * ERRORS: EINVAL Invalid devhandle, size, iova, or pagesize + * EBADALIGN r_addr is not properly aligned + * ENORADDRr_addr is not a valid real address + * ETOOMANYNo further IOTSBs may be configured + * EBUSY Duplicate devhandle, raddir, iova combination + * + * Create an IOTSB suitable for the PCI root complex identified by devhandle, + * for the DMA virtual address defined by the argument iova. + * + * r_addr is the properly aligned base address of the IOTSB and size is the + * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to + * being configured. If it contains any values other than zeros then the + * behavior is undefined. + * + * pagesize is the size of each page in the IOTSB. Note that the combination of + * size (table size) and pagesize must be valid. + * + * virt is the DMA virtual address this IOTSB will map. + * + * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1. + * Once configured,
[PATCH v3 2/6] sparc64: Add ATU (new IOMMU) support
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with Hypervisor IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 300MB-500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/include/asm/hypervisor.h | 337 arch/sparc/include/asm/iommu_64.h | 26 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/pci_sun4v.c | 140 +++ arch/sparc/kernel/pci_sun4v.h | 7 + arch/sparc/kernel/pci_sun4v_asm.S | 18 ++ 6 files changed, 529 insertions(+) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 666d5ba..7b15df8 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, */ #define HV_FAST_PCI_MSG_SETVALID 0xd3 +/* PCI IOMMU v2 definitions and services + * + * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO + * definitions and services. + * + * CTE Clump Table Entry. First level table entry in the ATU. + * + * pci_device_list + * A 32-bit aligned list of pci_devices. + * + * pci_device_listp + * real address of a pci_device_list. 32-bit aligned. + * + * iotte IOMMU translation table entry. + * + * iotte_attributes + * IO Attributes for IOMMU v2 mappings. In addition to + * read, write IOMMU v2 supports relax ordering + * + * io_page_listA 64-bit aligned list of real addresses. Each real + * address in an io_page_list must be properly aligned + * to the pagesize of the given IOTSB. + * + * io_page_list_p Real address of an io_page_list, 64-bit aligned. + * + * IOTSB IO Translation Storage Buffer. An aligned table of + * IOTTEs. Each IOTSB has a pagesize, table size, and + * virtual address associated with it that must match + * a pagesize and table size supported by the un-derlying + * hardware implementation. The alignment requirements + * for an IOTSB depend on the pagesize used for that IOTSB. + * Each IOTTE in an IOTSB maps one pagesize-sized page. + * The size of the IOTSB dictates how large of a virtual + * address space the IOTSB is capable of mapping. + * + * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus + * iotsb_handle represents a binding of an IOTSB to a + * PCI root complex. + * + * iotsb_index Zero-based IOTTE number within an IOTSB. + */ + +/* pci_iotsb_conf() + * TRAP: HV_FAST_TRAP + * FUNCTION: HV_FAST_PCI_IOTSB_CONF + * ARG0: devhandle + * ARG1: r_addr + * ARG2: size + * ARG3: pagesize + * ARG4: iova + * RET0: status + * RET1: iotsb_handle + * ERRORS: EINVAL Invalid devhandle, size, iova, or pagesize + * EBADALIGN r_addr is not properly aligned + * ENORADDRr_addr is not a valid real address + * ETOOMANYNo further IOTSBs may be configured + * EBUSY Duplicate devhandle, raddir, iova combination + * + * Create an IOTSB suitable for the PCI root complex identified by devhandle, + * for the DMA virtual address defined by the argument iova. + * + * r_addr is the properly aligned base address of the IOTSB and size is the + * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to + * being configured. If it contains any values other than zeros then the + * behavior is undefined. + * + * pagesize is the size of each page in the IOTSB. Note that the combination of + * size (table size) and pagesize must be valid. + * + * virt is the DMA virtual address this IOTSB will map. + * + * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1. + * Once configured, privileged access to the IOTSB memory is prohibited and + * creates undefined behavior. The only
[PATCH v3 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with 64bit DMA mask. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/hypervisor.h | 6 + arch/sparc/kernel/pci_sun4v.c | 216 ++-- arch/sparc/kernel/pci_sun4v.h | 11 ++ arch/sparc/kernel/pci_sun4v_asm.S | 36 ++ 4 files changed, 211 insertions(+), 58 deletions(-) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 7b15df8..73cb897 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, * iotsb_index Zero-based IOTTE number within an IOTSB. */ +/* The index_count argument consists of two fields: + * bits 63:48 #iottes and bits 47:0 iotsb_index + */ +#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \ + (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index))) + /* pci_iotsb_conf() * TRAP: HV_FAST_TRAP * FUNCTION: HV_FAST_PCI_IOTSB_CONF diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index d4208aa..06981cc 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -72,34 +72,57 @@ static inline void iommu_batch_start(struct device *dev, unsigned long prot, uns } /* Interrupts must be disabled. */ -static long iommu_batch_flush(struct iommu_batch *p) +static long iommu_batch_flush(struct iommu_batch *p, u64 mask) { struct pci_pbm_info *pbm = p->dev->archdata.host_controller; + u64 *pglist = p->pglist; + u64 index_count; unsigned long devhandle = pbm->devhandle; unsigned long prot = p->prot; unsigned long entry = p->entry; - u64 *pglist = p->pglist; unsigned long npages = p->npages; + unsigned long iotsb_num; + unsigned long ret; + long num; /* VPCI maj=1, min=[0,1] only supports read and write */ if (vpci_major < 2) prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE); while (npages != 0) { - long num; - - num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist)); - if (unlikely(num < 0)) { - if (printk_ratelimit()) - printk("iommu_batch_flush: IOMMU map of " - "[%08lx:%08llx:%lx:%lx:%lx] failed with " - "status %ld\n", - devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist), num); - return -1; + if (mask <= DMA_BIT_MASK(32)) { + num = pci_sun4v_iommu_map(devhandle, + HV_PCI_TSBID(0, entry), + npages, + prot, + __pa(pglist)); + if (unlikely(num < 0)) { + pr_err_ratelimited("%s: IOMMU map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n", + __func__, + devhandle, + HV_PCI_TSBID(0, entry), + npages, prot, __pa(pglist), + num); + return -1; + } + } else { + index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry), + iotsb_num = pbm->iommu->atu->iotsb->iotsb_num; + ret = pci_sun4v_iotsb_map(devhandle, + iotsb_num, + index_count, + prot, + __pa(pglist), + ); + if (unlikely(ret != HV_EOK)) { + pr_err_ratelimited("%s: ATU map of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n", + __func__, + devhandle, iotsb_num, + index
[PATCH v3 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with 64bit DMA mask. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/include/asm/hypervisor.h | 6 + arch/sparc/kernel/pci_sun4v.c | 216 ++-- arch/sparc/kernel/pci_sun4v.h | 11 ++ arch/sparc/kernel/pci_sun4v_asm.S | 36 ++ 4 files changed, 211 insertions(+), 58 deletions(-) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 7b15df8..73cb897 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, * iotsb_index Zero-based IOTTE number within an IOTSB. */ +/* The index_count argument consists of two fields: + * bits 63:48 #iottes and bits 47:0 iotsb_index + */ +#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \ + (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index))) + /* pci_iotsb_conf() * TRAP: HV_FAST_TRAP * FUNCTION: HV_FAST_PCI_IOTSB_CONF diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index d4208aa..06981cc 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -72,34 +72,57 @@ static inline void iommu_batch_start(struct device *dev, unsigned long prot, uns } /* Interrupts must be disabled. */ -static long iommu_batch_flush(struct iommu_batch *p) +static long iommu_batch_flush(struct iommu_batch *p, u64 mask) { struct pci_pbm_info *pbm = p->dev->archdata.host_controller; + u64 *pglist = p->pglist; + u64 index_count; unsigned long devhandle = pbm->devhandle; unsigned long prot = p->prot; unsigned long entry = p->entry; - u64 *pglist = p->pglist; unsigned long npages = p->npages; + unsigned long iotsb_num; + unsigned long ret; + long num; /* VPCI maj=1, min=[0,1] only supports read and write */ if (vpci_major < 2) prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE); while (npages != 0) { - long num; - - num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist)); - if (unlikely(num < 0)) { - if (printk_ratelimit()) - printk("iommu_batch_flush: IOMMU map of " - "[%08lx:%08llx:%lx:%lx:%lx] failed with " - "status %ld\n", - devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist), num); - return -1; + if (mask <= DMA_BIT_MASK(32)) { + num = pci_sun4v_iommu_map(devhandle, + HV_PCI_TSBID(0, entry), + npages, + prot, + __pa(pglist)); + if (unlikely(num < 0)) { + pr_err_ratelimited("%s: IOMMU map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n", + __func__, + devhandle, + HV_PCI_TSBID(0, entry), + npages, prot, __pa(pglist), + num); + return -1; + } + } else { + index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry), + iotsb_num = pbm->iommu->atu->iotsb->iotsb_num; + ret = pci_sun4v_iotsb_map(devhandle, + iotsb_num, + index_count, + prot, + __pa(pglist), + ); + if (unlikely(ret != HV_EOK)) { + pr_err_ratelimited("%s: ATU map of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n", + __func__, + devhandle, iotsb_num, + index_count, prot, + __pa(pglist), ret); + return -1; +
[PATCH v3 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13
From: Dave Kleikamp <dave.kleik...@oracle.com> This change allows ATU (new IOMMU) in SPARC systems to request large (32M) contiguous memory during boot for creating IOTSB backing store. Signed-off-by: Dave Kleikamp <dave.kleik...@oracle.com> Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> --- arch/sparc/Kconfig | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index b23c76b..5202eb4 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -89,6 +89,10 @@ config ARCH_DEFCONFIG config ARCH_PROC_KCORE_TEXT def_bool y +config ARCH_ATU + bool + default y if SPARC64 + config IOMMU_HELPER bool default y if SPARC64 @@ -304,6 +308,20 @@ config ARCH_SPARSEMEM_ENABLE config ARCH_SPARSEMEM_DEFAULT def_bool y if SPARC64 +config FORCE_MAX_ZONEORDER + int "Maximum zone order" + default "13" + help + The kernel memory allocator divides physically contiguous memory + blocks into "zones", where each zone is a power of two number of + pages. This option selects the largest power of two that the kernel + keeps in the memory allocator. If you need to allocate very large + blocks of physically contiguous memory, then you may need to + increase this value. + + This config option is actually maximum order plus one. For example, + a value of 13 means that the largest free memory block is 2^12 pages. + source "mm/Kconfig" if SPARC64 -- 1.9.1
[PATCH v3 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13
From: Dave Kleikamp This change allows ATU (new IOMMU) in SPARC systems to request large (32M) contiguous memory during boot for creating IOTSB backing store. Signed-off-by: Dave Kleikamp Signed-off-by: Tushar Dave --- arch/sparc/Kconfig | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index b23c76b..5202eb4 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -89,6 +89,10 @@ config ARCH_DEFCONFIG config ARCH_PROC_KCORE_TEXT def_bool y +config ARCH_ATU + bool + default y if SPARC64 + config IOMMU_HELPER bool default y if SPARC64 @@ -304,6 +308,20 @@ config ARCH_SPARSEMEM_ENABLE config ARCH_SPARSEMEM_DEFAULT def_bool y if SPARC64 +config FORCE_MAX_ZONEORDER + int "Maximum zone order" + default "13" + help + The kernel memory allocator divides physically contiguous memory + blocks into "zones", where each zone is a power of two number of + pages. This option selects the largest power of two that the kernel + keeps in the memory allocator. If you need to allocate very large + blocks of physically contiguous memory, then you may need to + increase this value. + + This config option is actually maximum order plus one. For example, + a value of 13 means that the largest free memory block is 2^12 pages. + source "mm/Kconfig" if SPARC64 -- 1.9.1
[PATCH v3 3/6] sparc64: Initialize iommu_map_table and iommu_pool
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU. This change initializes iommu_map_table and iommu_pool for ATU. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Reviewed-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/iommu_64.h | 2 ++ arch/sparc/kernel/pci_sun4v.c | 19 +++ 2 files changed, 21 insertions(+) diff --git a/arch/sparc/include/asm/iommu_64.h b/arch/sparc/include/asm/iommu_64.h index 93daa59..f24f356 100644 --- a/arch/sparc/include/asm/iommu_64.h +++ b/arch/sparc/include/asm/iommu_64.h @@ -45,8 +45,10 @@ struct atu_ranges { struct atu { struct atu_ranges *ranges; struct atu_iotsb *iotsb; + struct iommu_map_table tbl; u64 base; u64 size; + u64 dma_addr_mask; }; struct iommu { diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 2afb86c..242477c 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -644,6 +644,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) struct atu *atu = pbm->iommu->atu; unsigned long err; const u64 *ranges; + u64 map_size, num_iotte; + u64 dma_mask; const u32 *page_size; int len; @@ -682,6 +684,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) return err; } + /* Create ATU iommu map. +* One bit represents one iotte in IOTSB table. +*/ + dma_mask = (roundup_pow_of_two(atu->size) - 1UL); + num_iotte = atu->size / IO_PAGE_SIZE; + map_size = num_iotte / 8; + atu->tbl.table_map_base = atu->base; + atu->dma_addr_mask = dma_mask; + atu->tbl.map = kzalloc(map_size, GFP_KERNEL); + if (!atu->tbl.map) + return -ENOMEM; + + iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT, + NULL, false /* no large_pool */, + 0 /* default npools */, + false /* want span boundary checking */); + return 0; } -- 1.9.1
[PATCH v3 3/6] sparc64: Initialize iommu_map_table and iommu_pool
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU. This change initializes iommu_map_table and iommu_pool for ATU. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Reviewed-by: Sowmini Varadhan --- arch/sparc/include/asm/iommu_64.h | 2 ++ arch/sparc/kernel/pci_sun4v.c | 19 +++ 2 files changed, 21 insertions(+) diff --git a/arch/sparc/include/asm/iommu_64.h b/arch/sparc/include/asm/iommu_64.h index 93daa59..f24f356 100644 --- a/arch/sparc/include/asm/iommu_64.h +++ b/arch/sparc/include/asm/iommu_64.h @@ -45,8 +45,10 @@ struct atu_ranges { struct atu { struct atu_ranges *ranges; struct atu_iotsb *iotsb; + struct iommu_map_table tbl; u64 base; u64 size; + u64 dma_addr_mask; }; struct iommu { diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 2afb86c..242477c 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -644,6 +644,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) struct atu *atu = pbm->iommu->atu; unsigned long err; const u64 *ranges; + u64 map_size, num_iotte; + u64 dma_mask; const u32 *page_size; int len; @@ -682,6 +684,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) return err; } + /* Create ATU iommu map. +* One bit represents one iotte in IOTSB table. +*/ + dma_mask = (roundup_pow_of_two(atu->size) - 1UL); + num_iotte = atu->size / IO_PAGE_SIZE; + map_size = num_iotte / 8; + atu->tbl.table_map_base = atu->base; + atu->dma_addr_mask = dma_mask; + atu->tbl.map = kzalloc(map_size, GFP_KERNEL); + if (!atu->tbl.map) + return -ENOMEM; + + iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT, + NULL, false /* no large_pool */, + 0 /* default npools */, + false /* want span boundary checking */); + return 0; } -- 1.9.1
[PATCH v3 6/6] sparc64: Enable 64-bit DMA
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities to use ATU for 64bit DMA. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/Kconfig| 4 arch/sparc/kernel/iommu.c | 8 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 5202eb4..60145c9 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -93,6 +93,10 @@ config ARCH_ATU bool default y if SPARC64 +config ARCH_DMA_ADDR_T_64BIT + bool + default y if ARCH_ATU + config IOMMU_HELPER bool default y if SPARC64 diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 5c615ab..852a329 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask) struct iommu *iommu = dev->archdata.iommu; u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask >= (1UL << 32UL)) - return 0; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -- 1.9.1
[PATCH v3 6/6] sparc64: Enable 64-bit DMA
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities to use ATU for 64bit DMA. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/Kconfig| 4 arch/sparc/kernel/iommu.c | 8 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 5202eb4..60145c9 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -93,6 +93,10 @@ config ARCH_ATU bool default y if SPARC64 +config ARCH_DMA_ADDR_T_64BIT + bool + default y if ARCH_ATU + config IOMMU_HELPER bool default y if SPARC64 diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 5c615ab..852a329 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask) struct iommu *iommu = dev->archdata.iommu; u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask >= (1UL << 32UL)) - return 0; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -- 1.9.1
[PATCH v2 2/6] sparc64: Add ATU (new IOMMU) support
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with Hypervisor IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 300MB-500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/hypervisor.h | 337 arch/sparc/include/asm/iommu_64.h | 26 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/pci_sun4v.c | 140 +++ arch/sparc/kernel/pci_sun4v.h | 7 + arch/sparc/kernel/pci_sun4v_asm.S | 18 ++ 6 files changed, 529 insertions(+) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 666d5ba..7b15df8 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, */ #define HV_FAST_PCI_MSG_SETVALID 0xd3 +/* PCI IOMMU v2 definitions and services + * + * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO + * definitions and services. + * + * CTE Clump Table Entry. First level table entry in the ATU. + * + * pci_device_list + * A 32-bit aligned list of pci_devices. + * + * pci_device_listp + * real address of a pci_device_list. 32-bit aligned. + * + * iotte IOMMU translation table entry. + * + * iotte_attributes + * IO Attributes for IOMMU v2 mappings. In addition to + * read, write IOMMU v2 supports relax ordering + * + * io_page_listA 64-bit aligned list of real addresses. Each real + * address in an io_page_list must be properly aligned + * to the pagesize of the given IOTSB. + * + * io_page_list_p Real address of an io_page_list, 64-bit aligned. + * + * IOTSB IO Translation Storage Buffer. An aligned table of + * IOTTEs. Each IOTSB has a pagesize, table size, and + * virtual address associated with it that must match + * a pagesize and table size supported by the un-derlying + * hardware implementation. The alignment requirements + * for an IOTSB depend on the pagesize used for that IOTSB. + * Each IOTTE in an IOTSB maps one pagesize-sized page. + * The size of the IOTSB dictates how large of a virtual + * address space the IOTSB is capable of mapping. + * + * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus + * iotsb_handle represents a binding of an IOTSB to a + * PCI root complex. + * + * iotsb_index Zero-based IOTTE number within an IOTSB. + */ + +/* pci_iotsb_conf() + * TRAP: HV_FAST_TRAP + * FUNCTION: HV_FAST_PCI_IOTSB_CONF + * ARG0: devhandle + * ARG1: r_addr + * ARG2: size + * ARG3: pagesize + * ARG4: iova + * RET0: status + * RET1: iotsb_handle + * ERRORS: EINVAL Invalid devhandle, size, iova, or pagesize + * EBADALIGN r_addr is not properly aligned + * ENORADDRr_addr is not a valid real address + * ETOOMANYNo further IOTSBs may be configured + * EBUSY Duplicate devhandle, raddir, iova combination + * + * Create an IOTSB suitable for the PCI root complex identified by devhandle, + * for the DMA virtual address defined by the argument iova. + * + * r_addr is the properly aligned base address of the IOTSB and size is the + * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to + * being configured. If it contains any values other than zeros then the + * behavior is undefined. + * + * pagesize is the size of each page in the IOTSB. Note that the combination of + * size (table size) and pagesize must be valid. + * + * virt is the DMA virtual address this IOTSB will map. + * + * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1. + * Once configured,
[PATCH v2 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe device has to be bound to IOTSB using HV API pci_iotsb_bind(). Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/kernel/pci_sun4v.c | 43 +++ arch/sparc/kernel/pci_sun4v.h | 3 +++ arch/sparc/kernel/pci_sun4v_asm.S | 14 + 3 files changed, 60 insertions(+) diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 242477c..d4208aa 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -216,6 +216,43 @@ static void *dma_4v_alloc_coherent(struct device *dev, size_t size, return NULL; } +unsigned long dma_4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + struct pci_bus *bus_dev) +{ + struct pci_dev *pdev; + unsigned long err; + unsigned int bus; + unsigned int device; + unsigned int fun; + + list_for_each_entry(pdev, _dev->devices, bus_list) { + if (pdev->subordinate) { + /* No need to bind pci bridge */ + dma_4v_iotsb_bind(devhandle, iotsb_num, + pdev->subordinate); + } else { + bus = bus_dev->number; + device = PCI_SLOT(pdev->devfn); + fun = PCI_FUNC(pdev->devfn); + err = pci_sun4v_iotsb_bind(devhandle, iotsb_num, + HV_PCI_DEVICE_BUILD(bus, + device, + fun)); + + /* If bind fails for one device it is going to fail +* for rest of the devices because we are sharing +* IOTSB. So in case of failure simply return with +* error. +*/ + if (err) + return err; + } + } + + return 0; +} + static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry, unsigned long npages) { @@ -629,6 +666,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info *pbm) } iotsb->iotsb_num = iotsb_num; + err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus); + if (err) { + pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err); + goto iotsb_conf_failed; + } + return 0; iotsb_conf_failed: diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h index 0ef6d1c..1019e0f 100644 --- a/arch/sparc/kernel/pci_sun4v.h +++ b/arch/sparc/kernel/pci_sun4v.h @@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle, unsigned long page_size, unsigned long dvma_base, u64 *iotsb_num); +unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + unsigned int pci_device); #endif /* !(_PCI_SUN4V_H) */ diff --git a/arch/sparc/kernel/pci_sun4v_asm.S b/arch/sparc/kernel/pci_sun4v_asm.S index fd94d0e..22024a9 100644 --- a/arch/sparc/kernel/pci_sun4v_asm.S +++ b/arch/sparc/kernel/pci_sun4v_asm.S @@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf) retl stx%o1, [%g1] ENDPROC(pci_sun4v_iotsb_conf) + + /* +* %o0: devhandle +* %o1: iotsb_num/iotsb_handle +* %o2: pci_device +* +* returns %o0: status +*/ +ENTRY(pci_sun4v_iotsb_bind) + mov HV_FAST_PCI_IOTSB_BIND, %o5 + ta HV_FAST_TRAP + retl +nop +ENDPROC(pci_sun4v_iotsb_bind) -- 1.9.1
[PATCH v2 2/6] sparc64: Add ATU (new IOMMU) support
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with Hypervisor IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 300MB-500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/include/asm/hypervisor.h | 337 arch/sparc/include/asm/iommu_64.h | 26 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/pci_sun4v.c | 140 +++ arch/sparc/kernel/pci_sun4v.h | 7 + arch/sparc/kernel/pci_sun4v_asm.S | 18 ++ 6 files changed, 529 insertions(+) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 666d5ba..7b15df8 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, */ #define HV_FAST_PCI_MSG_SETVALID 0xd3 +/* PCI IOMMU v2 definitions and services + * + * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO + * definitions and services. + * + * CTE Clump Table Entry. First level table entry in the ATU. + * + * pci_device_list + * A 32-bit aligned list of pci_devices. + * + * pci_device_listp + * real address of a pci_device_list. 32-bit aligned. + * + * iotte IOMMU translation table entry. + * + * iotte_attributes + * IO Attributes for IOMMU v2 mappings. In addition to + * read, write IOMMU v2 supports relax ordering + * + * io_page_listA 64-bit aligned list of real addresses. Each real + * address in an io_page_list must be properly aligned + * to the pagesize of the given IOTSB. + * + * io_page_list_p Real address of an io_page_list, 64-bit aligned. + * + * IOTSB IO Translation Storage Buffer. An aligned table of + * IOTTEs. Each IOTSB has a pagesize, table size, and + * virtual address associated with it that must match + * a pagesize and table size supported by the un-derlying + * hardware implementation. The alignment requirements + * for an IOTSB depend on the pagesize used for that IOTSB. + * Each IOTTE in an IOTSB maps one pagesize-sized page. + * The size of the IOTSB dictates how large of a virtual + * address space the IOTSB is capable of mapping. + * + * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus + * iotsb_handle represents a binding of an IOTSB to a + * PCI root complex. + * + * iotsb_index Zero-based IOTTE number within an IOTSB. + */ + +/* pci_iotsb_conf() + * TRAP: HV_FAST_TRAP + * FUNCTION: HV_FAST_PCI_IOTSB_CONF + * ARG0: devhandle + * ARG1: r_addr + * ARG2: size + * ARG3: pagesize + * ARG4: iova + * RET0: status + * RET1: iotsb_handle + * ERRORS: EINVAL Invalid devhandle, size, iova, or pagesize + * EBADALIGN r_addr is not properly aligned + * ENORADDRr_addr is not a valid real address + * ETOOMANYNo further IOTSBs may be configured + * EBUSY Duplicate devhandle, raddir, iova combination + * + * Create an IOTSB suitable for the PCI root complex identified by devhandle, + * for the DMA virtual address defined by the argument iova. + * + * r_addr is the properly aligned base address of the IOTSB and size is the + * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to + * being configured. If it contains any values other than zeros then the + * behavior is undefined. + * + * pagesize is the size of each page in the IOTSB. Note that the combination of + * size (table size) and pagesize must be valid. + * + * virt is the DMA virtual address this IOTSB will map. + * + * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1. + * Once configured, privileged access to the IOTSB memory is prohibited and + * creates undefined behavior. The only
[PATCH v2 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe device has to be bound to IOTSB using HV API pci_iotsb_bind(). Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/kernel/pci_sun4v.c | 43 +++ arch/sparc/kernel/pci_sun4v.h | 3 +++ arch/sparc/kernel/pci_sun4v_asm.S | 14 + 3 files changed, 60 insertions(+) diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 242477c..d4208aa 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -216,6 +216,43 @@ static void *dma_4v_alloc_coherent(struct device *dev, size_t size, return NULL; } +unsigned long dma_4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + struct pci_bus *bus_dev) +{ + struct pci_dev *pdev; + unsigned long err; + unsigned int bus; + unsigned int device; + unsigned int fun; + + list_for_each_entry(pdev, _dev->devices, bus_list) { + if (pdev->subordinate) { + /* No need to bind pci bridge */ + dma_4v_iotsb_bind(devhandle, iotsb_num, + pdev->subordinate); + } else { + bus = bus_dev->number; + device = PCI_SLOT(pdev->devfn); + fun = PCI_FUNC(pdev->devfn); + err = pci_sun4v_iotsb_bind(devhandle, iotsb_num, + HV_PCI_DEVICE_BUILD(bus, + device, + fun)); + + /* If bind fails for one device it is going to fail +* for rest of the devices because we are sharing +* IOTSB. So in case of failure simply return with +* error. +*/ + if (err) + return err; + } + } + + return 0; +} + static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry, unsigned long npages) { @@ -629,6 +666,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info *pbm) } iotsb->iotsb_num = iotsb_num; + err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus); + if (err) { + pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err); + goto iotsb_conf_failed; + } + return 0; iotsb_conf_failed: diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h index 0ef6d1c..1019e0f 100644 --- a/arch/sparc/kernel/pci_sun4v.h +++ b/arch/sparc/kernel/pci_sun4v.h @@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle, unsigned long page_size, unsigned long dvma_base, u64 *iotsb_num); +unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + unsigned int pci_device); #endif /* !(_PCI_SUN4V_H) */ diff --git a/arch/sparc/kernel/pci_sun4v_asm.S b/arch/sparc/kernel/pci_sun4v_asm.S index fd94d0e..22024a9 100644 --- a/arch/sparc/kernel/pci_sun4v_asm.S +++ b/arch/sparc/kernel/pci_sun4v_asm.S @@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf) retl stx%o1, [%g1] ENDPROC(pci_sun4v_iotsb_conf) + + /* +* %o0: devhandle +* %o1: iotsb_num/iotsb_handle +* %o2: pci_device +* +* returns %o0: status +*/ +ENTRY(pci_sun4v_iotsb_bind) + mov HV_FAST_PCI_IOTSB_BIND, %o5 + ta HV_FAST_TRAP + retl +nop +ENDPROC(pci_sun4v_iotsb_bind) -- 1.9.1
[PATCH v2 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with 64bit DMA mask. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/hypervisor.h | 6 + arch/sparc/kernel/pci_sun4v.c | 214 ++-- arch/sparc/kernel/pci_sun4v.h | 11 ++ arch/sparc/kernel/pci_sun4v_asm.S | 36 ++ 4 files changed, 209 insertions(+), 58 deletions(-) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 7b15df8..73cb897 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, * iotsb_index Zero-based IOTTE number within an IOTSB. */ +/* The index_count argument consists of two fields: + * bits 63:48 #iottes and bits 47:0 iotsb_index + */ +#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \ + (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index))) + /* pci_iotsb_conf() * TRAP: HV_FAST_TRAP * FUNCTION: HV_FAST_PCI_IOTSB_CONF diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index d4208aa..d6ab2b9 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, unsigned long prot, uns } /* Interrupts must be disabled. */ -static long iommu_batch_flush(struct iommu_batch *p) +static long iommu_batch_flush(struct iommu_batch *p, u64 mask) { struct pci_pbm_info *pbm = p->dev->archdata.host_controller; + u64 *pglist = p->pglist; + u64 index_count; unsigned long devhandle = pbm->devhandle; unsigned long prot = p->prot; unsigned long entry = p->entry; - u64 *pglist = p->pglist; unsigned long npages = p->npages; + unsigned long iotsb_num; + unsigned long ret; + long num; /* VPCI maj=1, min=[0,1] only supports read and write */ if (vpci_major < 2) prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE); while (npages != 0) { - long num; - - num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist)); - if (unlikely(num < 0)) { - if (printk_ratelimit()) - printk("iommu_batch_flush: IOMMU map of " - "[%08lx:%08llx:%lx:%lx:%lx] failed with " - "status %ld\n", - devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist), num); - return -1; + if (mask <= DMA_BIT_MASK(32)) { + num = pci_sun4v_iommu_map(devhandle, + HV_PCI_TSBID(0, entry), + npages, + prot, + __pa(pglist)); + if (unlikely(num < 0)) { + pr_err_ratelimited("iommu_batch_flush: IOMMU map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n", + devhandle, + HV_PCI_TSBID(0, entry), + npages, prot, __pa(pglist), + num); + return -1; + } + } else { + index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry), + iotsb_num = pbm->iommu->atu->iotsb->iotsb_num; + ret = pci_sun4v_iotsb_map(devhandle, + iotsb_num, + index_count, + prot, + __pa(pglist), + ); + if (unlikely(ret != HV_EOK)) { + pr_err_ratelimited("iommu_batch_flush: ATU map of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n", + devhandle, iotsb_num, + index_count, prot, + __pa(pglist), ret); + return -1; +
[PATCH v2 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13
From: Dave Kleikamp <dave.kleik...@oracle.com> This change allows ATU (new IOMMU) in SPARC systems to request large (32M) contiguous memory during boot for creating IOTSB backing store. Signed-off-by: Dave Kleikamp <dave.kleik...@oracle.com> Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> --- arch/sparc/Kconfig | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index b23c76b..5202eb4 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -89,6 +89,10 @@ config ARCH_DEFCONFIG config ARCH_PROC_KCORE_TEXT def_bool y +config ARCH_ATU + bool + default y if SPARC64 + config IOMMU_HELPER bool default y if SPARC64 @@ -304,6 +308,20 @@ config ARCH_SPARSEMEM_ENABLE config ARCH_SPARSEMEM_DEFAULT def_bool y if SPARC64 +config FORCE_MAX_ZONEORDER + int "Maximum zone order" + default "13" + help + The kernel memory allocator divides physically contiguous memory + blocks into "zones", where each zone is a power of two number of + pages. This option selects the largest power of two that the kernel + keeps in the memory allocator. If you need to allocate very large + blocks of physically contiguous memory, then you may need to + increase this value. + + This config option is actually maximum order plus one. For example, + a value of 13 means that the largest free memory block is 2^12 pages. + source "mm/Kconfig" if SPARC64 -- 1.9.1
[PATCH v2 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13
From: Dave Kleikamp This change allows ATU (new IOMMU) in SPARC systems to request large (32M) contiguous memory during boot for creating IOTSB backing store. Signed-off-by: Dave Kleikamp Signed-off-by: Tushar Dave --- arch/sparc/Kconfig | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index b23c76b..5202eb4 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -89,6 +89,10 @@ config ARCH_DEFCONFIG config ARCH_PROC_KCORE_TEXT def_bool y +config ARCH_ATU + bool + default y if SPARC64 + config IOMMU_HELPER bool default y if SPARC64 @@ -304,6 +308,20 @@ config ARCH_SPARSEMEM_ENABLE config ARCH_SPARSEMEM_DEFAULT def_bool y if SPARC64 +config FORCE_MAX_ZONEORDER + int "Maximum zone order" + default "13" + help + The kernel memory allocator divides physically contiguous memory + blocks into "zones", where each zone is a power of two number of + pages. This option selects the largest power of two that the kernel + keeps in the memory allocator. If you need to allocate very large + blocks of physically contiguous memory, then you may need to + increase this value. + + This config option is actually maximum order plus one. For example, + a value of 13 means that the largest free memory block is 2^12 pages. + source "mm/Kconfig" if SPARC64 -- 1.9.1
[PATCH v2 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with 64bit DMA mask. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/include/asm/hypervisor.h | 6 + arch/sparc/kernel/pci_sun4v.c | 214 ++-- arch/sparc/kernel/pci_sun4v.h | 11 ++ arch/sparc/kernel/pci_sun4v_asm.S | 36 ++ 4 files changed, 209 insertions(+), 58 deletions(-) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 7b15df8..73cb897 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, * iotsb_index Zero-based IOTTE number within an IOTSB. */ +/* The index_count argument consists of two fields: + * bits 63:48 #iottes and bits 47:0 iotsb_index + */ +#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \ + (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index))) + /* pci_iotsb_conf() * TRAP: HV_FAST_TRAP * FUNCTION: HV_FAST_PCI_IOTSB_CONF diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index d4208aa..d6ab2b9 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, unsigned long prot, uns } /* Interrupts must be disabled. */ -static long iommu_batch_flush(struct iommu_batch *p) +static long iommu_batch_flush(struct iommu_batch *p, u64 mask) { struct pci_pbm_info *pbm = p->dev->archdata.host_controller; + u64 *pglist = p->pglist; + u64 index_count; unsigned long devhandle = pbm->devhandle; unsigned long prot = p->prot; unsigned long entry = p->entry; - u64 *pglist = p->pglist; unsigned long npages = p->npages; + unsigned long iotsb_num; + unsigned long ret; + long num; /* VPCI maj=1, min=[0,1] only supports read and write */ if (vpci_major < 2) prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE); while (npages != 0) { - long num; - - num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist)); - if (unlikely(num < 0)) { - if (printk_ratelimit()) - printk("iommu_batch_flush: IOMMU map of " - "[%08lx:%08llx:%lx:%lx:%lx] failed with " - "status %ld\n", - devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist), num); - return -1; + if (mask <= DMA_BIT_MASK(32)) { + num = pci_sun4v_iommu_map(devhandle, + HV_PCI_TSBID(0, entry), + npages, + prot, + __pa(pglist)); + if (unlikely(num < 0)) { + pr_err_ratelimited("iommu_batch_flush: IOMMU map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n", + devhandle, + HV_PCI_TSBID(0, entry), + npages, prot, __pa(pglist), + num); + return -1; + } + } else { + index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry), + iotsb_num = pbm->iommu->atu->iotsb->iotsb_num; + ret = pci_sun4v_iotsb_map(devhandle, + iotsb_num, + index_count, + prot, + __pa(pglist), + ); + if (unlikely(ret != HV_EOK)) { + pr_err_ratelimited("iommu_batch_flush: ATU map of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n", + devhandle, iotsb_num, + index_count, prot, + __pa(pglist), ret); + return -1; + } } - entry += num; npages -= n
[PATCH v2 3/6] sparc64: Initialize iommu_map_table and iommu_pool
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU. This change initializes iommu_map_table and iommu_pool for ATU. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Reviewed-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/iommu_64.h | 2 ++ arch/sparc/kernel/pci_sun4v.c | 19 +++ 2 files changed, 21 insertions(+) diff --git a/arch/sparc/include/asm/iommu_64.h b/arch/sparc/include/asm/iommu_64.h index 93daa59..f24f356 100644 --- a/arch/sparc/include/asm/iommu_64.h +++ b/arch/sparc/include/asm/iommu_64.h @@ -45,8 +45,10 @@ struct atu_ranges { struct atu { struct atu_ranges *ranges; struct atu_iotsb *iotsb; + struct iommu_map_table tbl; u64 base; u64 size; + u64 dma_addr_mask; }; struct iommu { diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 2afb86c..242477c 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -644,6 +644,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) struct atu *atu = pbm->iommu->atu; unsigned long err; const u64 *ranges; + u64 map_size, num_iotte; + u64 dma_mask; const u32 *page_size; int len; @@ -682,6 +684,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) return err; } + /* Create ATU iommu map. +* One bit represents one iotte in IOTSB table. +*/ + dma_mask = (roundup_pow_of_two(atu->size) - 1UL); + num_iotte = atu->size / IO_PAGE_SIZE; + map_size = num_iotte / 8; + atu->tbl.table_map_base = atu->base; + atu->dma_addr_mask = dma_mask; + atu->tbl.map = kzalloc(map_size, GFP_KERNEL); + if (!atu->tbl.map) + return -ENOMEM; + + iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT, + NULL, false /* no large_pool */, + 0 /* default npools */, + false /* want span boundary checking */); + return 0; } -- 1.9.1
[PATCH v2 3/6] sparc64: Initialize iommu_map_table and iommu_pool
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU. This change initializes iommu_map_table and iommu_pool for ATU. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Reviewed-by: Sowmini Varadhan --- arch/sparc/include/asm/iommu_64.h | 2 ++ arch/sparc/kernel/pci_sun4v.c | 19 +++ 2 files changed, 21 insertions(+) diff --git a/arch/sparc/include/asm/iommu_64.h b/arch/sparc/include/asm/iommu_64.h index 93daa59..f24f356 100644 --- a/arch/sparc/include/asm/iommu_64.h +++ b/arch/sparc/include/asm/iommu_64.h @@ -45,8 +45,10 @@ struct atu_ranges { struct atu { struct atu_ranges *ranges; struct atu_iotsb *iotsb; + struct iommu_map_table tbl; u64 base; u64 size; + u64 dma_addr_mask; }; struct iommu { diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 2afb86c..242477c 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -644,6 +644,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) struct atu *atu = pbm->iommu->atu; unsigned long err; const u64 *ranges; + u64 map_size, num_iotte; + u64 dma_mask; const u32 *page_size; int len; @@ -682,6 +684,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) return err; } + /* Create ATU iommu map. +* One bit represents one iotte in IOTSB table. +*/ + dma_mask = (roundup_pow_of_two(atu->size) - 1UL); + num_iotte = atu->size / IO_PAGE_SIZE; + map_size = num_iotte / 8; + atu->tbl.table_map_base = atu->base; + atu->dma_addr_mask = dma_mask; + atu->tbl.map = kzalloc(map_size, GFP_KERNEL); + if (!atu->tbl.map) + return -ENOMEM; + + iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT, + NULL, false /* no large_pool */, + 0 /* default npools */, + false /* want span boundary checking */); + return 0; } -- 1.9.1
[PATCH v2 6/6] sparc64: Enable 64-bit DMA
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities to use ATU for 64bit DMA. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/Kconfig| 4 arch/sparc/kernel/iommu.c | 8 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 5202eb4..60145c9 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -93,6 +93,10 @@ config ARCH_ATU bool default y if SPARC64 +config ARCH_DMA_ADDR_T_64BIT + bool + default y if ARCH_ATU + config IOMMU_HELPER bool default y if SPARC64 diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 5c615ab..852a329 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask) struct iommu *iommu = dev->archdata.iommu; u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask >= (1UL << 32UL)) - return 0; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -- 1.9.1
[PATCH v2 6/6] sparc64: Enable 64-bit DMA
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities to use ATU for 64bit DMA. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/Kconfig| 4 arch/sparc/kernel/iommu.c | 8 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 5202eb4..60145c9 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -93,6 +93,10 @@ config ARCH_ATU bool default y if SPARC64 +config ARCH_DMA_ADDR_T_64BIT + bool + default y if ARCH_ATU + config IOMMU_HELPER bool default y if SPARC64 diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 5c615ab..852a329 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask) struct iommu *iommu = dev->archdata.iommu; u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask >= (1UL << 32UL)) - return 0; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -- 1.9.1
[PATCH v2 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with sun4v hypervisor PCI IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. For example, we recently experienced legacy IOMMU limitation while using i40e driver in system with large number of CPUs (e.g. 128). Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has 512 (default) descriptors. So considering only RX queues (because RX premap DMA buffers), i40e takes 4*128*512 number of DMA entries in IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available in table. So bringing up four instance of i40e alone saturate existing IOMMU resource. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7) and sun4u SPARC. Thanks. -Tushar v1->v2: - Patch #2 addresses comments by Dave M. -- use page allocator to allocate IOTSB. -- use true/false with boolean variables. Dave Kleikamp (1): sparc64: Add FORCE_MAX_ZONEORDER and default to 13 Tushar Dave (5): sparc64: Add ATU (new IOMMU) support sparc64: Initialize iommu_map_table and iommu_pool sparc64: Bind PCIe devices to use IOMMU v2 service sparc64: Enable sun4v dma ops to use IOMMU v2 APIs sparc64: Enable 64-bit DMA arch/sparc/Kconfig | 22 ++ arch/sparc/include/asm/hypervisor.h | 343 + arch/sparc/include/asm/iommu_64.h | 28 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/iommu.c | 8 +- arch/sparc/kernel/pci_sun4v.c | 416 +++- arch/sparc/kernel/pci_sun4v.h | 21 ++ arch/sparc/kernel/pci_sun4v_asm.S | 68 ++ 8 files changed, 847 insertions(+), 60 deletions(-) -- 1.9.1
[PATCH v2 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with sun4v hypervisor PCI IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. For example, we recently experienced legacy IOMMU limitation while using i40e driver in system with large number of CPUs (e.g. 128). Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has 512 (default) descriptors. So considering only RX queues (because RX premap DMA buffers), i40e takes 4*128*512 number of DMA entries in IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available in table. So bringing up four instance of i40e alone saturate existing IOMMU resource. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7) and sun4u SPARC. Thanks. -Tushar v1->v2: - Patch #2 addresses comments by Dave M. -- use page allocator to allocate IOTSB. -- use true/false with boolean variables. Dave Kleikamp (1): sparc64: Add FORCE_MAX_ZONEORDER and default to 13 Tushar Dave (5): sparc64: Add ATU (new IOMMU) support sparc64: Initialize iommu_map_table and iommu_pool sparc64: Bind PCIe devices to use IOMMU v2 service sparc64: Enable sun4v dma ops to use IOMMU v2 APIs sparc64: Enable 64-bit DMA arch/sparc/Kconfig | 22 ++ arch/sparc/include/asm/hypervisor.h | 343 + arch/sparc/include/asm/iommu_64.h | 28 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/iommu.c | 8 +- arch/sparc/kernel/pci_sun4v.c | 416 +++- arch/sparc/kernel/pci_sun4v.h | 21 ++ arch/sparc/kernel/pci_sun4v_asm.S | 68 ++ 8 files changed, 847 insertions(+), 60 deletions(-) -- 1.9.1
[PATCH 1/2] sunqe: Fix compiler warnings
sunqe uses '__u32' for dma handle while invoking kernel DMA APIs, instead of using dma_addr_t. This hasn't caused any 'incompatible pointer type' warning on SPARC because until now dma_addr_t is of type u32. However, recent changes in SPARC ATU (iommu) enables 64bit DMA and therefore dma_addr_t becomes of type u64. This makes 'incompatible pointer type' warnings inevitable. e.g. drivers/net/ethernet/sun/sunqe.c: In function ‘qec_ether_init’: drivers/net/ethernet/sun/sunqe.c:883: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ drivers/net/ethernet/sun/sunqe.c:885: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ This patch resolves above compiler warnings. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> --- drivers/net/ethernet/sun/sunqe.c | 11 ++- drivers/net/ethernet/sun/sunqe.h | 4 ++-- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/sun/sunqe.c b/drivers/net/ethernet/sun/sunqe.c index 9b825780..9582948 100644 --- a/drivers/net/ethernet/sun/sunqe.c +++ b/drivers/net/ethernet/sun/sunqe.c @@ -124,7 +124,7 @@ static void qe_init_rings(struct sunqe *qep) { struct qe_init_block *qb = qep->qe_block; struct sunqe_buffers *qbufs = qep->buffers; - __u32 qbufs_dvma = qep->buffers_dvma; + __u32 qbufs_dvma = (__u32)qep->buffers_dvma; int i; qep->rx_new = qep->rx_old = qep->tx_new = qep->tx_old = 0; @@ -144,6 +144,7 @@ static int qe_init(struct sunqe *qep, int from_irq) void __iomem *mregs = qep->mregs; void __iomem *gregs = qecp->gregs; unsigned char *e = >dev->dev_addr[0]; + __u32 qblk_dvma = (__u32)qep->qblock_dvma; u32 tmp; int i; @@ -152,8 +153,8 @@ static int qe_init(struct sunqe *qep, int from_irq) return -EAGAIN; /* Setup initial rx/tx init block pointers. */ - sbus_writel(qep->qblock_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS); - sbus_writel(qep->qblock_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS); + sbus_writel(qblk_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS); + sbus_writel(qblk_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS); /* Enable/mask the various irq's. */ sbus_writel(0, cregs + CREG_RIMASK); @@ -413,7 +414,7 @@ static void qe_rx(struct sunqe *qep) struct net_device *dev = qep->dev; struct qe_rxd *this; struct sunqe_buffers *qbufs = qep->buffers; - __u32 qbufs_dvma = qep->buffers_dvma; + __u32 qbufs_dvma = (__u32)qep->buffers_dvma; int elem = qep->rx_new; u32 flags; @@ -572,7 +573,7 @@ static int qe_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct sunqe *qep = netdev_priv(dev); struct sunqe_buffers *qbufs = qep->buffers; - __u32 txbuf_dvma, qbufs_dvma = qep->buffers_dvma; + __u32 txbuf_dvma, qbufs_dvma = (__u32)qep->buffers_dvma; unsigned char *txbuf; int len, entry; diff --git a/drivers/net/ethernet/sun/sunqe.h b/drivers/net/ethernet/sun/sunqe.h index 581781b..ae190b7 100644 --- a/drivers/net/ethernet/sun/sunqe.h +++ b/drivers/net/ethernet/sun/sunqe.h @@ -334,12 +334,12 @@ struct sunqe { void __iomem*qcregs;/* QEC per-channel Registers */ void __iomem*mregs; /* Per-channel MACE Registers */ struct qe_init_block*qe_block; /* RX and TX descriptors */ - __u32 qblock_dvma;/* RX and TX descriptors */ + dma_addr_t qblock_dvma;/* RX and TX descriptors */ spinlock_t lock; /* Protects txfull state */ int rx_new, rx_old; /* RX ring extents */ int tx_new, tx_old; /* TX ring extents */ struct sunqe_buffers*buffers; /* CPU visible address. */ - __u32 buffers_dvma; /* DVMA visible address. */ + dma_addr_t buffers_dvma; /* DVMA visible address. */ struct sunqec *parent; u8 mconfig;/* Base MACE mconfig value */ struct platform_device *op;/* QE's OF device struct */ -- 1.9.1
[PATCH 1/2] sunqe: Fix compiler warnings
sunqe uses '__u32' for dma handle while invoking kernel DMA APIs, instead of using dma_addr_t. This hasn't caused any 'incompatible pointer type' warning on SPARC because until now dma_addr_t is of type u32. However, recent changes in SPARC ATU (iommu) enables 64bit DMA and therefore dma_addr_t becomes of type u64. This makes 'incompatible pointer type' warnings inevitable. e.g. drivers/net/ethernet/sun/sunqe.c: In function ‘qec_ether_init’: drivers/net/ethernet/sun/sunqe.c:883: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ drivers/net/ethernet/sun/sunqe.c:885: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ This patch resolves above compiler warnings. Signed-off-by: Tushar Dave Reviewed-by: chris hyser --- drivers/net/ethernet/sun/sunqe.c | 11 ++- drivers/net/ethernet/sun/sunqe.h | 4 ++-- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/sun/sunqe.c b/drivers/net/ethernet/sun/sunqe.c index 9b825780..9582948 100644 --- a/drivers/net/ethernet/sun/sunqe.c +++ b/drivers/net/ethernet/sun/sunqe.c @@ -124,7 +124,7 @@ static void qe_init_rings(struct sunqe *qep) { struct qe_init_block *qb = qep->qe_block; struct sunqe_buffers *qbufs = qep->buffers; - __u32 qbufs_dvma = qep->buffers_dvma; + __u32 qbufs_dvma = (__u32)qep->buffers_dvma; int i; qep->rx_new = qep->rx_old = qep->tx_new = qep->tx_old = 0; @@ -144,6 +144,7 @@ static int qe_init(struct sunqe *qep, int from_irq) void __iomem *mregs = qep->mregs; void __iomem *gregs = qecp->gregs; unsigned char *e = >dev->dev_addr[0]; + __u32 qblk_dvma = (__u32)qep->qblock_dvma; u32 tmp; int i; @@ -152,8 +153,8 @@ static int qe_init(struct sunqe *qep, int from_irq) return -EAGAIN; /* Setup initial rx/tx init block pointers. */ - sbus_writel(qep->qblock_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS); - sbus_writel(qep->qblock_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS); + sbus_writel(qblk_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS); + sbus_writel(qblk_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS); /* Enable/mask the various irq's. */ sbus_writel(0, cregs + CREG_RIMASK); @@ -413,7 +414,7 @@ static void qe_rx(struct sunqe *qep) struct net_device *dev = qep->dev; struct qe_rxd *this; struct sunqe_buffers *qbufs = qep->buffers; - __u32 qbufs_dvma = qep->buffers_dvma; + __u32 qbufs_dvma = (__u32)qep->buffers_dvma; int elem = qep->rx_new; u32 flags; @@ -572,7 +573,7 @@ static int qe_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct sunqe *qep = netdev_priv(dev); struct sunqe_buffers *qbufs = qep->buffers; - __u32 txbuf_dvma, qbufs_dvma = qep->buffers_dvma; + __u32 txbuf_dvma, qbufs_dvma = (__u32)qep->buffers_dvma; unsigned char *txbuf; int len, entry; diff --git a/drivers/net/ethernet/sun/sunqe.h b/drivers/net/ethernet/sun/sunqe.h index 581781b..ae190b7 100644 --- a/drivers/net/ethernet/sun/sunqe.h +++ b/drivers/net/ethernet/sun/sunqe.h @@ -334,12 +334,12 @@ struct sunqe { void __iomem*qcregs;/* QEC per-channel Registers */ void __iomem*mregs; /* Per-channel MACE Registers */ struct qe_init_block*qe_block; /* RX and TX descriptors */ - __u32 qblock_dvma;/* RX and TX descriptors */ + dma_addr_t qblock_dvma;/* RX and TX descriptors */ spinlock_t lock; /* Protects txfull state */ int rx_new, rx_old; /* RX ring extents */ int tx_new, tx_old; /* TX ring extents */ struct sunqe_buffers*buffers; /* CPU visible address. */ - __u32 buffers_dvma; /* DVMA visible address. */ + dma_addr_t buffers_dvma; /* DVMA visible address. */ struct sunqec *parent; u8 mconfig;/* Base MACE mconfig value */ struct platform_device *op;/* QE's OF device struct */ -- 1.9.1
[PATCH 2/2] sunbmac: Fix compiler warning
sunbmac uses '__u32' for dma handle while invoking kernel DMA APIs, instead of using dma_addr_t. This hasn't caused any 'incompatible pointer type' warning on SPARC because until now dma_addr_t is of type u32. However, recent changes in SPARC ATU (iommu) enables 64bit DMA and therefore dma_addr_t becomes of type u64. This makes 'incompatible pointer type' warnings inevitable. e.g. drivers/net/ethernet/sun/sunbmac.c: In function ‘bigmac_ether_init’: drivers/net/ethernet/sun/sunbmac.c:1166: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ This patch resolves above compiler warning. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> --- drivers/net/ethernet/sun/sunbmac.c | 5 +++-- drivers/net/ethernet/sun/sunbmac.h | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/sun/sunbmac.c b/drivers/net/ethernet/sun/sunbmac.c index aa4f9d2..02f4527 100644 --- a/drivers/net/ethernet/sun/sunbmac.c +++ b/drivers/net/ethernet/sun/sunbmac.c @@ -623,6 +623,7 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq) void __iomem *gregs= bp->gregs; void __iomem *cregs= bp->creg; void __iomem *bregs= bp->bregs; + __u32 bblk_dvma = (__u32)bp->bblock_dvma; unsigned char *e = >dev->dev_addr[0]; /* Latch current counters into statistics. */ @@ -671,9 +672,9 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq) bregs + BMAC_XIFCFG); /* Tell the QEC where the ring descriptors are. */ - sbus_writel(bp->bblock_dvma + bib_offset(be_rxd, 0), + sbus_writel(bblk_dvma + bib_offset(be_rxd, 0), cregs + CREG_RXDS); - sbus_writel(bp->bblock_dvma + bib_offset(be_txd, 0), + sbus_writel(bblk_dvma + bib_offset(be_txd, 0), cregs + CREG_TXDS); /* Setup the FIFO pointers into QEC local memory. */ diff --git a/drivers/net/ethernet/sun/sunbmac.h b/drivers/net/ethernet/sun/sunbmac.h index 06dd217..532fc56 100644 --- a/drivers/net/ethernet/sun/sunbmac.h +++ b/drivers/net/ethernet/sun/sunbmac.h @@ -291,7 +291,7 @@ struct bigmac { void __iomem*bregs; /* BigMAC Registers */ void __iomem*tregs; /* BigMAC Transceiver */ struct bmac_init_block *bmac_block;/* RX and TX descriptors */ - __u32bblock_dvma; /* RX and TX descriptors */ + dma_addr_t bblock_dvma;/* RX and TX descriptors */ spinlock_t lock; -- 1.9.1
[PATCH 2/2] sunbmac: Fix compiler warning
sunbmac uses '__u32' for dma handle while invoking kernel DMA APIs, instead of using dma_addr_t. This hasn't caused any 'incompatible pointer type' warning on SPARC because until now dma_addr_t is of type u32. However, recent changes in SPARC ATU (iommu) enables 64bit DMA and therefore dma_addr_t becomes of type u64. This makes 'incompatible pointer type' warnings inevitable. e.g. drivers/net/ethernet/sun/sunbmac.c: In function ‘bigmac_ether_init’: drivers/net/ethernet/sun/sunbmac.c:1166: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ This patch resolves above compiler warning. Signed-off-by: Tushar Dave Reviewed-by: chris hyser --- drivers/net/ethernet/sun/sunbmac.c | 5 +++-- drivers/net/ethernet/sun/sunbmac.h | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/sun/sunbmac.c b/drivers/net/ethernet/sun/sunbmac.c index aa4f9d2..02f4527 100644 --- a/drivers/net/ethernet/sun/sunbmac.c +++ b/drivers/net/ethernet/sun/sunbmac.c @@ -623,6 +623,7 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq) void __iomem *gregs= bp->gregs; void __iomem *cregs= bp->creg; void __iomem *bregs= bp->bregs; + __u32 bblk_dvma = (__u32)bp->bblock_dvma; unsigned char *e = >dev->dev_addr[0]; /* Latch current counters into statistics. */ @@ -671,9 +672,9 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq) bregs + BMAC_XIFCFG); /* Tell the QEC where the ring descriptors are. */ - sbus_writel(bp->bblock_dvma + bib_offset(be_rxd, 0), + sbus_writel(bblk_dvma + bib_offset(be_rxd, 0), cregs + CREG_RXDS); - sbus_writel(bp->bblock_dvma + bib_offset(be_txd, 0), + sbus_writel(bblk_dvma + bib_offset(be_txd, 0), cregs + CREG_TXDS); /* Setup the FIFO pointers into QEC local memory. */ diff --git a/drivers/net/ethernet/sun/sunbmac.h b/drivers/net/ethernet/sun/sunbmac.h index 06dd217..532fc56 100644 --- a/drivers/net/ethernet/sun/sunbmac.h +++ b/drivers/net/ethernet/sun/sunbmac.h @@ -291,7 +291,7 @@ struct bigmac { void __iomem*bregs; /* BigMAC Registers */ void __iomem*tregs; /* BigMAC Transceiver */ struct bmac_init_block *bmac_block;/* RX and TX descriptors */ - __u32bblock_dvma; /* RX and TX descriptors */ + dma_addr_t bblock_dvma;/* RX and TX descriptors */ spinlock_t lock; -- 1.9.1
[PATCH 0/2] sparc/net: Fix compiler warnings
Recently, ATU (iommu) changes are submitted to sparclinux that enables 64bit DMA on SPARC. However, this change also makes 'incompatible pointer type' compiler warnings inevitable on sunqe and sunbmac driver. The two patches in series fix compiler warnings. Tushar Dave (2): sunqe: Fix compiler warnings sunbmac: Fix compiler warning drivers/net/ethernet/sun/sunbmac.c | 5 +++-- drivers/net/ethernet/sun/sunbmac.h | 2 +- drivers/net/ethernet/sun/sunqe.c | 11 ++- drivers/net/ethernet/sun/sunqe.h | 4 ++-- 4 files changed, 12 insertions(+), 10 deletions(-) -- 1.9.1
[PATCH 0/2] sparc/net: Fix compiler warnings
Recently, ATU (iommu) changes are submitted to sparclinux that enables 64bit DMA on SPARC. However, this change also makes 'incompatible pointer type' compiler warnings inevitable on sunqe and sunbmac driver. The two patches in series fix compiler warnings. Tushar Dave (2): sunqe: Fix compiler warnings sunbmac: Fix compiler warning drivers/net/ethernet/sun/sunbmac.c | 5 +++-- drivers/net/ethernet/sun/sunbmac.h | 2 +- drivers/net/ethernet/sun/sunqe.c | 11 ++- drivers/net/ethernet/sun/sunqe.h | 4 ++-- 4 files changed, 12 insertions(+), 10 deletions(-) -- 1.9.1
[PATCH 2/2] sunbmac: Fix compiler warning
sunbmac uses '__u32' for dma handle while invoking kernel DMA APIs, instead of using dma_addr_t. This hasn't caused any 'incompatible pointer type' warning on SPARC because until now dma_addr_t is of type u32. However, recent changes in SPARC ATU (iommu) enables 64bit DMA and therefore dma_addr_t becomes of type u64. This makes 'incompatible pointer type' warnings inevitable. e.g. drivers/net/ethernet/sun/sunbmac.c: In function ‘bigmac_ether_init’: drivers/net/ethernet/sun/sunbmac.c:1166: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ This patch resolves above compiler warning. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> --- drivers/net/ethernet/sun/sunbmac.c | 5 +++-- drivers/net/ethernet/sun/sunbmac.h | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/sun/sunbmac.c b/drivers/net/ethernet/sun/sunbmac.c index aa4f9d2..02f4527 100644 --- a/drivers/net/ethernet/sun/sunbmac.c +++ b/drivers/net/ethernet/sun/sunbmac.c @@ -623,6 +623,7 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq) void __iomem *gregs= bp->gregs; void __iomem *cregs= bp->creg; void __iomem *bregs= bp->bregs; + __u32 bblk_dvma = (__u32)bp->bblock_dvma; unsigned char *e = >dev->dev_addr[0]; /* Latch current counters into statistics. */ @@ -671,9 +672,9 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq) bregs + BMAC_XIFCFG); /* Tell the QEC where the ring descriptors are. */ - sbus_writel(bp->bblock_dvma + bib_offset(be_rxd, 0), + sbus_writel(bblk_dvma + bib_offset(be_rxd, 0), cregs + CREG_RXDS); - sbus_writel(bp->bblock_dvma + bib_offset(be_txd, 0), + sbus_writel(bblk_dvma + bib_offset(be_txd, 0), cregs + CREG_TXDS); /* Setup the FIFO pointers into QEC local memory. */ diff --git a/drivers/net/ethernet/sun/sunbmac.h b/drivers/net/ethernet/sun/sunbmac.h index 06dd217..532fc56 100644 --- a/drivers/net/ethernet/sun/sunbmac.h +++ b/drivers/net/ethernet/sun/sunbmac.h @@ -291,7 +291,7 @@ struct bigmac { void __iomem*bregs; /* BigMAC Registers */ void __iomem*tregs; /* BigMAC Transceiver */ struct bmac_init_block *bmac_block;/* RX and TX descriptors */ - __u32bblock_dvma; /* RX and TX descriptors */ + dma_addr_t bblock_dvma;/* RX and TX descriptors */ spinlock_t lock; -- 1.9.1
[PATCH 2/2] sunbmac: Fix compiler warning
sunbmac uses '__u32' for dma handle while invoking kernel DMA APIs, instead of using dma_addr_t. This hasn't caused any 'incompatible pointer type' warning on SPARC because until now dma_addr_t is of type u32. However, recent changes in SPARC ATU (iommu) enables 64bit DMA and therefore dma_addr_t becomes of type u64. This makes 'incompatible pointer type' warnings inevitable. e.g. drivers/net/ethernet/sun/sunbmac.c: In function ‘bigmac_ether_init’: drivers/net/ethernet/sun/sunbmac.c:1166: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ This patch resolves above compiler warning. Signed-off-by: Tushar Dave Reviewed-by: chris hyser --- drivers/net/ethernet/sun/sunbmac.c | 5 +++-- drivers/net/ethernet/sun/sunbmac.h | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/sun/sunbmac.c b/drivers/net/ethernet/sun/sunbmac.c index aa4f9d2..02f4527 100644 --- a/drivers/net/ethernet/sun/sunbmac.c +++ b/drivers/net/ethernet/sun/sunbmac.c @@ -623,6 +623,7 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq) void __iomem *gregs= bp->gregs; void __iomem *cregs= bp->creg; void __iomem *bregs= bp->bregs; + __u32 bblk_dvma = (__u32)bp->bblock_dvma; unsigned char *e = >dev->dev_addr[0]; /* Latch current counters into statistics. */ @@ -671,9 +672,9 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq) bregs + BMAC_XIFCFG); /* Tell the QEC where the ring descriptors are. */ - sbus_writel(bp->bblock_dvma + bib_offset(be_rxd, 0), + sbus_writel(bblk_dvma + bib_offset(be_rxd, 0), cregs + CREG_RXDS); - sbus_writel(bp->bblock_dvma + bib_offset(be_txd, 0), + sbus_writel(bblk_dvma + bib_offset(be_txd, 0), cregs + CREG_TXDS); /* Setup the FIFO pointers into QEC local memory. */ diff --git a/drivers/net/ethernet/sun/sunbmac.h b/drivers/net/ethernet/sun/sunbmac.h index 06dd217..532fc56 100644 --- a/drivers/net/ethernet/sun/sunbmac.h +++ b/drivers/net/ethernet/sun/sunbmac.h @@ -291,7 +291,7 @@ struct bigmac { void __iomem*bregs; /* BigMAC Registers */ void __iomem*tregs; /* BigMAC Transceiver */ struct bmac_init_block *bmac_block;/* RX and TX descriptors */ - __u32bblock_dvma; /* RX and TX descriptors */ + dma_addr_t bblock_dvma;/* RX and TX descriptors */ spinlock_t lock; -- 1.9.1
[PATCH 1/2] sunqe: Fix compiler warnings
sunqe uses '__u32' for dma handle while invoking kernel DMA APIs, instead of using dma_addr_t. This hasn't caused any 'incompatible pointer type' warning on SPARC because until now dma_addr_t is of type u32. However, recent changes in SPARC ATU (iommu) enables 64bit DMA and therefore dma_addr_t becomes of type u64. This makes 'incompatible pointer type' warnings inevitable. e.g. drivers/net/ethernet/sun/sunqe.c: In function ‘qec_ether_init’: drivers/net/ethernet/sun/sunqe.c:883: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ drivers/net/ethernet/sun/sunqe.c:885: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ This patch resolves above compiler warnings. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> --- drivers/net/ethernet/sun/sunqe.c | 11 ++- drivers/net/ethernet/sun/sunqe.h | 4 ++-- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/sun/sunqe.c b/drivers/net/ethernet/sun/sunqe.c index 9b825780..9582948 100644 --- a/drivers/net/ethernet/sun/sunqe.c +++ b/drivers/net/ethernet/sun/sunqe.c @@ -124,7 +124,7 @@ static void qe_init_rings(struct sunqe *qep) { struct qe_init_block *qb = qep->qe_block; struct sunqe_buffers *qbufs = qep->buffers; - __u32 qbufs_dvma = qep->buffers_dvma; + __u32 qbufs_dvma = (__u32)qep->buffers_dvma; int i; qep->rx_new = qep->rx_old = qep->tx_new = qep->tx_old = 0; @@ -144,6 +144,7 @@ static int qe_init(struct sunqe *qep, int from_irq) void __iomem *mregs = qep->mregs; void __iomem *gregs = qecp->gregs; unsigned char *e = >dev->dev_addr[0]; + __u32 qblk_dvma = (__u32)qep->qblock_dvma; u32 tmp; int i; @@ -152,8 +153,8 @@ static int qe_init(struct sunqe *qep, int from_irq) return -EAGAIN; /* Setup initial rx/tx init block pointers. */ - sbus_writel(qep->qblock_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS); - sbus_writel(qep->qblock_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS); + sbus_writel(qblk_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS); + sbus_writel(qblk_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS); /* Enable/mask the various irq's. */ sbus_writel(0, cregs + CREG_RIMASK); @@ -413,7 +414,7 @@ static void qe_rx(struct sunqe *qep) struct net_device *dev = qep->dev; struct qe_rxd *this; struct sunqe_buffers *qbufs = qep->buffers; - __u32 qbufs_dvma = qep->buffers_dvma; + __u32 qbufs_dvma = (__u32)qep->buffers_dvma; int elem = qep->rx_new; u32 flags; @@ -572,7 +573,7 @@ static int qe_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct sunqe *qep = netdev_priv(dev); struct sunqe_buffers *qbufs = qep->buffers; - __u32 txbuf_dvma, qbufs_dvma = qep->buffers_dvma; + __u32 txbuf_dvma, qbufs_dvma = (__u32)qep->buffers_dvma; unsigned char *txbuf; int len, entry; diff --git a/drivers/net/ethernet/sun/sunqe.h b/drivers/net/ethernet/sun/sunqe.h index 581781b..ae190b7 100644 --- a/drivers/net/ethernet/sun/sunqe.h +++ b/drivers/net/ethernet/sun/sunqe.h @@ -334,12 +334,12 @@ struct sunqe { void __iomem*qcregs;/* QEC per-channel Registers */ void __iomem*mregs; /* Per-channel MACE Registers */ struct qe_init_block*qe_block; /* RX and TX descriptors */ - __u32 qblock_dvma;/* RX and TX descriptors */ + dma_addr_t qblock_dvma;/* RX and TX descriptors */ spinlock_t lock; /* Protects txfull state */ int rx_new, rx_old; /* RX ring extents */ int tx_new, tx_old; /* TX ring extents */ struct sunqe_buffers*buffers; /* CPU visible address. */ - __u32 buffers_dvma; /* DVMA visible address. */ + dma_addr_t buffers_dvma; /* DVMA visible address. */ struct sunqec *parent; u8 mconfig;/* Base MACE mconfig value */ struct platform_device *op;/* QE's OF device struct */ -- 1.9.1
[PATCH 0/2] net: Fix compiler warnings
Recently, ATU (iommu) changes are submitted to linux-sparc that enables 64bit DMA on SPARC. However, this change also makes 'incompatible pointer type' compiler warnings inevitable on sunqe and sunbmac driver. The two patches in series fix compiler warnings. Tushar Dave (2): sunqe: Fix compiler warnings sunbmac: Fix compiler warning drivers/net/ethernet/sun/sunbmac.c | 5 +++-- drivers/net/ethernet/sun/sunbmac.h | 2 +- drivers/net/ethernet/sun/sunqe.c | 11 ++- drivers/net/ethernet/sun/sunqe.h | 4 ++-- 4 files changed, 12 insertions(+), 10 deletions(-) -- 1.9.1
[PATCH 1/2] sunqe: Fix compiler warnings
sunqe uses '__u32' for dma handle while invoking kernel DMA APIs, instead of using dma_addr_t. This hasn't caused any 'incompatible pointer type' warning on SPARC because until now dma_addr_t is of type u32. However, recent changes in SPARC ATU (iommu) enables 64bit DMA and therefore dma_addr_t becomes of type u64. This makes 'incompatible pointer type' warnings inevitable. e.g. drivers/net/ethernet/sun/sunqe.c: In function ‘qec_ether_init’: drivers/net/ethernet/sun/sunqe.c:883: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ drivers/net/ethernet/sun/sunqe.c:885: warning: passing argument 3 of ‘dma_alloc_coherent’ from incompatible pointer type ./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument is of type ‘__u32 *’ This patch resolves above compiler warnings. Signed-off-by: Tushar Dave Reviewed-by: chris hyser --- drivers/net/ethernet/sun/sunqe.c | 11 ++- drivers/net/ethernet/sun/sunqe.h | 4 ++-- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/sun/sunqe.c b/drivers/net/ethernet/sun/sunqe.c index 9b825780..9582948 100644 --- a/drivers/net/ethernet/sun/sunqe.c +++ b/drivers/net/ethernet/sun/sunqe.c @@ -124,7 +124,7 @@ static void qe_init_rings(struct sunqe *qep) { struct qe_init_block *qb = qep->qe_block; struct sunqe_buffers *qbufs = qep->buffers; - __u32 qbufs_dvma = qep->buffers_dvma; + __u32 qbufs_dvma = (__u32)qep->buffers_dvma; int i; qep->rx_new = qep->rx_old = qep->tx_new = qep->tx_old = 0; @@ -144,6 +144,7 @@ static int qe_init(struct sunqe *qep, int from_irq) void __iomem *mregs = qep->mregs; void __iomem *gregs = qecp->gregs; unsigned char *e = >dev->dev_addr[0]; + __u32 qblk_dvma = (__u32)qep->qblock_dvma; u32 tmp; int i; @@ -152,8 +153,8 @@ static int qe_init(struct sunqe *qep, int from_irq) return -EAGAIN; /* Setup initial rx/tx init block pointers. */ - sbus_writel(qep->qblock_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS); - sbus_writel(qep->qblock_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS); + sbus_writel(qblk_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS); + sbus_writel(qblk_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS); /* Enable/mask the various irq's. */ sbus_writel(0, cregs + CREG_RIMASK); @@ -413,7 +414,7 @@ static void qe_rx(struct sunqe *qep) struct net_device *dev = qep->dev; struct qe_rxd *this; struct sunqe_buffers *qbufs = qep->buffers; - __u32 qbufs_dvma = qep->buffers_dvma; + __u32 qbufs_dvma = (__u32)qep->buffers_dvma; int elem = qep->rx_new; u32 flags; @@ -572,7 +573,7 @@ static int qe_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct sunqe *qep = netdev_priv(dev); struct sunqe_buffers *qbufs = qep->buffers; - __u32 txbuf_dvma, qbufs_dvma = qep->buffers_dvma; + __u32 txbuf_dvma, qbufs_dvma = (__u32)qep->buffers_dvma; unsigned char *txbuf; int len, entry; diff --git a/drivers/net/ethernet/sun/sunqe.h b/drivers/net/ethernet/sun/sunqe.h index 581781b..ae190b7 100644 --- a/drivers/net/ethernet/sun/sunqe.h +++ b/drivers/net/ethernet/sun/sunqe.h @@ -334,12 +334,12 @@ struct sunqe { void __iomem*qcregs;/* QEC per-channel Registers */ void __iomem*mregs; /* Per-channel MACE Registers */ struct qe_init_block*qe_block; /* RX and TX descriptors */ - __u32 qblock_dvma;/* RX and TX descriptors */ + dma_addr_t qblock_dvma;/* RX and TX descriptors */ spinlock_t lock; /* Protects txfull state */ int rx_new, rx_old; /* RX ring extents */ int tx_new, tx_old; /* TX ring extents */ struct sunqe_buffers*buffers; /* CPU visible address. */ - __u32 buffers_dvma; /* DVMA visible address. */ + dma_addr_t buffers_dvma; /* DVMA visible address. */ struct sunqec *parent; u8 mconfig;/* Base MACE mconfig value */ struct platform_device *op;/* QE's OF device struct */ -- 1.9.1
[PATCH 0/2] net: Fix compiler warnings
Recently, ATU (iommu) changes are submitted to linux-sparc that enables 64bit DMA on SPARC. However, this change also makes 'incompatible pointer type' compiler warnings inevitable on sunqe and sunbmac driver. The two patches in series fix compiler warnings. Tushar Dave (2): sunqe: Fix compiler warnings sunbmac: Fix compiler warning drivers/net/ethernet/sun/sunbmac.c | 5 +++-- drivers/net/ethernet/sun/sunbmac.h | 2 +- drivers/net/ethernet/sun/sunqe.c | 11 ++- drivers/net/ethernet/sun/sunqe.h | 4 ++-- 4 files changed, 12 insertions(+), 10 deletions(-) -- 1.9.1
[PATCH 6/6] sparc64: Enable 64-bit DMA
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities to use ATU for 64bit DMA. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/Kconfig| 4 arch/sparc/kernel/iommu.c | 8 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index a7482bc..99bb845 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -94,6 +94,10 @@ config ARCH_ATU bool default y if SPARC64 +config ARCH_DMA_ADDR_T_64BIT + bool + default y if ARCH_ATU + config IOMMU_HELPER bool default y if SPARC64 diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 5c615ab..852a329 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask) struct iommu *iommu = dev->archdata.iommu; u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask >= (1UL << 32UL)) - return 0; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -- 1.9.1
[PATCH 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with sun4v hypervisor PCI IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. For example, we recently experienced legacy IOMMU limitation while using i40e driver in system with large number of CPUs (e.g. 128). Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has 512 (default) descriptors. So considering only RX queues (because RX premap DMA buffers), i40e takes 4*128*512 number of DMA entries in IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available in table. So bringing up four instance of i40e alone saturate existing IOMMU resource. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7) and sun4u SPARC. Thanks. -Tushar Dave Kleikamp (1): sparc64: Add FORCE_MAX_ZONEORDER and default to 13 Tushar Dave (5): sparc64: Add ATU (new IOMMU) support sparc64: Initialize iommu_map_table and iommu_pool sparc64: Bind PCIe devices to use IOMMU v2 service sparc64: Enable sun4v dma ops to use IOMMU v2 APIs sparc64: Enable 64-bit DMA arch/sparc/Kconfig | 22 ++ arch/sparc/include/asm/hypervisor.h | 343 + arch/sparc/include/asm/iommu_64.h | 28 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/iommu.c | 8 +- arch/sparc/kernel/pci_sun4v.c | 415 +++- arch/sparc/kernel/pci_sun4v.h | 21 ++ arch/sparc/kernel/pci_sun4v_asm.S | 68 ++ 8 files changed, 846 insertions(+), 60 deletions(-) -- 1.9.1
[PATCH 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe device has to be bound to IOTSB using HV API pci_iotsb_bind(). Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/kernel/pci_sun4v.c | 43 +++ arch/sparc/kernel/pci_sun4v.h | 3 +++ arch/sparc/kernel/pci_sun4v_asm.S | 14 + 3 files changed, 60 insertions(+) diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 5404b33..c7f3a0b3 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -216,6 +216,43 @@ range_alloc_fail: return NULL; } +unsigned long dma_4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + struct pci_bus *bus_dev) +{ + struct pci_dev *pdev; + unsigned long err; + unsigned int bus; + unsigned int device; + unsigned int fun; + + list_for_each_entry(pdev, _dev->devices, bus_list) { + if (pdev->subordinate) { + /* No need to bind pci bridge */ + dma_4v_iotsb_bind(devhandle, iotsb_num, + pdev->subordinate); + } else { + bus = bus_dev->number; + device = PCI_SLOT(pdev->devfn); + fun = PCI_FUNC(pdev->devfn); + err = pci_sun4v_iotsb_bind(devhandle, iotsb_num, + HV_PCI_DEVICE_BUILD(bus, + device, + fun)); + + /* If bind fails for one device it is going to fail +* for rest of the devices because we are sharing +* IOTSB. So in case of failure simply return with +* error. +*/ + if (err) + return err; + } + } + + return 0; +} + static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry, unsigned long npages) { @@ -628,6 +665,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info *pbm) } iotsb->iotsb_num = iotsb_num; + err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus); + if (err) { + pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err); + goto iotsb_conf_failed; + } + return 0; iotsb_conf_failed: diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h index 0ef6d1c..1019e0f 100644 --- a/arch/sparc/kernel/pci_sun4v.h +++ b/arch/sparc/kernel/pci_sun4v.h @@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle, unsigned long page_size, unsigned long dvma_base, u64 *iotsb_num); +unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + unsigned int pci_device); #endif /* !(_PCI_SUN4V_H) */ diff --git a/arch/sparc/kernel/pci_sun4v_asm.S b/arch/sparc/kernel/pci_sun4v_asm.S index fd94d0e..22024a9 100644 --- a/arch/sparc/kernel/pci_sun4v_asm.S +++ b/arch/sparc/kernel/pci_sun4v_asm.S @@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf) retl stx%o1, [%g1] ENDPROC(pci_sun4v_iotsb_conf) + + /* +* %o0: devhandle +* %o1: iotsb_num/iotsb_handle +* %o2: pci_device +* +* returns %o0: status +*/ +ENTRY(pci_sun4v_iotsb_bind) + mov HV_FAST_PCI_IOTSB_BIND, %o5 + ta HV_FAST_TRAP + retl +nop +ENDPROC(pci_sun4v_iotsb_bind) -- 1.9.1
[PATCH 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13
From: Dave Kleikamp <dave.kleik...@oracle.com> This change allows ATU (new IOMMU) in SPARC systems to request large (32M) contiguous memory during boot for creating IOTSB backing store. Signed-off-by: Dave Kleikamp <dave.kleik...@oracle.com> Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> --- arch/sparc/Kconfig | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index f5d60f1..a7482bc 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -90,6 +90,10 @@ config ARCH_DEFCONFIG config ARCH_PROC_KCORE_TEXT def_bool y +config ARCH_ATU + bool + default y if SPARC64 + config IOMMU_HELPER bool default y if SPARC64 @@ -305,6 +309,20 @@ config ARCH_SPARSEMEM_ENABLE config ARCH_SPARSEMEM_DEFAULT def_bool y if SPARC64 +config FORCE_MAX_ZONEORDER + int "Maximum zone order" + default "13" + help + The kernel memory allocator divides physically contiguous memory + blocks into "zones", where each zone is a power of two number of + pages. This option selects the largest power of two that the kernel + keeps in the memory allocator. If you need to allocate very large + blocks of physically contiguous memory, then you may need to + increase this value. + + This config option is actually maximum order plus one. For example, + a value of 13 means that the largest free memory block is 2^12 pages. + source "mm/Kconfig" if SPARC64 -- 1.9.1
[PATCH 6/6] sparc64: Enable 64-bit DMA
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities to use ATU for 64bit DMA. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/Kconfig| 4 arch/sparc/kernel/iommu.c | 8 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index a7482bc..99bb845 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -94,6 +94,10 @@ config ARCH_ATU bool default y if SPARC64 +config ARCH_DMA_ADDR_T_64BIT + bool + default y if ARCH_ATU + config IOMMU_HELPER bool default y if SPARC64 diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 5c615ab..852a329 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask) struct iommu *iommu = dev->archdata.iommu; u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask >= (1UL << 32UL)) - return 0; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -- 1.9.1
[PATCH 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with sun4v hypervisor PCI IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. For example, we recently experienced legacy IOMMU limitation while using i40e driver in system with large number of CPUs (e.g. 128). Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has 512 (default) descriptors. So considering only RX queues (because RX premap DMA buffers), i40e takes 4*128*512 number of DMA entries in IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available in table. So bringing up four instance of i40e alone saturate existing IOMMU resource. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7) and sun4u SPARC. Thanks. -Tushar Dave Kleikamp (1): sparc64: Add FORCE_MAX_ZONEORDER and default to 13 Tushar Dave (5): sparc64: Add ATU (new IOMMU) support sparc64: Initialize iommu_map_table and iommu_pool sparc64: Bind PCIe devices to use IOMMU v2 service sparc64: Enable sun4v dma ops to use IOMMU v2 APIs sparc64: Enable 64-bit DMA arch/sparc/Kconfig | 22 ++ arch/sparc/include/asm/hypervisor.h | 343 + arch/sparc/include/asm/iommu_64.h | 28 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/iommu.c | 8 +- arch/sparc/kernel/pci_sun4v.c | 415 +++- arch/sparc/kernel/pci_sun4v.h | 21 ++ arch/sparc/kernel/pci_sun4v_asm.S | 68 ++ 8 files changed, 846 insertions(+), 60 deletions(-) -- 1.9.1
[PATCH 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe device has to be bound to IOTSB using HV API pci_iotsb_bind(). Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/kernel/pci_sun4v.c | 43 +++ arch/sparc/kernel/pci_sun4v.h | 3 +++ arch/sparc/kernel/pci_sun4v_asm.S | 14 + 3 files changed, 60 insertions(+) diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 5404b33..c7f3a0b3 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -216,6 +216,43 @@ range_alloc_fail: return NULL; } +unsigned long dma_4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + struct pci_bus *bus_dev) +{ + struct pci_dev *pdev; + unsigned long err; + unsigned int bus; + unsigned int device; + unsigned int fun; + + list_for_each_entry(pdev, _dev->devices, bus_list) { + if (pdev->subordinate) { + /* No need to bind pci bridge */ + dma_4v_iotsb_bind(devhandle, iotsb_num, + pdev->subordinate); + } else { + bus = bus_dev->number; + device = PCI_SLOT(pdev->devfn); + fun = PCI_FUNC(pdev->devfn); + err = pci_sun4v_iotsb_bind(devhandle, iotsb_num, + HV_PCI_DEVICE_BUILD(bus, + device, + fun)); + + /* If bind fails for one device it is going to fail +* for rest of the devices because we are sharing +* IOTSB. So in case of failure simply return with +* error. +*/ + if (err) + return err; + } + } + + return 0; +} + static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry, unsigned long npages) { @@ -628,6 +665,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info *pbm) } iotsb->iotsb_num = iotsb_num; + err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus); + if (err) { + pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err); + goto iotsb_conf_failed; + } + return 0; iotsb_conf_failed: diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h index 0ef6d1c..1019e0f 100644 --- a/arch/sparc/kernel/pci_sun4v.h +++ b/arch/sparc/kernel/pci_sun4v.h @@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle, unsigned long page_size, unsigned long dvma_base, u64 *iotsb_num); +unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + unsigned int pci_device); #endif /* !(_PCI_SUN4V_H) */ diff --git a/arch/sparc/kernel/pci_sun4v_asm.S b/arch/sparc/kernel/pci_sun4v_asm.S index fd94d0e..22024a9 100644 --- a/arch/sparc/kernel/pci_sun4v_asm.S +++ b/arch/sparc/kernel/pci_sun4v_asm.S @@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf) retl stx%o1, [%g1] ENDPROC(pci_sun4v_iotsb_conf) + + /* +* %o0: devhandle +* %o1: iotsb_num/iotsb_handle +* %o2: pci_device +* +* returns %o0: status +*/ +ENTRY(pci_sun4v_iotsb_bind) + mov HV_FAST_PCI_IOTSB_BIND, %o5 + ta HV_FAST_TRAP + retl +nop +ENDPROC(pci_sun4v_iotsb_bind) -- 1.9.1
[PATCH 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13
From: Dave Kleikamp This change allows ATU (new IOMMU) in SPARC systems to request large (32M) contiguous memory during boot for creating IOTSB backing store. Signed-off-by: Dave Kleikamp Signed-off-by: Tushar Dave --- arch/sparc/Kconfig | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index f5d60f1..a7482bc 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -90,6 +90,10 @@ config ARCH_DEFCONFIG config ARCH_PROC_KCORE_TEXT def_bool y +config ARCH_ATU + bool + default y if SPARC64 + config IOMMU_HELPER bool default y if SPARC64 @@ -305,6 +309,20 @@ config ARCH_SPARSEMEM_ENABLE config ARCH_SPARSEMEM_DEFAULT def_bool y if SPARC64 +config FORCE_MAX_ZONEORDER + int "Maximum zone order" + default "13" + help + The kernel memory allocator divides physically contiguous memory + blocks into "zones", where each zone is a power of two number of + pages. This option selects the largest power of two that the kernel + keeps in the memory allocator. If you need to allocate very large + blocks of physically contiguous memory, then you may need to + increase this value. + + This config option is actually maximum order plus one. For example, + a value of 13 means that the largest free memory block is 2^12 pages. + source "mm/Kconfig" if SPARC64 -- 1.9.1
[PATCH 2/6] sparc64: Add ATU (new IOMMU) support
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with Hypervisor IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 300MB-500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/hypervisor.h | 337 arch/sparc/include/asm/iommu_64.h | 26 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/pci_sun4v.c | 139 +++ arch/sparc/kernel/pci_sun4v.h | 7 + arch/sparc/kernel/pci_sun4v_asm.S | 18 ++ 6 files changed, 528 insertions(+) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 666d5ba..7b15df8 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, */ #define HV_FAST_PCI_MSG_SETVALID 0xd3 +/* PCI IOMMU v2 definitions and services + * + * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO + * definitions and services. + * + * CTE Clump Table Entry. First level table entry in the ATU. + * + * pci_device_list + * A 32-bit aligned list of pci_devices. + * + * pci_device_listp + * real address of a pci_device_list. 32-bit aligned. + * + * iotte IOMMU translation table entry. + * + * iotte_attributes + * IO Attributes for IOMMU v2 mappings. In addition to + * read, write IOMMU v2 supports relax ordering + * + * io_page_listA 64-bit aligned list of real addresses. Each real + * address in an io_page_list must be properly aligned + * to the pagesize of the given IOTSB. + * + * io_page_list_p Real address of an io_page_list, 64-bit aligned. + * + * IOTSB IO Translation Storage Buffer. An aligned table of + * IOTTEs. Each IOTSB has a pagesize, table size, and + * virtual address associated with it that must match + * a pagesize and table size supported by the un-derlying + * hardware implementation. The alignment requirements + * for an IOTSB depend on the pagesize used for that IOTSB. + * Each IOTTE in an IOTSB maps one pagesize-sized page. + * The size of the IOTSB dictates how large of a virtual + * address space the IOTSB is capable of mapping. + * + * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus + * iotsb_handle represents a binding of an IOTSB to a + * PCI root complex. + * + * iotsb_index Zero-based IOTTE number within an IOTSB. + */ + +/* pci_iotsb_conf() + * TRAP: HV_FAST_TRAP + * FUNCTION: HV_FAST_PCI_IOTSB_CONF + * ARG0: devhandle + * ARG1: r_addr + * ARG2: size + * ARG3: pagesize + * ARG4: iova + * RET0: status + * RET1: iotsb_handle + * ERRORS: EINVAL Invalid devhandle, size, iova, or pagesize + * EBADALIGN r_addr is not properly aligned + * ENORADDRr_addr is not a valid real address + * ETOOMANYNo further IOTSBs may be configured + * EBUSY Duplicate devhandle, raddir, iova combination + * + * Create an IOTSB suitable for the PCI root complex identified by devhandle, + * for the DMA virtual address defined by the argument iova. + * + * r_addr is the properly aligned base address of the IOTSB and size is the + * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to + * being configured. If it contains any values other than zeros then the + * behavior is undefined. + * + * pagesize is the size of each page in the IOTSB. Note that the combination of + * size (table size) and pagesize must be valid. + * + * virt is the DMA virtual address this IOTSB will map. + * + * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1. + * Once configured,
[PATCH 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with 64bit DMA mask. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/hypervisor.h | 6 + arch/sparc/kernel/pci_sun4v.c | 214 ++-- arch/sparc/kernel/pci_sun4v.h | 11 ++ arch/sparc/kernel/pci_sun4v_asm.S | 36 ++ 4 files changed, 209 insertions(+), 58 deletions(-) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 7b15df8..73cb897 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, * iotsb_index Zero-based IOTTE number within an IOTSB. */ +/* The index_count argument consists of two fields: + * bits 63:48 #iottes and bits 47:0 iotsb_index + */ +#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \ + (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index))) + /* pci_iotsb_conf() * TRAP: HV_FAST_TRAP * FUNCTION: HV_FAST_PCI_IOTSB_CONF diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index c7f3a0b3..1ea6d30 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, unsigned long prot, uns } /* Interrupts must be disabled. */ -static long iommu_batch_flush(struct iommu_batch *p) +static long iommu_batch_flush(struct iommu_batch *p, u64 mask) { struct pci_pbm_info *pbm = p->dev->archdata.host_controller; + u64 *pglist = p->pglist; + u64 index_count; unsigned long devhandle = pbm->devhandle; unsigned long prot = p->prot; unsigned long entry = p->entry; - u64 *pglist = p->pglist; unsigned long npages = p->npages; + unsigned long iotsb_num; + unsigned long ret; + long num; /* VPCI maj=1, min=[0,1] only supports read and write */ if (vpci_major < 2) prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE); while (npages != 0) { - long num; - - num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist)); - if (unlikely(num < 0)) { - if (printk_ratelimit()) - printk("iommu_batch_flush: IOMMU map of " - "[%08lx:%08llx:%lx:%lx:%lx] failed with " - "status %ld\n", - devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist), num); - return -1; + if (mask <= DMA_BIT_MASK(32)) { + num = pci_sun4v_iommu_map(devhandle, + HV_PCI_TSBID(0, entry), + npages, + prot, + __pa(pglist)); + if (unlikely(num < 0)) { + pr_err_ratelimited("iommu_batch_flush: IOMMU map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n", + devhandle, + HV_PCI_TSBID(0, entry), + npages, prot, __pa(pglist), + num); + return -1; + } + } else { + index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry), + iotsb_num = pbm->iommu->atu->iotsb->iotsb_num; + ret = pci_sun4v_iotsb_map(devhandle, + iotsb_num, + index_count, + prot, + __pa(pglist), + ); + if (unlikely(ret != HV_EOK)) { + pr_err_ratelimited("iommu_batch_flush: ATU map of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n", + devhandle, iotsb_num, + index_count, prot, + __pa(pglist), ret); + return -1; +
[PATCH 2/6] sparc64: Add ATU (new IOMMU) support
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with Hypervisor IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 300MB-500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/include/asm/hypervisor.h | 337 arch/sparc/include/asm/iommu_64.h | 26 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/pci_sun4v.c | 139 +++ arch/sparc/kernel/pci_sun4v.h | 7 + arch/sparc/kernel/pci_sun4v_asm.S | 18 ++ 6 files changed, 528 insertions(+) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 666d5ba..7b15df8 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, */ #define HV_FAST_PCI_MSG_SETVALID 0xd3 +/* PCI IOMMU v2 definitions and services + * + * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO + * definitions and services. + * + * CTE Clump Table Entry. First level table entry in the ATU. + * + * pci_device_list + * A 32-bit aligned list of pci_devices. + * + * pci_device_listp + * real address of a pci_device_list. 32-bit aligned. + * + * iotte IOMMU translation table entry. + * + * iotte_attributes + * IO Attributes for IOMMU v2 mappings. In addition to + * read, write IOMMU v2 supports relax ordering + * + * io_page_listA 64-bit aligned list of real addresses. Each real + * address in an io_page_list must be properly aligned + * to the pagesize of the given IOTSB. + * + * io_page_list_p Real address of an io_page_list, 64-bit aligned. + * + * IOTSB IO Translation Storage Buffer. An aligned table of + * IOTTEs. Each IOTSB has a pagesize, table size, and + * virtual address associated with it that must match + * a pagesize and table size supported by the un-derlying + * hardware implementation. The alignment requirements + * for an IOTSB depend on the pagesize used for that IOTSB. + * Each IOTTE in an IOTSB maps one pagesize-sized page. + * The size of the IOTSB dictates how large of a virtual + * address space the IOTSB is capable of mapping. + * + * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus + * iotsb_handle represents a binding of an IOTSB to a + * PCI root complex. + * + * iotsb_index Zero-based IOTTE number within an IOTSB. + */ + +/* pci_iotsb_conf() + * TRAP: HV_FAST_TRAP + * FUNCTION: HV_FAST_PCI_IOTSB_CONF + * ARG0: devhandle + * ARG1: r_addr + * ARG2: size + * ARG3: pagesize + * ARG4: iova + * RET0: status + * RET1: iotsb_handle + * ERRORS: EINVAL Invalid devhandle, size, iova, or pagesize + * EBADALIGN r_addr is not properly aligned + * ENORADDRr_addr is not a valid real address + * ETOOMANYNo further IOTSBs may be configured + * EBUSY Duplicate devhandle, raddir, iova combination + * + * Create an IOTSB suitable for the PCI root complex identified by devhandle, + * for the DMA virtual address defined by the argument iova. + * + * r_addr is the properly aligned base address of the IOTSB and size is the + * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to + * being configured. If it contains any values other than zeros then the + * behavior is undefined. + * + * pagesize is the size of each page in the IOTSB. Note that the combination of + * size (table size) and pagesize must be valid. + * + * virt is the DMA virtual address this IOTSB will map. + * + * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1. + * Once configured, privileged access to the IOTSB memory is prohibited and + * creates undefined behavior. The only
[PATCH 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with 64bit DMA mask. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/include/asm/hypervisor.h | 6 + arch/sparc/kernel/pci_sun4v.c | 214 ++-- arch/sparc/kernel/pci_sun4v.h | 11 ++ arch/sparc/kernel/pci_sun4v_asm.S | 36 ++ 4 files changed, 209 insertions(+), 58 deletions(-) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 7b15df8..73cb897 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, * iotsb_index Zero-based IOTTE number within an IOTSB. */ +/* The index_count argument consists of two fields: + * bits 63:48 #iottes and bits 47:0 iotsb_index + */ +#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \ + (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index))) + /* pci_iotsb_conf() * TRAP: HV_FAST_TRAP * FUNCTION: HV_FAST_PCI_IOTSB_CONF diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index c7f3a0b3..1ea6d30 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, unsigned long prot, uns } /* Interrupts must be disabled. */ -static long iommu_batch_flush(struct iommu_batch *p) +static long iommu_batch_flush(struct iommu_batch *p, u64 mask) { struct pci_pbm_info *pbm = p->dev->archdata.host_controller; + u64 *pglist = p->pglist; + u64 index_count; unsigned long devhandle = pbm->devhandle; unsigned long prot = p->prot; unsigned long entry = p->entry; - u64 *pglist = p->pglist; unsigned long npages = p->npages; + unsigned long iotsb_num; + unsigned long ret; + long num; /* VPCI maj=1, min=[0,1] only supports read and write */ if (vpci_major < 2) prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE); while (npages != 0) { - long num; - - num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist)); - if (unlikely(num < 0)) { - if (printk_ratelimit()) - printk("iommu_batch_flush: IOMMU map of " - "[%08lx:%08llx:%lx:%lx:%lx] failed with " - "status %ld\n", - devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist), num); - return -1; + if (mask <= DMA_BIT_MASK(32)) { + num = pci_sun4v_iommu_map(devhandle, + HV_PCI_TSBID(0, entry), + npages, + prot, + __pa(pglist)); + if (unlikely(num < 0)) { + pr_err_ratelimited("iommu_batch_flush: IOMMU map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n", + devhandle, + HV_PCI_TSBID(0, entry), + npages, prot, __pa(pglist), + num); + return -1; + } + } else { + index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry), + iotsb_num = pbm->iommu->atu->iotsb->iotsb_num; + ret = pci_sun4v_iotsb_map(devhandle, + iotsb_num, + index_count, + prot, + __pa(pglist), + ); + if (unlikely(ret != HV_EOK)) { + pr_err_ratelimited("iommu_batch_flush: ATU map of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n", + devhandle, iotsb_num, + index_count, prot, + __pa(pglist), ret); + return -1; + } } - entry += num; npages -= n
[PATCH 3/6] sparc64: Initialize iommu_map_table and iommu_pool
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU. This change initializes iommu_map_table and iommu_pool for ATU. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Reviewed-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/iommu_64.h | 2 ++ arch/sparc/kernel/pci_sun4v.c | 19 +++ 2 files changed, 21 insertions(+) diff --git a/arch/sparc/include/asm/iommu_64.h b/arch/sparc/include/asm/iommu_64.h index 93daa59..f24f356 100644 --- a/arch/sparc/include/asm/iommu_64.h +++ b/arch/sparc/include/asm/iommu_64.h @@ -45,8 +45,10 @@ struct atu_ranges { struct atu { struct atu_ranges *ranges; struct atu_iotsb *iotsb; + struct iommu_map_table tbl; u64 base; u64 size; + u64 dma_addr_mask; }; struct iommu { diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index a1c4d5e..5404b33 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -643,6 +643,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) struct atu *atu = pbm->iommu->atu; unsigned long err; const u64 *ranges; + u64 map_size, num_iotte; + u64 dma_mask; const u32 *page_size; int len; @@ -681,6 +683,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) return err; } + /* Create ATU iommu map. +* One bit represents one iotte in IOTSB table. +*/ + dma_mask = (roundup_pow_of_two(atu->size) - 1UL); + num_iotte = atu->size / IO_PAGE_SIZE; + map_size = num_iotte / 8; + atu->tbl.table_map_base = atu->base; + atu->dma_addr_mask = dma_mask; + atu->tbl.map = kzalloc(map_size, GFP_KERNEL); + if (!atu->tbl.map) + return -ENOMEM; + + iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT, + NULL, false /* no large_pool */, + 0 /* default npools */, + false /* want span boundary checking */); + return 0; } -- 1.9.1
[PATCH 3/6] sparc64: Initialize iommu_map_table and iommu_pool
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU. This change initializes iommu_map_table and iommu_pool for ATU. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Reviewed-by: Sowmini Varadhan --- arch/sparc/include/asm/iommu_64.h | 2 ++ arch/sparc/kernel/pci_sun4v.c | 19 +++ 2 files changed, 21 insertions(+) diff --git a/arch/sparc/include/asm/iommu_64.h b/arch/sparc/include/asm/iommu_64.h index 93daa59..f24f356 100644 --- a/arch/sparc/include/asm/iommu_64.h +++ b/arch/sparc/include/asm/iommu_64.h @@ -45,8 +45,10 @@ struct atu_ranges { struct atu { struct atu_ranges *ranges; struct atu_iotsb *iotsb; + struct iommu_map_table tbl; u64 base; u64 size; + u64 dma_addr_mask; }; struct iommu { diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index a1c4d5e..5404b33 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -643,6 +643,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) struct atu *atu = pbm->iommu->atu; unsigned long err; const u64 *ranges; + u64 map_size, num_iotte; + u64 dma_mask; const u32 *page_size; int len; @@ -681,6 +683,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) return err; } + /* Create ATU iommu map. +* One bit represents one iotte in IOTSB table. +*/ + dma_mask = (roundup_pow_of_two(atu->size) - 1UL); + num_iotte = atu->size / IO_PAGE_SIZE; + map_size = num_iotte / 8; + atu->tbl.table_map_base = atu->base; + atu->dma_addr_mask = dma_mask; + atu->tbl.map = kzalloc(map_size, GFP_KERNEL); + if (!atu->tbl.map) + return -ENOMEM; + + iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT, + NULL, false /* no large_pool */, + 0 /* default npools */, + false /* want span boundary checking */); + return 0; } -- 1.9.1
[RFC PATCH 2/6] sparc64: Add ATU (new IOMMU) support
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with Hypervisor IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 300MB-500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/hypervisor.h | 337 arch/sparc/include/asm/iommu_64.h | 26 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/pci_sun4v.c | 139 +++ arch/sparc/kernel/pci_sun4v.h | 7 + arch/sparc/kernel/pci_sun4v_asm.S | 18 ++ 6 files changed, 528 insertions(+) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 666d5ba..7b15df8 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, */ #define HV_FAST_PCI_MSG_SETVALID 0xd3 +/* PCI IOMMU v2 definitions and services + * + * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO + * definitions and services. + * + * CTE Clump Table Entry. First level table entry in the ATU. + * + * pci_device_list + * A 32-bit aligned list of pci_devices. + * + * pci_device_listp + * real address of a pci_device_list. 32-bit aligned. + * + * iotte IOMMU translation table entry. + * + * iotte_attributes + * IO Attributes for IOMMU v2 mappings. In addition to + * read, write IOMMU v2 supports relax ordering + * + * io_page_listA 64-bit aligned list of real addresses. Each real + * address in an io_page_list must be properly aligned + * to the pagesize of the given IOTSB. + * + * io_page_list_p Real address of an io_page_list, 64-bit aligned. + * + * IOTSB IO Translation Storage Buffer. An aligned table of + * IOTTEs. Each IOTSB has a pagesize, table size, and + * virtual address associated with it that must match + * a pagesize and table size supported by the un-derlying + * hardware implementation. The alignment requirements + * for an IOTSB depend on the pagesize used for that IOTSB. + * Each IOTTE in an IOTSB maps one pagesize-sized page. + * The size of the IOTSB dictates how large of a virtual + * address space the IOTSB is capable of mapping. + * + * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus + * iotsb_handle represents a binding of an IOTSB to a + * PCI root complex. + * + * iotsb_index Zero-based IOTTE number within an IOTSB. + */ + +/* pci_iotsb_conf() + * TRAP: HV_FAST_TRAP + * FUNCTION: HV_FAST_PCI_IOTSB_CONF + * ARG0: devhandle + * ARG1: r_addr + * ARG2: size + * ARG3: pagesize + * ARG4: iova + * RET0: status + * RET1: iotsb_handle + * ERRORS: EINVAL Invalid devhandle, size, iova, or pagesize + * EBADALIGN r_addr is not properly aligned + * ENORADDRr_addr is not a valid real address + * ETOOMANYNo further IOTSBs may be configured + * EBUSY Duplicate devhandle, raddir, iova combination + * + * Create an IOTSB suitable for the PCI root complex identified by devhandle, + * for the DMA virtual address defined by the argument iova. + * + * r_addr is the properly aligned base address of the IOTSB and size is the + * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to + * being configured. If it contains any values other than zeros then the + * behavior is undefined. + * + * pagesize is the size of each page in the IOTSB. Note that the combination of + * size (table size) and pagesize must be valid. + * + * virt is the DMA virtual address this IOTSB will map. + * + * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1. + * Once configured,
[RFC PATCH 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with 64bit DMA mask. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/hypervisor.h | 6 + arch/sparc/kernel/pci_sun4v.c | 214 ++-- arch/sparc/kernel/pci_sun4v.h | 11 ++ arch/sparc/kernel/pci_sun4v_asm.S | 36 ++ 4 files changed, 209 insertions(+), 58 deletions(-) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 7b15df8..73cb897 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, * iotsb_index Zero-based IOTTE number within an IOTSB. */ +/* The index_count argument consists of two fields: + * bits 63:48 #iottes and bits 47:0 iotsb_index + */ +#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \ + (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index))) + /* pci_iotsb_conf() * TRAP: HV_FAST_TRAP * FUNCTION: HV_FAST_PCI_IOTSB_CONF diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index c7f3a0b3..1ea6d30 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, unsigned long prot, uns } /* Interrupts must be disabled. */ -static long iommu_batch_flush(struct iommu_batch *p) +static long iommu_batch_flush(struct iommu_batch *p, u64 mask) { struct pci_pbm_info *pbm = p->dev->archdata.host_controller; + u64 *pglist = p->pglist; + u64 index_count; unsigned long devhandle = pbm->devhandle; unsigned long prot = p->prot; unsigned long entry = p->entry; - u64 *pglist = p->pglist; unsigned long npages = p->npages; + unsigned long iotsb_num; + unsigned long ret; + long num; /* VPCI maj=1, min=[0,1] only supports read and write */ if (vpci_major < 2) prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE); while (npages != 0) { - long num; - - num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist)); - if (unlikely(num < 0)) { - if (printk_ratelimit()) - printk("iommu_batch_flush: IOMMU map of " - "[%08lx:%08llx:%lx:%lx:%lx] failed with " - "status %ld\n", - devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist), num); - return -1; + if (mask <= DMA_BIT_MASK(32)) { + num = pci_sun4v_iommu_map(devhandle, + HV_PCI_TSBID(0, entry), + npages, + prot, + __pa(pglist)); + if (unlikely(num < 0)) { + pr_err_ratelimited("iommu_batch_flush: IOMMU map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n", + devhandle, + HV_PCI_TSBID(0, entry), + npages, prot, __pa(pglist), + num); + return -1; + } + } else { + index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry), + iotsb_num = pbm->iommu->atu->iotsb->iotsb_num; + ret = pci_sun4v_iotsb_map(devhandle, + iotsb_num, + index_count, + prot, + __pa(pglist), + ); + if (unlikely(ret != HV_EOK)) { + pr_err_ratelimited("iommu_batch_flush: ATU map of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n", + devhandle, iotsb_num, + index_count, prot, + __pa(pglist), ret); + return -1; +
[RFC PATCH 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with sun4v hypervisor PCI IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. For example, we recently experienced legacy IOMMU limitation while using i40e driver in system with large number of CPUs (e.g. 128). Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has 512 (default) descriptors. So considering only RX queues (because RX premap DMA buffers), i40e takes 4*128*512 number of DMA entries in IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available in table. So bringing up four instance of i40e alone saturate existing IOMMU resource. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7) and sun4u SPARC. Thanks. -Tushar Dave Kleikamp (1): sparc64: Add FORCE_MAX_ZONEORDER and default to 13 Tushar Dave (5): sparc64: Add ATU (new IOMMU) support sparc64: Initialize iommu_map_table and iommu_pool sparc64: Bind PCIe devices to use IOMMU v2 service sparc64: Enable sun4v dma ops to use IOMMU v2 APIs sparc64: Enable 64-bit DMA arch/sparc/Kconfig | 22 ++ arch/sparc/include/asm/hypervisor.h | 343 + arch/sparc/include/asm/iommu_64.h | 28 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/iommu.c | 8 +- arch/sparc/kernel/pci_sun4v.c | 415 +++- arch/sparc/kernel/pci_sun4v.h | 21 ++ arch/sparc/kernel/pci_sun4v_asm.S | 68 ++ 8 files changed, 846 insertions(+), 60 deletions(-) -- 1.9.1
[RFC PATCH 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with 64bit DMA mask. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/include/asm/hypervisor.h | 6 + arch/sparc/kernel/pci_sun4v.c | 214 ++-- arch/sparc/kernel/pci_sun4v.h | 11 ++ arch/sparc/kernel/pci_sun4v_asm.S | 36 ++ 4 files changed, 209 insertions(+), 58 deletions(-) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 7b15df8..73cb897 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, * iotsb_index Zero-based IOTTE number within an IOTSB. */ +/* The index_count argument consists of two fields: + * bits 63:48 #iottes and bits 47:0 iotsb_index + */ +#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \ + (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index))) + /* pci_iotsb_conf() * TRAP: HV_FAST_TRAP * FUNCTION: HV_FAST_PCI_IOTSB_CONF diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index c7f3a0b3..1ea6d30 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, unsigned long prot, uns } /* Interrupts must be disabled. */ -static long iommu_batch_flush(struct iommu_batch *p) +static long iommu_batch_flush(struct iommu_batch *p, u64 mask) { struct pci_pbm_info *pbm = p->dev->archdata.host_controller; + u64 *pglist = p->pglist; + u64 index_count; unsigned long devhandle = pbm->devhandle; unsigned long prot = p->prot; unsigned long entry = p->entry; - u64 *pglist = p->pglist; unsigned long npages = p->npages; + unsigned long iotsb_num; + unsigned long ret; + long num; /* VPCI maj=1, min=[0,1] only supports read and write */ if (vpci_major < 2) prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE); while (npages != 0) { - long num; - - num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist)); - if (unlikely(num < 0)) { - if (printk_ratelimit()) - printk("iommu_batch_flush: IOMMU map of " - "[%08lx:%08llx:%lx:%lx:%lx] failed with " - "status %ld\n", - devhandle, HV_PCI_TSBID(0, entry), - npages, prot, __pa(pglist), num); - return -1; + if (mask <= DMA_BIT_MASK(32)) { + num = pci_sun4v_iommu_map(devhandle, + HV_PCI_TSBID(0, entry), + npages, + prot, + __pa(pglist)); + if (unlikely(num < 0)) { + pr_err_ratelimited("iommu_batch_flush: IOMMU map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n", + devhandle, + HV_PCI_TSBID(0, entry), + npages, prot, __pa(pglist), + num); + return -1; + } + } else { + index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry), + iotsb_num = pbm->iommu->atu->iotsb->iotsb_num; + ret = pci_sun4v_iotsb_map(devhandle, + iotsb_num, + index_count, + prot, + __pa(pglist), + ); + if (unlikely(ret != HV_EOK)) { + pr_err_ratelimited("iommu_batch_flush: ATU map of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n", + devhandle, iotsb_num, + index_count, prot, + __pa(pglist), ret); + return -1; + } } - entry += num; npages -= n
[RFC PATCH 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with sun4v hypervisor PCI IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. For example, we recently experienced legacy IOMMU limitation while using i40e driver in system with large number of CPUs (e.g. 128). Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has 512 (default) descriptors. So considering only RX queues (because RX premap DMA buffers), i40e takes 4*128*512 number of DMA entries in IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available in table. So bringing up four instance of i40e alone saturate existing IOMMU resource. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7) and sun4u SPARC. Thanks. -Tushar Dave Kleikamp (1): sparc64: Add FORCE_MAX_ZONEORDER and default to 13 Tushar Dave (5): sparc64: Add ATU (new IOMMU) support sparc64: Initialize iommu_map_table and iommu_pool sparc64: Bind PCIe devices to use IOMMU v2 service sparc64: Enable sun4v dma ops to use IOMMU v2 APIs sparc64: Enable 64-bit DMA arch/sparc/Kconfig | 22 ++ arch/sparc/include/asm/hypervisor.h | 343 + arch/sparc/include/asm/iommu_64.h | 28 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/iommu.c | 8 +- arch/sparc/kernel/pci_sun4v.c | 415 +++- arch/sparc/kernel/pci_sun4v.h | 21 ++ arch/sparc/kernel/pci_sun4v_asm.S | 68 ++ 8 files changed, 846 insertions(+), 60 deletions(-) -- 1.9.1
[RFC PATCH 2/6] sparc64: Add ATU (new IOMMU) support
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with Hypervisor IOMMU v2 APIs. Current SPARC IOMMU supports only 32bit address ranges and one TSB per PCIe root complex that has a 2GB per root complex DVMA space limit. The limit has become a scalability bottleneck nowadays that a typical 10G/40G NIC can consume 300MB-500MB DVMA space per instance. When DVMA resource is exhausted, devices will not be usable since the driver can't allocate DVMA. ATU removes bottleneck by allowing guest os to create IOTSB of size 32G (or more) with 64bit address ranges available in ATU HW. 32G is more than enough DVMA space to be shared by all PCIe devices under root complex contrast to 2G space provided by legacy IOMMU. ATU allows PCIe devices to use 64bit DMA addressing. Devices which choose to use 32bit DMA mask will continue to work with the existing legacy IOMMU. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/include/asm/hypervisor.h | 337 arch/sparc/include/asm/iommu_64.h | 26 +++ arch/sparc/kernel/hvapi.c | 1 + arch/sparc/kernel/pci_sun4v.c | 139 +++ arch/sparc/kernel/pci_sun4v.h | 7 + arch/sparc/kernel/pci_sun4v_asm.S | 18 ++ 6 files changed, 528 insertions(+) diff --git a/arch/sparc/include/asm/hypervisor.h b/arch/sparc/include/asm/hypervisor.h index 666d5ba..7b15df8 100644 --- a/arch/sparc/include/asm/hypervisor.h +++ b/arch/sparc/include/asm/hypervisor.h @@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long dev_handle, */ #define HV_FAST_PCI_MSG_SETVALID 0xd3 +/* PCI IOMMU v2 definitions and services + * + * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO + * definitions and services. + * + * CTE Clump Table Entry. First level table entry in the ATU. + * + * pci_device_list + * A 32-bit aligned list of pci_devices. + * + * pci_device_listp + * real address of a pci_device_list. 32-bit aligned. + * + * iotte IOMMU translation table entry. + * + * iotte_attributes + * IO Attributes for IOMMU v2 mappings. In addition to + * read, write IOMMU v2 supports relax ordering + * + * io_page_listA 64-bit aligned list of real addresses. Each real + * address in an io_page_list must be properly aligned + * to the pagesize of the given IOTSB. + * + * io_page_list_p Real address of an io_page_list, 64-bit aligned. + * + * IOTSB IO Translation Storage Buffer. An aligned table of + * IOTTEs. Each IOTSB has a pagesize, table size, and + * virtual address associated with it that must match + * a pagesize and table size supported by the un-derlying + * hardware implementation. The alignment requirements + * for an IOTSB depend on the pagesize used for that IOTSB. + * Each IOTTE in an IOTSB maps one pagesize-sized page. + * The size of the IOTSB dictates how large of a virtual + * address space the IOTSB is capable of mapping. + * + * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus + * iotsb_handle represents a binding of an IOTSB to a + * PCI root complex. + * + * iotsb_index Zero-based IOTTE number within an IOTSB. + */ + +/* pci_iotsb_conf() + * TRAP: HV_FAST_TRAP + * FUNCTION: HV_FAST_PCI_IOTSB_CONF + * ARG0: devhandle + * ARG1: r_addr + * ARG2: size + * ARG3: pagesize + * ARG4: iova + * RET0: status + * RET1: iotsb_handle + * ERRORS: EINVAL Invalid devhandle, size, iova, or pagesize + * EBADALIGN r_addr is not properly aligned + * ENORADDRr_addr is not a valid real address + * ETOOMANYNo further IOTSBs may be configured + * EBUSY Duplicate devhandle, raddir, iova combination + * + * Create an IOTSB suitable for the PCI root complex identified by devhandle, + * for the DMA virtual address defined by the argument iova. + * + * r_addr is the properly aligned base address of the IOTSB and size is the + * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to + * being configured. If it contains any values other than zeros then the + * behavior is undefined. + * + * pagesize is the size of each page in the IOTSB. Note that the combination of + * size (table size) and pagesize must be valid. + * + * virt is the DMA virtual address this IOTSB will map. + * + * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1. + * Once configured, privileged access to the IOTSB memory is prohibited and + * creates undefined behavior. The only
[RFC PATCH 3/6] sparc64: Initialize iommu_map_table and iommu_pool
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU. This change initializes iommu_map_table and iommu_pool for ATU. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Reviewed-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/include/asm/iommu_64.h | 2 ++ arch/sparc/kernel/pci_sun4v.c | 19 +++ 2 files changed, 21 insertions(+) diff --git a/arch/sparc/include/asm/iommu_64.h b/arch/sparc/include/asm/iommu_64.h index 93daa59..f24f356 100644 --- a/arch/sparc/include/asm/iommu_64.h +++ b/arch/sparc/include/asm/iommu_64.h @@ -45,8 +45,10 @@ struct atu_ranges { struct atu { struct atu_ranges *ranges; struct atu_iotsb *iotsb; + struct iommu_map_table tbl; u64 base; u64 size; + u64 dma_addr_mask; }; struct iommu { diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index a1c4d5e..5404b33 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -643,6 +643,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) struct atu *atu = pbm->iommu->atu; unsigned long err; const u64 *ranges; + u64 map_size, num_iotte; + u64 dma_mask; const u32 *page_size; int len; @@ -681,6 +683,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) return err; } + /* Create ATU iommu map. +* One bit represents one iotte in IOTSB table. +*/ + dma_mask = (roundup_pow_of_two(atu->size) - 1UL); + num_iotte = atu->size / IO_PAGE_SIZE; + map_size = num_iotte / 8; + atu->tbl.table_map_base = atu->base; + atu->dma_addr_mask = dma_mask; + atu->tbl.map = kzalloc(map_size, GFP_KERNEL); + if (!atu->tbl.map) + return -ENOMEM; + + iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT, + NULL, false /* no large_pool */, + 0 /* default npools */, + false /* want span boundary checking */); + return 0; } -- 1.9.1
[RFC PATCH 6/6] sparc64: Enable 64-bit DMA
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities to use ATU for 64bit DMA. Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/Kconfig| 4 arch/sparc/kernel/iommu.c | 8 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 338282d..78e7556 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -95,6 +95,10 @@ config ARCH_ATU bool default y if SPARC64 +config ARCH_DMA_ADDR_T_64BIT + bool + default y if ARCH_ATU + config IOMMU_HELPER bool default y if SPARC64 diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 5c615ab..852a329 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask) struct iommu *iommu = dev->archdata.iommu; u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask >= (1UL << 32UL)) - return 0; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -- 1.9.1
[RFC PATCH 3/6] sparc64: Initialize iommu_map_table and iommu_pool
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU. This change initializes iommu_map_table and iommu_pool for ATU. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Reviewed-by: Sowmini Varadhan --- arch/sparc/include/asm/iommu_64.h | 2 ++ arch/sparc/kernel/pci_sun4v.c | 19 +++ 2 files changed, 21 insertions(+) diff --git a/arch/sparc/include/asm/iommu_64.h b/arch/sparc/include/asm/iommu_64.h index 93daa59..f24f356 100644 --- a/arch/sparc/include/asm/iommu_64.h +++ b/arch/sparc/include/asm/iommu_64.h @@ -45,8 +45,10 @@ struct atu_ranges { struct atu { struct atu_ranges *ranges; struct atu_iotsb *iotsb; + struct iommu_map_table tbl; u64 base; u64 size; + u64 dma_addr_mask; }; struct iommu { diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index a1c4d5e..5404b33 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -643,6 +643,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) struct atu *atu = pbm->iommu->atu; unsigned long err; const u64 *ranges; + u64 map_size, num_iotte; + u64 dma_mask; const u32 *page_size; int len; @@ -681,6 +683,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm) return err; } + /* Create ATU iommu map. +* One bit represents one iotte in IOTSB table. +*/ + dma_mask = (roundup_pow_of_two(atu->size) - 1UL); + num_iotte = atu->size / IO_PAGE_SIZE; + map_size = num_iotte / 8; + atu->tbl.table_map_base = atu->base; + atu->dma_addr_mask = dma_mask; + atu->tbl.map = kzalloc(map_size, GFP_KERNEL); + if (!atu->tbl.map) + return -ENOMEM; + + iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT, + NULL, false /* no large_pool */, + 0 /* default npools */, + false /* want span boundary checking */); + return 0; } -- 1.9.1
[RFC PATCH 6/6] sparc64: Enable 64-bit DMA
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities to use ATU for 64bit DMA. Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/Kconfig| 4 arch/sparc/kernel/iommu.c | 8 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 338282d..78e7556 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -95,6 +95,10 @@ config ARCH_ATU bool default y if SPARC64 +config ARCH_DMA_ADDR_T_64BIT + bool + default y if ARCH_ATU + config IOMMU_HELPER bool default y if SPARC64 diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c index 5c615ab..852a329 100644 --- a/arch/sparc/kernel/iommu.c +++ b/arch/sparc/kernel/iommu.c @@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask) struct iommu *iommu = dev->archdata.iommu; u64 dma_addr_mask = iommu->dma_addr_mask; - if (device_mask >= (1UL << 32UL)) - return 0; + if (device_mask > DMA_BIT_MASK(32)) { + if (iommu->atu) + dma_addr_mask = iommu->atu->dma_addr_mask; + else + return 0; + } if ((device_mask & dma_addr_mask) == dma_addr_mask) return 1; -- 1.9.1
[RFC PATCH 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe device has to be bound to IOTSB using HV API pci_iotsb_bind(). Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> Reviewed-by: chris hyser <chris.hy...@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- arch/sparc/kernel/pci_sun4v.c | 43 +++ arch/sparc/kernel/pci_sun4v.h | 3 +++ arch/sparc/kernel/pci_sun4v_asm.S | 14 + 3 files changed, 60 insertions(+) diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 5404b33..c7f3a0b3 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -216,6 +216,43 @@ range_alloc_fail: return NULL; } +unsigned long dma_4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + struct pci_bus *bus_dev) +{ + struct pci_dev *pdev; + unsigned long err; + unsigned int bus; + unsigned int device; + unsigned int fun; + + list_for_each_entry(pdev, _dev->devices, bus_list) { + if (pdev->subordinate) { + /* No need to bind pci bridge */ + dma_4v_iotsb_bind(devhandle, iotsb_num, + pdev->subordinate); + } else { + bus = bus_dev->number; + device = PCI_SLOT(pdev->devfn); + fun = PCI_FUNC(pdev->devfn); + err = pci_sun4v_iotsb_bind(devhandle, iotsb_num, + HV_PCI_DEVICE_BUILD(bus, + device, + fun)); + + /* If bind fails for one device it is going to fail +* for rest of the devices because we are sharing +* IOTSB. So in case of failure simply return with +* error. +*/ + if (err) + return err; + } + } + + return 0; +} + static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry, unsigned long npages) { @@ -628,6 +665,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info *pbm) } iotsb->iotsb_num = iotsb_num; + err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus); + if (err) { + pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err); + goto iotsb_conf_failed; + } + return 0; iotsb_conf_failed: diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h index 0ef6d1c..1019e0f 100644 --- a/arch/sparc/kernel/pci_sun4v.h +++ b/arch/sparc/kernel/pci_sun4v.h @@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle, unsigned long page_size, unsigned long dvma_base, u64 *iotsb_num); +unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + unsigned int pci_device); #endif /* !(_PCI_SUN4V_H) */ diff --git a/arch/sparc/kernel/pci_sun4v_asm.S b/arch/sparc/kernel/pci_sun4v_asm.S index fd94d0e..22024a9 100644 --- a/arch/sparc/kernel/pci_sun4v_asm.S +++ b/arch/sparc/kernel/pci_sun4v_asm.S @@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf) retl stx%o1, [%g1] ENDPROC(pci_sun4v_iotsb_conf) + + /* +* %o0: devhandle +* %o1: iotsb_num/iotsb_handle +* %o2: pci_device +* +* returns %o0: status +*/ +ENTRY(pci_sun4v_iotsb_bind) + mov HV_FAST_PCI_IOTSB_BIND, %o5 + ta HV_FAST_TRAP + retl +nop +ENDPROC(pci_sun4v_iotsb_bind) -- 1.9.1
[RFC PATCH 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe device has to be bound to IOTSB using HV API pci_iotsb_bind(). Signed-off-by: Tushar Dave Reviewed-by: chris hyser Acked-by: Sowmini Varadhan --- arch/sparc/kernel/pci_sun4v.c | 43 +++ arch/sparc/kernel/pci_sun4v.h | 3 +++ arch/sparc/kernel/pci_sun4v_asm.S | 14 + 3 files changed, 60 insertions(+) diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c index 5404b33..c7f3a0b3 100644 --- a/arch/sparc/kernel/pci_sun4v.c +++ b/arch/sparc/kernel/pci_sun4v.c @@ -216,6 +216,43 @@ range_alloc_fail: return NULL; } +unsigned long dma_4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + struct pci_bus *bus_dev) +{ + struct pci_dev *pdev; + unsigned long err; + unsigned int bus; + unsigned int device; + unsigned int fun; + + list_for_each_entry(pdev, _dev->devices, bus_list) { + if (pdev->subordinate) { + /* No need to bind pci bridge */ + dma_4v_iotsb_bind(devhandle, iotsb_num, + pdev->subordinate); + } else { + bus = bus_dev->number; + device = PCI_SLOT(pdev->devfn); + fun = PCI_FUNC(pdev->devfn); + err = pci_sun4v_iotsb_bind(devhandle, iotsb_num, + HV_PCI_DEVICE_BUILD(bus, + device, + fun)); + + /* If bind fails for one device it is going to fail +* for rest of the devices because we are sharing +* IOTSB. So in case of failure simply return with +* error. +*/ + if (err) + return err; + } + } + + return 0; +} + static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry, unsigned long npages) { @@ -628,6 +665,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info *pbm) } iotsb->iotsb_num = iotsb_num; + err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus); + if (err) { + pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err); + goto iotsb_conf_failed; + } + return 0; iotsb_conf_failed: diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h index 0ef6d1c..1019e0f 100644 --- a/arch/sparc/kernel/pci_sun4v.h +++ b/arch/sparc/kernel/pci_sun4v.h @@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle, unsigned long page_size, unsigned long dvma_base, u64 *iotsb_num); +unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle, + unsigned long iotsb_num, + unsigned int pci_device); #endif /* !(_PCI_SUN4V_H) */ diff --git a/arch/sparc/kernel/pci_sun4v_asm.S b/arch/sparc/kernel/pci_sun4v_asm.S index fd94d0e..22024a9 100644 --- a/arch/sparc/kernel/pci_sun4v_asm.S +++ b/arch/sparc/kernel/pci_sun4v_asm.S @@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf) retl stx%o1, [%g1] ENDPROC(pci_sun4v_iotsb_conf) + + /* +* %o0: devhandle +* %o1: iotsb_num/iotsb_handle +* %o2: pci_device +* +* returns %o0: status +*/ +ENTRY(pci_sun4v_iotsb_bind) + mov HV_FAST_PCI_IOTSB_BIND, %o5 + ta HV_FAST_TRAP + retl +nop +ENDPROC(pci_sun4v_iotsb_bind) -- 1.9.1
[RFC PATCH 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13
From: Dave Kleikamp <dave.kleik...@oracle.com> This change allows ATU (new IOMMU) in SPARC systems to request large (32M) contiguous memory during boot for creating IOTSB backing store. Signed-off-by: Dave Kleikamp <dave.kleik...@oracle.com> Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com> --- arch/sparc/Kconfig | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 59b0960..338282d 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -91,6 +91,10 @@ config ARCH_DEFCONFIG config ARCH_PROC_KCORE_TEXT def_bool y +config ARCH_ATU + bool + default y if SPARC64 + config IOMMU_HELPER bool default y if SPARC64 @@ -306,6 +310,20 @@ config ARCH_SPARSEMEM_ENABLE config ARCH_SPARSEMEM_DEFAULT def_bool y if SPARC64 +config FORCE_MAX_ZONEORDER + int "Maximum zone order" + default "13" + help + The kernel memory allocator divides physically contiguous memory + blocks into "zones", where each zone is a power of two number of + pages. This option selects the largest power of two that the kernel + keeps in the memory allocator. If you need to allocate very large + blocks of physically contiguous memory, then you may need to + increase this value. + + This config option is actually maximum order plus one. For example, + a value of 13 means that the largest free memory block is 2^12 pages. + source "mm/Kconfig" if SPARC64 -- 1.9.1
[RFC PATCH 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13
From: Dave Kleikamp This change allows ATU (new IOMMU) in SPARC systems to request large (32M) contiguous memory during boot for creating IOTSB backing store. Signed-off-by: Dave Kleikamp Signed-off-by: Tushar Dave --- arch/sparc/Kconfig | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 59b0960..338282d 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -91,6 +91,10 @@ config ARCH_DEFCONFIG config ARCH_PROC_KCORE_TEXT def_bool y +config ARCH_ATU + bool + default y if SPARC64 + config IOMMU_HELPER bool default y if SPARC64 @@ -306,6 +310,20 @@ config ARCH_SPARSEMEM_ENABLE config ARCH_SPARSEMEM_DEFAULT def_bool y if SPARC64 +config FORCE_MAX_ZONEORDER + int "Maximum zone order" + default "13" + help + The kernel memory allocator divides physically contiguous memory + blocks into "zones", where each zone is a power of two number of + pages. This option selects the largest power of two that the kernel + keeps in the memory allocator. If you need to allocate very large + blocks of physically contiguous memory, then you may need to + increase this value. + + This config option is actually maximum order plus one. For example, + a value of 13 means that the largest free memory block is 2^12 pages. + source "mm/Kconfig" if SPARC64 -- 1.9.1