intel-iommu: Is this a bug?

2018-03-26 Thread Tushar Dave

Hi,

I am analyzing network performance with intel-iommu enabled.
And found that running with iommu=pt, for every dma map/unmap it
executes this code:

/*
 * At boot time, we don't yet know if devices will be 64-bit 
capable.

 * Assume that they will — if they turn out not to be, then we can
 * take them out of the 1:1 domain later.
 */
if (!startup) {
/*
 * If the device's dma_mask is less than the system's 
memory

 * size then this is not a candidate for identity mapping.
 */
u64 dma_mask = *dev->dma_mask;

if (dev->coherent_dma_mask &&
dev->coherent_dma_mask < dma_mask)
dma_mask = dev->coherent_dma_mask;

return dma_mask >= dma_get_required_mask(dev);
}


Do we really need this check for every dma/unmap?
Considering it should be only during startup, shouldn't it be,

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 582fd01..3c8f14e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2929,7 +2929,7 @@ static int iommu_should_identity_map(struct device 
*dev, int startup)

 * Assume that they will — if they turn out not to be, then we can
 * take them out of the 1:1 domain later.
 */
-   if (!startup) {
+   if (startup) {
/*
 * If the device's dma_mask is less than the system's 
memory

 * size then this is not a candidate for identity mapping.


Thanks.

-Tushar


intel-iommu: Is this a bug?

2018-03-26 Thread Tushar Dave

Hi,

I am analyzing network performance with intel-iommu enabled.
And found that running with iommu=pt, for every dma map/unmap it
executes this code:

/*
 * At boot time, we don't yet know if devices will be 64-bit 
capable.

 * Assume that they will — if they turn out not to be, then we can
 * take them out of the 1:1 domain later.
 */
if (!startup) {
/*
 * If the device's dma_mask is less than the system's 
memory

 * size then this is not a candidate for identity mapping.
 */
u64 dma_mask = *dev->dma_mask;

if (dev->coherent_dma_mask &&
dev->coherent_dma_mask < dma_mask)
dma_mask = dev->coherent_dma_mask;

return dma_mask >= dma_get_required_mask(dev);
}


Do we really need this check for every dma/unmap?
Considering it should be only during startup, shouldn't it be,

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 582fd01..3c8f14e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2929,7 +2929,7 @@ static int iommu_should_identity_map(struct device 
*dev, int startup)

 * Assume that they will — if they turn out not to be, then we can
 * take them out of the 1:1 domain later.
 */
-   if (!startup) {
+   if (startup) {
/*
 * If the device's dma_mask is less than the system's 
memory

 * size then this is not a candidate for identity mapping.


Thanks.

-Tushar


Re: [PATCH] selftests/bpf: Add bpf_probe_read_str to bpf_helpers.h

2018-02-28 Thread Tushar Dave



On 02/28/2018 08:57 AM, Daniel Borkmann wrote:

Hi Tushar,

On 02/28/2018 01:33 AM, Tushar Dave wrote:

Using bpf_probe_read_str() from samples/bpf causes compiler warning.
e.g.
warning: implicit declaration of function 'bpf_probe_read_str' is invalid in C99
   [-Wimplicit-function-declaration]
 num = bpf_probe_read_str(buf, sizeof(buf), ctx->di);
   ^
1 warning generated.

Add bpf_probe_read_str() to bpf_helpers.h so it can be used by
samples/bpf programs.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>


In general no objections to it, but it would need an in-tree
user first:

$ git grep -n bpf_probe_read_str tools/
tools/include/uapi/linux/bpf.h:596: * int bpf_probe_read_str(void *dst, int 
size, const void *unsafe_ptr)
$

Why not adding this along with a sample?

Okay, I will send new patch along with new sample or add usage of
bpf_probe_read_str() in one of our exiting sample :)

Thanks.
-Tushar
PS: adding correct mail-list this time linux-kselft...@vger.kernel.org


Thanks,
Daniel



Re: [PATCH] selftests/bpf: Add bpf_probe_read_str to bpf_helpers.h

2018-02-28 Thread Tushar Dave



On 02/28/2018 08:57 AM, Daniel Borkmann wrote:

Hi Tushar,

On 02/28/2018 01:33 AM, Tushar Dave wrote:

Using bpf_probe_read_str() from samples/bpf causes compiler warning.
e.g.
warning: implicit declaration of function 'bpf_probe_read_str' is invalid in C99
   [-Wimplicit-function-declaration]
 num = bpf_probe_read_str(buf, sizeof(buf), ctx->di);
   ^
1 warning generated.

Add bpf_probe_read_str() to bpf_helpers.h so it can be used by
samples/bpf programs.

Signed-off-by: Tushar Dave 


In general no objections to it, but it would need an in-tree
user first:

$ git grep -n bpf_probe_read_str tools/
tools/include/uapi/linux/bpf.h:596: * int bpf_probe_read_str(void *dst, int 
size, const void *unsafe_ptr)
$

Why not adding this along with a sample?

Okay, I will send new patch along with new sample or add usage of
bpf_probe_read_str() in one of our exiting sample :)

Thanks.
-Tushar
PS: adding correct mail-list this time linux-kselft...@vger.kernel.org


Thanks,
Daniel



[PATCH] selftests/bpf: Add bpf_probe_read_str to bpf_helpers.h

2018-02-27 Thread Tushar Dave
Using bpf_probe_read_str() from samples/bpf causes compiler warning.
e.g.
warning: implicit declaration of function 'bpf_probe_read_str' is invalid in C99
  [-Wimplicit-function-declaration]
num = bpf_probe_read_str(buf, sizeof(buf), ctx->di);
  ^
1 warning generated.

Add bpf_probe_read_str() to bpf_helpers.h so it can be used by
samples/bpf programs.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
---
 tools/testing/selftests/bpf/bpf_helpers.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index dde2c11..65a266d 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -65,6 +65,8 @@ static int (*bpf_xdp_adjust_head)(void *ctx, int offset) =
(void *) BPF_FUNC_xdp_adjust_head;
 static int (*bpf_xdp_adjust_meta)(void *ctx, int offset) =
(void *) BPF_FUNC_xdp_adjust_meta;
+static int (*bpf_probe_read_str)(void *dst, int size, void *unsafe_ptr) =
+   (void *) BPF_FUNC_probe_read_str;
 static int (*bpf_setsockopt)(void *ctx, int level, int optname, void *optval,
 int optlen) =
(void *) BPF_FUNC_setsockopt;
-- 
1.9.1



[PATCH] selftests/bpf: Add bpf_probe_read_str to bpf_helpers.h

2018-02-27 Thread Tushar Dave
Using bpf_probe_read_str() from samples/bpf causes compiler warning.
e.g.
warning: implicit declaration of function 'bpf_probe_read_str' is invalid in C99
  [-Wimplicit-function-declaration]
num = bpf_probe_read_str(buf, sizeof(buf), ctx->di);
  ^
1 warning generated.

Add bpf_probe_read_str() to bpf_helpers.h so it can be used by
samples/bpf programs.

Signed-off-by: Tushar Dave 
---
 tools/testing/selftests/bpf/bpf_helpers.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index dde2c11..65a266d 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -65,6 +65,8 @@ static int (*bpf_xdp_adjust_head)(void *ctx, int offset) =
(void *) BPF_FUNC_xdp_adjust_head;
 static int (*bpf_xdp_adjust_meta)(void *ctx, int offset) =
(void *) BPF_FUNC_xdp_adjust_meta;
+static int (*bpf_probe_read_str)(void *dst, int size, void *unsafe_ptr) =
+   (void *) BPF_FUNC_probe_read_str;
 static int (*bpf_setsockopt)(void *ctx, int level, int optname, void *optval,
 int optlen) =
(void *) BPF_FUNC_setsockopt;
-- 
1.9.1



Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device

2017-12-05 Thread Tushar Dave



On 12/04/2017 05:03 PM, Fengguang Wu wrote:

Hi Tushar,

On Tue, Nov 28, 2017 at 01:01:23AM +0530, Tushar Dave wrote:



On 11/23/2017 04:43 AM, Fengguang Wu wrote:

On Wed, Nov 22, 2017 at 03:40:52AM +0530, Tushar Dave wrote:



On 11/21/2017 06:11 PM, Fengguang Wu wrote:

Hello,

FYI this happens in mainline kernel 4.14.0-01330-g3c07399.
It happens since 4.13 .

It occurs in 3 out of 162 boots.


[   44.637743] advantechwdt: Unexpected close, not stopping watchdog!
[   44.997548] input: ImExPS/2 Generic Explorer Mouse as
/devices/platform/i8042/serio1/input/input6
[   45.013419] e1000 :00:03.0: disabling already-disabled device
[   45.013447] [ cut here ]
[   45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641
pci_disable_device+0xa1/0x105:
    pci_disable_device at drivers/pci/pci.c:1640
[   45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted
4.14.0-01330-g3c07399 #1
[   45.017197] task: 88011bee9e40 task.stack: c986
[   45.017987] RIP: 0010:pci_disable_device+0xa1/0x105:
    pci_disable_device at drivers/pci/pci.c:1640
[   45.018603] RSP: :c9863e30 EFLAGS: 00010286
[   45.019282] RAX: 0035 RBX: 88013a230008 RCX:

[   45.020182] RDX:  RSI:  RDI:
0203
[   45.021084] RBP: 88013a3f31e8 R08: 0001 R09:

[   45.021986] R10: 827ec29c R11: 0002 R12:
0001
[   45.022946] R13: 88013a230008 R14: 880117802b20 R15:
c9863e8f
[   45.023842] FS:  () GS:88013fd0()
knlGS:
[   45.024863] CS:  0010 DS:  ES:  CR0: 80050033
[   45.025583] CR2: c96d4000 CR3: 0220f000 CR4:
06a0
[   45.026478] Call Trace:
[   45.026811]  __e1000_shutdown+0x1d4/0x1e2:
    __e1000_shutdown at
drivers/net/ethernet/intel/e1000/e1000_main.c:5162
[   45.027344]  ? rcu_perf_cleanup+0x2a1/0x2a1:
    rcu_perf_shutdown at kernel/rcu/rcuperf.c:627
[   45.027883]  e1000_shutdown+0x14/0x3a:
    e1000_shutdown at
drivers/net/ethernet/intel/e1000/e1000_main.c:5235
[   45.028351]  device_shutdown+0x110/0x1aa:
    device_shutdown at drivers/base/core.c:2807
[   45.028858]  kernel_power_off+0x31/0x64:
    kernel_power_off at kernel/reboot.c:260
[   45.029343]  rcu_perf_shutdown+0x9b/0xa7:
    rcu_perf_shutdown at kernel/rcu/rcuperf.c:637
[   45.029852]  ? __wake_up_common_lock+0xa2/0xa2:
    autoremove_wake_function at
kernel/sched/wait.c:376
[   45.030414]  kthread+0x126/0x12e:
    kthread at kernel/kthread.c:233
[   45.030834]  ? __kthread_bind_mask+0x8e/0x8e:
    kthread at kernel/kthread.c:190
[   45.031399]  ? ret_from_fork+0x1f/0x30:
    ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.031883]  ? kernel_init+0xa/0xf5:
    kernel_init at init/main.c:997
[   45.032325]  ret_from_fork+0x1f/0x30:
    ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb
98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8
55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0
b1 61 82
[   45.035222] ---[ end trace c257137b1b1976ef ]---
[   45.037838] ACPI: Preparing to enter system sleep state S5

Attached the full dmesg, kconfig and reproduce scripts.

Looks like e1000 pci/pxi-x device is already suspended. And therefore
call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device()
already had disabled the device.
Disabling device again by e1000_shutdown handler during system shutdown
causes warning at drivers/pci/pci.c:1641.

I think function __e1000_shutdown should just return if device is
already suspended!

I don't have e1000 hardware to test right now. So if this seems logical
to others I will send a patch.


Tushar, it happens on QEMU boot testing, so do not rely on e1000 HW.
Unless you'd like to prevent regressions on real HW.

The original report attached a reproduce script to run the QEMU test.
Or you may send me the patch for testing.

Fengguang,

Would you please try this patch and test. The patch is compile tested
only. The patch is similar to how ixgbe handled the issue.
Thanks.

e1000: fix disabling already-disabled warning

This patch adds check so that driver does not disable already
disabled device.


It works! I tried 100 boots and the "e1000 :00:03.0: disabling
already-disabled device" error no longer show up.

Tested-by: Fengguang Wu <fengguang...@intel.com>

Fengguang,

Thanks for testing. I will send patch soon.

-Tushar


Thanks,
Fengguang


Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
---
 drivers/net/ethe

Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device

2017-12-05 Thread Tushar Dave



On 12/04/2017 05:03 PM, Fengguang Wu wrote:

Hi Tushar,

On Tue, Nov 28, 2017 at 01:01:23AM +0530, Tushar Dave wrote:



On 11/23/2017 04:43 AM, Fengguang Wu wrote:

On Wed, Nov 22, 2017 at 03:40:52AM +0530, Tushar Dave wrote:



On 11/21/2017 06:11 PM, Fengguang Wu wrote:

Hello,

FYI this happens in mainline kernel 4.14.0-01330-g3c07399.
It happens since 4.13 .

It occurs in 3 out of 162 boots.


[   44.637743] advantechwdt: Unexpected close, not stopping watchdog!
[   44.997548] input: ImExPS/2 Generic Explorer Mouse as
/devices/platform/i8042/serio1/input/input6
[   45.013419] e1000 :00:03.0: disabling already-disabled device
[   45.013447] [ cut here ]
[   45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641
pci_disable_device+0xa1/0x105:
    pci_disable_device at drivers/pci/pci.c:1640
[   45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted
4.14.0-01330-g3c07399 #1
[   45.017197] task: 88011bee9e40 task.stack: c986
[   45.017987] RIP: 0010:pci_disable_device+0xa1/0x105:
    pci_disable_device at drivers/pci/pci.c:1640
[   45.018603] RSP: :c9863e30 EFLAGS: 00010286
[   45.019282] RAX: 0035 RBX: 88013a230008 RCX:

[   45.020182] RDX:  RSI:  RDI:
0203
[   45.021084] RBP: 88013a3f31e8 R08: 0001 R09:

[   45.021986] R10: 827ec29c R11: 0002 R12:
0001
[   45.022946] R13: 88013a230008 R14: 880117802b20 R15:
c9863e8f
[   45.023842] FS:  () GS:88013fd0()
knlGS:
[   45.024863] CS:  0010 DS:  ES:  CR0: 80050033
[   45.025583] CR2: c96d4000 CR3: 0220f000 CR4:
06a0
[   45.026478] Call Trace:
[   45.026811]  __e1000_shutdown+0x1d4/0x1e2:
    __e1000_shutdown at
drivers/net/ethernet/intel/e1000/e1000_main.c:5162
[   45.027344]  ? rcu_perf_cleanup+0x2a1/0x2a1:
    rcu_perf_shutdown at kernel/rcu/rcuperf.c:627
[   45.027883]  e1000_shutdown+0x14/0x3a:
    e1000_shutdown at
drivers/net/ethernet/intel/e1000/e1000_main.c:5235
[   45.028351]  device_shutdown+0x110/0x1aa:
    device_shutdown at drivers/base/core.c:2807
[   45.028858]  kernel_power_off+0x31/0x64:
    kernel_power_off at kernel/reboot.c:260
[   45.029343]  rcu_perf_shutdown+0x9b/0xa7:
    rcu_perf_shutdown at kernel/rcu/rcuperf.c:637
[   45.029852]  ? __wake_up_common_lock+0xa2/0xa2:
    autoremove_wake_function at
kernel/sched/wait.c:376
[   45.030414]  kthread+0x126/0x12e:
    kthread at kernel/kthread.c:233
[   45.030834]  ? __kthread_bind_mask+0x8e/0x8e:
    kthread at kernel/kthread.c:190
[   45.031399]  ? ret_from_fork+0x1f/0x30:
    ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.031883]  ? kernel_init+0xa/0xf5:
    kernel_init at init/main.c:997
[   45.032325]  ret_from_fork+0x1f/0x30:
    ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb
98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8
55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0
b1 61 82
[   45.035222] ---[ end trace c257137b1b1976ef ]---
[   45.037838] ACPI: Preparing to enter system sleep state S5

Attached the full dmesg, kconfig and reproduce scripts.

Looks like e1000 pci/pxi-x device is already suspended. And therefore
call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device()
already had disabled the device.
Disabling device again by e1000_shutdown handler during system shutdown
causes warning at drivers/pci/pci.c:1641.

I think function __e1000_shutdown should just return if device is
already suspended!

I don't have e1000 hardware to test right now. So if this seems logical
to others I will send a patch.


Tushar, it happens on QEMU boot testing, so do not rely on e1000 HW.
Unless you'd like to prevent regressions on real HW.

The original report attached a reproduce script to run the QEMU test.
Or you may send me the patch for testing.

Fengguang,

Would you please try this patch and test. The patch is compile tested
only. The patch is similar to how ixgbe handled the issue.
Thanks.

e1000: fix disabling already-disabled warning

This patch adds check so that driver does not disable already
disabled device.


It works! I tried 100 boots and the "e1000 :00:03.0: disabling
already-disabled device" error no longer show up.

Tested-by: Fengguang Wu 

Fengguang,

Thanks for testing. I will send patch soon.

-Tushar


Thanks,
Fengguang


Signed-off-by: Tushar Dave 
---
 drivers/net/ethernet/intel/e1000/e1000.h  |  3 ++-
 drivers/net/ethernet/intel/e

Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device

2017-11-27 Thread Tushar Dave



On 11/23/2017 04:43 AM, Fengguang Wu wrote:

On Wed, Nov 22, 2017 at 03:40:52AM +0530, Tushar Dave wrote:



On 11/21/2017 06:11 PM, Fengguang Wu wrote:

Hello,

FYI this happens in mainline kernel 4.14.0-01330-g3c07399.
It happens since 4.13 .

It occurs in 3 out of 162 boots.


[   44.637743] advantechwdt: Unexpected close, not stopping watchdog!
[   44.997548] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input6

[   45.013419] e1000 :00:03.0: disabling already-disabled device
[   45.013447] [ cut here ]
[   45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 
pci_disable_device+0xa1/0x105:

    pci_disable_device at drivers/pci/pci.c:1640
[   45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 
4.14.0-01330-g3c07399 #1

[   45.017197] task: 88011bee9e40 task.stack: c986
[   45.017987] RIP: 0010:pci_disable_device+0xa1/0x105:
    pci_disable_device at drivers/pci/pci.c:1640
[   45.018603] RSP: :c9863e30 EFLAGS: 00010286
[   45.019282] RAX: 0035 RBX: 88013a230008 RCX: 

[   45.020182] RDX:  RSI:  RDI: 
0203
[   45.021084] RBP: 88013a3f31e8 R08: 0001 R09: 

[   45.021986] R10: 827ec29c R11: 0002 R12: 
0001
[   45.022946] R13: 88013a230008 R14: 880117802b20 R15: 
c9863e8f
[   45.023842] FS:  () GS:88013fd0() 
knlGS:

[   45.024863] CS:  0010 DS:  ES:  CR0: 80050033
[   45.025583] CR2: c96d4000 CR3: 0220f000 CR4: 
06a0

[   45.026478] Call Trace:
[   45.026811]  __e1000_shutdown+0x1d4/0x1e2:
    __e1000_shutdown at 
drivers/net/ethernet/intel/e1000/e1000_main.c:5162

[   45.027344]  ? rcu_perf_cleanup+0x2a1/0x2a1:
    rcu_perf_shutdown at kernel/rcu/rcuperf.c:627
[   45.027883]  e1000_shutdown+0x14/0x3a:
    e1000_shutdown at 
drivers/net/ethernet/intel/e1000/e1000_main.c:5235

[   45.028351]  device_shutdown+0x110/0x1aa:
    device_shutdown at drivers/base/core.c:2807
[   45.028858]  kernel_power_off+0x31/0x64:
    kernel_power_off at kernel/reboot.c:260
[   45.029343]  rcu_perf_shutdown+0x9b/0xa7:
    rcu_perf_shutdown at kernel/rcu/rcuperf.c:637
[   45.029852]  ? __wake_up_common_lock+0xa2/0xa2:
    autoremove_wake_function at 
kernel/sched/wait.c:376

[   45.030414]  kthread+0x126/0x12e:
    kthread at kernel/kthread.c:233
[   45.030834]  ? __kthread_bind_mask+0x8e/0x8e:
    kthread at kernel/kthread.c:190
[   45.031399]  ? ret_from_fork+0x1f/0x30:
    ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.031883]  ? kernel_init+0xa/0xf5:
    kernel_init at init/main.c:997
[   45.032325]  ret_from_fork+0x1f/0x30:
    ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 
98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 
55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 
b1 61 82

[   45.035222] ---[ end trace c257137b1b1976ef ]---
[   45.037838] ACPI: Preparing to enter system sleep state S5

Attached the full dmesg, kconfig and reproduce scripts.

Looks like e1000 pci/pxi-x device is already suspended. And therefore
call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device()
already had disabled the device.
Disabling device again by e1000_shutdown handler during system shutdown
causes warning at drivers/pci/pci.c:1641.

I think function __e1000_shutdown should just return if device is
already suspended!

I don't have e1000 hardware to test right now. So if this seems logical
to others I will send a patch.


Tushar, it happens on QEMU boot testing, so do not rely on e1000 HW.
Unless you'd like to prevent regressions on real HW.

The original report attached a reproduce script to run the QEMU test.
Or you may send me the patch for testing.

Fengguang,

Would you please try this patch and test. The patch is compile tested 
only. The patch is similar to how ixgbe handled the issue.

Thanks.

e1000: fix disabling already-disabled warning

This patch adds check so that driver does not disable already
disabled device.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
---
 drivers/net/ethernet/intel/e1000/e1000.h  |  3 ++-
 drivers/net/ethernet/intel/e1000/e1000_main.c | 23 ++-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000.h 
b/drivers/net/ethernet/intel/e1000/e1000.h

index d7bdea7..8fd2458 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/i

Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device

2017-11-27 Thread Tushar Dave



On 11/23/2017 04:43 AM, Fengguang Wu wrote:

On Wed, Nov 22, 2017 at 03:40:52AM +0530, Tushar Dave wrote:



On 11/21/2017 06:11 PM, Fengguang Wu wrote:

Hello,

FYI this happens in mainline kernel 4.14.0-01330-g3c07399.
It happens since 4.13 .

It occurs in 3 out of 162 boots.


[   44.637743] advantechwdt: Unexpected close, not stopping watchdog!
[   44.997548] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input6

[   45.013419] e1000 :00:03.0: disabling already-disabled device
[   45.013447] [ cut here ]
[   45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 
pci_disable_device+0xa1/0x105:

    pci_disable_device at drivers/pci/pci.c:1640
[   45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 
4.14.0-01330-g3c07399 #1

[   45.017197] task: 88011bee9e40 task.stack: c986
[   45.017987] RIP: 0010:pci_disable_device+0xa1/0x105:
    pci_disable_device at drivers/pci/pci.c:1640
[   45.018603] RSP: :c9863e30 EFLAGS: 00010286
[   45.019282] RAX: 0035 RBX: 88013a230008 RCX: 

[   45.020182] RDX:  RSI:  RDI: 
0203
[   45.021084] RBP: 88013a3f31e8 R08: 0001 R09: 

[   45.021986] R10: 827ec29c R11: 0002 R12: 
0001
[   45.022946] R13: 88013a230008 R14: 880117802b20 R15: 
c9863e8f
[   45.023842] FS:  () GS:88013fd0() 
knlGS:

[   45.024863] CS:  0010 DS:  ES:  CR0: 80050033
[   45.025583] CR2: c96d4000 CR3: 0220f000 CR4: 
06a0

[   45.026478] Call Trace:
[   45.026811]  __e1000_shutdown+0x1d4/0x1e2:
    __e1000_shutdown at 
drivers/net/ethernet/intel/e1000/e1000_main.c:5162

[   45.027344]  ? rcu_perf_cleanup+0x2a1/0x2a1:
    rcu_perf_shutdown at kernel/rcu/rcuperf.c:627
[   45.027883]  e1000_shutdown+0x14/0x3a:
    e1000_shutdown at 
drivers/net/ethernet/intel/e1000/e1000_main.c:5235

[   45.028351]  device_shutdown+0x110/0x1aa:
    device_shutdown at drivers/base/core.c:2807
[   45.028858]  kernel_power_off+0x31/0x64:
    kernel_power_off at kernel/reboot.c:260
[   45.029343]  rcu_perf_shutdown+0x9b/0xa7:
    rcu_perf_shutdown at kernel/rcu/rcuperf.c:637
[   45.029852]  ? __wake_up_common_lock+0xa2/0xa2:
    autoremove_wake_function at 
kernel/sched/wait.c:376

[   45.030414]  kthread+0x126/0x12e:
    kthread at kernel/kthread.c:233
[   45.030834]  ? __kthread_bind_mask+0x8e/0x8e:
    kthread at kernel/kthread.c:190
[   45.031399]  ? ret_from_fork+0x1f/0x30:
    ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.031883]  ? kernel_init+0xa/0xf5:
    kernel_init at init/main.c:997
[   45.032325]  ret_from_fork+0x1f/0x30:
    ret_from_fork at arch/x86/entry/entry_64.S:443
[   45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 
98 00 00 00 e8 aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 
55 7d da ff <0f> ff b9 01 00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 
b1 61 82

[   45.035222] ---[ end trace c257137b1b1976ef ]---
[   45.037838] ACPI: Preparing to enter system sleep state S5

Attached the full dmesg, kconfig and reproduce scripts.

Looks like e1000 pci/pxi-x device is already suspended. And therefore
call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device()
already had disabled the device.
Disabling device again by e1000_shutdown handler during system shutdown
causes warning at drivers/pci/pci.c:1641.

I think function __e1000_shutdown should just return if device is
already suspended!

I don't have e1000 hardware to test right now. So if this seems logical
to others I will send a patch.


Tushar, it happens on QEMU boot testing, so do not rely on e1000 HW.
Unless you'd like to prevent regressions on real HW.

The original report attached a reproduce script to run the QEMU test.
Or you may send me the patch for testing.

Fengguang,

Would you please try this patch and test. The patch is compile tested 
only. The patch is similar to how ixgbe handled the issue.

Thanks.

e1000: fix disabling already-disabled warning

This patch adds check so that driver does not disable already
disabled device.

Signed-off-by: Tushar Dave 
---
 drivers/net/ethernet/intel/e1000/e1000.h  |  3 ++-
 drivers/net/ethernet/intel/e1000/e1000_main.c | 23 ++-
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000.h 
b/drivers/net/ethernet/intel/e1000/e1000.h

index d7bdea7..8fd2458 100644
--- a/drivers/net/ethernet/intel/e1000/e1000.h
+++ b/drivers/net/ethernet/intel/e1000/e1000.h
@@ -331,7 +331

Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device

2017-11-21 Thread Tushar Dave



On 11/21/2017 06:11 PM, Fengguang Wu wrote:

Hello,

FYI this happens in mainline kernel 4.14.0-01330-g3c07399.
It happens since 4.13 .

It occurs in 3 out of 162 boots.


[   44.637743] advantechwdt: Unexpected close, not stopping watchdog!
[   44.997548] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input6
[   45.013419] e1000 :00:03.0: disabling already-disabled device
[   45.013447] [ cut here ]
[   45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 
pci_disable_device+0xa1/0x105:
pci_disable_device at 
drivers/pci/pci.c:1640
[   45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 
4.14.0-01330-g3c07399 #1
[   45.017197] task: 88011bee9e40 task.stack: c986
[   45.017987] RIP: 0010:pci_disable_device+0xa1/0x105:
pci_disable_device at 
drivers/pci/pci.c:1640
[   45.018603] RSP: :c9863e30 EFLAGS: 00010286
[   45.019282] RAX: 0035 RBX: 88013a230008 RCX: 
[   45.020182] RDX:  RSI:  RDI: 0203
[   45.021084] RBP: 88013a3f31e8 R08: 0001 R09: 
[   45.021986] R10: 827ec29c R11: 0002 R12: 0001
[   45.022946] R13: 88013a230008 R14: 880117802b20 R15: c9863e8f
[   45.023842] FS:  () GS:88013fd0() 
knlGS:
[   45.024863] CS:  0010 DS:  ES:  CR0: 80050033
[   45.025583] CR2: c96d4000 CR3: 0220f000 CR4: 06a0
[   45.026478] Call Trace:
[   45.026811]  __e1000_shutdown+0x1d4/0x1e2:
__e1000_shutdown at 
drivers/net/ethernet/intel/e1000/e1000_main.c:5162
[   45.027344]  ? rcu_perf_cleanup+0x2a1/0x2a1:
rcu_perf_shutdown at 
kernel/rcu/rcuperf.c:627
[   45.027883]  e1000_shutdown+0x14/0x3a:
e1000_shutdown at 
drivers/net/ethernet/intel/e1000/e1000_main.c:5235
[   45.028351]  device_shutdown+0x110/0x1aa:
device_shutdown at 
drivers/base/core.c:2807
[   45.028858]  kernel_power_off+0x31/0x64:
kernel_power_off at 
kernel/reboot.c:260
[   45.029343]  rcu_perf_shutdown+0x9b/0xa7:
rcu_perf_shutdown at 
kernel/rcu/rcuperf.c:637
[   45.029852]  ? __wake_up_common_lock+0xa2/0xa2:
autoremove_wake_function at 
kernel/sched/wait.c:376
[   45.030414]  kthread+0x126/0x12e:
kthread at kernel/kthread.c:233
[   45.030834]  ? __kthread_bind_mask+0x8e/0x8e:
kthread at kernel/kthread.c:190
[   45.031399]  ? ret_from_fork+0x1f/0x30:
ret_from_fork at 
arch/x86/entry/entry_64.S:443
[   45.031883]  ? kernel_init+0xa/0xf5:
kernel_init at init/main.c:997
[   45.032325]  ret_from_fork+0x1f/0x30:
ret_from_fork at 
arch/x86/entry/entry_64.S:443
[   45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 98 00 00 00 e8 
aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 55 7d da ff <0f> ff b9 01 
00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 b1 61 82
[   45.035222] ---[ end trace c257137b1b1976ef ]---
[   45.037838] ACPI: Preparing to enter system sleep state S5

Attached the full dmesg, kconfig and reproduce scripts.

Looks like e1000 pci/pxi-x device is already suspended. And therefore
call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device()
already had disabled the device.
Disabling device again by e1000_shutdown handler during system shutdown
causes warning at drivers/pci/pci.c:1641.

I think function __e1000_shutdown should just return if device is
already suspended!

I don't have e1000 hardware to test right now. So if this seems logical
to others I will send a patch.

-Tushar


Thanks,
Fengguang



Re: [e1000_shutdown] e1000 0000:00:03.0: disabling already-disabled device

2017-11-21 Thread Tushar Dave



On 11/21/2017 06:11 PM, Fengguang Wu wrote:

Hello,

FYI this happens in mainline kernel 4.14.0-01330-g3c07399.
It happens since 4.13 .

It occurs in 3 out of 162 boots.


[   44.637743] advantechwdt: Unexpected close, not stopping watchdog!
[   44.997548] input: ImExPS/2 Generic Explorer Mouse as 
/devices/platform/i8042/serio1/input/input6
[   45.013419] e1000 :00:03.0: disabling already-disabled device
[   45.013447] [ cut here ]
[   45.014868] WARNING: CPU: 1 PID: 71 at drivers/pci/pci.c:1641 
pci_disable_device+0xa1/0x105:
pci_disable_device at 
drivers/pci/pci.c:1640
[   45.016171] CPU: 1 PID: 71 Comm: rcu_perf_shutdo Not tainted 
4.14.0-01330-g3c07399 #1
[   45.017197] task: 88011bee9e40 task.stack: c986
[   45.017987] RIP: 0010:pci_disable_device+0xa1/0x105:
pci_disable_device at 
drivers/pci/pci.c:1640
[   45.018603] RSP: :c9863e30 EFLAGS: 00010286
[   45.019282] RAX: 0035 RBX: 88013a230008 RCX: 
[   45.020182] RDX:  RSI:  RDI: 0203
[   45.021084] RBP: 88013a3f31e8 R08: 0001 R09: 
[   45.021986] R10: 827ec29c R11: 0002 R12: 0001
[   45.022946] R13: 88013a230008 R14: 880117802b20 R15: c9863e8f
[   45.023842] FS:  () GS:88013fd0() 
knlGS:
[   45.024863] CS:  0010 DS:  ES:  CR0: 80050033
[   45.025583] CR2: c96d4000 CR3: 0220f000 CR4: 06a0
[   45.026478] Call Trace:
[   45.026811]  __e1000_shutdown+0x1d4/0x1e2:
__e1000_shutdown at 
drivers/net/ethernet/intel/e1000/e1000_main.c:5162
[   45.027344]  ? rcu_perf_cleanup+0x2a1/0x2a1:
rcu_perf_shutdown at 
kernel/rcu/rcuperf.c:627
[   45.027883]  e1000_shutdown+0x14/0x3a:
e1000_shutdown at 
drivers/net/ethernet/intel/e1000/e1000_main.c:5235
[   45.028351]  device_shutdown+0x110/0x1aa:
device_shutdown at 
drivers/base/core.c:2807
[   45.028858]  kernel_power_off+0x31/0x64:
kernel_power_off at 
kernel/reboot.c:260
[   45.029343]  rcu_perf_shutdown+0x9b/0xa7:
rcu_perf_shutdown at 
kernel/rcu/rcuperf.c:637
[   45.029852]  ? __wake_up_common_lock+0xa2/0xa2:
autoremove_wake_function at 
kernel/sched/wait.c:376
[   45.030414]  kthread+0x126/0x12e:
kthread at kernel/kthread.c:233
[   45.030834]  ? __kthread_bind_mask+0x8e/0x8e:
kthread at kernel/kthread.c:190
[   45.031399]  ? ret_from_fork+0x1f/0x30:
ret_from_fork at 
arch/x86/entry/entry_64.S:443
[   45.031883]  ? kernel_init+0xa/0xf5:
kernel_init at init/main.c:997
[   45.032325]  ret_from_fork+0x1f/0x30:
ret_from_fork at 
arch/x86/entry/entry_64.S:443
[   45.032777] Code: 00 48 85 ed 75 07 48 8b ab a8 00 00 00 48 8d bb 98 00 00 00 e8 
aa d1 11 00 48 89 ea 48 89 c6 48 c7 c7 d8 e4 0b 82 e8 55 7d da ff <0f> ff b9 01 
00 00 00 31 d2 be 01 00 00 00 48 c7 c7 f0 b1 61 82
[   45.035222] ---[ end trace c257137b1b1976ef ]---
[   45.037838] ACPI: Preparing to enter system sleep state S5

Attached the full dmesg, kconfig and reproduce scripts.

Looks like e1000 pci/pxi-x device is already suspended. And therefore
call to e1000_suspend() -> __e1000_shutdown() -> pci_disable_device()
already had disabled the device.
Disabling device again by e1000_shutdown handler during system shutdown
causes warning at drivers/pci/pci.c:1641.

I think function __e1000_shutdown should just return if device is
already suspended!

I don't have e1000 hardware to test right now. So if this seems logical
to others I will send a patch.

-Tushar


Thanks,
Fengguang



Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-06 Thread Tushar Dave



On 11/02/2017 11:58 PM, Sandipan Das wrote:

For added security, the layout of some structures can be
randomized by enabling CONFIG_GCC_PLUGIN_RANDSTRUCT. One
such structure is task_struct. To build BPF programs, we
use Clang which does not support this feature. So, if we
attempt to read a field of a structure with a randomized
layout within a BPF program, we do not get the expected
value because of incorrect offsets. To observe this, it
is not mandatory to have CONFIG_GCC_PLUGIN_RANDSTRUCT
enabled because the structure annotations/members added
for this purpose are enough to cause this. So, all kernel
builds are affected.

For example, considering samples/bpf/offwaketime_kern.c,
if we try to print the values of pid and comm inside the
task_struct passed to waker() by adding the following
lines of code at the appropriate place

   char fmt[] = "waker(): p->pid = %u, p->comm = %s\n";
   bpf_trace_printk(fmt, sizeof(fmt), _(p->pid), _(p->comm));

it is seen that upon rebuilding and running this sample
followed by inspecting /sys/kernel/debug/tracing/trace,
the output looks like the following

_-=> irqs-off
   / _=> need-resched
  | / _---=> hardirq/softirq
  || / _--=> preempt-depth
  ||| / delay
 TASK-PID   CPU#  TIMESTAMP  FUNCTION
| |   |      | |
   -0 [007] d.s.  1883.443594: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [018] d.s.  1883.453588: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [007] d.s.  1883.463584: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [009] d.s.  1883.483586: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [005] d.s.  1883.493583: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [009] d.s.  1883.503583: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [018] d.s.  1883.513578: 0x0001: waker(): p->pid = 0, 
p->comm =
  systemd-journal-3140  [003] d...  1883.627660: 0x0001: waker(): p->pid = 0, 
p->comm =
  systemd-journal-3140  [003] d...  1883.627704: 0x0001: waker(): p->pid = 0, 
p->comm =
  systemd-journal-3140  [003] d...  1883.627723: 0x0001: waker(): p->pid = 0, 
p->comm =

To avoid this, we add new BPF helpers that read the
correct values for some of the important task_struct
members such as pid, tgid, comm and flags which are
extensively used in BPF-based analysis tools such as
bcc. Since these helpers are built with GCC, they use
the correct offsets when referencing a member.
Just to add that we were seeing the same issue (but had no clue until 
looked at this patch , thanks). Its easy to reproduce by running bcc 
example task_switch.py where pid (prev_pid) is retrieved from struct 
task_struct and that is always zero. we tried printing other task_struct 
members such as 'comm' and see that as empty string as well.



-Tushar


Signed-off-by: Sandipan Das 
---
  include/linux/bpf.h   |  3 ++
  include/uapi/linux/bpf.h  | 13 ++
  kernel/bpf/core.c |  3 ++
  kernel/bpf/helpers.c  | 75 +++
  kernel/trace/bpf_trace.c  |  6 +++
  tools/testing/selftests/bpf/bpf_helpers.h |  9 
  6 files changed, 109 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f1af7d63d678..5993a0f5262b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -418,6 +418,9 @@ extern const struct bpf_func_proto bpf_ktime_get_ns_proto;
  extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto;
  extern const struct bpf_func_proto bpf_get_current_uid_gid_proto;
  extern const struct bpf_func_proto bpf_get_current_comm_proto;
+extern const struct bpf_func_proto bpf_get_task_pid_tgid_proto;
+extern const struct bpf_func_proto bpf_get_task_comm_proto;
+extern const struct bpf_func_proto bpf_get_task_flags_proto;
  extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
  extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
  extern const struct bpf_func_proto bpf_get_stackid_proto;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f90860d1f897..324508d27bd2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -338,6 +338,16 @@ union bpf_attr {
   * @skb: pointer to skb
   * Return: classid if != 0
   *
+ * u64 bpf_get_task_pid_tgid(struct task_struct *task)
+ * Return: task->tgid << 32 | task->pid
+ *
+ * int bpf_get_task_comm(struct task_struct *task)
+ * Stores task->comm into buf
+ * Return: 0 on success or negative error
+ *
+ * u32 bpf_get_task_flags(struct task_struct *task)
+ * Return: task->flags
+ *
   * int bpf_skb_vlan_push(skb, vlan_proto, vlan_tci)
   * Return: 0 on success or negative error
   *

Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-06 Thread Tushar Dave



On 11/02/2017 11:58 PM, Sandipan Das wrote:

For added security, the layout of some structures can be
randomized by enabling CONFIG_GCC_PLUGIN_RANDSTRUCT. One
such structure is task_struct. To build BPF programs, we
use Clang which does not support this feature. So, if we
attempt to read a field of a structure with a randomized
layout within a BPF program, we do not get the expected
value because of incorrect offsets. To observe this, it
is not mandatory to have CONFIG_GCC_PLUGIN_RANDSTRUCT
enabled because the structure annotations/members added
for this purpose are enough to cause this. So, all kernel
builds are affected.

For example, considering samples/bpf/offwaketime_kern.c,
if we try to print the values of pid and comm inside the
task_struct passed to waker() by adding the following
lines of code at the appropriate place

   char fmt[] = "waker(): p->pid = %u, p->comm = %s\n";
   bpf_trace_printk(fmt, sizeof(fmt), _(p->pid), _(p->comm));

it is seen that upon rebuilding and running this sample
followed by inspecting /sys/kernel/debug/tracing/trace,
the output looks like the following

_-=> irqs-off
   / _=> need-resched
  | / _---=> hardirq/softirq
  || / _--=> preempt-depth
  ||| / delay
 TASK-PID   CPU#  TIMESTAMP  FUNCTION
| |   |      | |
   -0 [007] d.s.  1883.443594: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [018] d.s.  1883.453588: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [007] d.s.  1883.463584: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [009] d.s.  1883.483586: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [005] d.s.  1883.493583: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [009] d.s.  1883.503583: 0x0001: waker(): p->pid = 0, 
p->comm =
   -0 [018] d.s.  1883.513578: 0x0001: waker(): p->pid = 0, 
p->comm =
  systemd-journal-3140  [003] d...  1883.627660: 0x0001: waker(): p->pid = 0, 
p->comm =
  systemd-journal-3140  [003] d...  1883.627704: 0x0001: waker(): p->pid = 0, 
p->comm =
  systemd-journal-3140  [003] d...  1883.627723: 0x0001: waker(): p->pid = 0, 
p->comm =

To avoid this, we add new BPF helpers that read the
correct values for some of the important task_struct
members such as pid, tgid, comm and flags which are
extensively used in BPF-based analysis tools such as
bcc. Since these helpers are built with GCC, they use
the correct offsets when referencing a member.
Just to add that we were seeing the same issue (but had no clue until 
looked at this patch , thanks). Its easy to reproduce by running bcc 
example task_switch.py where pid (prev_pid) is retrieved from struct 
task_struct and that is always zero. we tried printing other task_struct 
members such as 'comm' and see that as empty string as well.



-Tushar


Signed-off-by: Sandipan Das 
---
  include/linux/bpf.h   |  3 ++
  include/uapi/linux/bpf.h  | 13 ++
  kernel/bpf/core.c |  3 ++
  kernel/bpf/helpers.c  | 75 +++
  kernel/trace/bpf_trace.c  |  6 +++
  tools/testing/selftests/bpf/bpf_helpers.h |  9 
  6 files changed, 109 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f1af7d63d678..5993a0f5262b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -418,6 +418,9 @@ extern const struct bpf_func_proto bpf_ktime_get_ns_proto;
  extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto;
  extern const struct bpf_func_proto bpf_get_current_uid_gid_proto;
  extern const struct bpf_func_proto bpf_get_current_comm_proto;
+extern const struct bpf_func_proto bpf_get_task_pid_tgid_proto;
+extern const struct bpf_func_proto bpf_get_task_comm_proto;
+extern const struct bpf_func_proto bpf_get_task_flags_proto;
  extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
  extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
  extern const struct bpf_func_proto bpf_get_stackid_proto;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f90860d1f897..324508d27bd2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -338,6 +338,16 @@ union bpf_attr {
   * @skb: pointer to skb
   * Return: classid if != 0
   *
+ * u64 bpf_get_task_pid_tgid(struct task_struct *task)
+ * Return: task->tgid << 32 | task->pid
+ *
+ * int bpf_get_task_comm(struct task_struct *task)
+ * Stores task->comm into buf
+ * Return: 0 on success or negative error
+ *
+ * u32 bpf_get_task_flags(struct task_struct *task)
+ * Return: task->flags
+ *
   * int bpf_skb_vlan_push(skb, vlan_proto, vlan_tci)
   * Return: 0 on success or negative error
   *
@@ -602,6 +612,9 @@ union 

Re: sun4v+DMA related boot crash on 4.13-git

2017-07-11 Thread Tushar Dave



On 07/11/2017 09:02 PM, David Miller wrote:

From: Tushar Dave <tushar.n.d...@oracle.com>
Date: Tue, 11 Jul 2017 20:43:39 -0700


Yes, indeed the bug is in Linus's tree. However, 'sparc' tree doesn't
have DMA API change (e.g. commit b02c2b0bfd7ae) yet that introduced
the panic.


You can simply make a note of this when you send the bug fix to me.:( yeah, I 
should have mentioned this when I sent patch to you. My bad.

Will make sure to left you a note for this kid of occurrence in future!

-Tushar


--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: sun4v+DMA related boot crash on 4.13-git

2017-07-11 Thread Tushar Dave



On 07/11/2017 09:02 PM, David Miller wrote:

From: Tushar Dave 
Date: Tue, 11 Jul 2017 20:43:39 -0700


Yes, indeed the bug is in Linus's tree. However, 'sparc' tree doesn't
have DMA API change (e.g. commit b02c2b0bfd7ae) yet that introduced
the panic.


You can simply make a note of this when you send the bug fix to me.:( yeah, I 
should have mentioned this when I sent patch to you. My bad.

Will make sure to left you a note for this kid of occurrence in future!

-Tushar


--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: sun4v+DMA related boot crash on 4.13-git

2017-07-11 Thread Tushar Dave



On 07/11/2017 05:34 PM, David Miller wrote:

From: Tushar Dave <tushar.n.d...@oracle.com>
Date: Tue, 11 Jul 2017 15:38:21 -0700




On 07/11/2017 02:48 PM, Meelis Roos wrote:

I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge
works
fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed
on boot with DMA-related stacktrace (below). Allt he machines are
sun4v
physical machines, not VM-s. Older sun4 machines do not exhibit this
problem.

Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get
time.

I see whats going on with panic. I will reproduce locally. Will get
back
soon.

This patch should fix panic. Please give it a try.


Yes, this patch fixes it. Thank you for fixing it quickly!

Thanks for testing. Patch sent for sparc-next.

Why sparc-next - it should go into 4.13 since 4.13 would break all
niagara1 and niagara2 systems otherwise?This is sparc arch fix so I
used sparc tree(in this case for sparc-next).

I am open to maintainers suggestions. Thanks.


If the bug is in Linus's tree the fix must target 'sparc' not
'sparc-next'.


Dave,

Yes, indeed the bug is in Linus's tree. However, 'sparc' tree doesn't 
have DMA API change (e.g. commit b02c2b0bfd7ae) yet that introduced the 
panic. Looks like the DMA API changes have not merged into 'sparc' tree 
yet. In other words, 'sparc' tree doesn't have mentioned panic issue, 
nothing to fix there!
However, 'sparc-next' is up to date (or more close to) linus tree and 
has DMA API change that cause mentioned panic issue. So I have send 
patch targeted for sparc-next.


Let me know what should be the best tree to get this fix in and I will 
send v2.


Thanks.

-Tushar

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: sun4v+DMA related boot crash on 4.13-git

2017-07-11 Thread Tushar Dave



On 07/11/2017 05:34 PM, David Miller wrote:

From: Tushar Dave 
Date: Tue, 11 Jul 2017 15:38:21 -0700




On 07/11/2017 02:48 PM, Meelis Roos wrote:

I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge
works
fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed
on boot with DMA-related stacktrace (below). Allt he machines are
sun4v
physical machines, not VM-s. Older sun4 machines do not exhibit this
problem.

Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get
time.

I see whats going on with panic. I will reproduce locally. Will get
back
soon.

This patch should fix panic. Please give it a try.


Yes, this patch fixes it. Thank you for fixing it quickly!

Thanks for testing. Patch sent for sparc-next.

Why sparc-next - it should go into 4.13 since 4.13 would break all
niagara1 and niagara2 systems otherwise?This is sparc arch fix so I
used sparc tree(in this case for sparc-next).

I am open to maintainers suggestions. Thanks.


If the bug is in Linus's tree the fix must target 'sparc' not
'sparc-next'.


Dave,

Yes, indeed the bug is in Linus's tree. However, 'sparc' tree doesn't 
have DMA API change (e.g. commit b02c2b0bfd7ae) yet that introduced the 
panic. Looks like the DMA API changes have not merged into 'sparc' tree 
yet. In other words, 'sparc' tree doesn't have mentioned panic issue, 
nothing to fix there!
However, 'sparc-next' is up to date (or more close to) linus tree and 
has DMA API change that cause mentioned panic issue. So I have send 
patch targeted for sparc-next.


Let me know what should be the best tree to get this fix in and I will 
send v2.


Thanks.

-Tushar

--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Re: sun4v+DMA related boot crash on 4.13-git

2017-07-11 Thread Tushar Dave



On 07/11/2017 02:48 PM, Meelis Roos wrote:

I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge
works
fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed
on boot with DMA-related stacktrace (below). Allt he machines are
sun4v
physical machines, not VM-s. Older sun4 machines do not exhibit this
problem.

Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get
time.

I see whats going on with panic. I will reproduce locally. Will get back
soon.

This patch should fix panic. Please give it a try.


Yes, this patch fixes it. Thank you for fixing it quickly!

Thanks for testing. Patch sent for sparc-next.


Why sparc-next - it should go into 4.13 since 4.13 would break all
niagara1 and niagara2 systems otherwise?This is sparc arch fix so I used sparc 
tree(in this case for sparc-next).

I am open to maintainers suggestions. Thanks.

-Tushar




-Tushar


commit b02c2b0bfd ("sparc: remove arch specific dma_supported
implementations") introduced a code that incorrectly allow dma_supported()
to
succeed for 64bit dma mask even if system doesn't have ATU IOMMU. 64bit
DMA
only supported on sun4v equipped with ATU IOMMU HW.

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 24f21c7..0a32c57 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev,
struct
scatterlist *sglist,
   static int dma_4v_supported(struct device *dev, u64 device_mask)
   {
  struct iommu *iommu = dev->archdata.iommu;
-   u64 dma_addr_mask;
+   u64 dma_addr_mask = iommu->dma_addr_mask;

-   if (device_mask > DMA_BIT_MASK(32) && iommu->atu)
-   dma_addr_mask = iommu->atu->dma_addr_mask;
-   else
-   dma_addr_mask = iommu->dma_addr_mask;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }

  if ((device_mask & dma_addr_mask) == dma_addr_mask)
  return 1;


-Tushar




-Tushar



[0.24] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.d 2011/07/06
14:29'
[0.33] PROMLIB: Root node compatible: sun4v
[0.79] Linux version 4.12.0-08915-gf263fbb (mroos@t2000) (gcc
version 4.9.2 (Debian 4.9.2-20)) #141 SMP Sun Jul 9 17:51:12 EEST 2017
[0.002047] bootconsole [earlyprom0] enabled
[0.002383] ARCH: SUN4V
[0.002668] Ethernet address: 00:14:4f:86:99:26
[0.003406] MM: PAGE_OFFSET is 0x8000 (max_phys_bits ==
39)
[0.004089] MM: VMALLOC [0x0001 --> 0x6000]
[0.004562] MM: VMEMMAP [0x6000 --> 0xc000]
[0.095699] Kernel: Using 3 locked TLB entries for main kernel
image.
[0.096387] Remapping the kernel...
[0.096400] done.
[1.906342] OF stdout device is: /virtual-devices@100/console@1
[1.907160] PROM: Built device tree with 148821 bytes of memory.
[1.907804] MDESC: Size is 42336 bytes.
[1.910139] PLATFORM: banner-name [Sun Fire T200]
[1.910564] PLATFORM: name [SUNW,Sun-Fire-T200]
[1.910919] PLATFORM: hostid [84869926]
[1.911224] PLATFORM: serial# [00ab4130]
[1.911536] PLATFORM: stick-frequency [3b9aca00]
[1.911894] PLATFORM: mac-address [144f869926]
[1.912241] PLATFORM: watchdog-resolution [1000 ms]
[1.912619] PLATFORM: watchdog-max-timeout [3153600 ms]
[1.913042] PLATFORM: max-cpus [32]
[1.913501] Top of RAM: 0x3ffd34000, Total RAM: 0x3f7918000
[1.913936] Memory hole size: 132MB
[2.279507] Allocated 16384 bytes for kernel page tables.
[2.280578] Zone ranges:
[2.280819]   Normal   [mem 0x0840-0x0003ffd33fff]
[2.281292] Movable zone start for each node
[2.281626] Early memory node ranges
[2.281916]   node   0: [mem 0x0840-0x0003ffc1]
[2.282557]   node   0: [mem 0x0003ffc28000-0x0003ffcfdfff]
[2.283030]   node   0: [mem 0x0003ffd0e000-0x0003ffd27fff]
[2.283514]   node   0: [mem 0x0003ffd2c000-0x0003ffd33fff]
[2.283994] Initmem setup node 0 [mem
0x0840-0x0003ffd33fff]
[2.782262] Booting Linux...
[2.782734] CPU CAPS:
[flush,stbar,swap,muldiv,v9,blkinit,mul32,div32]
[2.783255] CPU CAPS: [v8plus,ASIBlkInit]
[2.897543] percpu: Embedded 12 pages/cpu @8003ff80 s55872
r8192 d34240 u262144
[2.913264] SUN4V: Mondo queue sizes [cpu(4096) dev(16384) r(8192)
nr(256)]
[2.915492] Built 1 zonelists in Node order, mobility grouping on.
Total pages: 2063634
[2.916160] Policy zone: Normal
[2.916420] Kernel command line: root=/dev/sda1 ro
[2.918743] PID hash table entries: 4096 (order: 2, 32768 bytes)
[2.919230] Sorting __ex_table...
[3.220450] Memory: 16497120K/16639072K available (5521K kernel
code,
530K rwdata, 1224K rodata, 336K init, 699K bss, 141952K reserved, 

Re: sun4v+DMA related boot crash on 4.13-git

2017-07-11 Thread Tushar Dave



On 07/11/2017 02:48 PM, Meelis Roos wrote:

I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge
works
fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed
on boot with DMA-related stacktrace (below). Allt he machines are
sun4v
physical machines, not VM-s. Older sun4 machines do not exhibit this
problem.

Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get
time.

I see whats going on with panic. I will reproduce locally. Will get back
soon.

This patch should fix panic. Please give it a try.


Yes, this patch fixes it. Thank you for fixing it quickly!

Thanks for testing. Patch sent for sparc-next.


Why sparc-next - it should go into 4.13 since 4.13 would break all
niagara1 and niagara2 systems otherwise?This is sparc arch fix so I used sparc 
tree(in this case for sparc-next).

I am open to maintainers suggestions. Thanks.

-Tushar




-Tushar


commit b02c2b0bfd ("sparc: remove arch specific dma_supported
implementations") introduced a code that incorrectly allow dma_supported()
to
succeed for 64bit dma mask even if system doesn't have ATU IOMMU. 64bit
DMA
only supported on sun4v equipped with ATU IOMMU HW.

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 24f21c7..0a32c57 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev,
struct
scatterlist *sglist,
   static int dma_4v_supported(struct device *dev, u64 device_mask)
   {
  struct iommu *iommu = dev->archdata.iommu;
-   u64 dma_addr_mask;
+   u64 dma_addr_mask = iommu->dma_addr_mask;

-   if (device_mask > DMA_BIT_MASK(32) && iommu->atu)
-   dma_addr_mask = iommu->atu->dma_addr_mask;
-   else
-   dma_addr_mask = iommu->dma_addr_mask;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }

  if ((device_mask & dma_addr_mask) == dma_addr_mask)
  return 1;


-Tushar




-Tushar



[0.24] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.d 2011/07/06
14:29'
[0.33] PROMLIB: Root node compatible: sun4v
[0.79] Linux version 4.12.0-08915-gf263fbb (mroos@t2000) (gcc
version 4.9.2 (Debian 4.9.2-20)) #141 SMP Sun Jul 9 17:51:12 EEST 2017
[0.002047] bootconsole [earlyprom0] enabled
[0.002383] ARCH: SUN4V
[0.002668] Ethernet address: 00:14:4f:86:99:26
[0.003406] MM: PAGE_OFFSET is 0x8000 (max_phys_bits ==
39)
[0.004089] MM: VMALLOC [0x0001 --> 0x6000]
[0.004562] MM: VMEMMAP [0x6000 --> 0xc000]
[0.095699] Kernel: Using 3 locked TLB entries for main kernel
image.
[0.096387] Remapping the kernel...
[0.096400] done.
[1.906342] OF stdout device is: /virtual-devices@100/console@1
[1.907160] PROM: Built device tree with 148821 bytes of memory.
[1.907804] MDESC: Size is 42336 bytes.
[1.910139] PLATFORM: banner-name [Sun Fire T200]
[1.910564] PLATFORM: name [SUNW,Sun-Fire-T200]
[1.910919] PLATFORM: hostid [84869926]
[1.911224] PLATFORM: serial# [00ab4130]
[1.911536] PLATFORM: stick-frequency [3b9aca00]
[1.911894] PLATFORM: mac-address [144f869926]
[1.912241] PLATFORM: watchdog-resolution [1000 ms]
[1.912619] PLATFORM: watchdog-max-timeout [3153600 ms]
[1.913042] PLATFORM: max-cpus [32]
[1.913501] Top of RAM: 0x3ffd34000, Total RAM: 0x3f7918000
[1.913936] Memory hole size: 132MB
[2.279507] Allocated 16384 bytes for kernel page tables.
[2.280578] Zone ranges:
[2.280819]   Normal   [mem 0x0840-0x0003ffd33fff]
[2.281292] Movable zone start for each node
[2.281626] Early memory node ranges
[2.281916]   node   0: [mem 0x0840-0x0003ffc1]
[2.282557]   node   0: [mem 0x0003ffc28000-0x0003ffcfdfff]
[2.283030]   node   0: [mem 0x0003ffd0e000-0x0003ffd27fff]
[2.283514]   node   0: [mem 0x0003ffd2c000-0x0003ffd33fff]
[2.283994] Initmem setup node 0 [mem
0x0840-0x0003ffd33fff]
[2.782262] Booting Linux...
[2.782734] CPU CAPS:
[flush,stbar,swap,muldiv,v9,blkinit,mul32,div32]
[2.783255] CPU CAPS: [v8plus,ASIBlkInit]
[2.897543] percpu: Embedded 12 pages/cpu @8003ff80 s55872
r8192 d34240 u262144
[2.913264] SUN4V: Mondo queue sizes [cpu(4096) dev(16384) r(8192)
nr(256)]
[2.915492] Built 1 zonelists in Node order, mobility grouping on.
Total pages: 2063634
[2.916160] Policy zone: Normal
[2.916420] Kernel command line: root=/dev/sda1 ro
[2.918743] PID hash table entries: 4096 (order: 2, 32768 bytes)
[2.919230] Sorting __ex_table...
[3.220450] Memory: 16497120K/16639072K available (5521K kernel
code,
530K rwdata, 1224K rodata, 336K init, 699K bss, 141952K reserved, 

Re: sun4v+DMA related boot crash on 4.13-git

2017-07-11 Thread Tushar Dave



On 07/10/2017 10:05 PM, Meelis Roos wrote:

I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge works
fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed
on boot with DMA-related stacktrace (below). Allt he machines are sun4v
physical machines, not VM-s. Older sun4 machines do not exhibit this
problem.

Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get
time.

I see whats going on with panic. I will reproduce locally. Will get back
soon.

This patch should fix panic. Please give it a try.


Yes, this patch fixes it. Thank you for fixing it quickly!

Thanks for testing. Patch sent for sparc-next.

-Tushar


commit b02c2b0bfd ("sparc: remove arch specific dma_supported
implementations") introduced a code that incorrectly allow dma_supported() to
succeed for 64bit dma mask even if system doesn't have ATU IOMMU. 64bit DMA
only supported on sun4v equipped with ATU IOMMU HW.

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 24f21c7..0a32c57 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev, struct
scatterlist *sglist,
  static int dma_4v_supported(struct device *dev, u64 device_mask)
  {
 struct iommu *iommu = dev->archdata.iommu;
-   u64 dma_addr_mask;
+   u64 dma_addr_mask = iommu->dma_addr_mask;

-   if (device_mask > DMA_BIT_MASK(32) && iommu->atu)
-   dma_addr_mask = iommu->atu->dma_addr_mask;
-   else
-   dma_addr_mask = iommu->dma_addr_mask;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }

 if ((device_mask & dma_addr_mask) == dma_addr_mask)
 return 1;


-Tushar




-Tushar



[0.24] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.d 2011/07/06 14:29'
[0.33] PROMLIB: Root node compatible: sun4v
[0.79] Linux version 4.12.0-08915-gf263fbb (mroos@t2000) (gcc
version 4.9.2 (Debian 4.9.2-20)) #141 SMP Sun Jul 9 17:51:12 EEST 2017
[0.002047] bootconsole [earlyprom0] enabled
[0.002383] ARCH: SUN4V
[0.002668] Ethernet address: 00:14:4f:86:99:26
[0.003406] MM: PAGE_OFFSET is 0x8000 (max_phys_bits == 39)
[0.004089] MM: VMALLOC [0x0001 --> 0x6000]
[0.004562] MM: VMEMMAP [0x6000 --> 0xc000]
[0.095699] Kernel: Using 3 locked TLB entries for main kernel image.
[0.096387] Remapping the kernel...
[0.096400] done.
[1.906342] OF stdout device is: /virtual-devices@100/console@1
[1.907160] PROM: Built device tree with 148821 bytes of memory.
[1.907804] MDESC: Size is 42336 bytes.
[1.910139] PLATFORM: banner-name [Sun Fire T200]
[1.910564] PLATFORM: name [SUNW,Sun-Fire-T200]
[1.910919] PLATFORM: hostid [84869926]
[1.911224] PLATFORM: serial# [00ab4130]
[1.911536] PLATFORM: stick-frequency [3b9aca00]
[1.911894] PLATFORM: mac-address [144f869926]
[1.912241] PLATFORM: watchdog-resolution [1000 ms]
[1.912619] PLATFORM: watchdog-max-timeout [3153600 ms]
[1.913042] PLATFORM: max-cpus [32]
[1.913501] Top of RAM: 0x3ffd34000, Total RAM: 0x3f7918000
[1.913936] Memory hole size: 132MB
[2.279507] Allocated 16384 bytes for kernel page tables.
[2.280578] Zone ranges:
[2.280819]   Normal   [mem 0x0840-0x0003ffd33fff]
[2.281292] Movable zone start for each node
[2.281626] Early memory node ranges
[2.281916]   node   0: [mem 0x0840-0x0003ffc1]
[2.282557]   node   0: [mem 0x0003ffc28000-0x0003ffcfdfff]
[2.283030]   node   0: [mem 0x0003ffd0e000-0x0003ffd27fff]
[2.283514]   node   0: [mem 0x0003ffd2c000-0x0003ffd33fff]
[2.283994] Initmem setup node 0 [mem
0x0840-0x0003ffd33fff]
[2.782262] Booting Linux...
[2.782734] CPU CAPS: [flush,stbar,swap,muldiv,v9,blkinit,mul32,div32]
[2.783255] CPU CAPS: [v8plus,ASIBlkInit]
[2.897543] percpu: Embedded 12 pages/cpu @8003ff80 s55872
r8192 d34240 u262144
[2.913264] SUN4V: Mondo queue sizes [cpu(4096) dev(16384) r(8192)
nr(256)]
[2.915492] Built 1 zonelists in Node order, mobility grouping on.
Total pages: 2063634
[2.916160] Policy zone: Normal
[2.916420] Kernel command line: root=/dev/sda1 ro
[2.918743] PID hash table entries: 4096 (order: 2, 32768 bytes)
[2.919230] Sorting __ex_table...
[3.220450] Memory: 16497120K/16639072K available (5521K kernel code,
530K rwdata, 1224K rodata, 336K init, 699K bss, 141952K reserved, 0K
cma-reserved)
[3.223109] Hierarchical RCU implementation.
[3.223452] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=16.
[3.223933] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=16
[3.225508] NR_IRQS: 2048, 

Re: sun4v+DMA related boot crash on 4.13-git

2017-07-11 Thread Tushar Dave



On 07/10/2017 10:05 PM, Meelis Roos wrote:

I tested yesterdayd 4.12+git on sparc64 to see if the sparc merge works
fine, and on all of my sun4v machines (T1000, T2000, T5120) it crashed
on boot with DMA-related stacktrace (below). Allt he machines are sun4v
physical machines, not VM-s. Older sun4 machines do not exhibit this
problem.

Maybae DMA APi realted, maybe sparc64. Will try to bisect when I get
time.

I see whats going on with panic. I will reproduce locally. Will get back
soon.

This patch should fix panic. Please give it a try.


Yes, this patch fixes it. Thank you for fixing it quickly!

Thanks for testing. Patch sent for sparc-next.

-Tushar


commit b02c2b0bfd ("sparc: remove arch specific dma_supported
implementations") introduced a code that incorrectly allow dma_supported() to
succeed for 64bit dma mask even if system doesn't have ATU IOMMU. 64bit DMA
only supported on sun4v equipped with ATU IOMMU HW.

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 24f21c7..0a32c57 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev, struct
scatterlist *sglist,
  static int dma_4v_supported(struct device *dev, u64 device_mask)
  {
 struct iommu *iommu = dev->archdata.iommu;
-   u64 dma_addr_mask;
+   u64 dma_addr_mask = iommu->dma_addr_mask;

-   if (device_mask > DMA_BIT_MASK(32) && iommu->atu)
-   dma_addr_mask = iommu->atu->dma_addr_mask;
-   else
-   dma_addr_mask = iommu->dma_addr_mask;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }

 if ((device_mask & dma_addr_mask) == dma_addr_mask)
 return 1;


-Tushar




-Tushar



[0.24] PROMLIB: Sun IEEE Boot Prom 'OBP 4.30.4.d 2011/07/06 14:29'
[0.33] PROMLIB: Root node compatible: sun4v
[0.79] Linux version 4.12.0-08915-gf263fbb (mroos@t2000) (gcc
version 4.9.2 (Debian 4.9.2-20)) #141 SMP Sun Jul 9 17:51:12 EEST 2017
[0.002047] bootconsole [earlyprom0] enabled
[0.002383] ARCH: SUN4V
[0.002668] Ethernet address: 00:14:4f:86:99:26
[0.003406] MM: PAGE_OFFSET is 0x8000 (max_phys_bits == 39)
[0.004089] MM: VMALLOC [0x0001 --> 0x6000]
[0.004562] MM: VMEMMAP [0x6000 --> 0xc000]
[0.095699] Kernel: Using 3 locked TLB entries for main kernel image.
[0.096387] Remapping the kernel...
[0.096400] done.
[1.906342] OF stdout device is: /virtual-devices@100/console@1
[1.907160] PROM: Built device tree with 148821 bytes of memory.
[1.907804] MDESC: Size is 42336 bytes.
[1.910139] PLATFORM: banner-name [Sun Fire T200]
[1.910564] PLATFORM: name [SUNW,Sun-Fire-T200]
[1.910919] PLATFORM: hostid [84869926]
[1.911224] PLATFORM: serial# [00ab4130]
[1.911536] PLATFORM: stick-frequency [3b9aca00]
[1.911894] PLATFORM: mac-address [144f869926]
[1.912241] PLATFORM: watchdog-resolution [1000 ms]
[1.912619] PLATFORM: watchdog-max-timeout [3153600 ms]
[1.913042] PLATFORM: max-cpus [32]
[1.913501] Top of RAM: 0x3ffd34000, Total RAM: 0x3f7918000
[1.913936] Memory hole size: 132MB
[2.279507] Allocated 16384 bytes for kernel page tables.
[2.280578] Zone ranges:
[2.280819]   Normal   [mem 0x0840-0x0003ffd33fff]
[2.281292] Movable zone start for each node
[2.281626] Early memory node ranges
[2.281916]   node   0: [mem 0x0840-0x0003ffc1]
[2.282557]   node   0: [mem 0x0003ffc28000-0x0003ffcfdfff]
[2.283030]   node   0: [mem 0x0003ffd0e000-0x0003ffd27fff]
[2.283514]   node   0: [mem 0x0003ffd2c000-0x0003ffd33fff]
[2.283994] Initmem setup node 0 [mem
0x0840-0x0003ffd33fff]
[2.782262] Booting Linux...
[2.782734] CPU CAPS: [flush,stbar,swap,muldiv,v9,blkinit,mul32,div32]
[2.783255] CPU CAPS: [v8plus,ASIBlkInit]
[2.897543] percpu: Embedded 12 pages/cpu @8003ff80 s55872
r8192 d34240 u262144
[2.913264] SUN4V: Mondo queue sizes [cpu(4096) dev(16384) r(8192)
nr(256)]
[2.915492] Built 1 zonelists in Node order, mobility grouping on.
Total pages: 2063634
[2.916160] Policy zone: Normal
[2.916420] Kernel command line: root=/dev/sda1 ro
[2.918743] PID hash table entries: 4096 (order: 2, 32768 bytes)
[2.919230] Sorting __ex_table...
[3.220450] Memory: 16497120K/16639072K available (5521K kernel code,
530K rwdata, 1224K rodata, 336K init, 699K bss, 141952K reserved, 0K
cma-reserved)
[3.223109] Hierarchical RCU implementation.
[3.223452] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=16.
[3.223933] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=16
[3.225508] NR_IRQS: 2048, 

[sparc-next] SPARC64: Fix sun4v DMA panic

2017-07-11 Thread Tushar Dave
64bit DMA only supported on sun4v equipped with ATU IOMMU HW.
'Commit b02c2b0bfd7ae ("sparc: remove arch specific dma_supported
implementations")' introduced a code that incorrectly allow
dma_supported() to succeed for 64bit dma mask even if system doesn't
have ATU IOMMU. This results into panic.

Fix it.

Reported-by: Meelis Roos <mr...@linux.ee>
Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
---
 arch/sparc/kernel/pci_sun4v.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 24f21c7..f10e2f7 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev, struct 
scatterlist *sglist,
 static int dma_4v_supported(struct device *dev, u64 device_mask)
 {
struct iommu *iommu = dev->archdata.iommu;
-   u64 dma_addr_mask;
+   u64 dma_addr_mask = iommu->dma_addr_mask;
 
-   if (device_mask > DMA_BIT_MASK(32) && iommu->atu)
-   dma_addr_mask = iommu->atu->dma_addr_mask;
-   else
-   dma_addr_mask = iommu->dma_addr_mask;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }
 
if ((device_mask & dma_addr_mask) == dma_addr_mask)
return 1;
-- 
1.9.1



[sparc-next] SPARC64: Fix sun4v DMA panic

2017-07-11 Thread Tushar Dave
64bit DMA only supported on sun4v equipped with ATU IOMMU HW.
'Commit b02c2b0bfd7ae ("sparc: remove arch specific dma_supported
implementations")' introduced a code that incorrectly allow
dma_supported() to succeed for 64bit dma mask even if system doesn't
have ATU IOMMU. This results into panic.

Fix it.

Reported-by: Meelis Roos 
Signed-off-by: Tushar Dave 
---
 arch/sparc/kernel/pci_sun4v.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 24f21c7..f10e2f7 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -673,12 +673,14 @@ static void dma_4v_unmap_sg(struct device *dev, struct 
scatterlist *sglist,
 static int dma_4v_supported(struct device *dev, u64 device_mask)
 {
struct iommu *iommu = dev->archdata.iommu;
-   u64 dma_addr_mask;
+   u64 dma_addr_mask = iommu->dma_addr_mask;
 
-   if (device_mask > DMA_BIT_MASK(32) && iommu->atu)
-   dma_addr_mask = iommu->atu->dma_addr_mask;
-   else
-   dma_addr_mask = iommu->dma_addr_mask;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }
 
if ((device_mask & dma_addr_mask) == dma_addr_mask)
return 1;
-- 
1.9.1



[PATCH v3 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU

2016-10-28 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
sun4v hypervisor PCI IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 500MB DVMA space per instance.
When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

For example, we recently experienced legacy IOMMU limitation while
using i40e driver in system with large number of CPUs (e.g. 128).
Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has
512 (default) descriptors. So considering only RX queues (because RX
premap DMA buffers), i40e takes 4*128*512 number of DMA entries in
IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available
in table. So bringing up four instance of i40e alone saturate existing
IOMMU resource.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7)
and sun4u SPARC.

Thanks.
-Tushar

v2->v3:
- Patch #5 addresses comment by Joe Perches.
 -- use %s, __func__ instead of embedding the function name.

v1->v2:
- Patch #2 addresses comments by Dave M.
 -- use page allocator to allocate IOTSB.
 -- use true/false with boolean variables.


Dave Kleikamp (1):
  sparc64: Add FORCE_MAX_ZONEORDER and default to 13

Tushar Dave (5):
  sparc64: Add ATU (new IOMMU) support
  sparc64: Initialize iommu_map_table and iommu_pool
  sparc64: Bind PCIe devices to use IOMMU v2 service
  sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
  sparc64: Enable 64-bit DMA

 arch/sparc/Kconfig  |  22 ++
 arch/sparc/include/asm/hypervisor.h | 343 +
 arch/sparc/include/asm/iommu_64.h   |  28 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/iommu.c   |   8 +-
 arch/sparc/kernel/pci_sun4v.c   | 418 +++-
 arch/sparc/kernel/pci_sun4v.h   |  21 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  68 ++
 8 files changed, 849 insertions(+), 60 deletions(-)

-- 
1.9.1



[PATCH v3 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service

2016-10-28 Thread Tushar Dave
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
device has to be bound to IOTSB using HV API pci_iotsb_bind().

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/kernel/pci_sun4v.c | 43 +++
 arch/sparc/kernel/pci_sun4v.h |  3 +++
 arch/sparc/kernel/pci_sun4v_asm.S | 14 +
 3 files changed, 60 insertions(+)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 242477c..d4208aa 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -216,6 +216,43 @@ static void *dma_4v_alloc_coherent(struct device *dev, 
size_t size,
return NULL;
 }
 
+unsigned long dma_4v_iotsb_bind(unsigned long devhandle,
+   unsigned long iotsb_num,
+   struct pci_bus *bus_dev)
+{
+   struct pci_dev *pdev;
+   unsigned long err;
+   unsigned int bus;
+   unsigned int device;
+   unsigned int fun;
+
+   list_for_each_entry(pdev, _dev->devices, bus_list) {
+   if (pdev->subordinate) {
+   /* No need to bind pci bridge */
+   dma_4v_iotsb_bind(devhandle, iotsb_num,
+ pdev->subordinate);
+   } else {
+   bus = bus_dev->number;
+   device = PCI_SLOT(pdev->devfn);
+   fun = PCI_FUNC(pdev->devfn);
+   err = pci_sun4v_iotsb_bind(devhandle, iotsb_num,
+  HV_PCI_DEVICE_BUILD(bus,
+  device,
+  fun));
+
+   /* If bind fails for one device it is going to fail
+* for rest of the devices because we are sharing
+* IOTSB. So in case of failure simply return with
+* error.
+*/
+   if (err)
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry,
   unsigned long npages)
 {
@@ -629,6 +666,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info 
*pbm)
}
iotsb->iotsb_num = iotsb_num;
 
+   err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus);
+   if (err) {
+   pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err);
+   goto iotsb_conf_failed;
+   }
+
return 0;
 
 iotsb_conf_failed:
diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h
index 0ef6d1c..1019e0f 100644
--- a/arch/sparc/kernel/pci_sun4v.h
+++ b/arch/sparc/kernel/pci_sun4v.h
@@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle,
   unsigned long page_size,
   unsigned long dvma_base,
   u64 *iotsb_num);
+unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle,
+  unsigned long iotsb_num,
+  unsigned int pci_device);
 #endif /* !(_PCI_SUN4V_H) */
diff --git a/arch/sparc/kernel/pci_sun4v_asm.S 
b/arch/sparc/kernel/pci_sun4v_asm.S
index fd94d0e..22024a9 100644
--- a/arch/sparc/kernel/pci_sun4v_asm.S
+++ b/arch/sparc/kernel/pci_sun4v_asm.S
@@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf)
retl
 stx%o1, [%g1]
 ENDPROC(pci_sun4v_iotsb_conf)
+
+   /*
+* %o0: devhandle
+* %o1: iotsb_num/iotsb_handle
+* %o2: pci_device
+*
+* returns %o0: status
+*/
+ENTRY(pci_sun4v_iotsb_bind)
+   mov HV_FAST_PCI_IOTSB_BIND, %o5
+   ta  HV_FAST_TRAP
+   retl
+nop
+ENDPROC(pci_sun4v_iotsb_bind)
-- 
1.9.1



[PATCH v3 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU

2016-10-28 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
sun4v hypervisor PCI IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 500MB DVMA space per instance.
When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

For example, we recently experienced legacy IOMMU limitation while
using i40e driver in system with large number of CPUs (e.g. 128).
Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has
512 (default) descriptors. So considering only RX queues (because RX
premap DMA buffers), i40e takes 4*128*512 number of DMA entries in
IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available
in table. So bringing up four instance of i40e alone saturate existing
IOMMU resource.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7)
and sun4u SPARC.

Thanks.
-Tushar

v2->v3:
- Patch #5 addresses comment by Joe Perches.
 -- use %s, __func__ instead of embedding the function name.

v1->v2:
- Patch #2 addresses comments by Dave M.
 -- use page allocator to allocate IOTSB.
 -- use true/false with boolean variables.


Dave Kleikamp (1):
  sparc64: Add FORCE_MAX_ZONEORDER and default to 13

Tushar Dave (5):
  sparc64: Add ATU (new IOMMU) support
  sparc64: Initialize iommu_map_table and iommu_pool
  sparc64: Bind PCIe devices to use IOMMU v2 service
  sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
  sparc64: Enable 64-bit DMA

 arch/sparc/Kconfig  |  22 ++
 arch/sparc/include/asm/hypervisor.h | 343 +
 arch/sparc/include/asm/iommu_64.h   |  28 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/iommu.c   |   8 +-
 arch/sparc/kernel/pci_sun4v.c   | 418 +++-
 arch/sparc/kernel/pci_sun4v.h   |  21 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  68 ++
 8 files changed, 849 insertions(+), 60 deletions(-)

-- 
1.9.1



[PATCH v3 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service

2016-10-28 Thread Tushar Dave
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
device has to be bound to IOTSB using HV API pci_iotsb_bind().

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/kernel/pci_sun4v.c | 43 +++
 arch/sparc/kernel/pci_sun4v.h |  3 +++
 arch/sparc/kernel/pci_sun4v_asm.S | 14 +
 3 files changed, 60 insertions(+)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 242477c..d4208aa 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -216,6 +216,43 @@ static void *dma_4v_alloc_coherent(struct device *dev, 
size_t size,
return NULL;
 }
 
+unsigned long dma_4v_iotsb_bind(unsigned long devhandle,
+   unsigned long iotsb_num,
+   struct pci_bus *bus_dev)
+{
+   struct pci_dev *pdev;
+   unsigned long err;
+   unsigned int bus;
+   unsigned int device;
+   unsigned int fun;
+
+   list_for_each_entry(pdev, _dev->devices, bus_list) {
+   if (pdev->subordinate) {
+   /* No need to bind pci bridge */
+   dma_4v_iotsb_bind(devhandle, iotsb_num,
+ pdev->subordinate);
+   } else {
+   bus = bus_dev->number;
+   device = PCI_SLOT(pdev->devfn);
+   fun = PCI_FUNC(pdev->devfn);
+   err = pci_sun4v_iotsb_bind(devhandle, iotsb_num,
+  HV_PCI_DEVICE_BUILD(bus,
+  device,
+  fun));
+
+   /* If bind fails for one device it is going to fail
+* for rest of the devices because we are sharing
+* IOTSB. So in case of failure simply return with
+* error.
+*/
+   if (err)
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry,
   unsigned long npages)
 {
@@ -629,6 +666,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info 
*pbm)
}
iotsb->iotsb_num = iotsb_num;
 
+   err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus);
+   if (err) {
+   pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err);
+   goto iotsb_conf_failed;
+   }
+
return 0;
 
 iotsb_conf_failed:
diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h
index 0ef6d1c..1019e0f 100644
--- a/arch/sparc/kernel/pci_sun4v.h
+++ b/arch/sparc/kernel/pci_sun4v.h
@@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle,
   unsigned long page_size,
   unsigned long dvma_base,
   u64 *iotsb_num);
+unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle,
+  unsigned long iotsb_num,
+  unsigned int pci_device);
 #endif /* !(_PCI_SUN4V_H) */
diff --git a/arch/sparc/kernel/pci_sun4v_asm.S 
b/arch/sparc/kernel/pci_sun4v_asm.S
index fd94d0e..22024a9 100644
--- a/arch/sparc/kernel/pci_sun4v_asm.S
+++ b/arch/sparc/kernel/pci_sun4v_asm.S
@@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf)
retl
 stx%o1, [%g1]
 ENDPROC(pci_sun4v_iotsb_conf)
+
+   /*
+* %o0: devhandle
+* %o1: iotsb_num/iotsb_handle
+* %o2: pci_device
+*
+* returns %o0: status
+*/
+ENTRY(pci_sun4v_iotsb_bind)
+   mov HV_FAST_PCI_IOTSB_BIND, %o5
+   ta  HV_FAST_TRAP
+   retl
+nop
+ENDPROC(pci_sun4v_iotsb_bind)
-- 
1.9.1



[PATCH v3 2/6] sparc64: Add ATU (new IOMMU) support

2016-10-28 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
Hypervisor IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
instance. When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/hypervisor.h | 337 
 arch/sparc/include/asm/iommu_64.h   |  26 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/pci_sun4v.c   | 140 +++
 arch/sparc/kernel/pci_sun4v.h   |   7 +
 arch/sparc/kernel/pci_sun4v_asm.S   |  18 ++
 6 files changed, 529 insertions(+)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 666d5ba..7b15df8 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  */
 #define HV_FAST_PCI_MSG_SETVALID   0xd3
 
+/* PCI IOMMU v2 definitions and services
+ *
+ * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO
+ * definitions and services.
+ *
+ * CTE Clump Table Entry. First level table entry in the ATU.
+ *
+ * pci_device_list
+ * A 32-bit aligned list of pci_devices.
+ *
+ * pci_device_listp
+ * real address of a pci_device_list. 32-bit aligned.
+ *
+ * iotte   IOMMU translation table entry.
+ *
+ * iotte_attributes
+ * IO Attributes for IOMMU v2 mappings. In addition to
+ * read, write IOMMU v2 supports relax ordering
+ *
+ * io_page_listA 64-bit aligned list of real addresses. Each real
+ * address in an io_page_list must be properly aligned
+ * to the pagesize of the given IOTSB.
+ *
+ * io_page_list_p  Real address of an io_page_list, 64-bit aligned.
+ *
+ * IOTSB   IO Translation Storage Buffer. An aligned table of
+ * IOTTEs. Each IOTSB has a pagesize, table size, and
+ * virtual address associated with it that must match
+ * a pagesize and table size supported by the un-derlying
+ * hardware implementation. The alignment requirements
+ * for an IOTSB depend on the pagesize used for that IOTSB.
+ * Each IOTTE in an IOTSB maps one pagesize-sized page.
+ * The size of the IOTSB dictates how large of a virtual
+ * address space the IOTSB is capable of mapping.
+ *
+ * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus
+ * iotsb_handle represents a binding of an IOTSB to a
+ * PCI root complex.
+ *
+ * iotsb_index Zero-based IOTTE number within an IOTSB.
+ */
+
+/* pci_iotsb_conf()
+ * TRAP:   HV_FAST_TRAP
+ * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
+ * ARG0:   devhandle
+ * ARG1:   r_addr
+ * ARG2:   size
+ * ARG3:   pagesize
+ * ARG4:   iova
+ * RET0:   status
+ * RET1:   iotsb_handle
+ * ERRORS: EINVAL  Invalid devhandle, size, iova, or pagesize
+ * EBADALIGN   r_addr is not properly aligned
+ * ENORADDRr_addr is not a valid real address
+ * ETOOMANYNo further IOTSBs may be configured
+ * EBUSY   Duplicate devhandle, raddir, iova combination
+ *
+ * Create an IOTSB suitable for the PCI root complex identified by devhandle,
+ * for the DMA virtual address defined by the argument iova.
+ *
+ * r_addr is the properly aligned base address of the IOTSB and size is the
+ * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to
+ * being configured. If it contains any values other than zeros then the
+ * behavior is undefined.
+ *
+ * pagesize is the size of each page in the IOTSB. Note that the combination of
+ * size (table size) and pagesize must be valid.
+ *
+ * virt is the DMA virtual address this IOTSB will map.
+ *
+ * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1.
+ * Once configured,

[PATCH v3 2/6] sparc64: Add ATU (new IOMMU) support

2016-10-28 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
Hypervisor IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
instance. When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/hypervisor.h | 337 
 arch/sparc/include/asm/iommu_64.h   |  26 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/pci_sun4v.c   | 140 +++
 arch/sparc/kernel/pci_sun4v.h   |   7 +
 arch/sparc/kernel/pci_sun4v_asm.S   |  18 ++
 6 files changed, 529 insertions(+)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 666d5ba..7b15df8 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  */
 #define HV_FAST_PCI_MSG_SETVALID   0xd3
 
+/* PCI IOMMU v2 definitions and services
+ *
+ * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO
+ * definitions and services.
+ *
+ * CTE Clump Table Entry. First level table entry in the ATU.
+ *
+ * pci_device_list
+ * A 32-bit aligned list of pci_devices.
+ *
+ * pci_device_listp
+ * real address of a pci_device_list. 32-bit aligned.
+ *
+ * iotte   IOMMU translation table entry.
+ *
+ * iotte_attributes
+ * IO Attributes for IOMMU v2 mappings. In addition to
+ * read, write IOMMU v2 supports relax ordering
+ *
+ * io_page_listA 64-bit aligned list of real addresses. Each real
+ * address in an io_page_list must be properly aligned
+ * to the pagesize of the given IOTSB.
+ *
+ * io_page_list_p  Real address of an io_page_list, 64-bit aligned.
+ *
+ * IOTSB   IO Translation Storage Buffer. An aligned table of
+ * IOTTEs. Each IOTSB has a pagesize, table size, and
+ * virtual address associated with it that must match
+ * a pagesize and table size supported by the un-derlying
+ * hardware implementation. The alignment requirements
+ * for an IOTSB depend on the pagesize used for that IOTSB.
+ * Each IOTTE in an IOTSB maps one pagesize-sized page.
+ * The size of the IOTSB dictates how large of a virtual
+ * address space the IOTSB is capable of mapping.
+ *
+ * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus
+ * iotsb_handle represents a binding of an IOTSB to a
+ * PCI root complex.
+ *
+ * iotsb_index Zero-based IOTTE number within an IOTSB.
+ */
+
+/* pci_iotsb_conf()
+ * TRAP:   HV_FAST_TRAP
+ * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
+ * ARG0:   devhandle
+ * ARG1:   r_addr
+ * ARG2:   size
+ * ARG3:   pagesize
+ * ARG4:   iova
+ * RET0:   status
+ * RET1:   iotsb_handle
+ * ERRORS: EINVAL  Invalid devhandle, size, iova, or pagesize
+ * EBADALIGN   r_addr is not properly aligned
+ * ENORADDRr_addr is not a valid real address
+ * ETOOMANYNo further IOTSBs may be configured
+ * EBUSY   Duplicate devhandle, raddir, iova combination
+ *
+ * Create an IOTSB suitable for the PCI root complex identified by devhandle,
+ * for the DMA virtual address defined by the argument iova.
+ *
+ * r_addr is the properly aligned base address of the IOTSB and size is the
+ * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to
+ * being configured. If it contains any values other than zeros then the
+ * behavior is undefined.
+ *
+ * pagesize is the size of each page in the IOTSB. Note that the combination of
+ * size (table size) and pagesize must be valid.
+ *
+ * virt is the DMA virtual address this IOTSB will map.
+ *
+ * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1.
+ * Once configured, privileged access to the IOTSB memory is prohibited and
+ * creates undefined behavior. The only

[PATCH v3 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs

2016-10-28 Thread Tushar Dave
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
64bit DMA mask.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/hypervisor.h |   6 +
 arch/sparc/kernel/pci_sun4v.c   | 216 ++--
 arch/sparc/kernel/pci_sun4v.h   |  11 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  36 ++
 4 files changed, 211 insertions(+), 58 deletions(-)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 7b15df8..73cb897 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  * iotsb_index Zero-based IOTTE number within an IOTSB.
  */
 
+/* The index_count argument consists of two fields:
+ * bits 63:48 #iottes and bits 47:0 iotsb_index
+ */
+#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \
+   (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index)))
+
 /* pci_iotsb_conf()
  * TRAP:   HV_FAST_TRAP
  * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index d4208aa..06981cc 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -72,34 +72,57 @@ static inline void iommu_batch_start(struct device *dev, 
unsigned long prot, uns
 }
 
 /* Interrupts must be disabled.  */
-static long iommu_batch_flush(struct iommu_batch *p)
+static long iommu_batch_flush(struct iommu_batch *p, u64 mask)
 {
struct pci_pbm_info *pbm = p->dev->archdata.host_controller;
+   u64 *pglist = p->pglist;
+   u64 index_count;
unsigned long devhandle = pbm->devhandle;
unsigned long prot = p->prot;
unsigned long entry = p->entry;
-   u64 *pglist = p->pglist;
unsigned long npages = p->npages;
+   unsigned long iotsb_num;
+   unsigned long ret;
+   long num;
 
/* VPCI maj=1, min=[0,1] only supports read and write */
if (vpci_major < 2)
prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE);
 
while (npages != 0) {
-   long num;
-
-   num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry),
- npages, prot, __pa(pglist));
-   if (unlikely(num < 0)) {
-   if (printk_ratelimit())
-   printk("iommu_batch_flush: IOMMU map of "
-  "[%08lx:%08llx:%lx:%lx:%lx] failed with "
-  "status %ld\n",
-  devhandle, HV_PCI_TSBID(0, entry),
-  npages, prot, __pa(pglist), num);
-   return -1;
+   if (mask <= DMA_BIT_MASK(32)) {
+   num = pci_sun4v_iommu_map(devhandle,
+ HV_PCI_TSBID(0, entry),
+ npages,
+ prot,
+ __pa(pglist));
+   if (unlikely(num < 0)) {
+   pr_err_ratelimited("%s: IOMMU map of 
[%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n",
+  __func__,
+  devhandle,
+  HV_PCI_TSBID(0, entry),
+  npages, prot, __pa(pglist),
+  num);
+   return -1;
+   }
+   } else {
+   index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry),
+   iotsb_num = pbm->iommu->atu->iotsb->iotsb_num;
+   ret = pci_sun4v_iotsb_map(devhandle,
+ iotsb_num,
+ index_count,
+ prot,
+ __pa(pglist),
+ );
+   if (unlikely(ret != HV_EOK)) {
+   pr_err_ratelimited("%s: ATU map of 
[%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n",
+  __func__,
+  devhandle, iotsb_num,
+  index

[PATCH v3 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs

2016-10-28 Thread Tushar Dave
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
64bit DMA mask.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/hypervisor.h |   6 +
 arch/sparc/kernel/pci_sun4v.c   | 216 ++--
 arch/sparc/kernel/pci_sun4v.h   |  11 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  36 ++
 4 files changed, 211 insertions(+), 58 deletions(-)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 7b15df8..73cb897 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  * iotsb_index Zero-based IOTTE number within an IOTSB.
  */
 
+/* The index_count argument consists of two fields:
+ * bits 63:48 #iottes and bits 47:0 iotsb_index
+ */
+#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \
+   (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index)))
+
 /* pci_iotsb_conf()
  * TRAP:   HV_FAST_TRAP
  * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index d4208aa..06981cc 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -72,34 +72,57 @@ static inline void iommu_batch_start(struct device *dev, 
unsigned long prot, uns
 }
 
 /* Interrupts must be disabled.  */
-static long iommu_batch_flush(struct iommu_batch *p)
+static long iommu_batch_flush(struct iommu_batch *p, u64 mask)
 {
struct pci_pbm_info *pbm = p->dev->archdata.host_controller;
+   u64 *pglist = p->pglist;
+   u64 index_count;
unsigned long devhandle = pbm->devhandle;
unsigned long prot = p->prot;
unsigned long entry = p->entry;
-   u64 *pglist = p->pglist;
unsigned long npages = p->npages;
+   unsigned long iotsb_num;
+   unsigned long ret;
+   long num;
 
/* VPCI maj=1, min=[0,1] only supports read and write */
if (vpci_major < 2)
prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE);
 
while (npages != 0) {
-   long num;
-
-   num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry),
- npages, prot, __pa(pglist));
-   if (unlikely(num < 0)) {
-   if (printk_ratelimit())
-   printk("iommu_batch_flush: IOMMU map of "
-  "[%08lx:%08llx:%lx:%lx:%lx] failed with "
-  "status %ld\n",
-  devhandle, HV_PCI_TSBID(0, entry),
-  npages, prot, __pa(pglist), num);
-   return -1;
+   if (mask <= DMA_BIT_MASK(32)) {
+   num = pci_sun4v_iommu_map(devhandle,
+ HV_PCI_TSBID(0, entry),
+ npages,
+ prot,
+ __pa(pglist));
+   if (unlikely(num < 0)) {
+   pr_err_ratelimited("%s: IOMMU map of 
[%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n",
+  __func__,
+  devhandle,
+  HV_PCI_TSBID(0, entry),
+  npages, prot, __pa(pglist),
+  num);
+   return -1;
+   }
+   } else {
+   index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry),
+   iotsb_num = pbm->iommu->atu->iotsb->iotsb_num;
+   ret = pci_sun4v_iotsb_map(devhandle,
+ iotsb_num,
+ index_count,
+ prot,
+ __pa(pglist),
+ );
+   if (unlikely(ret != HV_EOK)) {
+   pr_err_ratelimited("%s: ATU map of 
[%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n",
+  __func__,
+  devhandle, iotsb_num,
+  index_count, prot,
+  __pa(pglist), ret);
+   return -1;
+

[PATCH v3 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13

2016-10-28 Thread Tushar Dave
From: Dave Kleikamp <dave.kleik...@oracle.com>

This change allows ATU (new IOMMU) in SPARC systems to request
large (32M) contiguous memory during boot for creating IOTSB backing
store.

Signed-off-by: Dave Kleikamp <dave.kleik...@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
---
 arch/sparc/Kconfig | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index b23c76b..5202eb4 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -89,6 +89,10 @@ config ARCH_DEFCONFIG
 config ARCH_PROC_KCORE_TEXT
def_bool y
 
+config ARCH_ATU
+   bool
+   default y if SPARC64
+
 config IOMMU_HELPER
bool
default y if SPARC64
@@ -304,6 +308,20 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64
 
+config FORCE_MAX_ZONEORDER
+   int "Maximum zone order"
+   default "13"
+   help
+ The kernel memory allocator divides physically contiguous memory
+ blocks into "zones", where each zone is a power of two number of
+ pages.  This option selects the largest power of two that the kernel
+ keeps in the memory allocator.  If you need to allocate very large
+ blocks of physically contiguous memory, then you may need to
+ increase this value.
+
+ This config option is actually maximum order plus one. For example,
+ a value of 13 means that the largest free memory block is 2^12 pages.
+
 source "mm/Kconfig"
 
 if SPARC64
-- 
1.9.1



[PATCH v3 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13

2016-10-28 Thread Tushar Dave
From: Dave Kleikamp 

This change allows ATU (new IOMMU) in SPARC systems to request
large (32M) contiguous memory during boot for creating IOTSB backing
store.

Signed-off-by: Dave Kleikamp 
Signed-off-by: Tushar Dave 
---
 arch/sparc/Kconfig | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index b23c76b..5202eb4 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -89,6 +89,10 @@ config ARCH_DEFCONFIG
 config ARCH_PROC_KCORE_TEXT
def_bool y
 
+config ARCH_ATU
+   bool
+   default y if SPARC64
+
 config IOMMU_HELPER
bool
default y if SPARC64
@@ -304,6 +308,20 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64
 
+config FORCE_MAX_ZONEORDER
+   int "Maximum zone order"
+   default "13"
+   help
+ The kernel memory allocator divides physically contiguous memory
+ blocks into "zones", where each zone is a power of two number of
+ pages.  This option selects the largest power of two that the kernel
+ keeps in the memory allocator.  If you need to allocate very large
+ blocks of physically contiguous memory, then you may need to
+ increase this value.
+
+ This config option is actually maximum order plus one. For example,
+ a value of 13 means that the largest free memory block is 2^12 pages.
+
 source "mm/Kconfig"
 
 if SPARC64
-- 
1.9.1



[PATCH v3 3/6] sparc64: Initialize iommu_map_table and iommu_pool

2016-10-28 Thread Tushar Dave
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
This change initializes iommu_map_table and iommu_pool for ATU.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/iommu_64.h |  2 ++
 arch/sparc/kernel/pci_sun4v.c | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/arch/sparc/include/asm/iommu_64.h 
b/arch/sparc/include/asm/iommu_64.h
index 93daa59..f24f356 100644
--- a/arch/sparc/include/asm/iommu_64.h
+++ b/arch/sparc/include/asm/iommu_64.h
@@ -45,8 +45,10 @@ struct atu_ranges {
 struct atu {
struct  atu_ranges  *ranges;
struct  atu_iotsb   *iotsb;
+   struct  iommu_map_table tbl;
u64 base;
u64 size;
+   u64 dma_addr_mask;
 };
 
 struct iommu {
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 2afb86c..242477c 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -644,6 +644,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
struct atu *atu = pbm->iommu->atu;
unsigned long err;
const u64 *ranges;
+   u64 map_size, num_iotte;
+   u64 dma_mask;
const u32 *page_size;
int len;
 
@@ -682,6 +684,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
return err;
}
 
+   /* Create ATU iommu map.
+* One bit represents one iotte in IOTSB table.
+*/
+   dma_mask = (roundup_pow_of_two(atu->size) - 1UL);
+   num_iotte = atu->size / IO_PAGE_SIZE;
+   map_size = num_iotte / 8;
+   atu->tbl.table_map_base = atu->base;
+   atu->dma_addr_mask = dma_mask;
+   atu->tbl.map = kzalloc(map_size, GFP_KERNEL);
+   if (!atu->tbl.map)
+   return -ENOMEM;
+
+   iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT,
+   NULL, false /* no large_pool */,
+   0 /* default npools */,
+   false /* want span boundary checking */);
+
return 0;
 }
 
-- 
1.9.1



[PATCH v3 3/6] sparc64: Initialize iommu_map_table and iommu_pool

2016-10-28 Thread Tushar Dave
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
This change initializes iommu_map_table and iommu_pool for ATU.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Reviewed-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/iommu_64.h |  2 ++
 arch/sparc/kernel/pci_sun4v.c | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/arch/sparc/include/asm/iommu_64.h 
b/arch/sparc/include/asm/iommu_64.h
index 93daa59..f24f356 100644
--- a/arch/sparc/include/asm/iommu_64.h
+++ b/arch/sparc/include/asm/iommu_64.h
@@ -45,8 +45,10 @@ struct atu_ranges {
 struct atu {
struct  atu_ranges  *ranges;
struct  atu_iotsb   *iotsb;
+   struct  iommu_map_table tbl;
u64 base;
u64 size;
+   u64 dma_addr_mask;
 };
 
 struct iommu {
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 2afb86c..242477c 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -644,6 +644,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
struct atu *atu = pbm->iommu->atu;
unsigned long err;
const u64 *ranges;
+   u64 map_size, num_iotte;
+   u64 dma_mask;
const u32 *page_size;
int len;
 
@@ -682,6 +684,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
return err;
}
 
+   /* Create ATU iommu map.
+* One bit represents one iotte in IOTSB table.
+*/
+   dma_mask = (roundup_pow_of_two(atu->size) - 1UL);
+   num_iotte = atu->size / IO_PAGE_SIZE;
+   map_size = num_iotte / 8;
+   atu->tbl.table_map_base = atu->base;
+   atu->dma_addr_mask = dma_mask;
+   atu->tbl.map = kzalloc(map_size, GFP_KERNEL);
+   if (!atu->tbl.map)
+   return -ENOMEM;
+
+   iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT,
+   NULL, false /* no large_pool */,
+   0 /* default npools */,
+   false /* want span boundary checking */);
+
return 0;
 }
 
-- 
1.9.1



[PATCH v3 6/6] sparc64: Enable 64-bit DMA

2016-10-28 Thread Tushar Dave
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
to use ATU for 64bit DMA.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/Kconfig| 4 
 arch/sparc/kernel/iommu.c | 8 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 5202eb4..60145c9 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -93,6 +93,10 @@ config ARCH_ATU
bool
default y if SPARC64
 
+config ARCH_DMA_ADDR_T_64BIT
+   bool
+   default y if ARCH_ATU
+
 config IOMMU_HELPER
bool
default y if SPARC64
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 5c615ab..852a329 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask)
struct iommu *iommu = dev->archdata.iommu;
u64 dma_addr_mask = iommu->dma_addr_mask;
 
-   if (device_mask >= (1UL << 32UL))
-   return 0;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }
 
if ((device_mask & dma_addr_mask) == dma_addr_mask)
return 1;
-- 
1.9.1



[PATCH v3 6/6] sparc64: Enable 64-bit DMA

2016-10-28 Thread Tushar Dave
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
to use ATU for 64bit DMA.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/Kconfig| 4 
 arch/sparc/kernel/iommu.c | 8 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 5202eb4..60145c9 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -93,6 +93,10 @@ config ARCH_ATU
bool
default y if SPARC64
 
+config ARCH_DMA_ADDR_T_64BIT
+   bool
+   default y if ARCH_ATU
+
 config IOMMU_HELPER
bool
default y if SPARC64
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 5c615ab..852a329 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask)
struct iommu *iommu = dev->archdata.iommu;
u64 dma_addr_mask = iommu->dma_addr_mask;
 
-   if (device_mask >= (1UL << 32UL))
-   return 0;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }
 
if ((device_mask & dma_addr_mask) == dma_addr_mask)
return 1;
-- 
1.9.1



[PATCH v2 2/6] sparc64: Add ATU (new IOMMU) support

2016-10-27 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
Hypervisor IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
instance. When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/hypervisor.h | 337 
 arch/sparc/include/asm/iommu_64.h   |  26 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/pci_sun4v.c   | 140 +++
 arch/sparc/kernel/pci_sun4v.h   |   7 +
 arch/sparc/kernel/pci_sun4v_asm.S   |  18 ++
 6 files changed, 529 insertions(+)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 666d5ba..7b15df8 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  */
 #define HV_FAST_PCI_MSG_SETVALID   0xd3
 
+/* PCI IOMMU v2 definitions and services
+ *
+ * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO
+ * definitions and services.
+ *
+ * CTE Clump Table Entry. First level table entry in the ATU.
+ *
+ * pci_device_list
+ * A 32-bit aligned list of pci_devices.
+ *
+ * pci_device_listp
+ * real address of a pci_device_list. 32-bit aligned.
+ *
+ * iotte   IOMMU translation table entry.
+ *
+ * iotte_attributes
+ * IO Attributes for IOMMU v2 mappings. In addition to
+ * read, write IOMMU v2 supports relax ordering
+ *
+ * io_page_listA 64-bit aligned list of real addresses. Each real
+ * address in an io_page_list must be properly aligned
+ * to the pagesize of the given IOTSB.
+ *
+ * io_page_list_p  Real address of an io_page_list, 64-bit aligned.
+ *
+ * IOTSB   IO Translation Storage Buffer. An aligned table of
+ * IOTTEs. Each IOTSB has a pagesize, table size, and
+ * virtual address associated with it that must match
+ * a pagesize and table size supported by the un-derlying
+ * hardware implementation. The alignment requirements
+ * for an IOTSB depend on the pagesize used for that IOTSB.
+ * Each IOTTE in an IOTSB maps one pagesize-sized page.
+ * The size of the IOTSB dictates how large of a virtual
+ * address space the IOTSB is capable of mapping.
+ *
+ * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus
+ * iotsb_handle represents a binding of an IOTSB to a
+ * PCI root complex.
+ *
+ * iotsb_index Zero-based IOTTE number within an IOTSB.
+ */
+
+/* pci_iotsb_conf()
+ * TRAP:   HV_FAST_TRAP
+ * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
+ * ARG0:   devhandle
+ * ARG1:   r_addr
+ * ARG2:   size
+ * ARG3:   pagesize
+ * ARG4:   iova
+ * RET0:   status
+ * RET1:   iotsb_handle
+ * ERRORS: EINVAL  Invalid devhandle, size, iova, or pagesize
+ * EBADALIGN   r_addr is not properly aligned
+ * ENORADDRr_addr is not a valid real address
+ * ETOOMANYNo further IOTSBs may be configured
+ * EBUSY   Duplicate devhandle, raddir, iova combination
+ *
+ * Create an IOTSB suitable for the PCI root complex identified by devhandle,
+ * for the DMA virtual address defined by the argument iova.
+ *
+ * r_addr is the properly aligned base address of the IOTSB and size is the
+ * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to
+ * being configured. If it contains any values other than zeros then the
+ * behavior is undefined.
+ *
+ * pagesize is the size of each page in the IOTSB. Note that the combination of
+ * size (table size) and pagesize must be valid.
+ *
+ * virt is the DMA virtual address this IOTSB will map.
+ *
+ * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1.
+ * Once configured,

[PATCH v2 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service

2016-10-27 Thread Tushar Dave
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
device has to be bound to IOTSB using HV API pci_iotsb_bind().

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/kernel/pci_sun4v.c | 43 +++
 arch/sparc/kernel/pci_sun4v.h |  3 +++
 arch/sparc/kernel/pci_sun4v_asm.S | 14 +
 3 files changed, 60 insertions(+)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 242477c..d4208aa 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -216,6 +216,43 @@ static void *dma_4v_alloc_coherent(struct device *dev, 
size_t size,
return NULL;
 }
 
+unsigned long dma_4v_iotsb_bind(unsigned long devhandle,
+   unsigned long iotsb_num,
+   struct pci_bus *bus_dev)
+{
+   struct pci_dev *pdev;
+   unsigned long err;
+   unsigned int bus;
+   unsigned int device;
+   unsigned int fun;
+
+   list_for_each_entry(pdev, _dev->devices, bus_list) {
+   if (pdev->subordinate) {
+   /* No need to bind pci bridge */
+   dma_4v_iotsb_bind(devhandle, iotsb_num,
+ pdev->subordinate);
+   } else {
+   bus = bus_dev->number;
+   device = PCI_SLOT(pdev->devfn);
+   fun = PCI_FUNC(pdev->devfn);
+   err = pci_sun4v_iotsb_bind(devhandle, iotsb_num,
+  HV_PCI_DEVICE_BUILD(bus,
+  device,
+  fun));
+
+   /* If bind fails for one device it is going to fail
+* for rest of the devices because we are sharing
+* IOTSB. So in case of failure simply return with
+* error.
+*/
+   if (err)
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry,
   unsigned long npages)
 {
@@ -629,6 +666,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info 
*pbm)
}
iotsb->iotsb_num = iotsb_num;
 
+   err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus);
+   if (err) {
+   pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err);
+   goto iotsb_conf_failed;
+   }
+
return 0;
 
 iotsb_conf_failed:
diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h
index 0ef6d1c..1019e0f 100644
--- a/arch/sparc/kernel/pci_sun4v.h
+++ b/arch/sparc/kernel/pci_sun4v.h
@@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle,
   unsigned long page_size,
   unsigned long dvma_base,
   u64 *iotsb_num);
+unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle,
+  unsigned long iotsb_num,
+  unsigned int pci_device);
 #endif /* !(_PCI_SUN4V_H) */
diff --git a/arch/sparc/kernel/pci_sun4v_asm.S 
b/arch/sparc/kernel/pci_sun4v_asm.S
index fd94d0e..22024a9 100644
--- a/arch/sparc/kernel/pci_sun4v_asm.S
+++ b/arch/sparc/kernel/pci_sun4v_asm.S
@@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf)
retl
 stx%o1, [%g1]
 ENDPROC(pci_sun4v_iotsb_conf)
+
+   /*
+* %o0: devhandle
+* %o1: iotsb_num/iotsb_handle
+* %o2: pci_device
+*
+* returns %o0: status
+*/
+ENTRY(pci_sun4v_iotsb_bind)
+   mov HV_FAST_PCI_IOTSB_BIND, %o5
+   ta  HV_FAST_TRAP
+   retl
+nop
+ENDPROC(pci_sun4v_iotsb_bind)
-- 
1.9.1



[PATCH v2 2/6] sparc64: Add ATU (new IOMMU) support

2016-10-27 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
Hypervisor IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
instance. When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/hypervisor.h | 337 
 arch/sparc/include/asm/iommu_64.h   |  26 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/pci_sun4v.c   | 140 +++
 arch/sparc/kernel/pci_sun4v.h   |   7 +
 arch/sparc/kernel/pci_sun4v_asm.S   |  18 ++
 6 files changed, 529 insertions(+)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 666d5ba..7b15df8 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  */
 #define HV_FAST_PCI_MSG_SETVALID   0xd3
 
+/* PCI IOMMU v2 definitions and services
+ *
+ * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO
+ * definitions and services.
+ *
+ * CTE Clump Table Entry. First level table entry in the ATU.
+ *
+ * pci_device_list
+ * A 32-bit aligned list of pci_devices.
+ *
+ * pci_device_listp
+ * real address of a pci_device_list. 32-bit aligned.
+ *
+ * iotte   IOMMU translation table entry.
+ *
+ * iotte_attributes
+ * IO Attributes for IOMMU v2 mappings. In addition to
+ * read, write IOMMU v2 supports relax ordering
+ *
+ * io_page_listA 64-bit aligned list of real addresses. Each real
+ * address in an io_page_list must be properly aligned
+ * to the pagesize of the given IOTSB.
+ *
+ * io_page_list_p  Real address of an io_page_list, 64-bit aligned.
+ *
+ * IOTSB   IO Translation Storage Buffer. An aligned table of
+ * IOTTEs. Each IOTSB has a pagesize, table size, and
+ * virtual address associated with it that must match
+ * a pagesize and table size supported by the un-derlying
+ * hardware implementation. The alignment requirements
+ * for an IOTSB depend on the pagesize used for that IOTSB.
+ * Each IOTTE in an IOTSB maps one pagesize-sized page.
+ * The size of the IOTSB dictates how large of a virtual
+ * address space the IOTSB is capable of mapping.
+ *
+ * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus
+ * iotsb_handle represents a binding of an IOTSB to a
+ * PCI root complex.
+ *
+ * iotsb_index Zero-based IOTTE number within an IOTSB.
+ */
+
+/* pci_iotsb_conf()
+ * TRAP:   HV_FAST_TRAP
+ * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
+ * ARG0:   devhandle
+ * ARG1:   r_addr
+ * ARG2:   size
+ * ARG3:   pagesize
+ * ARG4:   iova
+ * RET0:   status
+ * RET1:   iotsb_handle
+ * ERRORS: EINVAL  Invalid devhandle, size, iova, or pagesize
+ * EBADALIGN   r_addr is not properly aligned
+ * ENORADDRr_addr is not a valid real address
+ * ETOOMANYNo further IOTSBs may be configured
+ * EBUSY   Duplicate devhandle, raddir, iova combination
+ *
+ * Create an IOTSB suitable for the PCI root complex identified by devhandle,
+ * for the DMA virtual address defined by the argument iova.
+ *
+ * r_addr is the properly aligned base address of the IOTSB and size is the
+ * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to
+ * being configured. If it contains any values other than zeros then the
+ * behavior is undefined.
+ *
+ * pagesize is the size of each page in the IOTSB. Note that the combination of
+ * size (table size) and pagesize must be valid.
+ *
+ * virt is the DMA virtual address this IOTSB will map.
+ *
+ * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1.
+ * Once configured, privileged access to the IOTSB memory is prohibited and
+ * creates undefined behavior. The only

[PATCH v2 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service

2016-10-27 Thread Tushar Dave
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
device has to be bound to IOTSB using HV API pci_iotsb_bind().

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/kernel/pci_sun4v.c | 43 +++
 arch/sparc/kernel/pci_sun4v.h |  3 +++
 arch/sparc/kernel/pci_sun4v_asm.S | 14 +
 3 files changed, 60 insertions(+)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 242477c..d4208aa 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -216,6 +216,43 @@ static void *dma_4v_alloc_coherent(struct device *dev, 
size_t size,
return NULL;
 }
 
+unsigned long dma_4v_iotsb_bind(unsigned long devhandle,
+   unsigned long iotsb_num,
+   struct pci_bus *bus_dev)
+{
+   struct pci_dev *pdev;
+   unsigned long err;
+   unsigned int bus;
+   unsigned int device;
+   unsigned int fun;
+
+   list_for_each_entry(pdev, _dev->devices, bus_list) {
+   if (pdev->subordinate) {
+   /* No need to bind pci bridge */
+   dma_4v_iotsb_bind(devhandle, iotsb_num,
+ pdev->subordinate);
+   } else {
+   bus = bus_dev->number;
+   device = PCI_SLOT(pdev->devfn);
+   fun = PCI_FUNC(pdev->devfn);
+   err = pci_sun4v_iotsb_bind(devhandle, iotsb_num,
+  HV_PCI_DEVICE_BUILD(bus,
+  device,
+  fun));
+
+   /* If bind fails for one device it is going to fail
+* for rest of the devices because we are sharing
+* IOTSB. So in case of failure simply return with
+* error.
+*/
+   if (err)
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry,
   unsigned long npages)
 {
@@ -629,6 +666,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info 
*pbm)
}
iotsb->iotsb_num = iotsb_num;
 
+   err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus);
+   if (err) {
+   pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err);
+   goto iotsb_conf_failed;
+   }
+
return 0;
 
 iotsb_conf_failed:
diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h
index 0ef6d1c..1019e0f 100644
--- a/arch/sparc/kernel/pci_sun4v.h
+++ b/arch/sparc/kernel/pci_sun4v.h
@@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle,
   unsigned long page_size,
   unsigned long dvma_base,
   u64 *iotsb_num);
+unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle,
+  unsigned long iotsb_num,
+  unsigned int pci_device);
 #endif /* !(_PCI_SUN4V_H) */
diff --git a/arch/sparc/kernel/pci_sun4v_asm.S 
b/arch/sparc/kernel/pci_sun4v_asm.S
index fd94d0e..22024a9 100644
--- a/arch/sparc/kernel/pci_sun4v_asm.S
+++ b/arch/sparc/kernel/pci_sun4v_asm.S
@@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf)
retl
 stx%o1, [%g1]
 ENDPROC(pci_sun4v_iotsb_conf)
+
+   /*
+* %o0: devhandle
+* %o1: iotsb_num/iotsb_handle
+* %o2: pci_device
+*
+* returns %o0: status
+*/
+ENTRY(pci_sun4v_iotsb_bind)
+   mov HV_FAST_PCI_IOTSB_BIND, %o5
+   ta  HV_FAST_TRAP
+   retl
+nop
+ENDPROC(pci_sun4v_iotsb_bind)
-- 
1.9.1



[PATCH v2 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs

2016-10-27 Thread Tushar Dave
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
64bit DMA mask.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/hypervisor.h |   6 +
 arch/sparc/kernel/pci_sun4v.c   | 214 ++--
 arch/sparc/kernel/pci_sun4v.h   |  11 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  36 ++
 4 files changed, 209 insertions(+), 58 deletions(-)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 7b15df8..73cb897 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  * iotsb_index Zero-based IOTTE number within an IOTSB.
  */
 
+/* The index_count argument consists of two fields:
+ * bits 63:48 #iottes and bits 47:0 iotsb_index
+ */
+#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \
+   (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index)))
+
 /* pci_iotsb_conf()
  * TRAP:   HV_FAST_TRAP
  * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index d4208aa..d6ab2b9 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, 
unsigned long prot, uns
 }
 
 /* Interrupts must be disabled.  */
-static long iommu_batch_flush(struct iommu_batch *p)
+static long iommu_batch_flush(struct iommu_batch *p, u64 mask)
 {
struct pci_pbm_info *pbm = p->dev->archdata.host_controller;
+   u64 *pglist = p->pglist;
+   u64 index_count;
unsigned long devhandle = pbm->devhandle;
unsigned long prot = p->prot;
unsigned long entry = p->entry;
-   u64 *pglist = p->pglist;
unsigned long npages = p->npages;
+   unsigned long iotsb_num;
+   unsigned long ret;
+   long num;
 
/* VPCI maj=1, min=[0,1] only supports read and write */
if (vpci_major < 2)
prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE);
 
while (npages != 0) {
-   long num;
-
-   num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry),
- npages, prot, __pa(pglist));
-   if (unlikely(num < 0)) {
-   if (printk_ratelimit())
-   printk("iommu_batch_flush: IOMMU map of "
-  "[%08lx:%08llx:%lx:%lx:%lx] failed with "
-  "status %ld\n",
-  devhandle, HV_PCI_TSBID(0, entry),
-  npages, prot, __pa(pglist), num);
-   return -1;
+   if (mask <= DMA_BIT_MASK(32)) {
+   num = pci_sun4v_iommu_map(devhandle,
+ HV_PCI_TSBID(0, entry),
+ npages,
+ prot,
+ __pa(pglist));
+   if (unlikely(num < 0)) {
+   pr_err_ratelimited("iommu_batch_flush: IOMMU 
map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n",
+  devhandle,
+  HV_PCI_TSBID(0, entry),
+  npages, prot, __pa(pglist),
+  num);
+   return -1;
+   }
+   } else {
+   index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry),
+   iotsb_num = pbm->iommu->atu->iotsb->iotsb_num;
+   ret = pci_sun4v_iotsb_map(devhandle,
+ iotsb_num,
+ index_count,
+ prot,
+ __pa(pglist),
+ );
+   if (unlikely(ret != HV_EOK)) {
+   pr_err_ratelimited("iommu_batch_flush: ATU map 
of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n",
+  devhandle, iotsb_num,
+  index_count, prot,
+  __pa(pglist), ret);
+   return -1;
+ 

[PATCH v2 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13

2016-10-27 Thread Tushar Dave
From: Dave Kleikamp <dave.kleik...@oracle.com>

This change allows ATU (new IOMMU) in SPARC systems to request
large (32M) contiguous memory during boot for creating IOTSB backing
store.

Signed-off-by: Dave Kleikamp <dave.kleik...@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
---
 arch/sparc/Kconfig | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index b23c76b..5202eb4 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -89,6 +89,10 @@ config ARCH_DEFCONFIG
 config ARCH_PROC_KCORE_TEXT
def_bool y
 
+config ARCH_ATU
+   bool
+   default y if SPARC64
+
 config IOMMU_HELPER
bool
default y if SPARC64
@@ -304,6 +308,20 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64
 
+config FORCE_MAX_ZONEORDER
+   int "Maximum zone order"
+   default "13"
+   help
+ The kernel memory allocator divides physically contiguous memory
+ blocks into "zones", where each zone is a power of two number of
+ pages.  This option selects the largest power of two that the kernel
+ keeps in the memory allocator.  If you need to allocate very large
+ blocks of physically contiguous memory, then you may need to
+ increase this value.
+
+ This config option is actually maximum order plus one. For example,
+ a value of 13 means that the largest free memory block is 2^12 pages.
+
 source "mm/Kconfig"
 
 if SPARC64
-- 
1.9.1



[PATCH v2 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13

2016-10-27 Thread Tushar Dave
From: Dave Kleikamp 

This change allows ATU (new IOMMU) in SPARC systems to request
large (32M) contiguous memory during boot for creating IOTSB backing
store.

Signed-off-by: Dave Kleikamp 
Signed-off-by: Tushar Dave 
---
 arch/sparc/Kconfig | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index b23c76b..5202eb4 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -89,6 +89,10 @@ config ARCH_DEFCONFIG
 config ARCH_PROC_KCORE_TEXT
def_bool y
 
+config ARCH_ATU
+   bool
+   default y if SPARC64
+
 config IOMMU_HELPER
bool
default y if SPARC64
@@ -304,6 +308,20 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64
 
+config FORCE_MAX_ZONEORDER
+   int "Maximum zone order"
+   default "13"
+   help
+ The kernel memory allocator divides physically contiguous memory
+ blocks into "zones", where each zone is a power of two number of
+ pages.  This option selects the largest power of two that the kernel
+ keeps in the memory allocator.  If you need to allocate very large
+ blocks of physically contiguous memory, then you may need to
+ increase this value.
+
+ This config option is actually maximum order plus one. For example,
+ a value of 13 means that the largest free memory block is 2^12 pages.
+
 source "mm/Kconfig"
 
 if SPARC64
-- 
1.9.1



[PATCH v2 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs

2016-10-27 Thread Tushar Dave
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
64bit DMA mask.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/hypervisor.h |   6 +
 arch/sparc/kernel/pci_sun4v.c   | 214 ++--
 arch/sparc/kernel/pci_sun4v.h   |  11 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  36 ++
 4 files changed, 209 insertions(+), 58 deletions(-)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 7b15df8..73cb897 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  * iotsb_index Zero-based IOTTE number within an IOTSB.
  */
 
+/* The index_count argument consists of two fields:
+ * bits 63:48 #iottes and bits 47:0 iotsb_index
+ */
+#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \
+   (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index)))
+
 /* pci_iotsb_conf()
  * TRAP:   HV_FAST_TRAP
  * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index d4208aa..d6ab2b9 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, 
unsigned long prot, uns
 }
 
 /* Interrupts must be disabled.  */
-static long iommu_batch_flush(struct iommu_batch *p)
+static long iommu_batch_flush(struct iommu_batch *p, u64 mask)
 {
struct pci_pbm_info *pbm = p->dev->archdata.host_controller;
+   u64 *pglist = p->pglist;
+   u64 index_count;
unsigned long devhandle = pbm->devhandle;
unsigned long prot = p->prot;
unsigned long entry = p->entry;
-   u64 *pglist = p->pglist;
unsigned long npages = p->npages;
+   unsigned long iotsb_num;
+   unsigned long ret;
+   long num;
 
/* VPCI maj=1, min=[0,1] only supports read and write */
if (vpci_major < 2)
prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE);
 
while (npages != 0) {
-   long num;
-
-   num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry),
- npages, prot, __pa(pglist));
-   if (unlikely(num < 0)) {
-   if (printk_ratelimit())
-   printk("iommu_batch_flush: IOMMU map of "
-  "[%08lx:%08llx:%lx:%lx:%lx] failed with "
-  "status %ld\n",
-  devhandle, HV_PCI_TSBID(0, entry),
-  npages, prot, __pa(pglist), num);
-   return -1;
+   if (mask <= DMA_BIT_MASK(32)) {
+   num = pci_sun4v_iommu_map(devhandle,
+ HV_PCI_TSBID(0, entry),
+ npages,
+ prot,
+ __pa(pglist));
+   if (unlikely(num < 0)) {
+   pr_err_ratelimited("iommu_batch_flush: IOMMU 
map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n",
+  devhandle,
+  HV_PCI_TSBID(0, entry),
+  npages, prot, __pa(pglist),
+  num);
+   return -1;
+   }
+   } else {
+   index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry),
+   iotsb_num = pbm->iommu->atu->iotsb->iotsb_num;
+   ret = pci_sun4v_iotsb_map(devhandle,
+ iotsb_num,
+ index_count,
+ prot,
+ __pa(pglist),
+ );
+   if (unlikely(ret != HV_EOK)) {
+   pr_err_ratelimited("iommu_batch_flush: ATU map 
of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n",
+  devhandle, iotsb_num,
+  index_count, prot,
+  __pa(pglist), ret);
+   return -1;
+   }
}
-
entry += num;
npages -= n

[PATCH v2 3/6] sparc64: Initialize iommu_map_table and iommu_pool

2016-10-27 Thread Tushar Dave
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
This change initializes iommu_map_table and iommu_pool for ATU.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/iommu_64.h |  2 ++
 arch/sparc/kernel/pci_sun4v.c | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/arch/sparc/include/asm/iommu_64.h 
b/arch/sparc/include/asm/iommu_64.h
index 93daa59..f24f356 100644
--- a/arch/sparc/include/asm/iommu_64.h
+++ b/arch/sparc/include/asm/iommu_64.h
@@ -45,8 +45,10 @@ struct atu_ranges {
 struct atu {
struct  atu_ranges  *ranges;
struct  atu_iotsb   *iotsb;
+   struct  iommu_map_table tbl;
u64 base;
u64 size;
+   u64 dma_addr_mask;
 };
 
 struct iommu {
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 2afb86c..242477c 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -644,6 +644,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
struct atu *atu = pbm->iommu->atu;
unsigned long err;
const u64 *ranges;
+   u64 map_size, num_iotte;
+   u64 dma_mask;
const u32 *page_size;
int len;
 
@@ -682,6 +684,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
return err;
}
 
+   /* Create ATU iommu map.
+* One bit represents one iotte in IOTSB table.
+*/
+   dma_mask = (roundup_pow_of_two(atu->size) - 1UL);
+   num_iotte = atu->size / IO_PAGE_SIZE;
+   map_size = num_iotte / 8;
+   atu->tbl.table_map_base = atu->base;
+   atu->dma_addr_mask = dma_mask;
+   atu->tbl.map = kzalloc(map_size, GFP_KERNEL);
+   if (!atu->tbl.map)
+   return -ENOMEM;
+
+   iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT,
+   NULL, false /* no large_pool */,
+   0 /* default npools */,
+   false /* want span boundary checking */);
+
return 0;
 }
 
-- 
1.9.1



[PATCH v2 3/6] sparc64: Initialize iommu_map_table and iommu_pool

2016-10-27 Thread Tushar Dave
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
This change initializes iommu_map_table and iommu_pool for ATU.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Reviewed-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/iommu_64.h |  2 ++
 arch/sparc/kernel/pci_sun4v.c | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/arch/sparc/include/asm/iommu_64.h 
b/arch/sparc/include/asm/iommu_64.h
index 93daa59..f24f356 100644
--- a/arch/sparc/include/asm/iommu_64.h
+++ b/arch/sparc/include/asm/iommu_64.h
@@ -45,8 +45,10 @@ struct atu_ranges {
 struct atu {
struct  atu_ranges  *ranges;
struct  atu_iotsb   *iotsb;
+   struct  iommu_map_table tbl;
u64 base;
u64 size;
+   u64 dma_addr_mask;
 };
 
 struct iommu {
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 2afb86c..242477c 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -644,6 +644,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
struct atu *atu = pbm->iommu->atu;
unsigned long err;
const u64 *ranges;
+   u64 map_size, num_iotte;
+   u64 dma_mask;
const u32 *page_size;
int len;
 
@@ -682,6 +684,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
return err;
}
 
+   /* Create ATU iommu map.
+* One bit represents one iotte in IOTSB table.
+*/
+   dma_mask = (roundup_pow_of_two(atu->size) - 1UL);
+   num_iotte = atu->size / IO_PAGE_SIZE;
+   map_size = num_iotte / 8;
+   atu->tbl.table_map_base = atu->base;
+   atu->dma_addr_mask = dma_mask;
+   atu->tbl.map = kzalloc(map_size, GFP_KERNEL);
+   if (!atu->tbl.map)
+   return -ENOMEM;
+
+   iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT,
+   NULL, false /* no large_pool */,
+   0 /* default npools */,
+   false /* want span boundary checking */);
+
return 0;
 }
 
-- 
1.9.1



[PATCH v2 6/6] sparc64: Enable 64-bit DMA

2016-10-27 Thread Tushar Dave
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
to use ATU for 64bit DMA.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/Kconfig| 4 
 arch/sparc/kernel/iommu.c | 8 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 5202eb4..60145c9 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -93,6 +93,10 @@ config ARCH_ATU
bool
default y if SPARC64
 
+config ARCH_DMA_ADDR_T_64BIT
+   bool
+   default y if ARCH_ATU
+
 config IOMMU_HELPER
bool
default y if SPARC64
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 5c615ab..852a329 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask)
struct iommu *iommu = dev->archdata.iommu;
u64 dma_addr_mask = iommu->dma_addr_mask;
 
-   if (device_mask >= (1UL << 32UL))
-   return 0;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }
 
if ((device_mask & dma_addr_mask) == dma_addr_mask)
return 1;
-- 
1.9.1



[PATCH v2 6/6] sparc64: Enable 64-bit DMA

2016-10-27 Thread Tushar Dave
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
to use ATU for 64bit DMA.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/Kconfig| 4 
 arch/sparc/kernel/iommu.c | 8 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 5202eb4..60145c9 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -93,6 +93,10 @@ config ARCH_ATU
bool
default y if SPARC64
 
+config ARCH_DMA_ADDR_T_64BIT
+   bool
+   default y if ARCH_ATU
+
 config IOMMU_HELPER
bool
default y if SPARC64
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 5c615ab..852a329 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask)
struct iommu *iommu = dev->archdata.iommu;
u64 dma_addr_mask = iommu->dma_addr_mask;
 
-   if (device_mask >= (1UL << 32UL))
-   return 0;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }
 
if ((device_mask & dma_addr_mask) == dma_addr_mask)
return 1;
-- 
1.9.1



[PATCH v2 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU

2016-10-27 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
sun4v hypervisor PCI IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 500MB DVMA space per instance.
When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

For example, we recently experienced legacy IOMMU limitation while
using i40e driver in system with large number of CPUs (e.g. 128).
Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has
512 (default) descriptors. So considering only RX queues (because RX
premap DMA buffers), i40e takes 4*128*512 number of DMA entries in
IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available
in table. So bringing up four instance of i40e alone saturate existing
IOMMU resource.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7)
and sun4u SPARC.

Thanks.
-Tushar

v1->v2:
- Patch #2 addresses comments by Dave M.
 -- use page allocator to allocate IOTSB.
 -- use true/false with boolean variables.


Dave Kleikamp (1):
  sparc64: Add FORCE_MAX_ZONEORDER and default to 13

Tushar Dave (5):
  sparc64: Add ATU (new IOMMU) support
  sparc64: Initialize iommu_map_table and iommu_pool
  sparc64: Bind PCIe devices to use IOMMU v2 service
  sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
  sparc64: Enable 64-bit DMA

 arch/sparc/Kconfig  |  22 ++
 arch/sparc/include/asm/hypervisor.h | 343 +
 arch/sparc/include/asm/iommu_64.h   |  28 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/iommu.c   |   8 +-
 arch/sparc/kernel/pci_sun4v.c   | 416 +++-
 arch/sparc/kernel/pci_sun4v.h   |  21 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  68 ++
 8 files changed, 847 insertions(+), 60 deletions(-)

-- 
1.9.1



[PATCH v2 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU

2016-10-27 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
sun4v hypervisor PCI IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 500MB DVMA space per instance.
When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

For example, we recently experienced legacy IOMMU limitation while
using i40e driver in system with large number of CPUs (e.g. 128).
Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has
512 (default) descriptors. So considering only RX queues (because RX
premap DMA buffers), i40e takes 4*128*512 number of DMA entries in
IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available
in table. So bringing up four instance of i40e alone saturate existing
IOMMU resource.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7)
and sun4u SPARC.

Thanks.
-Tushar

v1->v2:
- Patch #2 addresses comments by Dave M.
 -- use page allocator to allocate IOTSB.
 -- use true/false with boolean variables.


Dave Kleikamp (1):
  sparc64: Add FORCE_MAX_ZONEORDER and default to 13

Tushar Dave (5):
  sparc64: Add ATU (new IOMMU) support
  sparc64: Initialize iommu_map_table and iommu_pool
  sparc64: Bind PCIe devices to use IOMMU v2 service
  sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
  sparc64: Enable 64-bit DMA

 arch/sparc/Kconfig  |  22 ++
 arch/sparc/include/asm/hypervisor.h | 343 +
 arch/sparc/include/asm/iommu_64.h   |  28 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/iommu.c   |   8 +-
 arch/sparc/kernel/pci_sun4v.c   | 416 +++-
 arch/sparc/kernel/pci_sun4v.h   |  21 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  68 ++
 8 files changed, 847 insertions(+), 60 deletions(-)

-- 
1.9.1



[PATCH 1/2] sunqe: Fix compiler warnings

2016-10-17 Thread Tushar Dave
sunqe uses '__u32' for dma handle while invoking kernel DMA APIs,
instead of using dma_addr_t. This hasn't caused any 'incompatible
pointer type' warning on SPARC because until now dma_addr_t is of
type u32. However, recent changes in SPARC ATU (iommu) enables 64bit
DMA and therefore dma_addr_t becomes of type u64. This makes
'incompatible pointer type' warnings inevitable.

e.g.
drivers/net/ethernet/sun/sunqe.c: In function ‘qec_ether_init’:
drivers/net/ethernet/sun/sunqe.c:883: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’
drivers/net/ethernet/sun/sunqe.c:885: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’

This patch resolves above compiler warnings.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
---
 drivers/net/ethernet/sun/sunqe.c | 11 ++-
 drivers/net/ethernet/sun/sunqe.h |  4 ++--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunqe.c b/drivers/net/ethernet/sun/sunqe.c
index 9b825780..9582948 100644
--- a/drivers/net/ethernet/sun/sunqe.c
+++ b/drivers/net/ethernet/sun/sunqe.c
@@ -124,7 +124,7 @@ static void qe_init_rings(struct sunqe *qep)
 {
struct qe_init_block *qb = qep->qe_block;
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 qbufs_dvma = qep->buffers_dvma;
+   __u32 qbufs_dvma = (__u32)qep->buffers_dvma;
int i;
 
qep->rx_new = qep->rx_old = qep->tx_new = qep->tx_old = 0;
@@ -144,6 +144,7 @@ static int qe_init(struct sunqe *qep, int from_irq)
void __iomem *mregs = qep->mregs;
void __iomem *gregs = qecp->gregs;
unsigned char *e = >dev->dev_addr[0];
+   __u32 qblk_dvma = (__u32)qep->qblock_dvma;
u32 tmp;
int i;
 
@@ -152,8 +153,8 @@ static int qe_init(struct sunqe *qep, int from_irq)
return -EAGAIN;
 
/* Setup initial rx/tx init block pointers. */
-   sbus_writel(qep->qblock_dvma + qib_offset(qe_rxd, 0), cregs + 
CREG_RXDS);
-   sbus_writel(qep->qblock_dvma + qib_offset(qe_txd, 0), cregs + 
CREG_TXDS);
+   sbus_writel(qblk_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS);
+   sbus_writel(qblk_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS);
 
/* Enable/mask the various irq's. */
sbus_writel(0, cregs + CREG_RIMASK);
@@ -413,7 +414,7 @@ static void qe_rx(struct sunqe *qep)
struct net_device *dev = qep->dev;
struct qe_rxd *this;
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 qbufs_dvma = qep->buffers_dvma;
+   __u32 qbufs_dvma = (__u32)qep->buffers_dvma;
int elem = qep->rx_new;
u32 flags;
 
@@ -572,7 +573,7 @@ static int qe_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 {
struct sunqe *qep = netdev_priv(dev);
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 txbuf_dvma, qbufs_dvma = qep->buffers_dvma;
+   __u32 txbuf_dvma, qbufs_dvma = (__u32)qep->buffers_dvma;
unsigned char *txbuf;
int len, entry;
 
diff --git a/drivers/net/ethernet/sun/sunqe.h b/drivers/net/ethernet/sun/sunqe.h
index 581781b..ae190b7 100644
--- a/drivers/net/ethernet/sun/sunqe.h
+++ b/drivers/net/ethernet/sun/sunqe.h
@@ -334,12 +334,12 @@ struct sunqe {
void __iomem*qcregs;/* QEC 
per-channel Registers   */
void __iomem*mregs; /* Per-channel MACE 
Registers  */
struct qe_init_block*qe_block;  /* RX and TX 
descriptors   */
-   __u32   qblock_dvma;/* RX and TX 
descriptors   */
+   dma_addr_t  qblock_dvma;/* RX and TX 
descriptors   */
spinlock_t  lock;   /* Protects txfull 
state   */
int rx_new, rx_old; /* RX ring extents  
   */
int tx_new, tx_old; /* TX ring extents  
   */
struct sunqe_buffers*buffers;   /* CPU visible address. 
   */
-   __u32   buffers_dvma;   /* DVMA visible 
address.   */
+   dma_addr_t  buffers_dvma;   /* DVMA visible 
address.   */
struct sunqec   *parent;
u8  mconfig;/* Base MACE mconfig 
value */
struct platform_device  *op;/* QE's OF device 
struct   */
-- 
1.9.1



[PATCH 1/2] sunqe: Fix compiler warnings

2016-10-17 Thread Tushar Dave
sunqe uses '__u32' for dma handle while invoking kernel DMA APIs,
instead of using dma_addr_t. This hasn't caused any 'incompatible
pointer type' warning on SPARC because until now dma_addr_t is of
type u32. However, recent changes in SPARC ATU (iommu) enables 64bit
DMA and therefore dma_addr_t becomes of type u64. This makes
'incompatible pointer type' warnings inevitable.

e.g.
drivers/net/ethernet/sun/sunqe.c: In function ‘qec_ether_init’:
drivers/net/ethernet/sun/sunqe.c:883: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’
drivers/net/ethernet/sun/sunqe.c:885: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’

This patch resolves above compiler warnings.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
---
 drivers/net/ethernet/sun/sunqe.c | 11 ++-
 drivers/net/ethernet/sun/sunqe.h |  4 ++--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunqe.c b/drivers/net/ethernet/sun/sunqe.c
index 9b825780..9582948 100644
--- a/drivers/net/ethernet/sun/sunqe.c
+++ b/drivers/net/ethernet/sun/sunqe.c
@@ -124,7 +124,7 @@ static void qe_init_rings(struct sunqe *qep)
 {
struct qe_init_block *qb = qep->qe_block;
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 qbufs_dvma = qep->buffers_dvma;
+   __u32 qbufs_dvma = (__u32)qep->buffers_dvma;
int i;
 
qep->rx_new = qep->rx_old = qep->tx_new = qep->tx_old = 0;
@@ -144,6 +144,7 @@ static int qe_init(struct sunqe *qep, int from_irq)
void __iomem *mregs = qep->mregs;
void __iomem *gregs = qecp->gregs;
unsigned char *e = >dev->dev_addr[0];
+   __u32 qblk_dvma = (__u32)qep->qblock_dvma;
u32 tmp;
int i;
 
@@ -152,8 +153,8 @@ static int qe_init(struct sunqe *qep, int from_irq)
return -EAGAIN;
 
/* Setup initial rx/tx init block pointers. */
-   sbus_writel(qep->qblock_dvma + qib_offset(qe_rxd, 0), cregs + 
CREG_RXDS);
-   sbus_writel(qep->qblock_dvma + qib_offset(qe_txd, 0), cregs + 
CREG_TXDS);
+   sbus_writel(qblk_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS);
+   sbus_writel(qblk_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS);
 
/* Enable/mask the various irq's. */
sbus_writel(0, cregs + CREG_RIMASK);
@@ -413,7 +414,7 @@ static void qe_rx(struct sunqe *qep)
struct net_device *dev = qep->dev;
struct qe_rxd *this;
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 qbufs_dvma = qep->buffers_dvma;
+   __u32 qbufs_dvma = (__u32)qep->buffers_dvma;
int elem = qep->rx_new;
u32 flags;
 
@@ -572,7 +573,7 @@ static int qe_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 {
struct sunqe *qep = netdev_priv(dev);
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 txbuf_dvma, qbufs_dvma = qep->buffers_dvma;
+   __u32 txbuf_dvma, qbufs_dvma = (__u32)qep->buffers_dvma;
unsigned char *txbuf;
int len, entry;
 
diff --git a/drivers/net/ethernet/sun/sunqe.h b/drivers/net/ethernet/sun/sunqe.h
index 581781b..ae190b7 100644
--- a/drivers/net/ethernet/sun/sunqe.h
+++ b/drivers/net/ethernet/sun/sunqe.h
@@ -334,12 +334,12 @@ struct sunqe {
void __iomem*qcregs;/* QEC 
per-channel Registers   */
void __iomem*mregs; /* Per-channel MACE 
Registers  */
struct qe_init_block*qe_block;  /* RX and TX 
descriptors   */
-   __u32   qblock_dvma;/* RX and TX 
descriptors   */
+   dma_addr_t  qblock_dvma;/* RX and TX 
descriptors   */
spinlock_t  lock;   /* Protects txfull 
state   */
int rx_new, rx_old; /* RX ring extents  
   */
int tx_new, tx_old; /* TX ring extents  
   */
struct sunqe_buffers*buffers;   /* CPU visible address. 
   */
-   __u32   buffers_dvma;   /* DVMA visible 
address.   */
+   dma_addr_t  buffers_dvma;   /* DVMA visible 
address.   */
struct sunqec   *parent;
u8  mconfig;/* Base MACE mconfig 
value */
struct platform_device  *op;/* QE's OF device 
struct   */
-- 
1.9.1



[PATCH 2/2] sunbmac: Fix compiler warning

2016-10-17 Thread Tushar Dave
sunbmac uses '__u32' for dma handle while invoking kernel DMA APIs,
instead of using dma_addr_t. This hasn't caused any 'incompatible
pointer type' warning on SPARC because until now dma_addr_t is of
type u32. However, recent changes in SPARC ATU (iommu) enables 64bit
DMA and therefore dma_addr_t becomes of type u64. This makes
'incompatible pointer type' warnings inevitable.

e.g.
drivers/net/ethernet/sun/sunbmac.c: In function ‘bigmac_ether_init’:
drivers/net/ethernet/sun/sunbmac.c:1166: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’

This patch resolves above compiler warning.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
---
 drivers/net/ethernet/sun/sunbmac.c | 5 +++--
 drivers/net/ethernet/sun/sunbmac.h | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunbmac.c 
b/drivers/net/ethernet/sun/sunbmac.c
index aa4f9d2..02f4527 100644
--- a/drivers/net/ethernet/sun/sunbmac.c
+++ b/drivers/net/ethernet/sun/sunbmac.c
@@ -623,6 +623,7 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq)
void __iomem *gregs= bp->gregs;
void __iomem *cregs= bp->creg;
void __iomem *bregs= bp->bregs;
+   __u32 bblk_dvma = (__u32)bp->bblock_dvma;
unsigned char *e = >dev->dev_addr[0];
 
/* Latch current counters into statistics. */
@@ -671,9 +672,9 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq)
bregs + BMAC_XIFCFG);
 
/* Tell the QEC where the ring descriptors are. */
-   sbus_writel(bp->bblock_dvma + bib_offset(be_rxd, 0),
+   sbus_writel(bblk_dvma + bib_offset(be_rxd, 0),
cregs + CREG_RXDS);
-   sbus_writel(bp->bblock_dvma + bib_offset(be_txd, 0),
+   sbus_writel(bblk_dvma + bib_offset(be_txd, 0),
cregs + CREG_TXDS);
 
/* Setup the FIFO pointers into QEC local memory. */
diff --git a/drivers/net/ethernet/sun/sunbmac.h 
b/drivers/net/ethernet/sun/sunbmac.h
index 06dd217..532fc56 100644
--- a/drivers/net/ethernet/sun/sunbmac.h
+++ b/drivers/net/ethernet/sun/sunbmac.h
@@ -291,7 +291,7 @@ struct bigmac {
void __iomem*bregs; /* BigMAC Registers   */
void __iomem*tregs; /* BigMAC Transceiver */
struct bmac_init_block  *bmac_block;/* RX and TX descriptors */
-   __u32bblock_dvma;   /* RX and TX descriptors */
+   dma_addr_t  bblock_dvma;/* RX and TX descriptors */
 
spinlock_t  lock;
 
-- 
1.9.1



[PATCH 2/2] sunbmac: Fix compiler warning

2016-10-17 Thread Tushar Dave
sunbmac uses '__u32' for dma handle while invoking kernel DMA APIs,
instead of using dma_addr_t. This hasn't caused any 'incompatible
pointer type' warning on SPARC because until now dma_addr_t is of
type u32. However, recent changes in SPARC ATU (iommu) enables 64bit
DMA and therefore dma_addr_t becomes of type u64. This makes
'incompatible pointer type' warnings inevitable.

e.g.
drivers/net/ethernet/sun/sunbmac.c: In function ‘bigmac_ether_init’:
drivers/net/ethernet/sun/sunbmac.c:1166: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’

This patch resolves above compiler warning.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
---
 drivers/net/ethernet/sun/sunbmac.c | 5 +++--
 drivers/net/ethernet/sun/sunbmac.h | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunbmac.c 
b/drivers/net/ethernet/sun/sunbmac.c
index aa4f9d2..02f4527 100644
--- a/drivers/net/ethernet/sun/sunbmac.c
+++ b/drivers/net/ethernet/sun/sunbmac.c
@@ -623,6 +623,7 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq)
void __iomem *gregs= bp->gregs;
void __iomem *cregs= bp->creg;
void __iomem *bregs= bp->bregs;
+   __u32 bblk_dvma = (__u32)bp->bblock_dvma;
unsigned char *e = >dev->dev_addr[0];
 
/* Latch current counters into statistics. */
@@ -671,9 +672,9 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq)
bregs + BMAC_XIFCFG);
 
/* Tell the QEC where the ring descriptors are. */
-   sbus_writel(bp->bblock_dvma + bib_offset(be_rxd, 0),
+   sbus_writel(bblk_dvma + bib_offset(be_rxd, 0),
cregs + CREG_RXDS);
-   sbus_writel(bp->bblock_dvma + bib_offset(be_txd, 0),
+   sbus_writel(bblk_dvma + bib_offset(be_txd, 0),
cregs + CREG_TXDS);
 
/* Setup the FIFO pointers into QEC local memory. */
diff --git a/drivers/net/ethernet/sun/sunbmac.h 
b/drivers/net/ethernet/sun/sunbmac.h
index 06dd217..532fc56 100644
--- a/drivers/net/ethernet/sun/sunbmac.h
+++ b/drivers/net/ethernet/sun/sunbmac.h
@@ -291,7 +291,7 @@ struct bigmac {
void __iomem*bregs; /* BigMAC Registers   */
void __iomem*tregs; /* BigMAC Transceiver */
struct bmac_init_block  *bmac_block;/* RX and TX descriptors */
-   __u32bblock_dvma;   /* RX and TX descriptors */
+   dma_addr_t  bblock_dvma;/* RX and TX descriptors */
 
spinlock_t  lock;
 
-- 
1.9.1



[PATCH 0/2] sparc/net: Fix compiler warnings

2016-10-17 Thread Tushar Dave
Recently, ATU (iommu) changes are submitted to sparclinux that
enables 64bit DMA on SPARC. However, this change also makes
'incompatible pointer type' compiler warnings inevitable on sunqe
and sunbmac driver.

The two patches in series fix compiler warnings.

Tushar Dave (2):
  sunqe: Fix compiler warnings
  sunbmac: Fix compiler warning

 drivers/net/ethernet/sun/sunbmac.c |  5 +++--
 drivers/net/ethernet/sun/sunbmac.h |  2 +-
 drivers/net/ethernet/sun/sunqe.c   | 11 ++-
 drivers/net/ethernet/sun/sunqe.h   |  4 ++--
 4 files changed, 12 insertions(+), 10 deletions(-)

-- 
1.9.1



[PATCH 0/2] sparc/net: Fix compiler warnings

2016-10-17 Thread Tushar Dave
Recently, ATU (iommu) changes are submitted to sparclinux that
enables 64bit DMA on SPARC. However, this change also makes
'incompatible pointer type' compiler warnings inevitable on sunqe
and sunbmac driver.

The two patches in series fix compiler warnings.

Tushar Dave (2):
  sunqe: Fix compiler warnings
  sunbmac: Fix compiler warning

 drivers/net/ethernet/sun/sunbmac.c |  5 +++--
 drivers/net/ethernet/sun/sunbmac.h |  2 +-
 drivers/net/ethernet/sun/sunqe.c   | 11 ++-
 drivers/net/ethernet/sun/sunqe.h   |  4 ++--
 4 files changed, 12 insertions(+), 10 deletions(-)

-- 
1.9.1



[PATCH 2/2] sunbmac: Fix compiler warning

2016-10-14 Thread Tushar Dave
sunbmac uses '__u32' for dma handle while invoking kernel DMA APIs,
instead of using dma_addr_t. This hasn't caused any 'incompatible
pointer type' warning on SPARC because until now dma_addr_t is of
type u32. However, recent changes in SPARC ATU (iommu) enables 64bit
DMA and therefore dma_addr_t becomes of type u64. This makes
'incompatible pointer type' warnings inevitable.

e.g.
drivers/net/ethernet/sun/sunbmac.c: In function ‘bigmac_ether_init’:
drivers/net/ethernet/sun/sunbmac.c:1166: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’

This patch resolves above compiler warning.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
---
 drivers/net/ethernet/sun/sunbmac.c | 5 +++--
 drivers/net/ethernet/sun/sunbmac.h | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunbmac.c 
b/drivers/net/ethernet/sun/sunbmac.c
index aa4f9d2..02f4527 100644
--- a/drivers/net/ethernet/sun/sunbmac.c
+++ b/drivers/net/ethernet/sun/sunbmac.c
@@ -623,6 +623,7 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq)
void __iomem *gregs= bp->gregs;
void __iomem *cregs= bp->creg;
void __iomem *bregs= bp->bregs;
+   __u32 bblk_dvma = (__u32)bp->bblock_dvma;
unsigned char *e = >dev->dev_addr[0];
 
/* Latch current counters into statistics. */
@@ -671,9 +672,9 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq)
bregs + BMAC_XIFCFG);
 
/* Tell the QEC where the ring descriptors are. */
-   sbus_writel(bp->bblock_dvma + bib_offset(be_rxd, 0),
+   sbus_writel(bblk_dvma + bib_offset(be_rxd, 0),
cregs + CREG_RXDS);
-   sbus_writel(bp->bblock_dvma + bib_offset(be_txd, 0),
+   sbus_writel(bblk_dvma + bib_offset(be_txd, 0),
cregs + CREG_TXDS);
 
/* Setup the FIFO pointers into QEC local memory. */
diff --git a/drivers/net/ethernet/sun/sunbmac.h 
b/drivers/net/ethernet/sun/sunbmac.h
index 06dd217..532fc56 100644
--- a/drivers/net/ethernet/sun/sunbmac.h
+++ b/drivers/net/ethernet/sun/sunbmac.h
@@ -291,7 +291,7 @@ struct bigmac {
void __iomem*bregs; /* BigMAC Registers   */
void __iomem*tregs; /* BigMAC Transceiver */
struct bmac_init_block  *bmac_block;/* RX and TX descriptors */
-   __u32bblock_dvma;   /* RX and TX descriptors */
+   dma_addr_t  bblock_dvma;/* RX and TX descriptors */
 
spinlock_t  lock;
 
-- 
1.9.1



[PATCH 2/2] sunbmac: Fix compiler warning

2016-10-14 Thread Tushar Dave
sunbmac uses '__u32' for dma handle while invoking kernel DMA APIs,
instead of using dma_addr_t. This hasn't caused any 'incompatible
pointer type' warning on SPARC because until now dma_addr_t is of
type u32. However, recent changes in SPARC ATU (iommu) enables 64bit
DMA and therefore dma_addr_t becomes of type u64. This makes
'incompatible pointer type' warnings inevitable.

e.g.
drivers/net/ethernet/sun/sunbmac.c: In function ‘bigmac_ether_init’:
drivers/net/ethernet/sun/sunbmac.c:1166: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’

This patch resolves above compiler warning.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
---
 drivers/net/ethernet/sun/sunbmac.c | 5 +++--
 drivers/net/ethernet/sun/sunbmac.h | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunbmac.c 
b/drivers/net/ethernet/sun/sunbmac.c
index aa4f9d2..02f4527 100644
--- a/drivers/net/ethernet/sun/sunbmac.c
+++ b/drivers/net/ethernet/sun/sunbmac.c
@@ -623,6 +623,7 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq)
void __iomem *gregs= bp->gregs;
void __iomem *cregs= bp->creg;
void __iomem *bregs= bp->bregs;
+   __u32 bblk_dvma = (__u32)bp->bblock_dvma;
unsigned char *e = >dev->dev_addr[0];
 
/* Latch current counters into statistics. */
@@ -671,9 +672,9 @@ static int bigmac_init_hw(struct bigmac *bp, int from_irq)
bregs + BMAC_XIFCFG);
 
/* Tell the QEC where the ring descriptors are. */
-   sbus_writel(bp->bblock_dvma + bib_offset(be_rxd, 0),
+   sbus_writel(bblk_dvma + bib_offset(be_rxd, 0),
cregs + CREG_RXDS);
-   sbus_writel(bp->bblock_dvma + bib_offset(be_txd, 0),
+   sbus_writel(bblk_dvma + bib_offset(be_txd, 0),
cregs + CREG_TXDS);
 
/* Setup the FIFO pointers into QEC local memory. */
diff --git a/drivers/net/ethernet/sun/sunbmac.h 
b/drivers/net/ethernet/sun/sunbmac.h
index 06dd217..532fc56 100644
--- a/drivers/net/ethernet/sun/sunbmac.h
+++ b/drivers/net/ethernet/sun/sunbmac.h
@@ -291,7 +291,7 @@ struct bigmac {
void __iomem*bregs; /* BigMAC Registers   */
void __iomem*tregs; /* BigMAC Transceiver */
struct bmac_init_block  *bmac_block;/* RX and TX descriptors */
-   __u32bblock_dvma;   /* RX and TX descriptors */
+   dma_addr_t  bblock_dvma;/* RX and TX descriptors */
 
spinlock_t  lock;
 
-- 
1.9.1



[PATCH 1/2] sunqe: Fix compiler warnings

2016-10-14 Thread Tushar Dave
sunqe uses '__u32' for dma handle while invoking kernel DMA APIs,
instead of using dma_addr_t. This hasn't caused any 'incompatible
pointer type' warning on SPARC because until now dma_addr_t is of
type u32. However, recent changes in SPARC ATU (iommu) enables 64bit
DMA and therefore dma_addr_t becomes of type u64. This makes
'incompatible pointer type' warnings inevitable.

e.g.
drivers/net/ethernet/sun/sunqe.c: In function ‘qec_ether_init’:
drivers/net/ethernet/sun/sunqe.c:883: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’
drivers/net/ethernet/sun/sunqe.c:885: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’

This patch resolves above compiler warnings.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
---
 drivers/net/ethernet/sun/sunqe.c | 11 ++-
 drivers/net/ethernet/sun/sunqe.h |  4 ++--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunqe.c b/drivers/net/ethernet/sun/sunqe.c
index 9b825780..9582948 100644
--- a/drivers/net/ethernet/sun/sunqe.c
+++ b/drivers/net/ethernet/sun/sunqe.c
@@ -124,7 +124,7 @@ static void qe_init_rings(struct sunqe *qep)
 {
struct qe_init_block *qb = qep->qe_block;
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 qbufs_dvma = qep->buffers_dvma;
+   __u32 qbufs_dvma = (__u32)qep->buffers_dvma;
int i;
 
qep->rx_new = qep->rx_old = qep->tx_new = qep->tx_old = 0;
@@ -144,6 +144,7 @@ static int qe_init(struct sunqe *qep, int from_irq)
void __iomem *mregs = qep->mregs;
void __iomem *gregs = qecp->gregs;
unsigned char *e = >dev->dev_addr[0];
+   __u32 qblk_dvma = (__u32)qep->qblock_dvma;
u32 tmp;
int i;
 
@@ -152,8 +153,8 @@ static int qe_init(struct sunqe *qep, int from_irq)
return -EAGAIN;
 
/* Setup initial rx/tx init block pointers. */
-   sbus_writel(qep->qblock_dvma + qib_offset(qe_rxd, 0), cregs + 
CREG_RXDS);
-   sbus_writel(qep->qblock_dvma + qib_offset(qe_txd, 0), cregs + 
CREG_TXDS);
+   sbus_writel(qblk_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS);
+   sbus_writel(qblk_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS);
 
/* Enable/mask the various irq's. */
sbus_writel(0, cregs + CREG_RIMASK);
@@ -413,7 +414,7 @@ static void qe_rx(struct sunqe *qep)
struct net_device *dev = qep->dev;
struct qe_rxd *this;
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 qbufs_dvma = qep->buffers_dvma;
+   __u32 qbufs_dvma = (__u32)qep->buffers_dvma;
int elem = qep->rx_new;
u32 flags;
 
@@ -572,7 +573,7 @@ static int qe_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 {
struct sunqe *qep = netdev_priv(dev);
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 txbuf_dvma, qbufs_dvma = qep->buffers_dvma;
+   __u32 txbuf_dvma, qbufs_dvma = (__u32)qep->buffers_dvma;
unsigned char *txbuf;
int len, entry;
 
diff --git a/drivers/net/ethernet/sun/sunqe.h b/drivers/net/ethernet/sun/sunqe.h
index 581781b..ae190b7 100644
--- a/drivers/net/ethernet/sun/sunqe.h
+++ b/drivers/net/ethernet/sun/sunqe.h
@@ -334,12 +334,12 @@ struct sunqe {
void __iomem*qcregs;/* QEC 
per-channel Registers   */
void __iomem*mregs; /* Per-channel MACE 
Registers  */
struct qe_init_block*qe_block;  /* RX and TX 
descriptors   */
-   __u32   qblock_dvma;/* RX and TX 
descriptors   */
+   dma_addr_t  qblock_dvma;/* RX and TX 
descriptors   */
spinlock_t  lock;   /* Protects txfull 
state   */
int rx_new, rx_old; /* RX ring extents  
   */
int tx_new, tx_old; /* TX ring extents  
   */
struct sunqe_buffers*buffers;   /* CPU visible address. 
   */
-   __u32   buffers_dvma;   /* DVMA visible 
address.   */
+   dma_addr_t  buffers_dvma;   /* DVMA visible 
address.   */
struct sunqec   *parent;
u8  mconfig;/* Base MACE mconfig 
value */
struct platform_device  *op;/* QE's OF device 
struct   */
-- 
1.9.1



[PATCH 0/2] net: Fix compiler warnings

2016-10-14 Thread Tushar Dave
Recently, ATU (iommu) changes are submitted to linux-sparc that
enables 64bit DMA on SPARC. However, this change also makes
'incompatible pointer type' compiler warnings inevitable on sunqe
and sunbmac driver.

The two patches in series fix compiler warnings.

Tushar Dave (2):
  sunqe: Fix compiler warnings
  sunbmac: Fix compiler warning

 drivers/net/ethernet/sun/sunbmac.c |  5 +++--
 drivers/net/ethernet/sun/sunbmac.h |  2 +-
 drivers/net/ethernet/sun/sunqe.c   | 11 ++-
 drivers/net/ethernet/sun/sunqe.h   |  4 ++--
 4 files changed, 12 insertions(+), 10 deletions(-)

-- 
1.9.1



[PATCH 1/2] sunqe: Fix compiler warnings

2016-10-14 Thread Tushar Dave
sunqe uses '__u32' for dma handle while invoking kernel DMA APIs,
instead of using dma_addr_t. This hasn't caused any 'incompatible
pointer type' warning on SPARC because until now dma_addr_t is of
type u32. However, recent changes in SPARC ATU (iommu) enables 64bit
DMA and therefore dma_addr_t becomes of type u64. This makes
'incompatible pointer type' warnings inevitable.

e.g.
drivers/net/ethernet/sun/sunqe.c: In function ‘qec_ether_init’:
drivers/net/ethernet/sun/sunqe.c:883: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’
drivers/net/ethernet/sun/sunqe.c:885: warning: passing argument 3 of 
‘dma_alloc_coherent’ from incompatible pointer type
./include/linux/dma-mapping.h:445: note: expected ‘dma_addr_t *’ but argument 
is of type ‘__u32 *’

This patch resolves above compiler warnings.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
---
 drivers/net/ethernet/sun/sunqe.c | 11 ++-
 drivers/net/ethernet/sun/sunqe.h |  4 ++--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunqe.c b/drivers/net/ethernet/sun/sunqe.c
index 9b825780..9582948 100644
--- a/drivers/net/ethernet/sun/sunqe.c
+++ b/drivers/net/ethernet/sun/sunqe.c
@@ -124,7 +124,7 @@ static void qe_init_rings(struct sunqe *qep)
 {
struct qe_init_block *qb = qep->qe_block;
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 qbufs_dvma = qep->buffers_dvma;
+   __u32 qbufs_dvma = (__u32)qep->buffers_dvma;
int i;
 
qep->rx_new = qep->rx_old = qep->tx_new = qep->tx_old = 0;
@@ -144,6 +144,7 @@ static int qe_init(struct sunqe *qep, int from_irq)
void __iomem *mregs = qep->mregs;
void __iomem *gregs = qecp->gregs;
unsigned char *e = >dev->dev_addr[0];
+   __u32 qblk_dvma = (__u32)qep->qblock_dvma;
u32 tmp;
int i;
 
@@ -152,8 +153,8 @@ static int qe_init(struct sunqe *qep, int from_irq)
return -EAGAIN;
 
/* Setup initial rx/tx init block pointers. */
-   sbus_writel(qep->qblock_dvma + qib_offset(qe_rxd, 0), cregs + 
CREG_RXDS);
-   sbus_writel(qep->qblock_dvma + qib_offset(qe_txd, 0), cregs + 
CREG_TXDS);
+   sbus_writel(qblk_dvma + qib_offset(qe_rxd, 0), cregs + CREG_RXDS);
+   sbus_writel(qblk_dvma + qib_offset(qe_txd, 0), cregs + CREG_TXDS);
 
/* Enable/mask the various irq's. */
sbus_writel(0, cregs + CREG_RIMASK);
@@ -413,7 +414,7 @@ static void qe_rx(struct sunqe *qep)
struct net_device *dev = qep->dev;
struct qe_rxd *this;
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 qbufs_dvma = qep->buffers_dvma;
+   __u32 qbufs_dvma = (__u32)qep->buffers_dvma;
int elem = qep->rx_new;
u32 flags;
 
@@ -572,7 +573,7 @@ static int qe_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
 {
struct sunqe *qep = netdev_priv(dev);
struct sunqe_buffers *qbufs = qep->buffers;
-   __u32 txbuf_dvma, qbufs_dvma = qep->buffers_dvma;
+   __u32 txbuf_dvma, qbufs_dvma = (__u32)qep->buffers_dvma;
unsigned char *txbuf;
int len, entry;
 
diff --git a/drivers/net/ethernet/sun/sunqe.h b/drivers/net/ethernet/sun/sunqe.h
index 581781b..ae190b7 100644
--- a/drivers/net/ethernet/sun/sunqe.h
+++ b/drivers/net/ethernet/sun/sunqe.h
@@ -334,12 +334,12 @@ struct sunqe {
void __iomem*qcregs;/* QEC 
per-channel Registers   */
void __iomem*mregs; /* Per-channel MACE 
Registers  */
struct qe_init_block*qe_block;  /* RX and TX 
descriptors   */
-   __u32   qblock_dvma;/* RX and TX 
descriptors   */
+   dma_addr_t  qblock_dvma;/* RX and TX 
descriptors   */
spinlock_t  lock;   /* Protects txfull 
state   */
int rx_new, rx_old; /* RX ring extents  
   */
int tx_new, tx_old; /* TX ring extents  
   */
struct sunqe_buffers*buffers;   /* CPU visible address. 
   */
-   __u32   buffers_dvma;   /* DVMA visible 
address.   */
+   dma_addr_t  buffers_dvma;   /* DVMA visible 
address.   */
struct sunqec   *parent;
u8  mconfig;/* Base MACE mconfig 
value */
struct platform_device  *op;/* QE's OF device 
struct   */
-- 
1.9.1



[PATCH 0/2] net: Fix compiler warnings

2016-10-14 Thread Tushar Dave
Recently, ATU (iommu) changes are submitted to linux-sparc that
enables 64bit DMA on SPARC. However, this change also makes
'incompatible pointer type' compiler warnings inevitable on sunqe
and sunbmac driver.

The two patches in series fix compiler warnings.

Tushar Dave (2):
  sunqe: Fix compiler warnings
  sunbmac: Fix compiler warning

 drivers/net/ethernet/sun/sunbmac.c |  5 +++--
 drivers/net/ethernet/sun/sunbmac.h |  2 +-
 drivers/net/ethernet/sun/sunqe.c   | 11 ++-
 drivers/net/ethernet/sun/sunqe.h   |  4 ++--
 4 files changed, 12 insertions(+), 10 deletions(-)

-- 
1.9.1



[PATCH 6/6] sparc64: Enable 64-bit DMA

2016-10-10 Thread Tushar Dave
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
to use ATU for 64bit DMA.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/Kconfig| 4 
 arch/sparc/kernel/iommu.c | 8 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a7482bc..99bb845 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -94,6 +94,10 @@ config ARCH_ATU
bool
default y if SPARC64
 
+config ARCH_DMA_ADDR_T_64BIT
+   bool
+   default y if ARCH_ATU
+
 config IOMMU_HELPER
bool
default y if SPARC64
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 5c615ab..852a329 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask)
struct iommu *iommu = dev->archdata.iommu;
u64 dma_addr_mask = iommu->dma_addr_mask;
 
-   if (device_mask >= (1UL << 32UL))
-   return 0;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }
 
if ((device_mask & dma_addr_mask) == dma_addr_mask)
return 1;
-- 
1.9.1



[PATCH 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU

2016-10-10 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
sun4v hypervisor PCI IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 500MB DVMA space per instance.
When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

For example, we recently experienced legacy IOMMU limitation while
using i40e driver in system with large number of CPUs (e.g. 128).
Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has
512 (default) descriptors. So considering only RX queues (because RX
premap DMA buffers), i40e takes 4*128*512 number of DMA entries in
IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available
in table. So bringing up four instance of i40e alone saturate existing
IOMMU resource.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7)
and sun4u SPARC.

Thanks.
-Tushar

Dave Kleikamp (1):
  sparc64: Add FORCE_MAX_ZONEORDER and default to 13

Tushar Dave (5):
  sparc64: Add ATU (new IOMMU) support
  sparc64: Initialize iommu_map_table and iommu_pool
  sparc64: Bind PCIe devices to use IOMMU v2 service
  sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
  sparc64: Enable 64-bit DMA

 arch/sparc/Kconfig  |  22 ++
 arch/sparc/include/asm/hypervisor.h | 343 +
 arch/sparc/include/asm/iommu_64.h   |  28 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/iommu.c   |   8 +-
 arch/sparc/kernel/pci_sun4v.c   | 415 +++-
 arch/sparc/kernel/pci_sun4v.h   |  21 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  68 ++
 8 files changed, 846 insertions(+), 60 deletions(-)

-- 
1.9.1



[PATCH 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service

2016-10-10 Thread Tushar Dave
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
device has to be bound to IOTSB using HV API pci_iotsb_bind().

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/kernel/pci_sun4v.c | 43 +++
 arch/sparc/kernel/pci_sun4v.h |  3 +++
 arch/sparc/kernel/pci_sun4v_asm.S | 14 +
 3 files changed, 60 insertions(+)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 5404b33..c7f3a0b3 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -216,6 +216,43 @@ range_alloc_fail:
return NULL;
 }
 
+unsigned long dma_4v_iotsb_bind(unsigned long devhandle,
+   unsigned long iotsb_num,
+   struct pci_bus *bus_dev)
+{
+   struct pci_dev *pdev;
+   unsigned long err;
+   unsigned int bus;
+   unsigned int device;
+   unsigned int fun;
+
+   list_for_each_entry(pdev, _dev->devices, bus_list) {
+   if (pdev->subordinate) {
+   /* No need to bind pci bridge */
+   dma_4v_iotsb_bind(devhandle, iotsb_num,
+ pdev->subordinate);
+   } else {
+   bus = bus_dev->number;
+   device = PCI_SLOT(pdev->devfn);
+   fun = PCI_FUNC(pdev->devfn);
+   err = pci_sun4v_iotsb_bind(devhandle, iotsb_num,
+  HV_PCI_DEVICE_BUILD(bus,
+  device,
+  fun));
+
+   /* If bind fails for one device it is going to fail
+* for rest of the devices because we are sharing
+* IOTSB. So in case of failure simply return with
+* error.
+*/
+   if (err)
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry,
   unsigned long npages)
 {
@@ -628,6 +665,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info 
*pbm)
}
iotsb->iotsb_num = iotsb_num;
 
+   err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus);
+   if (err) {
+   pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err);
+   goto iotsb_conf_failed;
+   }
+
return 0;
 
 iotsb_conf_failed:
diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h
index 0ef6d1c..1019e0f 100644
--- a/arch/sparc/kernel/pci_sun4v.h
+++ b/arch/sparc/kernel/pci_sun4v.h
@@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle,
   unsigned long page_size,
   unsigned long dvma_base,
   u64 *iotsb_num);
+unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle,
+  unsigned long iotsb_num,
+  unsigned int pci_device);
 #endif /* !(_PCI_SUN4V_H) */
diff --git a/arch/sparc/kernel/pci_sun4v_asm.S 
b/arch/sparc/kernel/pci_sun4v_asm.S
index fd94d0e..22024a9 100644
--- a/arch/sparc/kernel/pci_sun4v_asm.S
+++ b/arch/sparc/kernel/pci_sun4v_asm.S
@@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf)
retl
 stx%o1, [%g1]
 ENDPROC(pci_sun4v_iotsb_conf)
+
+   /*
+* %o0: devhandle
+* %o1: iotsb_num/iotsb_handle
+* %o2: pci_device
+*
+* returns %o0: status
+*/
+ENTRY(pci_sun4v_iotsb_bind)
+   mov HV_FAST_PCI_IOTSB_BIND, %o5
+   ta  HV_FAST_TRAP
+   retl
+nop
+ENDPROC(pci_sun4v_iotsb_bind)
-- 
1.9.1



[PATCH 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13

2016-10-10 Thread Tushar Dave
From: Dave Kleikamp <dave.kleik...@oracle.com>

This change allows ATU (new IOMMU) in SPARC systems to request
large (32M) contiguous memory during boot for creating IOTSB backing
store.

Signed-off-by: Dave Kleikamp <dave.kleik...@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
---
 arch/sparc/Kconfig | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index f5d60f1..a7482bc 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -90,6 +90,10 @@ config ARCH_DEFCONFIG
 config ARCH_PROC_KCORE_TEXT
def_bool y
 
+config ARCH_ATU
+   bool
+   default y if SPARC64
+
 config IOMMU_HELPER
bool
default y if SPARC64
@@ -305,6 +309,20 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64
 
+config FORCE_MAX_ZONEORDER
+   int "Maximum zone order"
+   default "13"
+   help
+ The kernel memory allocator divides physically contiguous memory
+ blocks into "zones", where each zone is a power of two number of
+ pages.  This option selects the largest power of two that the kernel
+ keeps in the memory allocator.  If you need to allocate very large
+ blocks of physically contiguous memory, then you may need to
+ increase this value.
+
+ This config option is actually maximum order plus one. For example,
+ a value of 13 means that the largest free memory block is 2^12 pages.
+
 source "mm/Kconfig"
 
 if SPARC64
-- 
1.9.1



[PATCH 6/6] sparc64: Enable 64-bit DMA

2016-10-10 Thread Tushar Dave
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
to use ATU for 64bit DMA.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/Kconfig| 4 
 arch/sparc/kernel/iommu.c | 8 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index a7482bc..99bb845 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -94,6 +94,10 @@ config ARCH_ATU
bool
default y if SPARC64
 
+config ARCH_DMA_ADDR_T_64BIT
+   bool
+   default y if ARCH_ATU
+
 config IOMMU_HELPER
bool
default y if SPARC64
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 5c615ab..852a329 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask)
struct iommu *iommu = dev->archdata.iommu;
u64 dma_addr_mask = iommu->dma_addr_mask;
 
-   if (device_mask >= (1UL << 32UL))
-   return 0;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }
 
if ((device_mask & dma_addr_mask) == dma_addr_mask)
return 1;
-- 
1.9.1



[PATCH 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU

2016-10-10 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
sun4v hypervisor PCI IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 500MB DVMA space per instance.
When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

For example, we recently experienced legacy IOMMU limitation while
using i40e driver in system with large number of CPUs (e.g. 128).
Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has
512 (default) descriptors. So considering only RX queues (because RX
premap DMA buffers), i40e takes 4*128*512 number of DMA entries in
IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available
in table. So bringing up four instance of i40e alone saturate existing
IOMMU resource.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7)
and sun4u SPARC.

Thanks.
-Tushar

Dave Kleikamp (1):
  sparc64: Add FORCE_MAX_ZONEORDER and default to 13

Tushar Dave (5):
  sparc64: Add ATU (new IOMMU) support
  sparc64: Initialize iommu_map_table and iommu_pool
  sparc64: Bind PCIe devices to use IOMMU v2 service
  sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
  sparc64: Enable 64-bit DMA

 arch/sparc/Kconfig  |  22 ++
 arch/sparc/include/asm/hypervisor.h | 343 +
 arch/sparc/include/asm/iommu_64.h   |  28 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/iommu.c   |   8 +-
 arch/sparc/kernel/pci_sun4v.c   | 415 +++-
 arch/sparc/kernel/pci_sun4v.h   |  21 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  68 ++
 8 files changed, 846 insertions(+), 60 deletions(-)

-- 
1.9.1



[PATCH 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service

2016-10-10 Thread Tushar Dave
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
device has to be bound to IOTSB using HV API pci_iotsb_bind().

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/kernel/pci_sun4v.c | 43 +++
 arch/sparc/kernel/pci_sun4v.h |  3 +++
 arch/sparc/kernel/pci_sun4v_asm.S | 14 +
 3 files changed, 60 insertions(+)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 5404b33..c7f3a0b3 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -216,6 +216,43 @@ range_alloc_fail:
return NULL;
 }
 
+unsigned long dma_4v_iotsb_bind(unsigned long devhandle,
+   unsigned long iotsb_num,
+   struct pci_bus *bus_dev)
+{
+   struct pci_dev *pdev;
+   unsigned long err;
+   unsigned int bus;
+   unsigned int device;
+   unsigned int fun;
+
+   list_for_each_entry(pdev, _dev->devices, bus_list) {
+   if (pdev->subordinate) {
+   /* No need to bind pci bridge */
+   dma_4v_iotsb_bind(devhandle, iotsb_num,
+ pdev->subordinate);
+   } else {
+   bus = bus_dev->number;
+   device = PCI_SLOT(pdev->devfn);
+   fun = PCI_FUNC(pdev->devfn);
+   err = pci_sun4v_iotsb_bind(devhandle, iotsb_num,
+  HV_PCI_DEVICE_BUILD(bus,
+  device,
+  fun));
+
+   /* If bind fails for one device it is going to fail
+* for rest of the devices because we are sharing
+* IOTSB. So in case of failure simply return with
+* error.
+*/
+   if (err)
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry,
   unsigned long npages)
 {
@@ -628,6 +665,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info 
*pbm)
}
iotsb->iotsb_num = iotsb_num;
 
+   err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus);
+   if (err) {
+   pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err);
+   goto iotsb_conf_failed;
+   }
+
return 0;
 
 iotsb_conf_failed:
diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h
index 0ef6d1c..1019e0f 100644
--- a/arch/sparc/kernel/pci_sun4v.h
+++ b/arch/sparc/kernel/pci_sun4v.h
@@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle,
   unsigned long page_size,
   unsigned long dvma_base,
   u64 *iotsb_num);
+unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle,
+  unsigned long iotsb_num,
+  unsigned int pci_device);
 #endif /* !(_PCI_SUN4V_H) */
diff --git a/arch/sparc/kernel/pci_sun4v_asm.S 
b/arch/sparc/kernel/pci_sun4v_asm.S
index fd94d0e..22024a9 100644
--- a/arch/sparc/kernel/pci_sun4v_asm.S
+++ b/arch/sparc/kernel/pci_sun4v_asm.S
@@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf)
retl
 stx%o1, [%g1]
 ENDPROC(pci_sun4v_iotsb_conf)
+
+   /*
+* %o0: devhandle
+* %o1: iotsb_num/iotsb_handle
+* %o2: pci_device
+*
+* returns %o0: status
+*/
+ENTRY(pci_sun4v_iotsb_bind)
+   mov HV_FAST_PCI_IOTSB_BIND, %o5
+   ta  HV_FAST_TRAP
+   retl
+nop
+ENDPROC(pci_sun4v_iotsb_bind)
-- 
1.9.1



[PATCH 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13

2016-10-10 Thread Tushar Dave
From: Dave Kleikamp 

This change allows ATU (new IOMMU) in SPARC systems to request
large (32M) contiguous memory during boot for creating IOTSB backing
store.

Signed-off-by: Dave Kleikamp 
Signed-off-by: Tushar Dave 
---
 arch/sparc/Kconfig | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index f5d60f1..a7482bc 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -90,6 +90,10 @@ config ARCH_DEFCONFIG
 config ARCH_PROC_KCORE_TEXT
def_bool y
 
+config ARCH_ATU
+   bool
+   default y if SPARC64
+
 config IOMMU_HELPER
bool
default y if SPARC64
@@ -305,6 +309,20 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64
 
+config FORCE_MAX_ZONEORDER
+   int "Maximum zone order"
+   default "13"
+   help
+ The kernel memory allocator divides physically contiguous memory
+ blocks into "zones", where each zone is a power of two number of
+ pages.  This option selects the largest power of two that the kernel
+ keeps in the memory allocator.  If you need to allocate very large
+ blocks of physically contiguous memory, then you may need to
+ increase this value.
+
+ This config option is actually maximum order plus one. For example,
+ a value of 13 means that the largest free memory block is 2^12 pages.
+
 source "mm/Kconfig"
 
 if SPARC64
-- 
1.9.1



[PATCH 2/6] sparc64: Add ATU (new IOMMU) support

2016-10-10 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
Hypervisor IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
instance. When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/hypervisor.h | 337 
 arch/sparc/include/asm/iommu_64.h   |  26 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/pci_sun4v.c   | 139 +++
 arch/sparc/kernel/pci_sun4v.h   |   7 +
 arch/sparc/kernel/pci_sun4v_asm.S   |  18 ++
 6 files changed, 528 insertions(+)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 666d5ba..7b15df8 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  */
 #define HV_FAST_PCI_MSG_SETVALID   0xd3
 
+/* PCI IOMMU v2 definitions and services
+ *
+ * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO
+ * definitions and services.
+ *
+ * CTE Clump Table Entry. First level table entry in the ATU.
+ *
+ * pci_device_list
+ * A 32-bit aligned list of pci_devices.
+ *
+ * pci_device_listp
+ * real address of a pci_device_list. 32-bit aligned.
+ *
+ * iotte   IOMMU translation table entry.
+ *
+ * iotte_attributes
+ * IO Attributes for IOMMU v2 mappings. In addition to
+ * read, write IOMMU v2 supports relax ordering
+ *
+ * io_page_listA 64-bit aligned list of real addresses. Each real
+ * address in an io_page_list must be properly aligned
+ * to the pagesize of the given IOTSB.
+ *
+ * io_page_list_p  Real address of an io_page_list, 64-bit aligned.
+ *
+ * IOTSB   IO Translation Storage Buffer. An aligned table of
+ * IOTTEs. Each IOTSB has a pagesize, table size, and
+ * virtual address associated with it that must match
+ * a pagesize and table size supported by the un-derlying
+ * hardware implementation. The alignment requirements
+ * for an IOTSB depend on the pagesize used for that IOTSB.
+ * Each IOTTE in an IOTSB maps one pagesize-sized page.
+ * The size of the IOTSB dictates how large of a virtual
+ * address space the IOTSB is capable of mapping.
+ *
+ * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus
+ * iotsb_handle represents a binding of an IOTSB to a
+ * PCI root complex.
+ *
+ * iotsb_index Zero-based IOTTE number within an IOTSB.
+ */
+
+/* pci_iotsb_conf()
+ * TRAP:   HV_FAST_TRAP
+ * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
+ * ARG0:   devhandle
+ * ARG1:   r_addr
+ * ARG2:   size
+ * ARG3:   pagesize
+ * ARG4:   iova
+ * RET0:   status
+ * RET1:   iotsb_handle
+ * ERRORS: EINVAL  Invalid devhandle, size, iova, or pagesize
+ * EBADALIGN   r_addr is not properly aligned
+ * ENORADDRr_addr is not a valid real address
+ * ETOOMANYNo further IOTSBs may be configured
+ * EBUSY   Duplicate devhandle, raddir, iova combination
+ *
+ * Create an IOTSB suitable for the PCI root complex identified by devhandle,
+ * for the DMA virtual address defined by the argument iova.
+ *
+ * r_addr is the properly aligned base address of the IOTSB and size is the
+ * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to
+ * being configured. If it contains any values other than zeros then the
+ * behavior is undefined.
+ *
+ * pagesize is the size of each page in the IOTSB. Note that the combination of
+ * size (table size) and pagesize must be valid.
+ *
+ * virt is the DMA virtual address this IOTSB will map.
+ *
+ * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1.
+ * Once configured,

[PATCH 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs

2016-10-10 Thread Tushar Dave
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
64bit DMA mask.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/hypervisor.h |   6 +
 arch/sparc/kernel/pci_sun4v.c   | 214 ++--
 arch/sparc/kernel/pci_sun4v.h   |  11 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  36 ++
 4 files changed, 209 insertions(+), 58 deletions(-)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 7b15df8..73cb897 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  * iotsb_index Zero-based IOTTE number within an IOTSB.
  */
 
+/* The index_count argument consists of two fields:
+ * bits 63:48 #iottes and bits 47:0 iotsb_index
+ */
+#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \
+   (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index)))
+
 /* pci_iotsb_conf()
  * TRAP:   HV_FAST_TRAP
  * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index c7f3a0b3..1ea6d30 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, 
unsigned long prot, uns
 }
 
 /* Interrupts must be disabled.  */
-static long iommu_batch_flush(struct iommu_batch *p)
+static long iommu_batch_flush(struct iommu_batch *p, u64 mask)
 {
struct pci_pbm_info *pbm = p->dev->archdata.host_controller;
+   u64 *pglist = p->pglist;
+   u64 index_count;
unsigned long devhandle = pbm->devhandle;
unsigned long prot = p->prot;
unsigned long entry = p->entry;
-   u64 *pglist = p->pglist;
unsigned long npages = p->npages;
+   unsigned long iotsb_num;
+   unsigned long ret;
+   long num;
 
/* VPCI maj=1, min=[0,1] only supports read and write */
if (vpci_major < 2)
prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE);
 
while (npages != 0) {
-   long num;
-
-   num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry),
- npages, prot, __pa(pglist));
-   if (unlikely(num < 0)) {
-   if (printk_ratelimit())
-   printk("iommu_batch_flush: IOMMU map of "
-  "[%08lx:%08llx:%lx:%lx:%lx] failed with "
-  "status %ld\n",
-  devhandle, HV_PCI_TSBID(0, entry),
-  npages, prot, __pa(pglist), num);
-   return -1;
+   if (mask <= DMA_BIT_MASK(32)) {
+   num = pci_sun4v_iommu_map(devhandle,
+ HV_PCI_TSBID(0, entry),
+ npages,
+ prot,
+ __pa(pglist));
+   if (unlikely(num < 0)) {
+   pr_err_ratelimited("iommu_batch_flush: IOMMU 
map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n",
+  devhandle,
+  HV_PCI_TSBID(0, entry),
+  npages, prot, __pa(pglist),
+  num);
+   return -1;
+   }
+   } else {
+   index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry),
+   iotsb_num = pbm->iommu->atu->iotsb->iotsb_num;
+   ret = pci_sun4v_iotsb_map(devhandle,
+ iotsb_num,
+ index_count,
+ prot,
+ __pa(pglist),
+ );
+   if (unlikely(ret != HV_EOK)) {
+   pr_err_ratelimited("iommu_batch_flush: ATU map 
of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n",
+  devhandle, iotsb_num,
+  index_count, prot,
+  __pa(pglist), ret);
+   return -1;
+ 

[PATCH 2/6] sparc64: Add ATU (new IOMMU) support

2016-10-10 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
Hypervisor IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
instance. When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/hypervisor.h | 337 
 arch/sparc/include/asm/iommu_64.h   |  26 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/pci_sun4v.c   | 139 +++
 arch/sparc/kernel/pci_sun4v.h   |   7 +
 arch/sparc/kernel/pci_sun4v_asm.S   |  18 ++
 6 files changed, 528 insertions(+)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 666d5ba..7b15df8 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  */
 #define HV_FAST_PCI_MSG_SETVALID   0xd3
 
+/* PCI IOMMU v2 definitions and services
+ *
+ * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO
+ * definitions and services.
+ *
+ * CTE Clump Table Entry. First level table entry in the ATU.
+ *
+ * pci_device_list
+ * A 32-bit aligned list of pci_devices.
+ *
+ * pci_device_listp
+ * real address of a pci_device_list. 32-bit aligned.
+ *
+ * iotte   IOMMU translation table entry.
+ *
+ * iotte_attributes
+ * IO Attributes for IOMMU v2 mappings. In addition to
+ * read, write IOMMU v2 supports relax ordering
+ *
+ * io_page_listA 64-bit aligned list of real addresses. Each real
+ * address in an io_page_list must be properly aligned
+ * to the pagesize of the given IOTSB.
+ *
+ * io_page_list_p  Real address of an io_page_list, 64-bit aligned.
+ *
+ * IOTSB   IO Translation Storage Buffer. An aligned table of
+ * IOTTEs. Each IOTSB has a pagesize, table size, and
+ * virtual address associated with it that must match
+ * a pagesize and table size supported by the un-derlying
+ * hardware implementation. The alignment requirements
+ * for an IOTSB depend on the pagesize used for that IOTSB.
+ * Each IOTTE in an IOTSB maps one pagesize-sized page.
+ * The size of the IOTSB dictates how large of a virtual
+ * address space the IOTSB is capable of mapping.
+ *
+ * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus
+ * iotsb_handle represents a binding of an IOTSB to a
+ * PCI root complex.
+ *
+ * iotsb_index Zero-based IOTTE number within an IOTSB.
+ */
+
+/* pci_iotsb_conf()
+ * TRAP:   HV_FAST_TRAP
+ * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
+ * ARG0:   devhandle
+ * ARG1:   r_addr
+ * ARG2:   size
+ * ARG3:   pagesize
+ * ARG4:   iova
+ * RET0:   status
+ * RET1:   iotsb_handle
+ * ERRORS: EINVAL  Invalid devhandle, size, iova, or pagesize
+ * EBADALIGN   r_addr is not properly aligned
+ * ENORADDRr_addr is not a valid real address
+ * ETOOMANYNo further IOTSBs may be configured
+ * EBUSY   Duplicate devhandle, raddir, iova combination
+ *
+ * Create an IOTSB suitable for the PCI root complex identified by devhandle,
+ * for the DMA virtual address defined by the argument iova.
+ *
+ * r_addr is the properly aligned base address of the IOTSB and size is the
+ * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to
+ * being configured. If it contains any values other than zeros then the
+ * behavior is undefined.
+ *
+ * pagesize is the size of each page in the IOTSB. Note that the combination of
+ * size (table size) and pagesize must be valid.
+ *
+ * virt is the DMA virtual address this IOTSB will map.
+ *
+ * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1.
+ * Once configured, privileged access to the IOTSB memory is prohibited and
+ * creates undefined behavior. The only

[PATCH 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs

2016-10-10 Thread Tushar Dave
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
64bit DMA mask.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/hypervisor.h |   6 +
 arch/sparc/kernel/pci_sun4v.c   | 214 ++--
 arch/sparc/kernel/pci_sun4v.h   |  11 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  36 ++
 4 files changed, 209 insertions(+), 58 deletions(-)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 7b15df8..73cb897 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  * iotsb_index Zero-based IOTTE number within an IOTSB.
  */
 
+/* The index_count argument consists of two fields:
+ * bits 63:48 #iottes and bits 47:0 iotsb_index
+ */
+#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \
+   (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index)))
+
 /* pci_iotsb_conf()
  * TRAP:   HV_FAST_TRAP
  * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index c7f3a0b3..1ea6d30 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, 
unsigned long prot, uns
 }
 
 /* Interrupts must be disabled.  */
-static long iommu_batch_flush(struct iommu_batch *p)
+static long iommu_batch_flush(struct iommu_batch *p, u64 mask)
 {
struct pci_pbm_info *pbm = p->dev->archdata.host_controller;
+   u64 *pglist = p->pglist;
+   u64 index_count;
unsigned long devhandle = pbm->devhandle;
unsigned long prot = p->prot;
unsigned long entry = p->entry;
-   u64 *pglist = p->pglist;
unsigned long npages = p->npages;
+   unsigned long iotsb_num;
+   unsigned long ret;
+   long num;
 
/* VPCI maj=1, min=[0,1] only supports read and write */
if (vpci_major < 2)
prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE);
 
while (npages != 0) {
-   long num;
-
-   num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry),
- npages, prot, __pa(pglist));
-   if (unlikely(num < 0)) {
-   if (printk_ratelimit())
-   printk("iommu_batch_flush: IOMMU map of "
-  "[%08lx:%08llx:%lx:%lx:%lx] failed with "
-  "status %ld\n",
-  devhandle, HV_PCI_TSBID(0, entry),
-  npages, prot, __pa(pglist), num);
-   return -1;
+   if (mask <= DMA_BIT_MASK(32)) {
+   num = pci_sun4v_iommu_map(devhandle,
+ HV_PCI_TSBID(0, entry),
+ npages,
+ prot,
+ __pa(pglist));
+   if (unlikely(num < 0)) {
+   pr_err_ratelimited("iommu_batch_flush: IOMMU 
map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n",
+  devhandle,
+  HV_PCI_TSBID(0, entry),
+  npages, prot, __pa(pglist),
+  num);
+   return -1;
+   }
+   } else {
+   index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry),
+   iotsb_num = pbm->iommu->atu->iotsb->iotsb_num;
+   ret = pci_sun4v_iotsb_map(devhandle,
+ iotsb_num,
+ index_count,
+ prot,
+ __pa(pglist),
+ );
+   if (unlikely(ret != HV_EOK)) {
+   pr_err_ratelimited("iommu_batch_flush: ATU map 
of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n",
+  devhandle, iotsb_num,
+  index_count, prot,
+  __pa(pglist), ret);
+   return -1;
+   }
}
-
entry += num;
npages -= n

[PATCH 3/6] sparc64: Initialize iommu_map_table and iommu_pool

2016-10-10 Thread Tushar Dave
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
This change initializes iommu_map_table and iommu_pool for ATU.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/iommu_64.h |  2 ++
 arch/sparc/kernel/pci_sun4v.c | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/arch/sparc/include/asm/iommu_64.h 
b/arch/sparc/include/asm/iommu_64.h
index 93daa59..f24f356 100644
--- a/arch/sparc/include/asm/iommu_64.h
+++ b/arch/sparc/include/asm/iommu_64.h
@@ -45,8 +45,10 @@ struct atu_ranges {
 struct atu {
struct  atu_ranges  *ranges;
struct  atu_iotsb   *iotsb;
+   struct  iommu_map_table tbl;
u64 base;
u64 size;
+   u64 dma_addr_mask;
 };
 
 struct iommu {
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index a1c4d5e..5404b33 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -643,6 +643,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
struct atu *atu = pbm->iommu->atu;
unsigned long err;
const u64 *ranges;
+   u64 map_size, num_iotte;
+   u64 dma_mask;
const u32 *page_size;
int len;
 
@@ -681,6 +683,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
return err;
}
 
+   /* Create ATU iommu map.
+* One bit represents one iotte in IOTSB table.
+*/
+   dma_mask = (roundup_pow_of_two(atu->size) - 1UL);
+   num_iotte = atu->size / IO_PAGE_SIZE;
+   map_size = num_iotte / 8;
+   atu->tbl.table_map_base = atu->base;
+   atu->dma_addr_mask = dma_mask;
+   atu->tbl.map = kzalloc(map_size, GFP_KERNEL);
+   if (!atu->tbl.map)
+   return -ENOMEM;
+
+   iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT,
+   NULL, false /* no large_pool */,
+   0 /* default npools */,
+   false /* want span boundary checking */);
+
return 0;
 }
 
-- 
1.9.1



[PATCH 3/6] sparc64: Initialize iommu_map_table and iommu_pool

2016-10-10 Thread Tushar Dave
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
This change initializes iommu_map_table and iommu_pool for ATU.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Reviewed-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/iommu_64.h |  2 ++
 arch/sparc/kernel/pci_sun4v.c | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/arch/sparc/include/asm/iommu_64.h 
b/arch/sparc/include/asm/iommu_64.h
index 93daa59..f24f356 100644
--- a/arch/sparc/include/asm/iommu_64.h
+++ b/arch/sparc/include/asm/iommu_64.h
@@ -45,8 +45,10 @@ struct atu_ranges {
 struct atu {
struct  atu_ranges  *ranges;
struct  atu_iotsb   *iotsb;
+   struct  iommu_map_table tbl;
u64 base;
u64 size;
+   u64 dma_addr_mask;
 };
 
 struct iommu {
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index a1c4d5e..5404b33 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -643,6 +643,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
struct atu *atu = pbm->iommu->atu;
unsigned long err;
const u64 *ranges;
+   u64 map_size, num_iotte;
+   u64 dma_mask;
const u32 *page_size;
int len;
 
@@ -681,6 +683,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
return err;
}
 
+   /* Create ATU iommu map.
+* One bit represents one iotte in IOTSB table.
+*/
+   dma_mask = (roundup_pow_of_two(atu->size) - 1UL);
+   num_iotte = atu->size / IO_PAGE_SIZE;
+   map_size = num_iotte / 8;
+   atu->tbl.table_map_base = atu->base;
+   atu->dma_addr_mask = dma_mask;
+   atu->tbl.map = kzalloc(map_size, GFP_KERNEL);
+   if (!atu->tbl.map)
+   return -ENOMEM;
+
+   iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT,
+   NULL, false /* no large_pool */,
+   0 /* default npools */,
+   false /* want span boundary checking */);
+
return 0;
 }
 
-- 
1.9.1



[RFC PATCH 2/6] sparc64: Add ATU (new IOMMU) support

2016-10-06 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
Hypervisor IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
instance. When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/hypervisor.h | 337 
 arch/sparc/include/asm/iommu_64.h   |  26 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/pci_sun4v.c   | 139 +++
 arch/sparc/kernel/pci_sun4v.h   |   7 +
 arch/sparc/kernel/pci_sun4v_asm.S   |  18 ++
 6 files changed, 528 insertions(+)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 666d5ba..7b15df8 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  */
 #define HV_FAST_PCI_MSG_SETVALID   0xd3
 
+/* PCI IOMMU v2 definitions and services
+ *
+ * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO
+ * definitions and services.
+ *
+ * CTE Clump Table Entry. First level table entry in the ATU.
+ *
+ * pci_device_list
+ * A 32-bit aligned list of pci_devices.
+ *
+ * pci_device_listp
+ * real address of a pci_device_list. 32-bit aligned.
+ *
+ * iotte   IOMMU translation table entry.
+ *
+ * iotte_attributes
+ * IO Attributes for IOMMU v2 mappings. In addition to
+ * read, write IOMMU v2 supports relax ordering
+ *
+ * io_page_listA 64-bit aligned list of real addresses. Each real
+ * address in an io_page_list must be properly aligned
+ * to the pagesize of the given IOTSB.
+ *
+ * io_page_list_p  Real address of an io_page_list, 64-bit aligned.
+ *
+ * IOTSB   IO Translation Storage Buffer. An aligned table of
+ * IOTTEs. Each IOTSB has a pagesize, table size, and
+ * virtual address associated with it that must match
+ * a pagesize and table size supported by the un-derlying
+ * hardware implementation. The alignment requirements
+ * for an IOTSB depend on the pagesize used for that IOTSB.
+ * Each IOTTE in an IOTSB maps one pagesize-sized page.
+ * The size of the IOTSB dictates how large of a virtual
+ * address space the IOTSB is capable of mapping.
+ *
+ * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus
+ * iotsb_handle represents a binding of an IOTSB to a
+ * PCI root complex.
+ *
+ * iotsb_index Zero-based IOTTE number within an IOTSB.
+ */
+
+/* pci_iotsb_conf()
+ * TRAP:   HV_FAST_TRAP
+ * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
+ * ARG0:   devhandle
+ * ARG1:   r_addr
+ * ARG2:   size
+ * ARG3:   pagesize
+ * ARG4:   iova
+ * RET0:   status
+ * RET1:   iotsb_handle
+ * ERRORS: EINVAL  Invalid devhandle, size, iova, or pagesize
+ * EBADALIGN   r_addr is not properly aligned
+ * ENORADDRr_addr is not a valid real address
+ * ETOOMANYNo further IOTSBs may be configured
+ * EBUSY   Duplicate devhandle, raddir, iova combination
+ *
+ * Create an IOTSB suitable for the PCI root complex identified by devhandle,
+ * for the DMA virtual address defined by the argument iova.
+ *
+ * r_addr is the properly aligned base address of the IOTSB and size is the
+ * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to
+ * being configured. If it contains any values other than zeros then the
+ * behavior is undefined.
+ *
+ * pagesize is the size of each page in the IOTSB. Note that the combination of
+ * size (table size) and pagesize must be valid.
+ *
+ * virt is the DMA virtual address this IOTSB will map.
+ *
+ * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1.
+ * Once configured,

[RFC PATCH 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs

2016-10-06 Thread Tushar Dave
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
64bit DMA mask.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/hypervisor.h |   6 +
 arch/sparc/kernel/pci_sun4v.c   | 214 ++--
 arch/sparc/kernel/pci_sun4v.h   |  11 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  36 ++
 4 files changed, 209 insertions(+), 58 deletions(-)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 7b15df8..73cb897 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  * iotsb_index Zero-based IOTTE number within an IOTSB.
  */
 
+/* The index_count argument consists of two fields:
+ * bits 63:48 #iottes and bits 47:0 iotsb_index
+ */
+#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \
+   (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index)))
+
 /* pci_iotsb_conf()
  * TRAP:   HV_FAST_TRAP
  * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index c7f3a0b3..1ea6d30 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, 
unsigned long prot, uns
 }
 
 /* Interrupts must be disabled.  */
-static long iommu_batch_flush(struct iommu_batch *p)
+static long iommu_batch_flush(struct iommu_batch *p, u64 mask)
 {
struct pci_pbm_info *pbm = p->dev->archdata.host_controller;
+   u64 *pglist = p->pglist;
+   u64 index_count;
unsigned long devhandle = pbm->devhandle;
unsigned long prot = p->prot;
unsigned long entry = p->entry;
-   u64 *pglist = p->pglist;
unsigned long npages = p->npages;
+   unsigned long iotsb_num;
+   unsigned long ret;
+   long num;
 
/* VPCI maj=1, min=[0,1] only supports read and write */
if (vpci_major < 2)
prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE);
 
while (npages != 0) {
-   long num;
-
-   num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry),
- npages, prot, __pa(pglist));
-   if (unlikely(num < 0)) {
-   if (printk_ratelimit())
-   printk("iommu_batch_flush: IOMMU map of "
-  "[%08lx:%08llx:%lx:%lx:%lx] failed with "
-  "status %ld\n",
-  devhandle, HV_PCI_TSBID(0, entry),
-  npages, prot, __pa(pglist), num);
-   return -1;
+   if (mask <= DMA_BIT_MASK(32)) {
+   num = pci_sun4v_iommu_map(devhandle,
+ HV_PCI_TSBID(0, entry),
+ npages,
+ prot,
+ __pa(pglist));
+   if (unlikely(num < 0)) {
+   pr_err_ratelimited("iommu_batch_flush: IOMMU 
map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n",
+  devhandle,
+  HV_PCI_TSBID(0, entry),
+  npages, prot, __pa(pglist),
+  num);
+   return -1;
+   }
+   } else {
+   index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry),
+   iotsb_num = pbm->iommu->atu->iotsb->iotsb_num;
+   ret = pci_sun4v_iotsb_map(devhandle,
+ iotsb_num,
+ index_count,
+ prot,
+ __pa(pglist),
+ );
+   if (unlikely(ret != HV_EOK)) {
+   pr_err_ratelimited("iommu_batch_flush: ATU map 
of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n",
+  devhandle, iotsb_num,
+  index_count, prot,
+  __pa(pglist), ret);
+   return -1;
+ 

[RFC PATCH 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU

2016-10-06 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
sun4v hypervisor PCI IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 500MB DVMA space per instance.
When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

For example, we recently experienced legacy IOMMU limitation while
using i40e driver in system with large number of CPUs (e.g. 128).
Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has
512 (default) descriptors. So considering only RX queues (because RX
premap DMA buffers), i40e takes 4*128*512 number of DMA entries in
IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available
in table. So bringing up four instance of i40e alone saturate existing
IOMMU resource.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7)
and sun4u SPARC.

Thanks.
-Tushar

Dave Kleikamp (1):
  sparc64: Add FORCE_MAX_ZONEORDER and default to 13

Tushar Dave (5):
  sparc64: Add ATU (new IOMMU) support
  sparc64: Initialize iommu_map_table and iommu_pool
  sparc64: Bind PCIe devices to use IOMMU v2 service
  sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
  sparc64: Enable 64-bit DMA

 arch/sparc/Kconfig  |  22 ++
 arch/sparc/include/asm/hypervisor.h | 343 +
 arch/sparc/include/asm/iommu_64.h   |  28 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/iommu.c   |   8 +-
 arch/sparc/kernel/pci_sun4v.c   | 415 +++-
 arch/sparc/kernel/pci_sun4v.h   |  21 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  68 ++
 8 files changed, 846 insertions(+), 60 deletions(-)

-- 
1.9.1



[RFC PATCH 5/6] sparc64: Enable sun4v dma ops to use IOMMU v2 APIs

2016-10-06 Thread Tushar Dave
Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
64bit DMA mask.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/hypervisor.h |   6 +
 arch/sparc/kernel/pci_sun4v.c   | 214 ++--
 arch/sparc/kernel/pci_sun4v.h   |  11 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  36 ++
 4 files changed, 209 insertions(+), 58 deletions(-)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 7b15df8..73cb897 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2377,6 +2377,12 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  * iotsb_index Zero-based IOTTE number within an IOTSB.
  */
 
+/* The index_count argument consists of two fields:
+ * bits 63:48 #iottes and bits 47:0 iotsb_index
+ */
+#define HV_PCI_IOTSB_INDEX_COUNT(__iottes, __iotsb_index) \
+   (((u64)(__iottes) << 48UL) | ((u64)(__iotsb_index)))
+
 /* pci_iotsb_conf()
  * TRAP:   HV_FAST_TRAP
  * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index c7f3a0b3..1ea6d30 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -72,34 +72,55 @@ static inline void iommu_batch_start(struct device *dev, 
unsigned long prot, uns
 }
 
 /* Interrupts must be disabled.  */
-static long iommu_batch_flush(struct iommu_batch *p)
+static long iommu_batch_flush(struct iommu_batch *p, u64 mask)
 {
struct pci_pbm_info *pbm = p->dev->archdata.host_controller;
+   u64 *pglist = p->pglist;
+   u64 index_count;
unsigned long devhandle = pbm->devhandle;
unsigned long prot = p->prot;
unsigned long entry = p->entry;
-   u64 *pglist = p->pglist;
unsigned long npages = p->npages;
+   unsigned long iotsb_num;
+   unsigned long ret;
+   long num;
 
/* VPCI maj=1, min=[0,1] only supports read and write */
if (vpci_major < 2)
prot &= (HV_PCI_MAP_ATTR_READ | HV_PCI_MAP_ATTR_WRITE);
 
while (npages != 0) {
-   long num;
-
-   num = pci_sun4v_iommu_map(devhandle, HV_PCI_TSBID(0, entry),
- npages, prot, __pa(pglist));
-   if (unlikely(num < 0)) {
-   if (printk_ratelimit())
-   printk("iommu_batch_flush: IOMMU map of "
-  "[%08lx:%08llx:%lx:%lx:%lx] failed with "
-  "status %ld\n",
-  devhandle, HV_PCI_TSBID(0, entry),
-  npages, prot, __pa(pglist), num);
-   return -1;
+   if (mask <= DMA_BIT_MASK(32)) {
+   num = pci_sun4v_iommu_map(devhandle,
+ HV_PCI_TSBID(0, entry),
+ npages,
+ prot,
+ __pa(pglist));
+   if (unlikely(num < 0)) {
+   pr_err_ratelimited("iommu_batch_flush: IOMMU 
map of [%08lx:%08llx:%lx:%lx:%lx] failed with status %ld\n",
+  devhandle,
+  HV_PCI_TSBID(0, entry),
+  npages, prot, __pa(pglist),
+  num);
+   return -1;
+   }
+   } else {
+   index_count = HV_PCI_IOTSB_INDEX_COUNT(npages, entry),
+   iotsb_num = pbm->iommu->atu->iotsb->iotsb_num;
+   ret = pci_sun4v_iotsb_map(devhandle,
+ iotsb_num,
+ index_count,
+ prot,
+ __pa(pglist),
+ );
+   if (unlikely(ret != HV_EOK)) {
+   pr_err_ratelimited("iommu_batch_flush: ATU map 
of [%08lx:%lx:%llx:%lx:%lx] failed with status %ld\n",
+  devhandle, iotsb_num,
+  index_count, prot,
+  __pa(pglist), ret);
+   return -1;
+   }
}
-
entry += num;
npages -= n

[RFC PATCH 0/6] sparc: Enable sun4v hypervisor PCI IOMMU v2 APIs and ATU

2016-10-06 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
sun4v hypervisor PCI IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 500MB DVMA space per instance.
When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

For example, we recently experienced legacy IOMMU limitation while
using i40e driver in system with large number of CPUs (e.g. 128).
Four ports of i40e, each request 128 QP (Queue Pairs). Each queue has
512 (default) descriptors. So considering only RX queues (because RX
premap DMA buffers), i40e takes 4*128*512 number of DMA entries in
IOMMU table. Legacy IOMMU can have at max (2G/8K)- 1 entries available
in table. So bringing up four instance of i40e alone saturate existing
IOMMU resource.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

The patch set is tested on sun4v (T1000, T2000, T3, T4, T5, T7, S7)
and sun4u SPARC.

Thanks.
-Tushar

Dave Kleikamp (1):
  sparc64: Add FORCE_MAX_ZONEORDER and default to 13

Tushar Dave (5):
  sparc64: Add ATU (new IOMMU) support
  sparc64: Initialize iommu_map_table and iommu_pool
  sparc64: Bind PCIe devices to use IOMMU v2 service
  sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
  sparc64: Enable 64-bit DMA

 arch/sparc/Kconfig  |  22 ++
 arch/sparc/include/asm/hypervisor.h | 343 +
 arch/sparc/include/asm/iommu_64.h   |  28 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/iommu.c   |   8 +-
 arch/sparc/kernel/pci_sun4v.c   | 415 +++-
 arch/sparc/kernel/pci_sun4v.h   |  21 ++
 arch/sparc/kernel/pci_sun4v_asm.S   |  68 ++
 8 files changed, 846 insertions(+), 60 deletions(-)

-- 
1.9.1



[RFC PATCH 2/6] sparc64: Add ATU (new IOMMU) support

2016-10-06 Thread Tushar Dave
ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
Hypervisor IOMMU v2 APIs.

Current SPARC IOMMU supports only 32bit address ranges and one TSB
per PCIe root complex that has a 2GB per root complex DVMA space
limit. The limit has become a scalability bottleneck nowadays that
a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
instance. When DVMA resource is exhausted, devices will not be usable
since the driver can't allocate DVMA.

ATU removes bottleneck by allowing guest os to create IOTSB of size
32G (or more) with 64bit address ranges available in ATU HW. 32G is
more than enough DVMA space to be shared by all PCIe devices under
root complex contrast to 2G space provided by legacy IOMMU.

ATU allows PCIe devices to use 64bit DMA addressing. Devices
which choose to use 32bit DMA mask will continue to work with the
existing legacy IOMMU.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/hypervisor.h | 337 
 arch/sparc/include/asm/iommu_64.h   |  26 +++
 arch/sparc/kernel/hvapi.c   |   1 +
 arch/sparc/kernel/pci_sun4v.c   | 139 +++
 arch/sparc/kernel/pci_sun4v.h   |   7 +
 arch/sparc/kernel/pci_sun4v_asm.S   |  18 ++
 6 files changed, 528 insertions(+)

diff --git a/arch/sparc/include/asm/hypervisor.h 
b/arch/sparc/include/asm/hypervisor.h
index 666d5ba..7b15df8 100644
--- a/arch/sparc/include/asm/hypervisor.h
+++ b/arch/sparc/include/asm/hypervisor.h
@@ -2335,6 +2335,342 @@ unsigned long sun4v_vintr_set_target(unsigned long 
dev_handle,
  */
 #define HV_FAST_PCI_MSG_SETVALID   0xd3
 
+/* PCI IOMMU v2 definitions and services
+ *
+ * While the PCI IO definitions above is valid IOMMU v2 adds new PCI IO
+ * definitions and services.
+ *
+ * CTE Clump Table Entry. First level table entry in the ATU.
+ *
+ * pci_device_list
+ * A 32-bit aligned list of pci_devices.
+ *
+ * pci_device_listp
+ * real address of a pci_device_list. 32-bit aligned.
+ *
+ * iotte   IOMMU translation table entry.
+ *
+ * iotte_attributes
+ * IO Attributes for IOMMU v2 mappings. In addition to
+ * read, write IOMMU v2 supports relax ordering
+ *
+ * io_page_listA 64-bit aligned list of real addresses. Each real
+ * address in an io_page_list must be properly aligned
+ * to the pagesize of the given IOTSB.
+ *
+ * io_page_list_p  Real address of an io_page_list, 64-bit aligned.
+ *
+ * IOTSB   IO Translation Storage Buffer. An aligned table of
+ * IOTTEs. Each IOTSB has a pagesize, table size, and
+ * virtual address associated with it that must match
+ * a pagesize and table size supported by the un-derlying
+ * hardware implementation. The alignment requirements
+ * for an IOTSB depend on the pagesize used for that IOTSB.
+ * Each IOTTE in an IOTSB maps one pagesize-sized page.
+ * The size of the IOTSB dictates how large of a virtual
+ * address space the IOTSB is capable of mapping.
+ *
+ * iotsb_handleAn opaque identifier for an IOTSB. A devhandle plus
+ * iotsb_handle represents a binding of an IOTSB to a
+ * PCI root complex.
+ *
+ * iotsb_index Zero-based IOTTE number within an IOTSB.
+ */
+
+/* pci_iotsb_conf()
+ * TRAP:   HV_FAST_TRAP
+ * FUNCTION:   HV_FAST_PCI_IOTSB_CONF
+ * ARG0:   devhandle
+ * ARG1:   r_addr
+ * ARG2:   size
+ * ARG3:   pagesize
+ * ARG4:   iova
+ * RET0:   status
+ * RET1:   iotsb_handle
+ * ERRORS: EINVAL  Invalid devhandle, size, iova, or pagesize
+ * EBADALIGN   r_addr is not properly aligned
+ * ENORADDRr_addr is not a valid real address
+ * ETOOMANYNo further IOTSBs may be configured
+ * EBUSY   Duplicate devhandle, raddir, iova combination
+ *
+ * Create an IOTSB suitable for the PCI root complex identified by devhandle,
+ * for the DMA virtual address defined by the argument iova.
+ *
+ * r_addr is the properly aligned base address of the IOTSB and size is the
+ * IOTSB (table) size in bytes.The IOTSB is required to be zeroed prior to
+ * being configured. If it contains any values other than zeros then the
+ * behavior is undefined.
+ *
+ * pagesize is the size of each page in the IOTSB. Note that the combination of
+ * size (table size) and pagesize must be valid.
+ *
+ * virt is the DMA virtual address this IOTSB will map.
+ *
+ * If successful, the opaque 64-bit handle iotsb_handle is returned in ret1.
+ * Once configured, privileged access to the IOTSB memory is prohibited and
+ * creates undefined behavior. The only

[RFC PATCH 3/6] sparc64: Initialize iommu_map_table and iommu_pool

2016-10-06 Thread Tushar Dave
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
This change initializes iommu_map_table and iommu_pool for ATU.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/include/asm/iommu_64.h |  2 ++
 arch/sparc/kernel/pci_sun4v.c | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/arch/sparc/include/asm/iommu_64.h 
b/arch/sparc/include/asm/iommu_64.h
index 93daa59..f24f356 100644
--- a/arch/sparc/include/asm/iommu_64.h
+++ b/arch/sparc/include/asm/iommu_64.h
@@ -45,8 +45,10 @@ struct atu_ranges {
 struct atu {
struct  atu_ranges  *ranges;
struct  atu_iotsb   *iotsb;
+   struct  iommu_map_table tbl;
u64 base;
u64 size;
+   u64 dma_addr_mask;
 };
 
 struct iommu {
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index a1c4d5e..5404b33 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -643,6 +643,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
struct atu *atu = pbm->iommu->atu;
unsigned long err;
const u64 *ranges;
+   u64 map_size, num_iotte;
+   u64 dma_mask;
const u32 *page_size;
int len;
 
@@ -681,6 +683,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
return err;
}
 
+   /* Create ATU iommu map.
+* One bit represents one iotte in IOTSB table.
+*/
+   dma_mask = (roundup_pow_of_two(atu->size) - 1UL);
+   num_iotte = atu->size / IO_PAGE_SIZE;
+   map_size = num_iotte / 8;
+   atu->tbl.table_map_base = atu->base;
+   atu->dma_addr_mask = dma_mask;
+   atu->tbl.map = kzalloc(map_size, GFP_KERNEL);
+   if (!atu->tbl.map)
+   return -ENOMEM;
+
+   iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT,
+   NULL, false /* no large_pool */,
+   0 /* default npools */,
+   false /* want span boundary checking */);
+
return 0;
 }
 
-- 
1.9.1



[RFC PATCH 6/6] sparc64: Enable 64-bit DMA

2016-10-06 Thread Tushar Dave
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
to use ATU for 64bit DMA.

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/Kconfig| 4 
 arch/sparc/kernel/iommu.c | 8 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 338282d..78e7556 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -95,6 +95,10 @@ config ARCH_ATU
bool
default y if SPARC64
 
+config ARCH_DMA_ADDR_T_64BIT
+   bool
+   default y if ARCH_ATU
+
 config IOMMU_HELPER
bool
default y if SPARC64
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 5c615ab..852a329 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask)
struct iommu *iommu = dev->archdata.iommu;
u64 dma_addr_mask = iommu->dma_addr_mask;
 
-   if (device_mask >= (1UL << 32UL))
-   return 0;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }
 
if ((device_mask & dma_addr_mask) == dma_addr_mask)
return 1;
-- 
1.9.1



[RFC PATCH 3/6] sparc64: Initialize iommu_map_table and iommu_pool

2016-10-06 Thread Tushar Dave
Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
This change initializes iommu_map_table and iommu_pool for ATU.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Reviewed-by: Sowmini Varadhan 
---
 arch/sparc/include/asm/iommu_64.h |  2 ++
 arch/sparc/kernel/pci_sun4v.c | 19 +++
 2 files changed, 21 insertions(+)

diff --git a/arch/sparc/include/asm/iommu_64.h 
b/arch/sparc/include/asm/iommu_64.h
index 93daa59..f24f356 100644
--- a/arch/sparc/include/asm/iommu_64.h
+++ b/arch/sparc/include/asm/iommu_64.h
@@ -45,8 +45,10 @@ struct atu_ranges {
 struct atu {
struct  atu_ranges  *ranges;
struct  atu_iotsb   *iotsb;
+   struct  iommu_map_table tbl;
u64 base;
u64 size;
+   u64 dma_addr_mask;
 };
 
 struct iommu {
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index a1c4d5e..5404b33 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -643,6 +643,8 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
struct atu *atu = pbm->iommu->atu;
unsigned long err;
const u64 *ranges;
+   u64 map_size, num_iotte;
+   u64 dma_mask;
const u32 *page_size;
int len;
 
@@ -681,6 +683,23 @@ static int pci_sun4v_atu_init(struct pci_pbm_info *pbm)
return err;
}
 
+   /* Create ATU iommu map.
+* One bit represents one iotte in IOTSB table.
+*/
+   dma_mask = (roundup_pow_of_two(atu->size) - 1UL);
+   num_iotte = atu->size / IO_PAGE_SIZE;
+   map_size = num_iotte / 8;
+   atu->tbl.table_map_base = atu->base;
+   atu->dma_addr_mask = dma_mask;
+   atu->tbl.map = kzalloc(map_size, GFP_KERNEL);
+   if (!atu->tbl.map)
+   return -ENOMEM;
+
+   iommu_tbl_pool_init(>tbl, num_iotte, IO_PAGE_SHIFT,
+   NULL, false /* no large_pool */,
+   0 /* default npools */,
+   false /* want span boundary checking */);
+
return 0;
 }
 
-- 
1.9.1



[RFC PATCH 6/6] sparc64: Enable 64-bit DMA

2016-10-06 Thread Tushar Dave
ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
to use ATU for 64bit DMA.

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/Kconfig| 4 
 arch/sparc/kernel/iommu.c | 8 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 338282d..78e7556 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -95,6 +95,10 @@ config ARCH_ATU
bool
default y if SPARC64
 
+config ARCH_DMA_ADDR_T_64BIT
+   bool
+   default y if ARCH_ATU
+
 config IOMMU_HELPER
bool
default y if SPARC64
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index 5c615ab..852a329 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -760,8 +760,12 @@ int dma_supported(struct device *dev, u64 device_mask)
struct iommu *iommu = dev->archdata.iommu;
u64 dma_addr_mask = iommu->dma_addr_mask;
 
-   if (device_mask >= (1UL << 32UL))
-   return 0;
+   if (device_mask > DMA_BIT_MASK(32)) {
+   if (iommu->atu)
+   dma_addr_mask = iommu->atu->dma_addr_mask;
+   else
+   return 0;
+   }
 
if ((device_mask & dma_addr_mask) == dma_addr_mask)
return 1;
-- 
1.9.1



[RFC PATCH 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service

2016-10-06 Thread Tushar Dave
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
device has to be bound to IOTSB using HV API pci_iotsb_bind().

Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
Reviewed-by: chris hyser <chris.hy...@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
 arch/sparc/kernel/pci_sun4v.c | 43 +++
 arch/sparc/kernel/pci_sun4v.h |  3 +++
 arch/sparc/kernel/pci_sun4v_asm.S | 14 +
 3 files changed, 60 insertions(+)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 5404b33..c7f3a0b3 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -216,6 +216,43 @@ range_alloc_fail:
return NULL;
 }
 
+unsigned long dma_4v_iotsb_bind(unsigned long devhandle,
+   unsigned long iotsb_num,
+   struct pci_bus *bus_dev)
+{
+   struct pci_dev *pdev;
+   unsigned long err;
+   unsigned int bus;
+   unsigned int device;
+   unsigned int fun;
+
+   list_for_each_entry(pdev, _dev->devices, bus_list) {
+   if (pdev->subordinate) {
+   /* No need to bind pci bridge */
+   dma_4v_iotsb_bind(devhandle, iotsb_num,
+ pdev->subordinate);
+   } else {
+   bus = bus_dev->number;
+   device = PCI_SLOT(pdev->devfn);
+   fun = PCI_FUNC(pdev->devfn);
+   err = pci_sun4v_iotsb_bind(devhandle, iotsb_num,
+  HV_PCI_DEVICE_BUILD(bus,
+  device,
+  fun));
+
+   /* If bind fails for one device it is going to fail
+* for rest of the devices because we are sharing
+* IOTSB. So in case of failure simply return with
+* error.
+*/
+   if (err)
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry,
   unsigned long npages)
 {
@@ -628,6 +665,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info 
*pbm)
}
iotsb->iotsb_num = iotsb_num;
 
+   err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus);
+   if (err) {
+   pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err);
+   goto iotsb_conf_failed;
+   }
+
return 0;
 
 iotsb_conf_failed:
diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h
index 0ef6d1c..1019e0f 100644
--- a/arch/sparc/kernel/pci_sun4v.h
+++ b/arch/sparc/kernel/pci_sun4v.h
@@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle,
   unsigned long page_size,
   unsigned long dvma_base,
   u64 *iotsb_num);
+unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle,
+  unsigned long iotsb_num,
+  unsigned int pci_device);
 #endif /* !(_PCI_SUN4V_H) */
diff --git a/arch/sparc/kernel/pci_sun4v_asm.S 
b/arch/sparc/kernel/pci_sun4v_asm.S
index fd94d0e..22024a9 100644
--- a/arch/sparc/kernel/pci_sun4v_asm.S
+++ b/arch/sparc/kernel/pci_sun4v_asm.S
@@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf)
retl
 stx%o1, [%g1]
 ENDPROC(pci_sun4v_iotsb_conf)
+
+   /*
+* %o0: devhandle
+* %o1: iotsb_num/iotsb_handle
+* %o2: pci_device
+*
+* returns %o0: status
+*/
+ENTRY(pci_sun4v_iotsb_bind)
+   mov HV_FAST_PCI_IOTSB_BIND, %o5
+   ta  HV_FAST_TRAP
+   retl
+nop
+ENDPROC(pci_sun4v_iotsb_bind)
-- 
1.9.1



[RFC PATCH 4/6] sparc64: Bind PCIe devices to use IOMMU v2 service

2016-10-06 Thread Tushar Dave
In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
device has to be bound to IOTSB using HV API pci_iotsb_bind().

Signed-off-by: Tushar Dave 
Reviewed-by: chris hyser 
Acked-by: Sowmini Varadhan 
---
 arch/sparc/kernel/pci_sun4v.c | 43 +++
 arch/sparc/kernel/pci_sun4v.h |  3 +++
 arch/sparc/kernel/pci_sun4v_asm.S | 14 +
 3 files changed, 60 insertions(+)

diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 5404b33..c7f3a0b3 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -216,6 +216,43 @@ range_alloc_fail:
return NULL;
 }
 
+unsigned long dma_4v_iotsb_bind(unsigned long devhandle,
+   unsigned long iotsb_num,
+   struct pci_bus *bus_dev)
+{
+   struct pci_dev *pdev;
+   unsigned long err;
+   unsigned int bus;
+   unsigned int device;
+   unsigned int fun;
+
+   list_for_each_entry(pdev, _dev->devices, bus_list) {
+   if (pdev->subordinate) {
+   /* No need to bind pci bridge */
+   dma_4v_iotsb_bind(devhandle, iotsb_num,
+ pdev->subordinate);
+   } else {
+   bus = bus_dev->number;
+   device = PCI_SLOT(pdev->devfn);
+   fun = PCI_FUNC(pdev->devfn);
+   err = pci_sun4v_iotsb_bind(devhandle, iotsb_num,
+  HV_PCI_DEVICE_BUILD(bus,
+  device,
+  fun));
+
+   /* If bind fails for one device it is going to fail
+* for rest of the devices because we are sharing
+* IOTSB. So in case of failure simply return with
+* error.
+*/
+   if (err)
+   return err;
+   }
+   }
+
+   return 0;
+}
+
 static void dma_4v_iommu_demap(void *demap_arg, unsigned long entry,
   unsigned long npages)
 {
@@ -628,6 +665,12 @@ static int pci_sun4v_atu_alloc_iotsb(struct pci_pbm_info 
*pbm)
}
iotsb->iotsb_num = iotsb_num;
 
+   err = dma_4v_iotsb_bind(pbm->devhandle, iotsb_num, pbm->pci_bus);
+   if (err) {
+   pr_err(PFX "pci_iotsb_bind failed error: %ld\n", err);
+   goto iotsb_conf_failed;
+   }
+
return 0;
 
 iotsb_conf_failed:
diff --git a/arch/sparc/kernel/pci_sun4v.h b/arch/sparc/kernel/pci_sun4v.h
index 0ef6d1c..1019e0f 100644
--- a/arch/sparc/kernel/pci_sun4v.h
+++ b/arch/sparc/kernel/pci_sun4v.h
@@ -96,4 +96,7 @@ unsigned long pci_sun4v_iotsb_conf(unsigned long devhandle,
   unsigned long page_size,
   unsigned long dvma_base,
   u64 *iotsb_num);
+unsigned long pci_sun4v_iotsb_bind(unsigned long devhandle,
+  unsigned long iotsb_num,
+  unsigned int pci_device);
 #endif /* !(_PCI_SUN4V_H) */
diff --git a/arch/sparc/kernel/pci_sun4v_asm.S 
b/arch/sparc/kernel/pci_sun4v_asm.S
index fd94d0e..22024a9 100644
--- a/arch/sparc/kernel/pci_sun4v_asm.S
+++ b/arch/sparc/kernel/pci_sun4v_asm.S
@@ -378,3 +378,17 @@ ENTRY(pci_sun4v_iotsb_conf)
retl
 stx%o1, [%g1]
 ENDPROC(pci_sun4v_iotsb_conf)
+
+   /*
+* %o0: devhandle
+* %o1: iotsb_num/iotsb_handle
+* %o2: pci_device
+*
+* returns %o0: status
+*/
+ENTRY(pci_sun4v_iotsb_bind)
+   mov HV_FAST_PCI_IOTSB_BIND, %o5
+   ta  HV_FAST_TRAP
+   retl
+nop
+ENDPROC(pci_sun4v_iotsb_bind)
-- 
1.9.1



[RFC PATCH 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13

2016-10-06 Thread Tushar Dave
From: Dave Kleikamp <dave.kleik...@oracle.com>

This change allows ATU (new IOMMU) in SPARC systems to request
large (32M) contiguous memory during boot for creating IOTSB backing
store.

Signed-off-by: Dave Kleikamp <dave.kleik...@oracle.com>
Signed-off-by: Tushar Dave <tushar.n.d...@oracle.com>
---
 arch/sparc/Kconfig | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 59b0960..338282d 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -91,6 +91,10 @@ config ARCH_DEFCONFIG
 config ARCH_PROC_KCORE_TEXT
def_bool y
 
+config ARCH_ATU
+   bool
+   default y if SPARC64
+
 config IOMMU_HELPER
bool
default y if SPARC64
@@ -306,6 +310,20 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64
 
+config FORCE_MAX_ZONEORDER
+   int "Maximum zone order"
+   default "13"
+   help
+ The kernel memory allocator divides physically contiguous memory
+ blocks into "zones", where each zone is a power of two number of
+ pages.  This option selects the largest power of two that the kernel
+ keeps in the memory allocator.  If you need to allocate very large
+ blocks of physically contiguous memory, then you may need to
+ increase this value.
+
+ This config option is actually maximum order plus one. For example,
+ a value of 13 means that the largest free memory block is 2^12 pages.
+
 source "mm/Kconfig"
 
 if SPARC64
-- 
1.9.1



[RFC PATCH 1/6] sparc64: Add FORCE_MAX_ZONEORDER and default to 13

2016-10-06 Thread Tushar Dave
From: Dave Kleikamp 

This change allows ATU (new IOMMU) in SPARC systems to request
large (32M) contiguous memory during boot for creating IOTSB backing
store.

Signed-off-by: Dave Kleikamp 
Signed-off-by: Tushar Dave 
---
 arch/sparc/Kconfig | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 59b0960..338282d 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -91,6 +91,10 @@ config ARCH_DEFCONFIG
 config ARCH_PROC_KCORE_TEXT
def_bool y
 
+config ARCH_ATU
+   bool
+   default y if SPARC64
+
 config IOMMU_HELPER
bool
default y if SPARC64
@@ -306,6 +310,20 @@ config ARCH_SPARSEMEM_ENABLE
 config ARCH_SPARSEMEM_DEFAULT
def_bool y if SPARC64
 
+config FORCE_MAX_ZONEORDER
+   int "Maximum zone order"
+   default "13"
+   help
+ The kernel memory allocator divides physically contiguous memory
+ blocks into "zones", where each zone is a power of two number of
+ pages.  This option selects the largest power of two that the kernel
+ keeps in the memory allocator.  If you need to allocate very large
+ blocks of physically contiguous memory, then you may need to
+ increase this value.
+
+ This config option is actually maximum order plus one. For example,
+ a value of 13 means that the largest free memory block is 2^12 pages.
+
 source "mm/Kconfig"
 
 if SPARC64
-- 
1.9.1