from:"Brian King"

[PATCH] powerpc: Enable support for 32 bit MSI-X vectors

2024-01-17 Thread Brian King

Some devices are not capable of addressing 64 bits
via DMA, which includes MSI-X vectors. This allows
us to ensure these devices use MSI-X vectors in
32 bit space.

Signed-off-by: Brian King 
---
 arch/powerpc/platforms/pseries/msi.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 423ee1d5bd94..6dfb55b52d36 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -26,6 +26,7 @@ static int query_token, change_token;
 #define RTAS_CHANGE_MSI_FN 3
 #define RTAS_CHANGE_MSIX_FN4
 #define RTAS_CHANGE_32MSI_FN   5
+#define RTAS_CHANGE_32MSIX_FN  6
 
 /* RTAS Helpers */
 
@@ -41,7 +42,7 @@ static int rtas_change_msi(struct pci_dn *pdn, u32 func, u32 
num_irqs)
seq_num = 1;
do {
if (func == RTAS_CHANGE_MSI_FN || func == RTAS_CHANGE_MSIX_FN ||
-   func == RTAS_CHANGE_32MSI_FN)
+   func == RTAS_CHANGE_32MSI_FN || func == 
RTAS_CHANGE_32MSIX_FN)
rc = rtas_call(change_token, 6, 4, rtas_ret, addr,
BUID_HI(buid), BUID_LO(buid),
func, num_irqs, seq_num);
@@ -406,8 +407,12 @@ static int rtas_prepare_msi_irqs(struct pci_dev *pdev, int 
nvec_in, int type,
 
if (use_32bit_msi_hack && rc > 0)
rtas_hack_32bit_msi_gen2(pdev);
-   } else
-   rc = rtas_change_msi(pdn, RTAS_CHANGE_MSIX_FN, nvec);
+   } else {
+   if (pdev->no_64bit_msi)
+   rc = rtas_change_msi(pdn, RTAS_CHANGE_32MSIX_FN, nvec);
+   else
+   rc = rtas_change_msi(pdn, RTAS_CHANGE_MSIX_FN, nvec);
+   }
 
if (rc != nvec) {
if (nvec != nvec_in) {
-- 
2.39.3

Re: [PATCH] scsi: ipr: Remove obsolete check for old CPUs

2023-11-27 Thread Brian King

Acked-by: Brian King 

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH] powerpc: Fix device node refcounting

2023-02-09 Thread Brian King

On 2/9/23 11:11 AM, Nathan Lynch wrote:
> Brian King  writes:
>> On 2/7/23 9:14 AM, Nathan Lynch wrote:
>>> Brian King  writes:
>>>> While testing fixes to the hvcs hotplug code, kmemleak was reporting
>>>> potential memory leaks. This was tracked down to the struct device_node
>>>> object associated with the hvcs device. Looking at the leaked
>>>> object in crash showed that the kref in the kobject in the device_node
>>>> had a reference count of 1 still, and the release function was never
>>>> getting called as a result of this. This adds an of_node_put in
>>>> pSeries_reconfig_remove_node in order to balance the refcounting
>>>> so that we actually free the device_node in the case of it being
>>>> allocated in pSeries_reconfig_add_node.
>>>
>>> My concern here would be whether the additional put is the right thing
>>> to do in all cases. The questions it raises for me are:
>>>
>>> - Is it safe for nodes that were present at boot, instead of added
>>>   dynamically?
>>
>> Yes. of_node_release has a check to see if OF_DYNAMIC is set. If it is not 
>> set,
>> the release function is a noop.
> 
> Yes, but to be more specific - does the additional of_node_put() risk
> underflowing the refcount on nodes without the OF_DYNAMIC flag? I
> suspect it's OK. If it's not, then I would expect to see warnings from
> the refcount code when that case is exercised.

Agreed. I have not seen any refcount underflow warnings in the testing I've done
so far.

> 
>>
>>> - Is it correct for all types of nodes, or is there something specific
>>>   to hvcs that leaves a dangling refcount?
>>
>> I would welcome more testing and I shared the same concern. I did do some
>> DLPARs of a virtual ethernet device with the change along with 
>> CONFIG_PAGE_POISONING
>> enabled and did not run into any issues. However if I do a DLPAR remove of a 
>> virtual
>> ethernet device without the change with kmemleak enabled it does not detect 
>> any
>> leaked memory.
> 
> Seems odd. If the change is generically correct, then without it applied
> I would expect kmemleak to flag a leak on removal of any type of
> dynamically-added node. On the other hand, if the change is for some
> reason not correct for virtual ethernet devices, then I would expect it
> to cause complaints from the refcount code and/or allocator debug
> facilities. But if I understand correctly, neither of those things is
> happening.

Agreed. I'll do some more testing with and without the change and see
what that yields.

-Brian


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH] powerpc: Fix device node refcounting

2023-02-09 Thread Brian King

On 2/7/23 9:14 AM, Nathan Lynch wrote:
> 
> (cc'ing a few possibly interested people)
> 
> Brian King  writes:
>> While testing fixes to the hvcs hotplug code, kmemleak was reporting
>> potential memory leaks. This was tracked down to the struct device_node
>> object associated with the hvcs device. Looking at the leaked
>> object in crash showed that the kref in the kobject in the device_node
>> had a reference count of 1 still, and the release function was never
>> getting called as a result of this. This adds an of_node_put in
>> pSeries_reconfig_remove_node in order to balance the refcounting
>> so that we actually free the device_node in the case of it being
>> allocated in pSeries_reconfig_add_node.
> 
> My concern here would be whether the additional put is the right thing
> to do in all cases. The questions it raises for me are:
> 
> - Is it safe for nodes that were present at boot, instead of added
>   dynamically?

Yes. of_node_release has a check to see if OF_DYNAMIC is set. If it is not set,
the release function is a noop. 

> - Is it correct for all types of nodes, or is there something specific
>   to hvcs that leaves a dangling refcount?

I would welcome more testing and I shared the same concern. I did do some
DLPARs of a virtual ethernet device with the change along with 
CONFIG_PAGE_POISONING
enabled and did not run into any issues. However if I do a DLPAR remove of a 
virtual
ethernet device without the change with kmemleak enabled it does not detect any
leaked memory.

Thanks,

Brian

> 
> Just hoping we're not stepping into a situation where we're preventing
> leaks in some situations but doing use-after-free in others. :-)
> 
>>
>> Signed-off-by: Brian King 
>> ---
>>  arch/powerpc/platforms/pseries/reconfig.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/reconfig.c 
>> b/arch/powerpc/platforms/pseries/reconfig.c
>> index 599bd2c78514..8cb7309b19a4 100644
>> --- a/arch/powerpc/platforms/pseries/reconfig.c
>> +++ b/arch/powerpc/platforms/pseries/reconfig.c
>> @@ -77,6 +77,7 @@ static int pSeries_reconfig_remove_node(struct device_node 
>> *np)
>>  }
>>  
>>  of_detach_node(np);
>> +of_node_put(np);
>>  of_node_put(parent);
>>  return 0;
> 
> In a situation like this where the of_node_put() call isn't obviously
> connected to one of the of_ iterator APIs or similar, I would prefer a
> comment indicating which "get" it balances. I suppose it corresponds to
> the node initialization itself, i.e. the of_node_init() call sites in
> pSeries_reconfig_add_node() and drivers/of/fdt.c::populate_node().

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

[PATCH v3 5/5] hvcs: Synchronize hotplug remove with port free

2023-02-03 Thread Brian King

Synchronizes hotplug remove with the freeing of the port.
This ensures we have freed all the memory associated with
this port and are not leaking memory.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index ecf24195b1e9..1de1a09bf82d 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -52,6 +52,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -285,6 +286,7 @@ struct hvcs_struct {
char p_location_code[HVCS_CLC_LENGTH + 1]; /* CLC + Null Term */
struct list_head next; /* list management */
struct vio_dev *vdev;
+   struct completion *destroyed;
 };
 
 static LIST_HEAD(hvcs_structs);
@@ -663,11 +665,13 @@ static void hvcs_destruct_port(struct tty_port *p)
 {
struct hvcs_struct *hvcsd = container_of(p, struct hvcs_struct, port);
struct vio_dev *vdev;
+   struct completion *comp;
unsigned long flags;
 
spin_lock(_structs_lock);
spin_lock_irqsave(>lock, flags);
 
+   comp = hvcsd->destroyed;
/* the list_del poisons the pointers */
list_del(&(hvcsd->next));
 
@@ -687,6 +691,7 @@ static void hvcs_destruct_port(struct tty_port *p)
 
hvcsd->p_unit_address = 0;
hvcsd->p_partition_ID = 0;
+   hvcsd->destroyed = NULL;
hvcs_return_index(hvcsd->index);
memset(>p_location_code[0], 0x00, HVCS_CLC_LENGTH + 1);
 
@@ -694,6 +699,8 @@ static void hvcs_destruct_port(struct tty_port *p)
spin_unlock(_structs_lock);
 
kfree(hvcsd);
+   if (comp)
+   complete(comp);
 }
 
 static const struct tty_port_operations hvcs_port_ops = {
@@ -792,6 +799,7 @@ static int hvcs_probe(
 static void hvcs_remove(struct vio_dev *dev)
 {
struct hvcs_struct *hvcsd = dev_get_drvdata(>dev);
+   DECLARE_COMPLETION_ONSTACK(comp);
unsigned long flags;
struct tty_struct *tty;
 
@@ -799,16 +807,11 @@ static void hvcs_remove(struct vio_dev *dev)
 
spin_lock_irqsave(>lock, flags);
 
+   hvcsd->destroyed = 
tty = tty_port_tty_get(>port);
 
spin_unlock_irqrestore(>lock, flags);
 
-   /*
-* Let the last holder of this object cause it to be removed, which
-* would probably be tty_hangup below.
-*/
-   tty_port_put(>port);
-
/*
 * The tty should always be valid at this time unless a
 * simultaneous tty close already cleaned up the hvcs_struct.
@@ -818,6 +821,8 @@ static void hvcs_remove(struct vio_dev *dev)
tty_kref_put(tty);
}
 
+   tty_port_put(>port);
+   wait_for_completion();
printk(KERN_INFO "HVCS: vty-server@%X removed from the"
" vio bus.\n", dev->unit_address);
 };
@@ -1171,7 +1176,10 @@ static void hvcs_close(struct tty_struct *tty, struct 
file *filp)
hvcsd = tty->driver_data;
 
spin_lock_irqsave(>lock, flags);
-   if (--hvcsd->port.count == 0) {
+   if (hvcsd->port.count == 0) {
+   spin_unlock_irqrestore(>lock, flags);
+   return;
+   } else if (--hvcsd->port.count == 0) {
 
vio_disable_interrupts(hvcsd->vdev);
 
@@ -1227,11 +1235,7 @@ static void hvcs_hangup(struct tty_struct * tty)
vio_disable_interrupts(hvcsd->vdev);
 
hvcsd->todo_mask = 0;
-
-   /* I don't think the tty needs the hvcs_struct pointer after a hangup */
-   tty->driver_data = NULL;
hvcsd->port.tty = NULL;
-
hvcsd->port.count = 0;
 
/* This will drop any buffered data on the floor which is OK in a hangup
-- 
2.31.1

[PATCH v3 4/5] hvcs: Use vhangup in hotplug remove

2023-02-03 Thread Brian King

When hotplug removing an hvcs device, we need to ensure the
hangup processing is done prior to exiting the remove function,
so use tty_vhangup to do the hangup processing directly
rather than using tty_hangup which simply schedules the hangup
work for later execution.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 8d40b20de277..ecf24195b1e9 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -810,12 +810,11 @@ static void hvcs_remove(struct vio_dev *dev)
tty_port_put(>port);
 
/*
-* The hangup is a scheduled function which will auto chain call
-* hvcs_hangup.  The tty should always be valid at this time unless a
+* The tty should always be valid at this time unless a
 * simultaneous tty close already cleaned up the hvcs_struct.
 */
if (tty) {
-   tty_hangup(tty);
+   tty_vhangup(tty);
tty_kref_put(tty);
}
 
-- 
2.31.1

[PATCH v3 3/5] hvcs: Get reference to tty in remove

2023-02-03 Thread Brian King

Grab a reference to the tty when removing the hvcs to ensure
it does not get freed unexpectedly.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 522910716025..8d40b20de277 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -799,7 +799,7 @@ static void hvcs_remove(struct vio_dev *dev)
 
spin_lock_irqsave(>lock, flags);
 
-   tty = hvcsd->port.tty;
+   tty = tty_port_tty_get(>port);
 
spin_unlock_irqrestore(>lock, flags);
 
@@ -814,8 +814,10 @@ static void hvcs_remove(struct vio_dev *dev)
 * hvcs_hangup.  The tty should always be valid at this time unless a
 * simultaneous tty close already cleaned up the hvcs_struct.
 */
-   if (tty)
+   if (tty) {
tty_hangup(tty);
+   tty_kref_put(tty);
+   }
 
printk(KERN_INFO "HVCS: vty-server@%X removed from the"
" vio bus.\n", dev->unit_address);
-- 
2.31.1

[PATCH v3 2/5] hvcs: Use driver groups to manage driver attributes

2023-02-03 Thread Brian King

Rather than manually creating attributes for the hvcs driver,
let the driver core do this for us. This also fixes some hotplug
remove issues and ensures that cleanup of these attributes
is done in the right order.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 0416601357e1..522910716025 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -466,6 +466,13 @@ static ssize_t rescan_store(struct device_driver *ddp, 
const char * buf,
 
 static DRIVER_ATTR_RW(rescan);
 
+static struct attribute *hvcs_attrs[] = {
+   _attr_rescan.attr,
+   NULL,
+};
+
+ATTRIBUTE_GROUPS(hvcs);
+
 static void hvcs_kick(void)
 {
hvcs_kicked = 1;
@@ -820,6 +827,7 @@ static struct vio_driver hvcs_vio_driver = {
.remove = hvcs_remove,
.name   = hvcs_driver_name,
.driver = {
+   .groups = hvcs_groups,
.dev_groups = hvcs_dev_groups,
},
 };
@@ -1498,13 +1506,6 @@ static int __init hvcs_module_init(void)
 
pr_info("HVCS: Driver registered.\n");
 
-   /* This needs to be done AFTER the vio_register_driver() call or else
-* the kobjects won't be initialized properly.
-*/
-   rc = driver_create_file(&(hvcs_vio_driver.driver), _attr_rescan);
-   if (rc)
-   pr_warn("HVCS: Failed to create rescan file (err %d)\n", rc);
-
return 0;
 }
 
@@ -1529,8 +1530,6 @@ static void __exit hvcs_module_exit(void)
hvcs_pi_buff = NULL;
spin_unlock(_pi_lock);
 
-   driver_remove_file(_vio_driver.driver, _attr_rescan);
-
tty_unregister_driver(hvcs_tty_driver);
 
hvcs_free_index_list();
-- 
2.31.1

[PATCH v3 1/5] hvcs: Use dev_groups to manage hvcs device attributes

2023-02-03 Thread Brian King

Use the dev_groups functionality to manage the attribute groups
for hvcs devices. This simplifies the code and also eliminates
errors coming from kernfs when attempting to remove a console
device that is in use.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index faf5ccfc561e..0416601357e1 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -432,7 +432,7 @@ static ssize_t hvcs_index_show(struct device *dev, struct 
device_attribute *attr
 
 static DEVICE_ATTR(index, S_IRUGO, hvcs_index_show, NULL);
 
-static struct attribute *hvcs_attrs[] = {
+static struct attribute *hvcs_dev_attrs[] = {
_attr_partner_vtys.attr,
_attr_partner_clcs.attr,
_attr_current_vty.attr,
@@ -441,9 +441,7 @@ static struct attribute *hvcs_attrs[] = {
NULL,
 };
 
-static struct attribute_group hvcs_attr_group = {
-   .attrs = hvcs_attrs,
-};
+ATTRIBUTE_GROUPS(hvcs_dev);
 
 static ssize_t rescan_show(struct device_driver *ddp, char *buf)
 {
@@ -688,8 +686,6 @@ static void hvcs_destruct_port(struct tty_port *p)
spin_unlock_irqrestore(>lock, flags);
spin_unlock(_structs_lock);
 
-   sysfs_remove_group(>dev.kobj, _attr_group);
-
kfree(hvcsd);
 }
 
@@ -721,7 +717,6 @@ static int hvcs_probe(
 {
struct hvcs_struct *hvcsd;
int index, rc;
-   int retval;
 
if (!dev || !id) {
printk(KERN_ERR "HVCS: probed with invalid parameter.\n");
@@ -778,13 +773,6 @@ static int hvcs_probe(
list_add_tail(&(hvcsd->next), _structs);
spin_unlock(_structs_lock);
 
-   retval = sysfs_create_group(>dev.kobj, _attr_group);
-   if (retval) {
-   printk(KERN_ERR "HVCS: Can't create sysfs attrs for 
vty-server@%X\n",
-  hvcsd->vdev->unit_address);
-   return retval;
-   }
-
printk(KERN_INFO "HVCS: vty-server@%X added to the vio bus.\n", 
dev->unit_address);
 
/*
@@ -831,6 +819,9 @@ static struct vio_driver hvcs_vio_driver = {
.probe  = hvcs_probe,
.remove = hvcs_remove,
.name   = hvcs_driver_name,
+   .driver = {
+   .dev_groups = hvcs_dev_groups,
+   },
 };
 
 /* Only called from hvcs_get_pi please */
-- 
2.31.1

[PATCH v3 0/5] hvcs: Various hvcs device hotplug fixes

2023-02-03 Thread Brian King

This patch series fixes a number of issues with hotplugging
hvcs devices including memory leaks as well as, the inability
to reconnect to a console device after it has been hot added
back, since it was not getting cleaned up properly on the
hotplug remove path.

Changes since version 2:
- Change attribute groups to use ATTRIBUTE_GROUPS macros

Changes since initial version:
- Change to use driver default groups to manage attribute lifecycle


Brian King (5):
  hvcs: Use dev_groups to manage hvcs device attributes
  hvcs: Use driver groups to manage driver attributes
  hvcs: Get reference to tty in remove
  hvcs: Use vhangup in hotplug remove
  hvcs: Synchronize hotplug remove with port free

 drivers/tty/hvc/hvcs.c | 73 --
 1 file changed, 34 insertions(+), 39 deletions(-)

-- 
2.31.1

[PATCH v2 6/6] hvcs: Synchronize hotplug remove with port free

2023-02-02 Thread Brian King

Synchronizes hotplug remove with the freeing of the port.
This ensures we have freed all the memory associated with
this port and are not leaking memory.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 2e9e45f06916..c360965e9c1b 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -52,6 +52,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -285,6 +286,7 @@ struct hvcs_struct {
char p_location_code[HVCS_CLC_LENGTH + 1]; /* CLC + Null Term */
struct list_head next; /* list management */
struct vio_dev *vdev;
+   struct completion *destroyed;
 };
 
 static LIST_HEAD(hvcs_structs);
@@ -677,11 +679,13 @@ static void hvcs_destruct_port(struct tty_port *p)
 {
struct hvcs_struct *hvcsd = container_of(p, struct hvcs_struct, port);
struct vio_dev *vdev;
+   struct completion *comp;
unsigned long flags;
 
spin_lock(_structs_lock);
spin_lock_irqsave(>lock, flags);
 
+   comp = hvcsd->destroyed;
/* the list_del poisons the pointers */
list_del(&(hvcsd->next));
 
@@ -701,6 +705,7 @@ static void hvcs_destruct_port(struct tty_port *p)
 
hvcsd->p_unit_address = 0;
hvcsd->p_partition_ID = 0;
+   hvcsd->destroyed = NULL;
hvcs_return_index(hvcsd->index);
memset(>p_location_code[0], 0x00, HVCS_CLC_LENGTH + 1);
 
@@ -708,6 +713,8 @@ static void hvcs_destruct_port(struct tty_port *p)
spin_unlock(_structs_lock);
 
kfree(hvcsd);
+   if (comp)
+   complete(comp);
 }
 
 static const struct tty_port_operations hvcs_port_ops = {
@@ -806,6 +813,7 @@ static int hvcs_probe(
 static void hvcs_remove(struct vio_dev *dev)
 {
struct hvcs_struct *hvcsd = dev_get_drvdata(>dev);
+   DECLARE_COMPLETION_ONSTACK(comp);
unsigned long flags;
struct tty_struct *tty;
 
@@ -813,16 +821,11 @@ static void hvcs_remove(struct vio_dev *dev)
 
spin_lock_irqsave(>lock, flags);
 
+   hvcsd->destroyed = 
tty = tty_port_tty_get(>port);
 
spin_unlock_irqrestore(>lock, flags);
 
-   /*
-* Let the last holder of this object cause it to be removed, which
-* would probably be tty_hangup below.
-*/
-   tty_port_put(>port);
-
/*
 * The tty should always be valid at this time unless a
 * simultaneous tty close already cleaned up the hvcs_struct.
@@ -832,6 +835,8 @@ static void hvcs_remove(struct vio_dev *dev)
tty_kref_put(tty);
}
 
+   tty_port_put(>port);
+   wait_for_completion();
printk(KERN_INFO "HVCS: vty-server@%X removed from the"
" vio bus.\n", dev->unit_address);
 };
@@ -1185,7 +1190,10 @@ static void hvcs_close(struct tty_struct *tty, struct 
file *filp)
hvcsd = tty->driver_data;
 
spin_lock_irqsave(>lock, flags);
-   if (--hvcsd->port.count == 0) {
+   if (hvcsd->port.count == 0) {
+   spin_unlock_irqrestore(>lock, flags);
+   return;
+   } else if (--hvcsd->port.count == 0) {
 
vio_disable_interrupts(hvcsd->vdev);
 
@@ -1241,11 +1249,7 @@ static void hvcs_hangup(struct tty_struct * tty)
vio_disable_interrupts(hvcsd->vdev);
 
hvcsd->todo_mask = 0;
-
-   /* I don't think the tty needs the hvcs_struct pointer after a hangup */
-   tty->driver_data = NULL;
hvcsd->port.tty = NULL;
-
hvcsd->port.count = 0;
 
/* This will drop any buffered data on the floor which is OK in a hangup
-- 
2.31.1

[PATCH v2 5/6] hvcs: Use vhangup in hotplug remove

2023-02-02 Thread Brian King

When hotplug removing an hvcs device, we need to ensure the
hangup processing is done prior to exiting the remove function,
so use tty_vhangup to do the hangup processing directly
rather than using tty_hangup which simply schedules the hangup
work for later execution.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 04c58ac4fec2..2e9e45f06916 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -824,12 +824,11 @@ static void hvcs_remove(struct vio_dev *dev)
tty_port_put(>port);
 
/*
-* The hangup is a scheduled function which will auto chain call
-* hvcs_hangup.  The tty should always be valid at this time unless a
+* The tty should always be valid at this time unless a
 * simultaneous tty close already cleaned up the hvcs_struct.
 */
if (tty) {
-   tty_hangup(tty);
+   tty_vhangup(tty);
tty_kref_put(tty);
}
 
-- 
2.31.1

[PATCH v2 4/6] hvcs: Get reference to tty in remove

2023-02-02 Thread Brian King

Grab a reference to the tty when removing the hvcs to ensure
it does not get freed unexpectedly.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 5de7ad40..04c58ac4fec2 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -813,7 +813,7 @@ static void hvcs_remove(struct vio_dev *dev)
 
spin_lock_irqsave(>lock, flags);
 
-   tty = hvcsd->port.tty;
+   tty = tty_port_tty_get(>port);
 
spin_unlock_irqrestore(>lock, flags);
 
@@ -828,8 +828,10 @@ static void hvcs_remove(struct vio_dev *dev)
 * hvcs_hangup.  The tty should always be valid at this time unless a
 * simultaneous tty close already cleaned up the hvcs_struct.
 */
-   if (tty)
+   if (tty) {
tty_hangup(tty);
+   tty_kref_put(tty);
+   }
 
printk(KERN_INFO "HVCS: vty-server@%X removed from the"
" vio bus.\n", dev->unit_address);
-- 
2.31.1

[PATCH v2 3/6] hvcs: Use driver groups to manage driver attributes

2023-02-02 Thread Brian King

Rather than manually creating attributes for the hvcs driver,
let the driver core do this for us. This also fixes some hotplug
remove issues and ensures that cleanup of these attributes
is done in the right order.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 7f79444b4d89..5de7ad40 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -473,6 +473,20 @@ static ssize_t rescan_store(struct device_driver *ddp, 
const char * buf,
 
 static DRIVER_ATTR_RW(rescan);
 
+static struct attribute *hvcs_attrs[] = {
+   _attr_rescan.attr,
+   NULL,
+};
+
+static struct attribute_group hvcs_attr_group = {
+   .attrs = hvcs_attrs,
+};
+
+const static struct attribute_group *hvcs_attr_groups[] = {
+   _attr_group,
+   NULL,
+};
+
 static void hvcs_kick(void)
 {
hvcs_kicked = 1;
@@ -827,6 +841,7 @@ static struct vio_driver hvcs_vio_driver = {
.remove = hvcs_remove,
.name   = hvcs_driver_name,
.driver = {
+   .groups = hvcs_attr_groups,
.dev_groups = hvcs_attr_dev_groups,
},
 };
@@ -1505,13 +1520,6 @@ static int __init hvcs_module_init(void)
 
pr_info("HVCS: Driver registered.\n");
 
-   /* This needs to be done AFTER the vio_register_driver() call or else
-* the kobjects won't be initialized properly.
-*/
-   rc = driver_create_file(&(hvcs_vio_driver.driver), _attr_rescan);
-   if (rc)
-   pr_warn("HVCS: Failed to create rescan file (err %d)\n", rc);
-
return 0;
 }
 
@@ -1536,8 +1544,6 @@ static void __exit hvcs_module_exit(void)
hvcs_pi_buff = NULL;
spin_unlock(_pi_lock);
 
-   driver_remove_file(_vio_driver.driver, _attr_rescan);
-
tty_unregister_driver(hvcs_tty_driver);
 
hvcs_free_index_list();
-- 
2.31.1

[PATCH v2 2/6] hvcs: Use dev_groups to manage hvcs device attributes

2023-02-02 Thread Brian King

Use the dev_groups functionality to manage the attribute groups
for hvcs devices. This simplifies the code and also eliminates
errors coming from kernfs when attempting to remove a console
device that is in use.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index faf5ccfc561e..7f79444b4d89 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -432,7 +432,7 @@ static ssize_t hvcs_index_show(struct device *dev, struct 
device_attribute *attr
 
 static DEVICE_ATTR(index, S_IRUGO, hvcs_index_show, NULL);
 
-static struct attribute *hvcs_attrs[] = {
+static struct attribute *hvcs_dev_attrs[] = {
_attr_partner_vtys.attr,
_attr_partner_clcs.attr,
_attr_current_vty.attr,
@@ -441,8 +441,13 @@ static struct attribute *hvcs_attrs[] = {
NULL,
 };
 
-static struct attribute_group hvcs_attr_group = {
-   .attrs = hvcs_attrs,
+static struct attribute_group hvcs_attr_dev_group = {
+   .attrs = hvcs_dev_attrs,
+};
+
+const static struct attribute_group *hvcs_attr_dev_groups[] = {
+   _attr_dev_group,
+   NULL,
 };
 
 static ssize_t rescan_show(struct device_driver *ddp, char *buf)
@@ -688,8 +693,6 @@ static void hvcs_destruct_port(struct tty_port *p)
spin_unlock_irqrestore(>lock, flags);
spin_unlock(_structs_lock);
 
-   sysfs_remove_group(>dev.kobj, _attr_group);
-
kfree(hvcsd);
 }
 
@@ -721,7 +724,6 @@ static int hvcs_probe(
 {
struct hvcs_struct *hvcsd;
int index, rc;
-   int retval;
 
if (!dev || !id) {
printk(KERN_ERR "HVCS: probed with invalid parameter.\n");
@@ -778,13 +780,6 @@ static int hvcs_probe(
list_add_tail(&(hvcsd->next), _structs);
spin_unlock(_structs_lock);
 
-   retval = sysfs_create_group(>dev.kobj, _attr_group);
-   if (retval) {
-   printk(KERN_ERR "HVCS: Can't create sysfs attrs for 
vty-server@%X\n",
-  hvcsd->vdev->unit_address);
-   return retval;
-   }
-
printk(KERN_INFO "HVCS: vty-server@%X added to the vio bus.\n", 
dev->unit_address);
 
/*
@@ -831,6 +826,9 @@ static struct vio_driver hvcs_vio_driver = {
.probe  = hvcs_probe,
.remove = hvcs_remove,
.name   = hvcs_driver_name,
+   .driver = {
+   .dev_groups = hvcs_attr_dev_groups,
+   },
 };
 
 /* Only called from hvcs_get_pi please */
-- 
2.31.1

[PATCH v2 1/6] hvcs: Fix hvcs port reference counting

2023-02-02 Thread Brian King

The hvcs driver only ever gets two references to the port. One
at initialization time, and one at install time. Remove the code
that was trying to do multiple port puts for each open, which
would result in more puts than gets.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 18 --
 1 file changed, 18 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 4ba24963685e..faf5ccfc561e 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -1215,12 +1215,9 @@ static void hvcs_hangup(struct tty_struct * tty)
 {
struct hvcs_struct *hvcsd = tty->driver_data;
unsigned long flags;
-   int temp_open_count;
int irq;
 
spin_lock_irqsave(>lock, flags);
-   /* Preserve this so that we know how many kref refs to put */
-   temp_open_count = hvcsd->port.count;
 
/*
 * Don't kref put inside the spinlock because the destruction
@@ -1247,21 +1244,6 @@ static void hvcs_hangup(struct tty_struct * tty)
spin_unlock_irqrestore(>lock, flags);
 
free_irq(irq, hvcsd);
-
-   /*
-* We need to kref_put() for every open_count we have since the
-* tty_hangup() function doesn't invoke a close per open connection on a
-* non-console device.
-*/
-   while(temp_open_count) {
-   --temp_open_count;
-   /*
-* The final put will trigger destruction of the hvcs_struct.
-* NOTE:  If this hangup was signaled from user space then the
-* final put will never happen.
-*/
-   tty_port_put(>port);
-   }
 }
 
 /*
-- 
2.31.1

[PATCH v2 0/6] hvcs: Various hvcs device hotplug fixes

2023-02-02 Thread Brian King

This patch series fixes a number of issues with hotplugging
hvcs devices including memory leaks as well as, the inability
to reconnect to a console device after it has been hot added
back, since it was not getting cleaned up properly on the
hotplug remove path.

Changes since initial version:
- Change to use driver default groups to manage attribute lifecycle

Brian King (6):
  hvcs: Fix hvcs port reference counting
  hvcs: Use dev_groups to manage hvcs device attributes
  hvcs: Use driver groups to manage driver attributes
  hvcs: Get reference to tty in remove
  hvcs: Use vhangup in hotplug remove
  hvcs: Synchronize hotplug remove with port free

 drivers/tty/hvc/hvcs.c | 103 +++--
 1 file changed, 47 insertions(+), 56 deletions(-)

-- 
2.31.1

[PATCH] powerpc: Fix device node refcounting

2023-02-01 Thread Brian King

While testing fixes to the hvcs hotplug code, kmemleak was reporting
potential memory leaks. This was tracked down to the struct device_node
object associated with the hvcs device. Looking at the leaked
object in crash showed that the kref in the kobject in the device_node
had a reference count of 1 still, and the release function was never
getting called as a result of this. This adds an of_node_put in
pSeries_reconfig_remove_node in order to balance the refcounting
so that we actually free the device_node in the case of it being
allocated in pSeries_reconfig_add_node.

Signed-off-by: Brian King 
---
 arch/powerpc/platforms/pseries/reconfig.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/pseries/reconfig.c 
b/arch/powerpc/platforms/pseries/reconfig.c
index 599bd2c78514..8cb7309b19a4 100644
--- a/arch/powerpc/platforms/pseries/reconfig.c
+++ b/arch/powerpc/platforms/pseries/reconfig.c
@@ -77,6 +77,7 @@ static int pSeries_reconfig_remove_node(struct device_node 
*np)
}
 
of_detach_node(np);
+   of_node_put(np);
of_node_put(parent);
return 0;
 }
-- 
2.31.1

[PATCH 6/6] hvcs: Synchronize hotplug remove with port free

2023-02-01 Thread Brian King

Synchronizes hotplug remove with the freeing of the port.
This ensures we have freed all the memory associated with
this port and are not leaking memory.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 24541fc53625..2b038d4b3a63 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -52,6 +52,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -285,6 +286,7 @@ struct hvcs_struct {
char p_location_code[HVCS_CLC_LENGTH + 1]; /* CLC + Null Term */
struct list_head next; /* list management */
struct vio_dev *vdev;
+   struct completion *destroyed;
 };
 
 static LIST_HEAD(hvcs_structs);
@@ -658,11 +660,13 @@ static void hvcs_destruct_port(struct tty_port *p)
 {
struct hvcs_struct *hvcsd = container_of(p, struct hvcs_struct, port);
struct vio_dev *vdev;
+   struct completion *comp;
unsigned long flags;
 
spin_lock(_structs_lock);
spin_lock_irqsave(>lock, flags);
 
+   comp = hvcsd->destroyed;
/* the list_del poisons the pointers */
list_del(&(hvcsd->next));
 
@@ -682,6 +686,7 @@ static void hvcs_destruct_port(struct tty_port *p)
 
hvcsd->p_unit_address = 0;
hvcsd->p_partition_ID = 0;
+   hvcsd->destroyed = NULL;
hvcs_return_index(hvcsd->index);
memset(>p_location_code[0], 0x00, HVCS_CLC_LENGTH + 1);
 
@@ -689,6 +694,8 @@ static void hvcs_destruct_port(struct tty_port *p)
spin_unlock(_structs_lock);
 
kfree(hvcsd);
+   if (comp)
+   complete(comp);
 }
 
 static const struct tty_port_operations hvcs_port_ops = {
@@ -795,6 +802,7 @@ static int hvcs_probe(
 static void hvcs_remove(struct vio_dev *dev)
 {
struct hvcs_struct *hvcsd = dev_get_drvdata(>dev);
+   DECLARE_COMPLETION_ONSTACK(comp);
unsigned long flags;
struct tty_struct *tty;
 
@@ -802,16 +810,11 @@ static void hvcs_remove(struct vio_dev *dev)
 
spin_lock_irqsave(>lock, flags);
 
+   hvcsd->destroyed = 
tty = tty_port_tty_get(>port);
 
spin_unlock_irqrestore(>lock, flags);
 
-   /*
-* Let the last holder of this object cause it to be removed, which
-* would probably be tty_hangup below.
-*/
-   tty_port_put(>port);
-
sysfs_remove_group(>dev.kobj, _attr_group);
 
/*
@@ -823,6 +826,8 @@ static void hvcs_remove(struct vio_dev *dev)
tty_kref_put(tty);
}
 
+   tty_port_put(>port);
+   wait_for_completion();
printk(KERN_INFO "HVCS: vty-server@%X removed from the"
" vio bus.\n", dev->unit_address);
 };
@@ -1172,7 +1177,10 @@ static void hvcs_close(struct tty_struct *tty, struct 
file *filp)
hvcsd = tty->driver_data;
 
spin_lock_irqsave(>lock, flags);
-   if (--hvcsd->port.count == 0) {
+   if (hvcsd->port.count == 0) {
+   spin_unlock_irqrestore(>lock, flags);
+   return;
+   } else if (--hvcsd->port.count == 0) {
 
vio_disable_interrupts(hvcsd->vdev);
 
@@ -1228,11 +1236,7 @@ static void hvcs_hangup(struct tty_struct * tty)
vio_disable_interrupts(hvcsd->vdev);
 
hvcsd->todo_mask = 0;
-
-   /* I don't think the tty needs the hvcs_struct pointer after a hangup */
-   tty->driver_data = NULL;
hvcsd->port.tty = NULL;
-
hvcsd->port.count = 0;
 
/* This will drop any buffered data on the floor which is OK in a hangup
-- 
2.31.1

[PATCH 5/6] hvcs: Use vhangup in hotplug remove

2023-02-01 Thread Brian King

When hotplug removing an hvcs device, we need to ensure the
hangup processing is done prior to exiting the remove function,
so use tty_vhangup to do the hangup processing directly
rather than using tty_hangup which simply schedules the hangup
work for later execution.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index b28ddfc46e42..24541fc53625 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -815,12 +815,11 @@ static void hvcs_remove(struct vio_dev *dev)
sysfs_remove_group(>dev.kobj, _attr_group);
 
/*
-* The hangup is a scheduled function which will auto chain call
-* hvcs_hangup.  The tty should always be valid at this time unless a
+* The tty should always be valid at this time unless a
 * simultaneous tty close already cleaned up the hvcs_struct.
 */
if (tty) {
-   tty_hangup(tty);
+   tty_vhangup(tty);
tty_kref_put(tty);
}
 
-- 
2.31.1

[PATCH 3/6] hvcs: Remove sysfs group earlier

2023-02-01 Thread Brian King

Cleanup the sysfs group earlier in remove. This eliminates
errors coming from kernfs when attempting to remove a console
device that is in use.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 9131dcb2e8d8..9c5887d0c882 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -688,8 +688,6 @@ static void hvcs_destruct_port(struct tty_port *p)
spin_unlock_irqrestore(>lock, flags);
spin_unlock(_structs_lock);
 
-   sysfs_remove_group(>dev.kobj, _attr_group);
-
kfree(hvcsd);
 }
 
@@ -814,6 +812,8 @@ static void hvcs_remove(struct vio_dev *dev)
 */
tty_port_put(>port);
 
+   sysfs_remove_group(>dev.kobj, _attr_group);
+
/*
 * The hangup is a scheduled function which will auto chain call
 * hvcs_hangup.  The tty should always be valid at this time unless a
-- 
2.31.1

[PATCH 1/6] hvcs: Fix hvcs port reference counting

2023-02-01 Thread Brian King

The hvcs driver only ever gets two references to the port. One
at initialization time, and one at install time. Remove the code
that was trying to do multiple port puts for each open, which
would result in more puts than gets.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 18 --
 1 file changed, 18 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 4ba24963685e..faf5ccfc561e 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -1215,12 +1215,9 @@ static void hvcs_hangup(struct tty_struct * tty)
 {
struct hvcs_struct *hvcsd = tty->driver_data;
unsigned long flags;
-   int temp_open_count;
int irq;
 
spin_lock_irqsave(>lock, flags);
-   /* Preserve this so that we know how many kref refs to put */
-   temp_open_count = hvcsd->port.count;
 
/*
 * Don't kref put inside the spinlock because the destruction
@@ -1247,21 +1244,6 @@ static void hvcs_hangup(struct tty_struct * tty)
spin_unlock_irqrestore(>lock, flags);
 
free_irq(irq, hvcsd);
-
-   /*
-* We need to kref_put() for every open_count we have since the
-* tty_hangup() function doesn't invoke a close per open connection on a
-* non-console device.
-*/
-   while(temp_open_count) {
-   --temp_open_count;
-   /*
-* The final put will trigger destruction of the hvcs_struct.
-* NOTE:  If this hangup was signaled from user space then the
-* final put will never happen.
-*/
-   tty_port_put(>port);
-   }
 }
 
 /*
-- 
2.31.1

[PATCH 4/6] hvcs: Get reference to tty in remove

2023-02-01 Thread Brian King

Grab a reference to the tty when removing the hvcs to ensure
it does not get freed unexpectedly.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 9c5887d0c882..b28ddfc46e42 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -802,7 +802,7 @@ static void hvcs_remove(struct vio_dev *dev)
 
spin_lock_irqsave(>lock, flags);
 
-   tty = hvcsd->port.tty;
+   tty = tty_port_tty_get(>port);
 
spin_unlock_irqrestore(>lock, flags);
 
@@ -819,8 +819,10 @@ static void hvcs_remove(struct vio_dev *dev)
 * hvcs_hangup.  The tty should always be valid at this time unless a
 * simultaneous tty close already cleaned up the hvcs_struct.
 */
-   if (tty)
+   if (tty) {
tty_hangup(tty);
+   tty_kref_put(tty);
+   }
 
printk(KERN_INFO "HVCS: vty-server@%X removed from the"
" vio bus.\n", dev->unit_address);
-- 
2.31.1

[PATCH 2/6] hvcs: Remove sysfs file prior to vio unregister

2023-02-01 Thread Brian King

This moves the removal of the rescan sysfs attribute to occur
before the call to unregister the vio to ensure the removal
does not fail due to the vio driver already being freed.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index faf5ccfc561e..9131dcb2e8d8 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -1519,6 +1519,8 @@ static int __init hvcs_module_init(void)
 
 static void __exit hvcs_module_exit(void)
 {
+   driver_remove_file(_vio_driver.driver, _attr_rescan);
+
/*
 * This driver receives hvcs_remove callbacks for each device upon
 * module removal.
@@ -1538,8 +1540,6 @@ static void __exit hvcs_module_exit(void)
hvcs_pi_buff = NULL;
spin_unlock(_pi_lock);
 
-   driver_remove_file(_vio_driver.driver, _attr_rescan);
-
tty_unregister_driver(hvcs_tty_driver);
 
hvcs_free_index_list();
-- 
2.31.1

[PATCH 0/6] hvcs: Various hvcs device hotplug fixes

2023-02-01 Thread Brian King

This patch series fixes a number of issues with hotplugging
hvcs devices including memory leaks as well as, the inability
to reconnect to a console device after it has been hot added
back, since it was not getting cleaned up properly on the
hotplug remove path.

Brian King (6):
  hvcs: Fix hvcs port reference counting
  hvcs: Remove sysfs file prior to vio unregister
  hvcs: Remove sysfs group earlier
  hvcs: Get reference to tty in remove
  hvcs: Use vhangup in hotplug remove
  hvcs: Synchronize hotplug remove with port free

 drivers/tty/hvc/hvcs.c | 61 +-
 1 file changed, 24 insertions(+), 37 deletions(-)

-- 
2.31.1

Re: [PATCH 0/7] hvcs: Various hvcs device hotplug fixes

2023-02-01 Thread Brian King

On 2/1/23 4:32 AM, Christophe Leroy wrote:
> 
> 
> Le 30/01/2023 à 23:43, Brian King a écrit :
>> This patch series fixes a number of issues with hotplugging
>> hvcs devices including memory leaks as well as, the inability
>> to reconnect to a console device after it has been hot added
>> back, since it was not getting cleaned up properly on the
>> hotplug remove path.
>>
>> Brian King (7):
>>hvcs: Fix hvcs port reference counting
>>hvcs: Remove sysfs file prior to vio unregister
>>hvcs: Remove sysfs group earlier
>>hvcs: Get reference to tty in remove
>>hvcs: Use vhangup in hotplug remove
>>hvcs: Synchronize hotplug remove with port free
>>powerpc: Fix device node refcounting
>>
>>   arch/powerpc/platforms/pseries/reconfig.c |  1 +
>>   drivers/tty/hvc/hvcs.c| 61 +--
>>   2 files changed, 25 insertions(+), 37 deletions(-)
>>
> 
> As far as I can see, most recent patches in drivers/tty/hvc/ are taken 
> by Greg Kroah-Hartman  so I'd suggest you 
> send you series to Greg as he maintains the tty tree.

Sure. I just followed what was in the maintainers file, but I can resend
and cc Greg and linux-serial if that makes sense.


> By the way, does last patch require the 6 previous ones or can it be 
> applied independently ?

Yes. It is independent, so I'll pull this out of the series and can send
it separately to this list only.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

[PATCH 5/7] hvcs: Use vhangup in hotplug remove

2023-01-30 Thread Brian King

When hotplug removing an hvcs device, we need to ensure the
hangup processing is done prior to exiting the remove function,
so use tty_vhangup to do the hangup processing directly
rather than using tty_hangup which simply schedules the hangup
work for later execution.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index b28ddfc46e42..24541fc53625 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -815,12 +815,11 @@ static void hvcs_remove(struct vio_dev *dev)
sysfs_remove_group(>dev.kobj, _attr_group);
 
/*
-* The hangup is a scheduled function which will auto chain call
-* hvcs_hangup.  The tty should always be valid at this time unless a
+* The tty should always be valid at this time unless a
 * simultaneous tty close already cleaned up the hvcs_struct.
 */
if (tty) {
-   tty_hangup(tty);
+   tty_vhangup(tty);
tty_kref_put(tty);
}
 
-- 
2.31.1

[PATCH 6/7] hvcs: Synchronize hotplug remove with port free

2023-01-30 Thread Brian King

Synchronizes hotplug remove with the freeing of the port.
This ensures we have freed all the memory associated with
this port and are not leaking memory.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 26 +++---
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 24541fc53625..2b038d4b3a63 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -52,6 +52,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -285,6 +286,7 @@ struct hvcs_struct {
char p_location_code[HVCS_CLC_LENGTH + 1]; /* CLC + Null Term */
struct list_head next; /* list management */
struct vio_dev *vdev;
+   struct completion *destroyed;
 };
 
 static LIST_HEAD(hvcs_structs);
@@ -658,11 +660,13 @@ static void hvcs_destruct_port(struct tty_port *p)
 {
struct hvcs_struct *hvcsd = container_of(p, struct hvcs_struct, port);
struct vio_dev *vdev;
+   struct completion *comp;
unsigned long flags;
 
spin_lock(_structs_lock);
spin_lock_irqsave(>lock, flags);
 
+   comp = hvcsd->destroyed;
/* the list_del poisons the pointers */
list_del(&(hvcsd->next));
 
@@ -682,6 +686,7 @@ static void hvcs_destruct_port(struct tty_port *p)
 
hvcsd->p_unit_address = 0;
hvcsd->p_partition_ID = 0;
+   hvcsd->destroyed = NULL;
hvcs_return_index(hvcsd->index);
memset(>p_location_code[0], 0x00, HVCS_CLC_LENGTH + 1);
 
@@ -689,6 +694,8 @@ static void hvcs_destruct_port(struct tty_port *p)
spin_unlock(_structs_lock);
 
kfree(hvcsd);
+   if (comp)
+   complete(comp);
 }
 
 static const struct tty_port_operations hvcs_port_ops = {
@@ -795,6 +802,7 @@ static int hvcs_probe(
 static void hvcs_remove(struct vio_dev *dev)
 {
struct hvcs_struct *hvcsd = dev_get_drvdata(>dev);
+   DECLARE_COMPLETION_ONSTACK(comp);
unsigned long flags;
struct tty_struct *tty;
 
@@ -802,16 +810,11 @@ static void hvcs_remove(struct vio_dev *dev)
 
spin_lock_irqsave(>lock, flags);
 
+   hvcsd->destroyed = 
tty = tty_port_tty_get(>port);
 
spin_unlock_irqrestore(>lock, flags);
 
-   /*
-* Let the last holder of this object cause it to be removed, which
-* would probably be tty_hangup below.
-*/
-   tty_port_put(>port);
-
sysfs_remove_group(>dev.kobj, _attr_group);
 
/*
@@ -823,6 +826,8 @@ static void hvcs_remove(struct vio_dev *dev)
tty_kref_put(tty);
}
 
+   tty_port_put(>port);
+   wait_for_completion();
printk(KERN_INFO "HVCS: vty-server@%X removed from the"
" vio bus.\n", dev->unit_address);
 };
@@ -1172,7 +1177,10 @@ static void hvcs_close(struct tty_struct *tty, struct 
file *filp)
hvcsd = tty->driver_data;
 
spin_lock_irqsave(>lock, flags);
-   if (--hvcsd->port.count == 0) {
+   if (hvcsd->port.count == 0) {
+   spin_unlock_irqrestore(>lock, flags);
+   return;
+   } else if (--hvcsd->port.count == 0) {
 
vio_disable_interrupts(hvcsd->vdev);
 
@@ -1228,11 +1236,7 @@ static void hvcs_hangup(struct tty_struct * tty)
vio_disable_interrupts(hvcsd->vdev);
 
hvcsd->todo_mask = 0;
-
-   /* I don't think the tty needs the hvcs_struct pointer after a hangup */
-   tty->driver_data = NULL;
hvcsd->port.tty = NULL;
-
hvcsd->port.count = 0;
 
/* This will drop any buffered data on the floor which is OK in a hangup
-- 
2.31.1

[PATCH 7/7] powerpc: Fix device node refcounting

2023-01-30 Thread Brian King

While testing fixes to the hvcs hotplug code, kmemleak was reporting
potential memory leaks. This was tracked down to the struct device_node
object associated with the hvcs device. Looking at the leaked
object in crash showed that the kref in the kobject in the device_node
had a reference count of 1 still, and the release function was never
getting called as a result of this. This adds an of_node_put in
pSeries_reconfig_remove_node in order to balance the refcounting
so that we actually free the device_node in the case of it being
allocated in pSeries_reconfig_add_node.

Signed-off-by: Brian King 
---
 arch/powerpc/platforms/pseries/reconfig.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/pseries/reconfig.c 
b/arch/powerpc/platforms/pseries/reconfig.c
index 599bd2c78514..8cb7309b19a4 100644
--- a/arch/powerpc/platforms/pseries/reconfig.c
+++ b/arch/powerpc/platforms/pseries/reconfig.c
@@ -77,6 +77,7 @@ static int pSeries_reconfig_remove_node(struct device_node 
*np)
}
 
of_detach_node(np);
+   of_node_put(np);
of_node_put(parent);
return 0;
 }
-- 
2.31.1

[PATCH 4/7] hvcs: Get reference to tty in remove

2023-01-30 Thread Brian King

Grab a reference to the tty when removing the hvcs to ensure
it does not get freed unexpectedly.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 9c5887d0c882..b28ddfc46e42 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -802,7 +802,7 @@ static void hvcs_remove(struct vio_dev *dev)
 
spin_lock_irqsave(>lock, flags);
 
-   tty = hvcsd->port.tty;
+   tty = tty_port_tty_get(>port);
 
spin_unlock_irqrestore(>lock, flags);
 
@@ -819,8 +819,10 @@ static void hvcs_remove(struct vio_dev *dev)
 * hvcs_hangup.  The tty should always be valid at this time unless a
 * simultaneous tty close already cleaned up the hvcs_struct.
 */
-   if (tty)
+   if (tty) {
tty_hangup(tty);
+   tty_kref_put(tty);
+   }
 
printk(KERN_INFO "HVCS: vty-server@%X removed from the"
" vio bus.\n", dev->unit_address);
-- 
2.31.1

[PATCH 2/7] hvcs: Remove sysfs file prior to vio unregister

2023-01-30 Thread Brian King

This moves the removal of the rescan sysfs attribute to occur
before the call to unregister the vio to ensure the removal
does not fail due to the vio driver already being freed.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index faf5ccfc561e..9131dcb2e8d8 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -1519,6 +1519,8 @@ static int __init hvcs_module_init(void)
 
 static void __exit hvcs_module_exit(void)
 {
+   driver_remove_file(_vio_driver.driver, _attr_rescan);
+
/*
 * This driver receives hvcs_remove callbacks for each device upon
 * module removal.
@@ -1538,8 +1540,6 @@ static void __exit hvcs_module_exit(void)
hvcs_pi_buff = NULL;
spin_unlock(_pi_lock);
 
-   driver_remove_file(_vio_driver.driver, _attr_rescan);
-
tty_unregister_driver(hvcs_tty_driver);
 
hvcs_free_index_list();
-- 
2.31.1

[PATCH 3/7] hvcs: Remove sysfs group earlier

2023-01-30 Thread Brian King

Cleanup the sysfs group earlier in remove. This eliminates
errors coming from kernfs when attempting to remove a console
device that is in use.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 9131dcb2e8d8..9c5887d0c882 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -688,8 +688,6 @@ static void hvcs_destruct_port(struct tty_port *p)
spin_unlock_irqrestore(>lock, flags);
spin_unlock(_structs_lock);
 
-   sysfs_remove_group(>dev.kobj, _attr_group);
-
kfree(hvcsd);
 }
 
@@ -814,6 +812,8 @@ static void hvcs_remove(struct vio_dev *dev)
 */
tty_port_put(>port);
 
+   sysfs_remove_group(>dev.kobj, _attr_group);
+
/*
 * The hangup is a scheduled function which will auto chain call
 * hvcs_hangup.  The tty should always be valid at this time unless a
-- 
2.31.1

[PATCH 0/7] hvcs: Various hvcs device hotplug fixes

2023-01-30 Thread Brian King

This patch series fixes a number of issues with hotplugging
hvcs devices including memory leaks as well as, the inability
to reconnect to a console device after it has been hot added
back, since it was not getting cleaned up properly on the
hotplug remove path.

Brian King (7):
  hvcs: Fix hvcs port reference counting
  hvcs: Remove sysfs file prior to vio unregister
  hvcs: Remove sysfs group earlier
  hvcs: Get reference to tty in remove
  hvcs: Use vhangup in hotplug remove
  hvcs: Synchronize hotplug remove with port free
  powerpc: Fix device node refcounting

 arch/powerpc/platforms/pseries/reconfig.c |  1 +
 drivers/tty/hvc/hvcs.c| 61 +--
 2 files changed, 25 insertions(+), 37 deletions(-)

-- 
2.31.1

[PATCH 1/7] hvcs: Fix hvcs port reference counting

2023-01-30 Thread Brian King

The hvcs driver only ever gets two references to the port. One
at initialization time, and one at install time. Remove the code
that was trying to do multiple port puts for each open, which
would result in more puts than gets.

Signed-off-by: Brian King 
---
 drivers/tty/hvc/hvcs.c | 18 --
 1 file changed, 18 deletions(-)

diff --git a/drivers/tty/hvc/hvcs.c b/drivers/tty/hvc/hvcs.c
index 4ba24963685e..faf5ccfc561e 100644
--- a/drivers/tty/hvc/hvcs.c
+++ b/drivers/tty/hvc/hvcs.c
@@ -1215,12 +1215,9 @@ static void hvcs_hangup(struct tty_struct * tty)
 {
struct hvcs_struct *hvcsd = tty->driver_data;
unsigned long flags;
-   int temp_open_count;
int irq;
 
spin_lock_irqsave(>lock, flags);
-   /* Preserve this so that we know how many kref refs to put */
-   temp_open_count = hvcsd->port.count;
 
/*
 * Don't kref put inside the spinlock because the destruction
@@ -1247,21 +1244,6 @@ static void hvcs_hangup(struct tty_struct * tty)
spin_unlock_irqrestore(>lock, flags);
 
free_irq(irq, hvcsd);
-
-   /*
-* We need to kref_put() for every open_count we have since the
-* tty_hangup() function doesn't invoke a close per open connection on a
-* non-console device.
-*/
-   while(temp_open_count) {
-   --temp_open_count;
-   /*
-* The final put will trigger destruction of the hvcs_struct.
-* NOTE:  If this hangup was signaled from user space then the
-* final put will never happen.
-*/
-   tty_port_put(>port);
-   }
 }
 
 /*
-- 
2.31.1

[PATCH] ibmvfc: Avoid path failures during live migration

2022-10-26 Thread Brian King

This patch fixes an issue reported when performing a live
migration when multipath is configured with a short fast
fail timeout of 5 seconds and also to have no_path_retry
set to fail. In this scenario, all paths would go into the
devloss state while the ibmvfc driver went through discovery
to log back in. On a loaded system, the discovery might
take longer than 5 seconds, which was resulting in all paths
being marked failed, which then resulted in a read only filesystem.

This patch changes the migration code in ibmvfc to
avoid deleting rports at all in this scenario, so we
avoid losing all paths.

Signed-off-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 00684e11976b..1a0c0b7289d2 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -708,8 +708,13 @@ static void ibmvfc_init_host(struct ibmvfc_host *vhost)
memset(vhost->async_crq.msgs.async, 0, PAGE_SIZE);
vhost->async_crq.cur = 0;
 
-   list_for_each_entry(tgt, >targets, queue)
-   ibmvfc_del_tgt(tgt);
+   list_for_each_entry(tgt, >targets, queue) {
+   if (vhost->client_migrated)
+   tgt->need_login = 1;
+   else
+   ibmvfc_del_tgt(tgt);
+   }
+
scsi_block_requests(vhost->host);
ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_INIT);
vhost->job_step = ibmvfc_npiv_login;
@@ -3235,9 +3240,12 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, 
struct ibmvfc_host *vhost,
/* We need to re-setup the interpartition connection */
dev_info(vhost->dev, "Partition migrated, Re-enabling 
adapter\n");
vhost->client_migrated = 1;
+
+   scsi_block_requests(vhost->host);
ibmvfc_purge_requests(vhost, DID_REQUEUE);
-   ibmvfc_link_down(vhost, IBMVFC_LINK_DOWN);
+   ibmvfc_set_host_state(vhost, IBMVFC_LINK_DOWN);
ibmvfc_set_host_action(vhost, 
IBMVFC_HOST_ACTION_REENABLE);
+   wake_up(>work_wait_q);
} else if (crq->format == IBMVFC_PARTNER_FAILED || crq->format 
== IBMVFC_PARTNER_DEREGISTER) {
dev_err(vhost->dev, "Host partner adapter deregistered 
or failed (rc=%d)\n", crq->format);
ibmvfc_purge_requests(vhost, DID_ERROR);
-- 
2.31.1

Re: [PATCH] powerpc: Enhance pmem DMA bypass handling

2021-10-27 Thread Brian King

On 10/26/21 12:39 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 10/26/21 01:40, Brian King wrote:
>> On 10/23/21 7:18 AM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 23/10/2021 07:18, Brian King wrote:
>>>> On 10/22/21 7:24 AM, Alexey Kardashevskiy wrote:
>>>>>
>>>>>
>>>>> On 22/10/2021 04:44, Brian King wrote:
>>>>>> If ibm,pmemory is installed in the system, it can appear anywhere
>>>>>> in the address space. This patch enhances how we handle DMA for devices 
>>>>>> when
>>>>>> ibm,pmemory is present. In the case where we have enough DMA space to
>>>>>> direct map all of RAM, but not ibm,pmemory, we use direct DMA for
>>>>>> I/O to RAM and use the default window to dynamically map ibm,pmemory.
>>>>>> In the case where we only have a single DMA window, this won't work, > 
>>>>>> so if the window is not big enough to map the entire address range,
>>>>>> we cannot direct map.
>>>>>
>>>>> but we want the pmem range to be mapped into the huge DMA window too if 
>>>>> we can, why skip it?
>>>>
>>>> This patch should simply do what the comment in this commit mentioned 
>>>> below suggests, which says that
>>>> ibm,pmemory can appear anywhere in the address space. If the DMA window is 
>>>> large enough
>>>> to map all of MAX_PHYSMEM_BITS, we will indeed simply do direct DMA for 
>>>> everything,
>>>> including the pmem. If we do not have a big enough window to do that, we 
>>>> will do
>>>> direct DMA for DRAM and dynamic mapping for pmem.
>>>
>>>
>>> Right, and this is what we do already, do not we? I missing something here.
>>
>> The upstream code does not work correctly that I can see. If I boot an 
>> upstream kernel
>> with an nvme device and vpmem assigned to the LPAR, and enable dev_dbg in 
>> arch/powerpc/platforms/pseries/iommu.c,
>> I see the following in the logs:
>>
>> [2.157549] nvme 0121:50:00.0: ibm,query-pe-dma-windows(53) 50 
>> 800 2121 returned 0
>> [2.157561] nvme 0121:50:00.0: Skipping ibm,pmemory
>> [2.157567] nvme 0121:50:00.0: can't map partition max 0x8 
>> with 16777216 65536-sized pages
>> [2.170150] nvme 0121:50:00.0: ibm,create-pe-dma-window(54) 50 
>> 800 2121 10 28 returned 0 (liobn = 0x7121 starting addr = 
>> 800 0)
>> [2.170170] nvme 0121:50:00.0: created tce table LIOBN 0x7121 for 
>> /pci@8002121/pci1014,683@0
>> [2.356260] nvme 0121:50:00.0: node is /pci@8002121/pci1014,683@0
>>
>> This means we are heading down the leg in enable_ddw where we do not set 
>> direct_mapping to true. We use
>> create the DDW window, but don't do any direct DMA. This is because the 
>> window is not large enough to
>> map 2PB of memory, which is what ddw_memory_hotplug_max returns without my 
>> patch. 
>>
>> With my patch applied, I get this in the logs:
>>
>> [2.204866] nvme 0121:50:00.0: ibm,query-pe-dma-windows(53) 50 
>> 800 2121 returned 0
>> [2.204875] nvme 0121:50:00.0: Skipping ibm,pmemory
>> [2.205058] nvme 0121:50:00.0: ibm,create-pe-dma-window(54) 500000 
>> 800 2121 10 21 returned 0 (liobn = 0x7121 starting addr = 
>> 800 0)
>> [2.205068] nvme 0121:50:00.0: created tce table LIOBN 0x7121 for 
>> /pci@8002121/pci1014,683@0
>> [2.215898] nvme 0121:50:00.0: iommu: 64-bit OK but direct DMA is limited 
>> by 802
>>
> 
> 
> ah I see. then...
> 
> 
>>
>> Thanks,
>>
>> Brian
>>
>>
>>>
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/powerpc/platforms/pseries/iommu.c?id=bf6e2d562bbc4d115cf322b0bca57fe5bbd26f48
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Brian
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Brian King 
>>>>>> ---
>>>>>>    arch/powerpc/platforms/pseries/iommu.c | 19 ++-
>>>>>>    1 file changed, 10 insertions(+), 9 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>>>>>> b/arch/powerpc/platforms/pseries/iommu.c
>>>>>&g

Re: [PATCH] powerpc: Enhance pmem DMA bypass handling

2021-10-25 Thread Brian King

On 10/23/21 7:18 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 23/10/2021 07:18, Brian King wrote:
>> On 10/22/21 7:24 AM, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 22/10/2021 04:44, Brian King wrote:
>>>> If ibm,pmemory is installed in the system, it can appear anywhere
>>>> in the address space. This patch enhances how we handle DMA for devices 
>>>> when
>>>> ibm,pmemory is present. In the case where we have enough DMA space to
>>>> direct map all of RAM, but not ibm,pmemory, we use direct DMA for
>>>> I/O to RAM and use the default window to dynamically map ibm,pmemory.
>>>> In the case where we only have a single DMA window, this won't work, > so 
>>>> if the window is not big enough to map the entire address range,
>>>> we cannot direct map.
>>>
>>> but we want the pmem range to be mapped into the huge DMA window too if we 
>>> can, why skip it?
>>
>> This patch should simply do what the comment in this commit mentioned below 
>> suggests, which says that
>> ibm,pmemory can appear anywhere in the address space. If the DMA window is 
>> large enough
>> to map all of MAX_PHYSMEM_BITS, we will indeed simply do direct DMA for 
>> everything,
>> including the pmem. If we do not have a big enough window to do that, we 
>> will do
>> direct DMA for DRAM and dynamic mapping for pmem.
> 
> 
> Right, and this is what we do already, do not we? I missing something here.

The upstream code does not work correctly that I can see. If I boot an upstream 
kernel
with an nvme device and vpmem assigned to the LPAR, and enable dev_dbg in 
arch/powerpc/platforms/pseries/iommu.c,
I see the following in the logs:

[2.157549] nvme 0121:50:00.0: ibm,query-pe-dma-windows(53) 50 800 
2121 returned 0
[2.157561] nvme 0121:50:00.0: Skipping ibm,pmemory
[2.157567] nvme 0121:50:00.0: can't map partition max 0x8 with 
16777216 65536-sized pages
[2.170150] nvme 0121:50:00.0: ibm,create-pe-dma-window(54) 50 800 
2121 10 28 returned 0 (liobn = 0x7121 starting addr = 800 0)
[2.170170] nvme 0121:50:00.0: created tce table LIOBN 0x7121 for 
/pci@8002121/pci1014,683@0
[2.356260] nvme 0121:50:00.0: node is /pci@8002121/pci1014,683@0

This means we are heading down the leg in enable_ddw where we do not set 
direct_mapping to true. We use
create the DDW window, but don't do any direct DMA. This is because the window 
is not large enough to
map 2PB of memory, which is what ddw_memory_hotplug_max returns without my 
patch. 

With my patch applied, I get this in the logs:

[2.204866] nvme 0121:50:00.0: ibm,query-pe-dma-windows(53) 50 800 
2121 returned 0
[2.204875] nvme 0121:50:00.0: Skipping ibm,pmemory
[2.205058] nvme 0121:50:00.0: ibm,create-pe-dma-window(54) 50 800 
2121 10 21 returned 0 (liobn = 0x7121 starting addr = 800 0)
[2.205068] nvme 0121:50:00.0: created tce table LIOBN 0x7121 for 
/pci@8002121/pci1014,683@0
[2.215898] nvme 0121:50:00.0: iommu: 64-bit OK but direct DMA is limited by 
802


Thanks,

Brian


> 
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/powerpc/platforms/pseries/iommu.c?id=bf6e2d562bbc4d115cf322b0bca57fe5bbd26f48
>>
>>
>> Thanks,
>>
>> Brian
>>
>>
>>>
>>>
>>>>
>>>> Signed-off-by: Brian King 
>>>> ---
>>>>    arch/powerpc/platforms/pseries/iommu.c | 19 ++-
>>>>    1 file changed, 10 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>>>> b/arch/powerpc/platforms/pseries/iommu.c
>>>> index 269f61d519c2..d9ae985d10a4 100644
>>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>>> @@ -1092,15 +1092,6 @@ static phys_addr_t ddw_memory_hotplug_max(void)
>>>>    phys_addr_t max_addr = memory_hotplug_max();
>>>>    struct device_node *memory;
>>>>    -    /*
>>>> - * The "ibm,pmemory" can appear anywhere in the address space.
>>>> - * Assuming it is still backed by page structs, set the upper limit
>>>> - * for the huge DMA window as MAX_PHYSMEM_BITS.
>>>> - */
>>>> -    if (of_find_node_by_type(NULL, "ibm,pmemory"))
>>>> -    return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
>>>> -    (phys_addr_t) -1 : (1ULL << MAX_PHYSMEM_BITS);
>>

Re: [PATCH] powerpc: Enhance pmem DMA bypass handling

2021-10-22 Thread Brian King

On 10/22/21 7:24 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 22/10/2021 04:44, Brian King wrote:
>> If ibm,pmemory is installed in the system, it can appear anywhere
>> in the address space. This patch enhances how we handle DMA for devices when
>> ibm,pmemory is present. In the case where we have enough DMA space to
>> direct map all of RAM, but not ibm,pmemory, we use direct DMA for
>> I/O to RAM and use the default window to dynamically map ibm,pmemory.
>> In the case where we only have a single DMA window, this won't work, > so if 
>> the window is not big enough to map the entire address range,
>> we cannot direct map.
> 
> but we want the pmem range to be mapped into the huge DMA window too if we 
> can, why skip it?

This patch should simply do what the comment in this commit mentioned below 
suggests, which says that
ibm,pmemory can appear anywhere in the address space. If the DMA window is 
large enough
to map all of MAX_PHYSMEM_BITS, we will indeed simply do direct DMA for 
everything,
including the pmem. If we do not have a big enough window to do that, we will do
direct DMA for DRAM and dynamic mapping for pmem.


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/powerpc/platforms/pseries/iommu.c?id=bf6e2d562bbc4d115cf322b0bca57fe5bbd26f48


Thanks,

Brian


> 
> 
>>
>> Signed-off-by: Brian King 
>> ---
>>   arch/powerpc/platforms/pseries/iommu.c | 19 ++-
>>   1 file changed, 10 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>> b/arch/powerpc/platforms/pseries/iommu.c
>> index 269f61d519c2..d9ae985d10a4 100644
>> --- a/arch/powerpc/platforms/pseries/iommu.c
>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>> @@ -1092,15 +1092,6 @@ static phys_addr_t ddw_memory_hotplug_max(void)
>>   phys_addr_t max_addr = memory_hotplug_max();
>>   struct device_node *memory;
>>   -    /*
>> - * The "ibm,pmemory" can appear anywhere in the address space.
>> - * Assuming it is still backed by page structs, set the upper limit
>> - * for the huge DMA window as MAX_PHYSMEM_BITS.
>> - */
>> -    if (of_find_node_by_type(NULL, "ibm,pmemory"))
>> -    return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
>> -    (phys_addr_t) -1 : (1ULL << MAX_PHYSMEM_BITS);
>> -
>>   for_each_node_by_type(memory, "memory") {
>>   unsigned long start, size;
>>   int n_mem_addr_cells, n_mem_size_cells, len;
>> @@ -1341,6 +1332,16 @@ static bool enable_ddw(struct pci_dev *dev, struct 
>> device_node *pdn)
>>    */
>>   len = max_ram_len;
>>   if (pmem_present) {
>> +    if (default_win_removed) {
>> +    /*
>> + * If we only have one DMA window and have pmem present,
>> + * then we need to be able to map the entire address
>> + * range in order to be able to do direct DMA to RAM.
>> + */
>> +    len = order_base_2((sizeof(phys_addr_t) * 8 <= 
>> MAX_PHYSMEM_BITS) ?
>> +    (phys_addr_t) -1 : (1ULL << MAX_PHYSMEM_BITS));
>> +    }
>> +
>>   if (query.largest_available_block >=
>>   (1ULL << (MAX_PHYSMEM_BITS - page_shift)))
>>   len = MAX_PHYSMEM_BITS;
>>
> 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

[PATCH] powerpc: Enhance pmem DMA bypass handling

2021-10-21 Thread Brian King

If ibm,pmemory is installed in the system, it can appear anywhere
in the address space. This patch enhances how we handle DMA for devices when
ibm,pmemory is present. In the case where we have enough DMA space to
direct map all of RAM, but not ibm,pmemory, we use direct DMA for
I/O to RAM and use the default window to dynamically map ibm,pmemory.
In the case where we only have a single DMA window, this won't work,
so if the window is not big enough to map the entire address range,
we cannot direct map.

Signed-off-by: Brian King 
---
 arch/powerpc/platforms/pseries/iommu.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 269f61d519c2..d9ae985d10a4 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1092,15 +1092,6 @@ static phys_addr_t ddw_memory_hotplug_max(void)
phys_addr_t max_addr = memory_hotplug_max();
struct device_node *memory;
 
-   /*
-* The "ibm,pmemory" can appear anywhere in the address space.
-* Assuming it is still backed by page structs, set the upper limit
-* for the huge DMA window as MAX_PHYSMEM_BITS.
-*/
-   if (of_find_node_by_type(NULL, "ibm,pmemory"))
-   return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
-   (phys_addr_t) -1 : (1ULL << MAX_PHYSMEM_BITS);
-
for_each_node_by_type(memory, "memory") {
unsigned long start, size;
int n_mem_addr_cells, n_mem_size_cells, len;
@@ -1341,6 +1332,16 @@ static bool enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
 */
len = max_ram_len;
if (pmem_present) {
+   if (default_win_removed) {
+   /*
+* If we only have one DMA window and have pmem present,
+* then we need to be able to map the entire address
+* range in order to be able to do direct DMA to RAM.
+*/
+   len = order_base_2((sizeof(phys_addr_t) * 8 <= 
MAX_PHYSMEM_BITS) ?
+   (phys_addr_t) -1 : (1ULL << 
MAX_PHYSMEM_BITS));
+   }
+
if (query.largest_available_block >=
(1ULL << (MAX_PHYSMEM_BITS - page_shift)))
len = MAX_PHYSMEM_BITS;
-- 
2.27.0

Re: [PATCH 2/2] ibmvfc: make ibmvfc_wait_for_ops MQ aware

2021-03-19 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 1/2] ibmvfc: fix potential race in ibmvfc_wait_for_ops

2021-03-19 Thread Brian King

Reviewed-by: Brian King 

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v4 5/5] ibmvfc: reinitialize sub-CRQs and perform channel enquiry after LPM

2021-02-26 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v4 2/5] ibmvfc: fix invalid sub-CRQ handles after hard reset

2021-02-26 Thread Brian King

Reviewed-by: Brian King 

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 5/5] ibmvfc: reinitialize sub-CRQs and perform channel enquiry after LPM

2021-02-25 Thread Brian King

On 2/25/21 2:48 PM, Tyrel Datwyler wrote:
> A live partition migration (LPM) results in a CRQ disconnect similar to
> a hard reset. In this LPM case the hypervisor moslty perserves the CRQ
> transport such that it simply needs to be reenabled. However, the
> capabilities may have changed such as fewer channels, or no channels at
> all. Further, its possible that there may be sub-CRQ support, but no
> channel support. The CRQ reenable path currently doesn't take any of
> this into consideration.
> 
> For simpilicty release and reinitialize sub-CRQs during reenable, and
> set do_enquiry and using_channels with the appropriate values to trigger
> channel renegotiation.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 4ac2c442e1e2..9ae6be56e375 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -903,6 +903,9 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
> *vhost)
>  {
>   int rc = 0;
>   struct vio_dev *vdev = to_vio_dev(vhost->dev);
> + unsigned long flags;
> +
> + ibmvfc_release_sub_crqs(vhost);
>  
>   /* Re-enable the CRQ */
>   do {
> @@ -914,6 +917,16 @@ static int ibmvfc_reenable_crq_queue(struct ibmvfc_host 
> *vhost)
>   if (rc)
>   dev_err(vhost->dev, "Error enabling adapter (rc=%d)\n", rc);
>  
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + spin_lock(vhost->crq.q_lock);
> + vhost->do_enquiry = 1;
> + vhost->using_channels = 0;
> +
> + ibmvfc_init_sub_crqs(vhost);
> +
> + spin_unlock(vhost->crq.q_lock);
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);

ibmvfc_init_sub_crqs can sleep, for multiple reasons, so you can't hold
a lock when you call it. There is a GFP_KERNEL allocation in it, and the
patch before this one adds an msleep in an error path.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 4/5] ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

2021-02-25 Thread Brian King

Reviewed-by: Brian King 

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 4/4] ibmvfc: store return code of H_FREE_SUB_CRQ during cleanup

2021-02-16 Thread Brian King

On 2/11/21 12:57 PM, Tyrel Datwyler wrote:
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index ba6fcf9cbc57..23b803ac4a13 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -5670,7 +5670,7 @@ static int ibmvfc_register_scsi_channel(struct 
> ibmvfc_host *vhost,
>  
>  irq_failed:
>   do {
> - plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
> scrq->cookie);
> + rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address, 
> scrq->cookie);
>   } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));

Other places in the driver where we get a busy return code back we have an 
msleep(100).
Should we be doing that here as well?

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 3/4] ibmvfc: treat H_CLOSED as success during sub-CRQ registration

2021-02-16 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 2/4] ibmvfc: fix invalid sub-CRQ handles after hard reset

2021-02-16 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 1/4] ibmvfc: simplify handling of sub-CRQ initialization

2021-02-15 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v5 00/21] ibmvfc: initial MQ development/enablement

2021-01-14 Thread Brian King

Tyrel,

I think this patch series is looking pretty good. I don't think we need
to wait for resolution of the can_queue issue being discussed, since
that is an issue that exists prior to this patch series and this patch
series doesn't make the issue any worse. Let's work that separately.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v5 21/21] ibmvfc: provide modules parameters for MQ settings

2021-01-14 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v5 18/21] ibmvfc: send Cancel MAD down each hw scsi channel

2021-01-14 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

2021-01-14 Thread Brian King

On 1/13/21 7:27 PM, Ming Lei wrote:
> On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
>> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
>>> On 1/12/21 2:54 PM, Brian King wrote:
>>>> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
>>>>> Introduce several new vhost fields for managing MQ state of the adapter
>>>>> as well as initial defaults for MQ enablement.
>>>>>
>>>>> Signed-off-by: Tyrel Datwyler 
>>>>> ---
>>>>>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 
>>>>>  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +
>>>>>  2 files changed, 17 insertions(+)
>>>>>
>>>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c 
>>>>> b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> index ba95438a8912..9200fe49c57e 100644
>>>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>>>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>>>>>   .max_sectors = IBMVFC_MAX_SECTORS,
>>>>>   .shost_attrs = ibmvfc_attrs,
>>>>>   .track_queue_depth = 1,
>>>>> + .host_tagset = 1,
>>>>
>>>> This doesn't seem right. You are setting host_tagset, which means you want 
>>>> a
>>>> shared, host wide, tag set for commands. It also means that the total
>>>> queue depth for the host is can_queue. However, it looks like you are 
>>>> allocating
>>>> max_requests events for each sub crq, which means you are over allocating 
>>>> memory.
>>>
>>> With the shared tagset yes the queue depth for the host is can_queue, but 
>>> this
>>> also implies that the max queue depth for each hw queue is also can_queue. 
>>> So,
>>> in the worst case that all commands are queued down the same hw queue we 
>>> need an
>>> event pool with can_queue commands.
>>>
>>>>
>>>> Looking at this closer, we might have bigger problems. There is a host wide
>>>> max number of commands that the VFC host supports, which gets returned on
>>>> NPIV Login. This value can change across a live migration event.
>>>
>>> From what I understand the max commands can only become less.
>>>
>>>>
>>>> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
>>>> can_queue on the scsi_host *after* the tag set has been allocated. This 
>>>> looks
>>>> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
>>>> we look at can_queue once the tag set is setup, and I'm not seeing a good 
>>>> way
>>>> to dynamically change the host queue depth once the tag set is setup. 
>>>>
>>>> Unless I'm missing something, our best options appear to either be to 
>>>> implement
>>>> our own host wide busy reference counting, which doesn't sound very good, 
>>>> or
>>>> we need to add some API to block / scsi that allows us to dynamically 
>>>> change
>>>> can_queue.
>>>
>>> Changing can_queue won't do use any good with the shared tagset becasue each
>>> queue still needs to be able to queue can_queue number of commands in the 
>>> worst
>>> case.
>>
>> The issue I'm trying to highlight here is the following scenario:
>>
>> 1. We set shost->can_queue, then call scsi_add_host, which allocates the tag 
>> set.
>>
>> 2. On our NPIV login response from the VIOS, we might get a lower value than 
>> we
>> initially set in shost->can_queue, so we update it, but nobody ever looks at 
>> it
>> again, and we don't have any protection against sending too many commands to 
>> the host.
>>
>>
>> Basically, we no longer have any code that ensures we don't send more
>> commands to the VIOS than we are told it supports. According to the 
>> architecture,
>> if we actually do this, the VIOS will do an h_free_crq, which would be a bit
>> of a bug on our part.
>>
>> I don't think it was ever clearly defined in the API that a driver can
>> change shost->can_queue after calling scsi_add_host, but up until
>> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
>> it doesn't. 
> 
> Actually it isn't related with commit 6eb045e092ef, because 
> blk_mq_alloc_tag_set()
> uses .can_queue to create driver tag sbitmap and request pool.
> 
> So even thought without 6eb045e092ef

Re: [PATCH v4 00/21] ibmvfc: initial MQ development

2021-01-13 Thread Brian King

With the exception of the few comments I've shared, the rest of this looks
good to me and you can add my:

Reviewed-by: Brian King 

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v4 21/21] ibmvfc: provide modules parameters for MQ settings

2021-01-13 Thread Brian King

On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> @@ -5880,12 +5936,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
> struct vio_device_id *id)
>  
>   shost->transportt = ibmvfc_transport_template;
>   shost->can_queue = max_requests;
> + shost->can_queue = (max_requests / nr_scsi_hw_queues);

This looks to be in conflict with the fact that the first patch requested a 
shared tag set, right?

>   shost->max_lun = max_lun;
>   shost->max_id = max_targets;
>   shost->max_sectors = IBMVFC_MAX_SECTORS;
>   shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
>   shost->unique_id = shost->host_no;
> - shost->nr_hw_queues = IBMVFC_MQ ? IBMVFC_SCSI_HW_QUEUES : 1;
> + shost->nr_hw_queues = mq_enabled ? nr_scsi_hw_queues : 1;

You might want to range check this, to make sure its sane.
>  
>   vhost = shost_priv(shost);
>   INIT_LIST_HEAD(>targets);
> @@ -5897,8 +5954,8 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
> struct vio_device_id *id)
>   vhost->log_level = log_level;
>   vhost->task_set = 1;
>  
> - vhost->mq_enabled = IBMVFC_MQ;
> - vhost->client_scsi_channels = IBMVFC_SCSI_CHANNELS;
> + vhost->mq_enabled = mq_enabled;
> + vhost->client_scsi_channels = min(nr_scsi_hw_queues, nr_scsi_channels);
>   vhost->using_channels = 0;
>   vhost->do_enquiry = 1;
>  
> 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

2021-01-13 Thread Brian King

On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> On 1/12/21 2:54 PM, Brian King wrote:
>> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
>>> Introduce several new vhost fields for managing MQ state of the adapter
>>> as well as initial defaults for MQ enablement.
>>>
>>> Signed-off-by: Tyrel Datwyler 
>>> ---
>>>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 
>>>  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +
>>>  2 files changed, 17 insertions(+)
>>>
>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> index ba95438a8912..9200fe49c57e 100644
>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>>> .max_sectors = IBMVFC_MAX_SECTORS,
>>> .shost_attrs = ibmvfc_attrs,
>>> .track_queue_depth = 1,
>>> +   .host_tagset = 1,
>>
>> This doesn't seem right. You are setting host_tagset, which means you want a
>> shared, host wide, tag set for commands. It also means that the total
>> queue depth for the host is can_queue. However, it looks like you are 
>> allocating
>> max_requests events for each sub crq, which means you are over allocating 
>> memory.
> 
> With the shared tagset yes the queue depth for the host is can_queue, but this
> also implies that the max queue depth for each hw queue is also can_queue. So,
> in the worst case that all commands are queued down the same hw queue we need 
> an
> event pool with can_queue commands.
> 
>>
>> Looking at this closer, we might have bigger problems. There is a host wide
>> max number of commands that the VFC host supports, which gets returned on
>> NPIV Login. This value can change across a live migration event.
> 
> From what I understand the max commands can only become less.
> 
>>
>> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
>> can_queue on the scsi_host *after* the tag set has been allocated. This looks
>> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
>> we look at can_queue once the tag set is setup, and I'm not seeing a good way
>> to dynamically change the host queue depth once the tag set is setup. 
>>
>> Unless I'm missing something, our best options appear to either be to 
>> implement
>> our own host wide busy reference counting, which doesn't sound very good, or
>> we need to add some API to block / scsi that allows us to dynamically change
>> can_queue.
> 
> Changing can_queue won't do use any good with the shared tagset becasue each
> queue still needs to be able to queue can_queue number of commands in the 
> worst
> case.

The issue I'm trying to highlight here is the following scenario:

1. We set shost->can_queue, then call scsi_add_host, which allocates the tag 
set.

2. On our NPIV login response from the VIOS, we might get a lower value than we
initially set in shost->can_queue, so we update it, but nobody ever looks at it
again, and we don't have any protection against sending too many commands to 
the host.


Basically, we no longer have any code that ensures we don't send more
commands to the VIOS than we are told it supports. According to the 
architecture,
if we actually do this, the VIOS will do an h_free_crq, which would be a bit
of a bug on our part.

I don't think it was ever clearly defined in the API that a driver can
change shost->can_queue after calling scsi_add_host, but up until
commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
it doesn't. 

I started looking through drivers that do this, and so far, it looks like the
following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely 
others...

We probably need an API that lets us change shost->can_queue dynamically.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v4 18/21] ibmvfc: send Cancel MAD down each hw scsi channel

2021-01-12 Thread Brian King

On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index b0b0212344f3..24e1278acfeb 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -2418,18 +2418,79 @@ static struct ibmvfc_event *ibmvfc_init_tmf(struct 
> ibmvfc_queue *queue,
>   return evt;
>  }
>  
> -/**
> - * ibmvfc_cancel_all - Cancel all outstanding commands to the device
> - * @sdev:scsi device to cancel commands
> - * @type:type of error recovery being performed
> - *
> - * This sends a cancel to the VIOS for the specified device. This does
> - * NOT send any abort to the actual device. That must be done separately.
> - *
> - * Returns:
> - *   0 on success / other on failure
> - **/
> -static int ibmvfc_cancel_all(struct scsi_device *sdev, int type)
> +static int ibmvfc_cancel_all_mq(struct scsi_device *sdev, int type)
> +{
> + struct ibmvfc_host *vhost = shost_priv(sdev->host);
> + struct ibmvfc_event *evt, *found_evt, *temp;
> + struct ibmvfc_queue *queues = vhost->scsi_scrqs.scrqs;
> + unsigned long flags;
> + int num_hwq, i;
> + LIST_HEAD(cancelq);
> + u16 status;
> +
> + ENTER;
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + num_hwq = vhost->scsi_scrqs.active_queues;
> + for (i = 0; i < num_hwq; i++) {
> + spin_lock(queues[i].q_lock);
> + spin_lock([i].l_lock);
> + found_evt = NULL;
> + list_for_each_entry(evt, [i].sent, queue_list) {
> + if (evt->cmnd && evt->cmnd->device == sdev) {
> + found_evt = evt;
> + break;
> + }
> + }
> + spin_unlock([i].l_lock);
> +

I really don't like the way the first for loop grabs all the q_locks, and then
there is a second for loop that drops all these locks... I think this code would
be cleaner if it looked like:

if (found_evt && vhost->logged_in) {
evt = ibmvfc_init_tmf([i], sdev, type);
evt->sync_iu = [i].cancel_rsp;
ibmvfc_send_event(evt, vhost, default_timeout);
list_add_tail(>cancel, );
}

spin_unlock(queues[i].q_lock);

}

> + if (!found_evt)
> + continue;
> +
> + if (vhost->logged_in) {
> + evt = ibmvfc_init_tmf([i], sdev, type);
> + evt->sync_iu = [i].cancel_rsp;
> + ibmvfc_send_event(evt, vhost, default_timeout);
> + list_add_tail(>cancel, );
> + }
> + }
> +
> + for (i = 0; i < num_hwq; i++)
> + spin_unlock(queues[i].q_lock);
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> +
> + if (list_empty()) {
> + if (vhost->log_level > IBMVFC_DEFAULT_LOG_LEVEL)
> + sdev_printk(KERN_INFO, sdev, "No events found to 
> cancel\n");
> + return 0;
> + }
> +
> + sdev_printk(KERN_INFO, sdev, "Cancelling outstanding commands.\n");
> +
> + list_for_each_entry_safe(evt, temp, , cancel) {
> + wait_for_completion(>comp);
> + status = be16_to_cpu(evt->queue->cancel_rsp.mad_common.status);

You probably want a list_del(>cancel) here.

> + ibmvfc_free_event(evt);
> +
> + if (status != IBMVFC_MAD_SUCCESS) {
> + sdev_printk(KERN_WARNING, sdev, "Cancel failed with 
> rc=%x\n", status);
> + switch (status) {
> + case IBMVFC_MAD_DRIVER_FAILED:
> + case IBMVFC_MAD_CRQ_ERROR:
> + /* Host adapter most likely going through reset, return 
> success to
> +  * the caller will wait for the command being cancelled 
> to get returned
> +  */
> + break;
> + default:
> + break;

I think this default break needs to be a return -EIO.

> + }
> + }
> + }
> +
> + sdev_printk(KERN_INFO, sdev, "Successfully cancelled outstanding 
> commands\n");
> + return 0;
> +}
> +
> +static int ibmvfc_cancel_all_sq(struct scsi_device *sdev, int type)
>  {
>   struct ibmvfc_host *vhost = shost_priv(sdev->host);
>   struct ibmvfc_event *evt, *found_evt;
> @@ -2498,6 +2559,27 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, 
> int type)
>   return 0;
>  }
>  



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement

2021-01-12 Thread Brian King

On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> Introduce several new vhost fields for managing MQ state of the adapter
> as well as initial defaults for MQ enablement.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 
>  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +
>  2 files changed, 17 insertions(+)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index ba95438a8912..9200fe49c57e 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>   .max_sectors = IBMVFC_MAX_SECTORS,
>   .shost_attrs = ibmvfc_attrs,
>   .track_queue_depth = 1,
> + .host_tagset = 1,

This doesn't seem right. You are setting host_tagset, which means you want a
shared, host wide, tag set for commands. It also means that the total
queue depth for the host is can_queue. However, it looks like you are allocating
max_requests events for each sub crq, which means you are over allocating 
memory.

Looking at this closer, we might have bigger problems. There is a host wide
max number of commands that the VFC host supports, which gets returned on
NPIV Login. This value can change across a live migration event. 

The ibmvfc driver, which does the same thing the lpfc driver does, modifies
can_queue on the scsi_host *after* the tag set has been allocated. This looks
to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
we look at can_queue once the tag set is setup, and I'm not seeing a good way
to dynamically change the host queue depth once the tag set is setup. 

Unless I'm missing something, our best options appear to either be to implement
our own host wide busy reference counting, which doesn't sound very good, or
we need to add some API to block / scsi that allows us to dynamically change
can_queue.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v3 18/18] ibmvfc: drop host lock when completing commands in CRQ

2020-12-04 Thread Brian King

On 12/2/20 8:08 PM, Tyrel Datwyler wrote:
> The legacy CRQ holds the host lock the even while completing commands.
> This presents a problem when in legacy single queue mode and
> nr_hw_queues is greater than one since calling scsi_done() introduces
> the potential for deadlock.
> 
> If nr_hw_queues is greater than one drop the hostlock in the legacy CRQ
> path when completing a command.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index e499599662ec..e2200bdff2be 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -2969,6 +2969,7 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, 
> struct ibmvfc_host *vhost)
>  {
>   long rc;
>   struct ibmvfc_event *evt = (struct ibmvfc_event 
> *)be64_to_cpu(crq->ioba);
> + unsigned long flags;
>  
>   switch (crq->valid) {
>   case IBMVFC_CRQ_INIT_RSP:
> @@ -3039,7 +3040,12 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, 
> struct ibmvfc_host *vhost)
>   del_timer(>timer);
>   list_del(>queue);
>   ibmvfc_trc_end(evt);
> - evt->done(evt);
> + if (nr_scsi_hw_queues > 1) {
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> + evt->done(evt);
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + } else
> + evt->done(evt);

Similar comment here as previously. The flags parameter is an output for
spin_lock_irqsave but an input for spin_unlock_irqrestore. You'll need
to rethink the locking here. You could just do a spin_unlock_irq / spin_lock_irq
here and that would probably be OK, but probably isn't the best. 

>  }
>  
>  /**
> 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v3 17/18] ibmvfc: provide modules parameters for MQ settings

2020-12-04 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v3 15/18] ibmvfc: send Cancel MAD down each hw scsi channel

2020-12-04 Thread Brian King

  }
>  
> - if (vhost->logged_in) {
> - evt = ibmvfc_init_tmf(vhost, sdev, type);
> - evt->sync_iu = 
> - rsp_rc = ibmvfc_send_event(evt, vhost, default_timeout);
> - }
> -
> - spin_unlock_irqrestore(vhost->host->host_lock, flags);
> -
>   if (rsp_rc != 0) {
>   sdev_printk(KERN_ERR, sdev, "Failed to send cancel event. 
> rc=%d\n", rsp_rc);
>   /* If failure is received, the host adapter is most likely going
>through reset, return success so the caller will wait for the 
> command
>being cancelled to get returned */
> - return 0;
> + goto free_events;
>   }
>  
>   sdev_printk(KERN_INFO, sdev, "Cancelling outstanding commands.\n");
>  
> - wait_for_completion(>comp);
> - status = be16_to_cpu(rsp.mad_common.status);
> - spin_lock_irqsave(vhost->host->host_lock, flags);
> - ibmvfc_free_event(evt);
> - spin_unlock_irqrestore(vhost->host->host_lock, flags);
> + for (i = 0; i < num_hwq; i++) {
> + if (!scrqs[i].cancel_event)
> + continue;
>  
> - if (status != IBMVFC_MAD_SUCCESS) {
> - sdev_printk(KERN_WARNING, sdev, "Cancel failed with rc=%x\n", 
> status);
> - switch (status) {
> - case IBMVFC_MAD_DRIVER_FAILED:
> - case IBMVFC_MAD_CRQ_ERROR:
> - /* Host adapter most likely going through reset, return 
> success to
> -  the caller will wait for the command being cancelled 
> to get returned */
> - return 0;
> - default:
> - return -EIO;
> - };
> + wait_for_completion([i].cancel_event->comp);
> + status = be16_to_cpu(scrqs[i].cancel_rsp.mad_common.status);
> +
> + if (status != IBMVFC_MAD_SUCCESS) {
> + sdev_printk(KERN_WARNING, sdev, "Cancel failed with 
> rc=%x\n", status);
> + switch (status) {
> + case IBMVFC_MAD_DRIVER_FAILED:
> + case IBMVFC_MAD_CRQ_ERROR:
> + /* Host adapter most likely going through 
> reset, return success to
> +  the caller will wait for the command being 
> cancelled to get returned */
> + goto free_events;

Similar comment here... What about the rest of the outstanding cancel commands? 
Do you need
to wait for those to complete before freeing them?

> + default:
> + ret = -EIO;
> + goto free_events;
> + };
> + }
>   }
>  
>   sdev_printk(KERN_INFO, sdev, "Successfully cancelled outstanding 
> commands\n");
> - return 0;
> +free_events:
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + for (i = 0; i < num_hwq; i++)
> + if (scrqs[i].cancel_event)
> + ibmvfc_free_event(scrqs[i].cancel_event);
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> +
> + return ret;
>  }
>  
>  /**



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v3 06/18] ibmvfc: add handlers to drain and complete Sub-CRQ responses

2020-12-04 Thread Brian King

On 12/2/20 8:07 PM, Tyrel Datwyler wrote:
> The logic for iterating over the Sub-CRQ responses is similiar to that
> of the primary CRQ. Add the necessary handlers for processing those
> responses.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 80 ++
>  1 file changed, 80 insertions(+)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index e082935f56cf..b61ae1df21e5 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -3381,6 +3381,86 @@ static int ibmvfc_toggle_scrq_irq(struct 
> ibmvfc_sub_queue *scrq, int enable)
>   return rc;
>  }
>  
> +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host 
> *vhost)
> +{
> + struct ibmvfc_event *evt = (struct ibmvfc_event 
> *)be64_to_cpu(crq->ioba);
> + unsigned long flags;
> +
> + switch (crq->valid) {
> + case IBMVFC_CRQ_CMD_RSP:
> + break;
> + case IBMVFC_CRQ_XPORT_EVENT:
> + return;
> + default:
> + dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", 
> crq->valid);
> + return;
> + }
> +
> + /* The only kind of payload CRQs we should get are responses to
> +  * things we send. Make sure this response is to something we
> +  * actually sent
> +  */
> + if (unlikely(!ibmvfc_valid_event(>pool, evt))) {
> + dev_err(vhost->dev, "Returned correlation_token 0x%08llx is 
> invalid!\n",
> + crq->ioba);
> + return;
> + }
> +
> + if (unlikely(atomic_read(>free))) {
> + dev_err(vhost->dev, "Received duplicate correlation_token 
> 0x%08llx!\n",
> + crq->ioba);
> + return;
> + }
> +
> + del_timer(>timer);
> + list_del(>queue);
> + ibmvfc_trc_end(evt);> + spin_unlock_irqrestore(vhost->host->host_lock, 
> flags);

You can't do this here... You are grabbing the host lock in ibmvfc_drain_sub_crq
and saving the irqflags to a local in that function, then doing a 
spin_unlock_irqrestore
and restoring irqflags using an uninitialized local in this function...

I'm assuming this will get sorted out with the locking changes we've been 
discussing off-list...


> + evt->done(evt);
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> +}
> +
> +static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_sub_queue *scrq)
> +{
> + struct ibmvfc_crq *crq;
> +
> + crq = >msgs[scrq->cur].crq;
> + if (crq->valid & 0x80) {
> + if (++scrq->cur == scrq->size)
> + scrq->cur = 0;
> + rmb();
> + } else
> + crq = NULL;
> +
> + return crq;
> +}
> +
> +static void ibmvfc_drain_sub_crq(struct ibmvfc_sub_queue *scrq)
> +{
> + struct ibmvfc_crq *crq;
> + unsigned long flags;
> + int done = 0;
> +
> + spin_lock_irqsave(scrq->vhost->host->host_lock, flags);
> + while (!done) {
> + while ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
> + ibmvfc_handle_scrq(crq, scrq->vhost);
> + crq->valid = 0;
> + wmb();
> + }
> +
> + ibmvfc_toggle_scrq_irq(scrq, 1);
> + if ((crq = ibmvfc_next_scrq(scrq)) != NULL) {
> + ibmvfc_toggle_scrq_irq(scrq, 0);
> +     ibmvfc_handle_scrq(crq, scrq->vhost);
> + crq->valid = 0;
> + wmb();
> + } else
> + done = 1;
> + }
> + spin_unlock_irqrestore(scrq->vhost->host->host_lock, flags);
> +}
> +
>  /**
>   * ibmvfc_init_tgt - Set the next init job step for the target
>   * @tgt: ibmvfc target struct
> 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v3 04/18] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels

2020-12-04 Thread Brian King

On 12/2/20 8:07 PM, Tyrel Datwyler wrote:
> @@ -4983,6 +4993,118 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
>   return retrc;
>  }
>  
> +static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
> +   int index)
> +{
> + struct device *dev = vhost->dev;
> + struct vio_dev *vdev = to_vio_dev(dev);
> + struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index];
> + int rc = -ENOMEM;
> +
> + ENTER;
> +
> + scrq->msgs = (struct ibmvfc_sub_crq *)get_zeroed_page(GFP_KERNEL);
> + if (!scrq->msgs)
> + return rc;
> +
> + scrq->size = PAGE_SIZE / sizeof(*scrq->msgs);
> + scrq->msg_token = dma_map_single(dev, scrq->msgs, PAGE_SIZE,
> +  DMA_BIDIRECTIONAL);
> +
> + if (dma_mapping_error(dev, scrq->msg_token))
> + goto dma_map_failed;
> +
> + rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
> +>cookie, >hw_irq);
> +
> + if (rc) {
> + dev_warn(dev, "Error registering sub-crq: %d\n", rc);
> + if (rc == H_PARAMETER)
> + dev_warn_once(dev, "Firmware may not support MQ\n");
> + goto reg_failed;
> + }
> +
> + scrq->hwq_id = index;
> + scrq->vhost = vhost;
> +
> + LEAVE;
> + return 0;
> +
> +reg_failed:
> + dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
> +dma_map_failed:
> + free_page((unsigned long)scrq->msgs);
> + LEAVE;
> + return rc;
> +}
> +
> +static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int 
> index)
> +{
> + struct device *dev = vhost->dev;
> + struct vio_dev *vdev = to_vio_dev(dev);
> + struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index];
> + long rc;
> +
> + ENTER;
> +
> + do {
> + rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
> + scrq->cookie);
> + } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
> +
> + if (rc)
> + dev_err(dev, "Failed to free sub-crq[%d]: rc=%ld\n", index, rc);
> +
> + dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
> + free_page((unsigned long)scrq->msgs);
> + LEAVE;
> +}
> +
> +static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
> +{
> + int i, j;
> +
> + ENTER;
> +
> + vhost->scsi_scrqs.scrqs = kcalloc(IBMVFC_SCSI_HW_QUEUES,
> +   sizeof(*vhost->scsi_scrqs.scrqs),
> +   GFP_KERNEL);
> + if (!vhost->scsi_scrqs.scrqs)
> + return -1;
> +
> + for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++) {
> + if (ibmvfc_register_scsi_channel(vhost, i)) {
> + for (j = i; j > 0; j--)
> + ibmvfc_deregister_scsi_channel(vhost, j - 1);
> + kfree(vhost->scsi_scrqs.scrqs);
> + vhost->scsi_scrqs.scrqs = NULL;
> + vhost->scsi_scrqs.active_queues = 0;
> + LEAVE;
> + return -1;
> + }
> + }
> +
> + LEAVE;
> + return 0;
> +}
> +
> +static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
> +{
> + int i;
> +
> + ENTER;
> + if (!vhost->scsi_scrqs.scrqs)
> + return;
> +
> + for (i = 0; i < IBMVFC_SCSI_HW_QUEUES; i++)
> + ibmvfc_deregister_scsi_channel(vhost, i);
> +
> + kfree(vhost->scsi_scrqs.scrqs);
> + vhost->scsi_scrqs.scrqs = NULL;
> + vhost->scsi_scrqs.active_queues = 0;
> + LEAVE;
> +}
> +
>  /**
>   * ibmvfc_free_mem - Free memory for vhost
>   * @vhost:   ibmvfc host struct
> @@ -5239,6 +5361,12 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
> struct vio_device_id *id)
>   goto remove_shost;
>   }
>  
> + if (vhost->mq_enabled) {
> + rc = ibmvfc_init_sub_crqs(vhost);
> + if (rc)
> + dev_warn(dev, "Failed to allocate Sub-CRQs. rc=%d\n", 
> rc);

So, I think if you end up down this path, you will have:

vhost->scsi_scrqs.scrqs == NULL
vhost->scsi_scrqs.active_queues == 0

And you proceed with discovery. You will proceed with enquiry and channel setup.
Then, I think you could end up in queuecommand doing this:

evt->hwq = hwq % vhost->scsi_scrqs.active_queues;

And that is a divide by zero...

I wonder if it would be better in this scenario where registering the sub crqs 
fails,
if you just did:

vhost->do_enquiry = 0;
vhost->mq_enabled = 0;
vhost->using_channels = 0;

It looks like you only try to allocate the subcrqs in probe, so if that fails, 
we'd
never end up using mq, so just disabling in this case seems reasonable.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 01/17] ibmvfc: add vhost fields and defaults for MQ enablement

2020-12-04 Thread Brian King

On 12/2/20 11:27 AM, Tyrel Datwyler wrote:
> On 12/2/20 7:14 AM, Brian King wrote:
>> On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
>>> Introduce several new vhost fields for managing MQ state of the adapter
>>> as well as initial defaults for MQ enablement.
>>>
>>> Signed-off-by: Tyrel Datwyler 
>>> ---
>>>  drivers/scsi/ibmvscsi/ibmvfc.c |  9 -
>>>  drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++--
>>>  2 files changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> index 42e4d35e0d35..f1d677a7423d 100644
>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
>>> struct vio_device_id *id)
>>> }
>>>  
>>> shost->transportt = ibmvfc_transport_template;
>>> -   shost->can_queue = max_requests;
>>> +   shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);
>>
>> This doesn't look right. can_queue is the SCSI host queue depth, not the MQ 
>> queue depth.
> 
> Our max_requests is the total number commands allowed across all queues. From
> what I understand is can_queue is the total number of commands in flight 
> allowed
> for each hw queue.
> 
> /*
>  * In scsi-mq mode, the number of hardware queues supported by the 
> LLD.
>  *
>  * Note: it is assumed that each hardware queue has a queue depth of
>  * can_queue. In other words, the total queue depth per host
>  * is nr_hw_queues * can_queue. However, for when host_tagset is set,
>  * the total queue depth is can_queue.
>  */
> 
> We currently don't use the host wide shared tagset.

Ok. I missed that bit... In that case, since we allocate by default only 100
event structs. If we slice that across IBMVFC_SCSI_HW_QUEUES (16) queues, then
we end up with only about 6 commands that can be outstanding per queue,
which is going to really hurt performance... I'd suggest bumping up
IBMVFC_MAX_REQUESTS_DEFAULT from 100 to 1000 as a starting point.

Thanks,

Brian


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 17/17] ibmvfc: provide modules parameters for MQ settings

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> +module_param_named(mig_channels_only, mig_channels_only, uint, S_IRUGO | 
> S_IWUSR);
> +MODULE_PARM_DESC(mig_channels_only, "Prevent migration to non-channelized 
> system. "
> +  "[Default=" __stringify(IBMVFC_MIG_NO_SUB_TO_CRQ) "]");
> +module_param_named(mig_no_less_channels, mig_no_less_channels, uint, S_IRUGO 
> | S_IWUSR);
> +MODULE_PARM_DESC(mig_no_less_channels, "Prevent migration to system with 
> less channels. "
> +  "[Default=" __stringify(IBMVFC_MIG_NO_N_TO_M) "]");

Both of these are writeable, but it doesn't look like you do any re-negotiation
with the VIOS for these changed settings to take effect if someone changes
them at runtime.

> +
>  module_param_named(init_timeout, init_timeout, uint, S_IRUGO | S_IWUSR);
>  MODULE_PARM_DESC(init_timeout, "Initialization timeout in seconds. "
>"[Default=" __stringify(IBMVFC_INIT_TIMEOUT) "]");

> @@ -3228,6 +3250,36 @@ static ssize_t ibmvfc_store_log_level(struct device 
> *dev,
>   return strlen(buf);
>  }
>  
> +static ssize_t ibmvfc_show_scsi_channels(struct device *dev,
> +  struct device_attribute *attr, char 
> *buf)
> +{
> + struct Scsi_Host *shost = class_to_shost(dev);
> + struct ibmvfc_host *vhost = shost_priv(shost);
> + unsigned long flags = 0;
> + int len;
> +
> + spin_lock_irqsave(shost->host_lock, flags);
> + len = snprintf(buf, PAGE_SIZE, "%d\n", vhost->client_scsi_channels);
> + spin_unlock_irqrestore(shost->host_lock, flags);
> + return len;
> +}
> +
> +static ssize_t ibmvfc_store_scsi_channels(struct device *dev,
> +  struct device_attribute *attr,
> +  const char *buf, size_t count)
> +{
> + struct Scsi_Host *shost = class_to_shost(dev);
> + struct ibmvfc_host *vhost = shost_priv(shost);
> + unsigned long flags = 0;
> + unsigned int channels;
> +
> + spin_lock_irqsave(shost->host_lock, flags);
> + channels = simple_strtoul(buf, NULL, 10);
> + vhost->client_scsi_channels = min(channels, nr_scsi_hw_queues);

Don't we need to do a LIP here for this new setting to go into effect?

> + spin_unlock_irqrestore(shost->host_lock, flags);
> + return strlen(buf);
> +}
> +
>  static DEVICE_ATTR(partition_name, S_IRUGO, ibmvfc_show_host_partition_name, 
> NULL);
>  static DEVICE_ATTR(device_name, S_IRUGO, ibmvfc_show_host_device_name, NULL);
>  static DEVICE_ATTR(port_loc_code, S_IRUGO, ibmvfc_show_host_loc_code, NULL);



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 16/17] ibmvfc: enable MQ and set reasonable defaults

2020-12-02 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 15/17] ibmvfc: send Cancel MAD down each hw scsi channel

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> In general the client needs to send Cancel MADs and task management
> commands down the same channel as the command(s) intended to cancel or
> abort. The client assigns cancel keys per LUN and thus must send a
> Cancel down each channel commands were submitted for that LUN. Further,
> the client then must wait for those cancel completions prior to
> submitting a LUN RESET or ABORT TASK SET.
> 
> Allocate event pointers for each possible scsi channel and assign an
> event for each channel that requires a cancel. Wait for completion each
> submitted cancel.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 106 +
>  1 file changed, 68 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 0b6284020f06..97e8eed04b01 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -2339,32 +2339,52 @@ static int ibmvfc_cancel_all(struct scsi_device 
> *sdev, int type)
>  {
>   struct ibmvfc_host *vhost = shost_priv(sdev->host);
>   struct ibmvfc_event *evt, *found_evt;
> - union ibmvfc_iu rsp;
> - int rsp_rc = -EBUSY;
> + struct ibmvfc_event **evt_list;
> + union ibmvfc_iu *rsp;
> + int rsp_rc = 0;
>   unsigned long flags;
>   u16 status;
> + int num_hwq = 1;
> + int i;
> + int ret = 0;
>  
>   ENTER;
>   spin_lock_irqsave(vhost->host->host_lock, flags);
> - found_evt = NULL;
> - list_for_each_entry(evt, >sent, queue) {
> - if (evt->cmnd && evt->cmnd->device == sdev) {
> - found_evt = evt;
> - break;
> + if (vhost->using_channels && vhost->scsi_scrqs.active_queues)
> + num_hwq = vhost->scsi_scrqs.active_queues;
> +
> + evt_list = kcalloc(num_hwq, sizeof(*evt_list), GFP_KERNE> + rsp = 
> kcalloc(num_hwq, sizeof(*rsp), GFP_KERNEL);

Can't this just go on the stack? We don't want to be allocating memory
during error recovery. Or, alternatively, you could put this in the
vhost structure and protect it with a mutex. We only have enough events
to single thread these anyway.

> +
> + for (i = 0; i < num_hwq; i++) {
> + sdev_printk(KERN_INFO, sdev, "Cancelling outstanding commands 
> on queue %d.\n", i);

Prior to this patch, if there was nothing outstanding to the device and 
cancel_all was called,
no messages would get printed. This is changing that behavior. Is that 
intentional? Additionally,
it looks like this will get a lot more vebose, logging a message for each hw 
queue, regardless
of whether there was anything outstanding. Perhaps you want to move this down 
to after the check
for !found_evt?

> +
> + found_evt = NULL;
> + list_for_each_entry(evt, >sent, queue) {
> +     if (evt->cmnd && evt->cmnd->device == sdev && evt->hwq 
> == i) {
> + found_evt = evt;
> + break;
> + }
>   }
> - }
>  



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 14/17] ibmvfc: add cancel mad initialization helper

2020-12-02 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 06/17] ibmvfc: add handlers to drain and complete Sub-CRQ responses

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host 
> *vhost)
> +{
> + struct ibmvfc_event *evt = (struct ibmvfc_event 
> *)be64_to_cpu(crq->ioba);
> + unsigned long flags;
> +
> + switch (crq->valid) {
> + case IBMVFC_CRQ_CMD_RSP:
> + break;
> + case IBMVFC_CRQ_XPORT_EVENT:
> + return;
> + default:
> + dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", 
> crq->valid);
> + return;
> + }
> +
> + /* The only kind of payload CRQs we should get are responses to
> +  * things we send. Make sure this response is to something we
> +  * actually sent
> +  */
> + if (unlikely(!ibmvfc_valid_event(>pool, evt))) {
> + dev_err(vhost->dev, "Returned correlation_token 0x%08llx is 
> invalid!\n",
> + crq->ioba);
> + return;
> + }
> +
> + if (unlikely(atomic_read(>free))) {
> + dev_err(vhost->dev, "Received duplicate correlation_token 
> 0x%08llx!\n",
> + crq->ioba);
> + return;
> + }
> +
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + del_timer(>timer);
> + list_del(>queue);
> + ibmvfc_trc_end(evt);

Another thought here... If you are going through ibmvfc_purge_requests at the 
same time
as this code, you could check the free bit above, then have 
ibmvfc_purge_requests
put the event on the free queue and call scsi_done, then you come down and get 
the host
lock here, remove the command from the free list, and call the done function 
again,
which could result in a double completion to the scsi layer.

I think you need to grab the host lock before you check the free bit to avoid 
this race.

> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> + evt->done(evt);
> +}
> +


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 09/17] ibmvfc: implement channel enquiry and setup commands

2020-12-02 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 06/17] ibmvfc: add handlers to drain and complete Sub-CRQ responses

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> The logic for iterating over the Sub-CRQ responses is similiar to that
> of the primary CRQ. Add the necessary handlers for processing those
> responses.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 77 ++
>  1 file changed, 77 insertions(+)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 97f00fefa809..e9da3f60c793 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -3381,6 +3381,83 @@ static int ibmvfc_toggle_scrq_irq(struct 
> ibmvfc_sub_queue *scrq, int enable)
>   return rc;
>  }
>  
> +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host 
> *vhost)
> +{
> + struct ibmvfc_event *evt = (struct ibmvfc_event 
> *)be64_to_cpu(crq->ioba);
> + unsigned long flags;
> +
> + switch (crq->valid) {
> + case IBMVFC_CRQ_CMD_RSP:
> + break;
> + case IBMVFC_CRQ_XPORT_EVENT:
> + return;
> + default:
> + dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", 
> crq->valid);
> + return;
> + }
> +
> + /* The only kind of payload CRQs we should get are responses to
> +  * things we send. Make sure this response is to something we
> +  * actually sent
> +  */
> + if (unlikely(!ibmvfc_valid_event(>pool, evt))) {
> + dev_err(vhost->dev, "Returned correlation_token 0x%08llx is 
> invalid!\n",
> + crq->ioba);
> + return;
> + }
> +
> + if (unlikely(atomic_read(>free))) {
> + dev_err(vhost->dev, "Received duplicate correlation_token 
> 0x%08llx!\n",
> + crq->ioba);
> + return;
> + }
> +
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + del_timer(>timer);
> + list_del(>queue);
> + ibmvfc_trc_end(evt);
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> + evt->done(evt);
> +}
> +
> +static struct ibmvfc_crq *ibmvfc_next_scrq(struct ibmvfc_sub_queue *scrq)
> +{
> + struct ibmvfc_crq *crq;
> +
> + crq = >msgs[scrq->cur].crq;
> + if (crq->valid & 0x80) {
> + if (++scrq->cur == scrq->size)

You are incrementing the cur pointer without any locks held. Although
unlikely, could you also be in ibmvfc_reset_crq in another thread?
If so, you'd have a subtle race condition here where the cur pointer could
be read, then ibmvfc_reset_crq writes it to zero, then this thread
writes it to a non zero value, which would then cause you to be out of
sync with the VIOS as to where the cur pointer is.

> + scrq->cur = 0;
> + rmb();
> + } else
> + crq = NULL;
> +
> + return crq;
> +}
> +



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 04/17] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> +static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
> +   int index)
> +{
> + struct device *dev = vhost->dev;
> + struct vio_dev *vdev = to_vio_dev(dev);
> + struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index];
> + int rc = -ENOMEM;
> +
> + ENTER;
> +
> + scrq->msgs = (struct ibmvfc_sub_crq *)get_zeroed_page(GFP_KERNEL);
> + if (!scrq->msgs)
> + return rc;
> +
> + scrq->size = PAGE_SIZE / sizeof(*scrq->msgs);
> + scrq->msg_token = dma_map_single(dev, scrq->msgs, PAGE_SIZE,
> +  DMA_BIDIRECTIONAL);
> +
> + if (dma_mapping_error(dev, scrq->msg_token))
> + goto dma_map_failed;
> +
> + rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
> +>cookie, >hw_irq);
> +
> + if (rc) {
> + dev_warn(dev, "Error registering sub-crq: %d\n", rc);
> + dev_warn(dev, "Firmware may not support MQ\n");

Will this now get logged everywhere this new driver runs if the firmware
does not support sub CRQs? Is there something better that could be done
here to only log this for a true error and not just because a new driver
is running with an older firmware release?

> + goto reg_failed;
> + }
> +
> + scrq->hwq_id = index;
> + scrq->vhost = vhost;
> +
> + LEAVE;
> + return 0;
> +
> +reg_failed:
> + dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
> +dma_map_failed:
> + free_page((unsigned long)scrq->msgs);
> + LEAVE;
> + return rc;
> +}
> +



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH v2 01/17] ibmvfc: add vhost fields and defaults for MQ enablement

2020-12-02 Thread Brian King

On 12/1/20 6:53 PM, Tyrel Datwyler wrote:
> Introduce several new vhost fields for managing MQ state of the adapter
> as well as initial defaults for MQ enablement.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c |  9 -
>  drivers/scsi/ibmvscsi/ibmvfc.h | 13 +++--
>  2 files changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 42e4d35e0d35..f1d677a7423d 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -5161,12 +5161,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const 
> struct vio_device_id *id)
>   }
>  
>   shost->transportt = ibmvfc_transport_template;
> - shost->can_queue = max_requests;
> + shost->can_queue = (max_requests / IBMVFC_SCSI_HW_QUEUES);

This doesn't look right. can_queue is the SCSI host queue depth, not the MQ 
queue depth.

>   shost->max_lun = max_lun;
>   shost->max_id = max_targets;
>   shost->max_sectors = IBMVFC_MAX_SECTORS;
>   shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
>   shost->unique_id = shost->host_no;
> + shost->nr_hw_queues = IBMVFC_SCSI_HW_QUEUES;
>  
>   vhost = shost_priv(shost);
>   INIT_LIST_HEAD(>sent);



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 13/13] ibmvfc: register Sub-CRQ handles with VIOS during channel setup

2020-11-27 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 12/13] ibmvfc: send commands down HW Sub-CRQ when channelized

2020-11-27 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 10/13] ibmvfc: advertise client support for using hardware channels

2020-11-27 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 11/13] ibmvfc: set and track hw queue in ibmvfc_event struct

2020-11-27 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 09/13] ibmvfc: implement channel enquiry and setup commands

2020-11-27 Thread Brian King

On 11/25/20 7:48 PM, Tyrel Datwyler wrote:
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c

> @@ -4462,6 +4464,118 @@ static void ibmvfc_discover_targets(struct 
> ibmvfc_host *vhost)
>   ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
>  }
>  
> +static void ibmvfc_channel_setup_done(struct ibmvfc_event *evt)
> +{
> + struct ibmvfc_host *vhost = evt->vhost;
> + u32 mad_status = be16_to_cpu(evt->xfer_iu->channel_setup.common.status);
> + int level = IBMVFC_DEFAULT_LOG_LEVEL;
> +
> + ibmvfc_free_event(evt);
> +
> + switch (mad_status) {
> + case IBMVFC_MAD_SUCCESS:
> + ibmvfc_dbg(vhost, "Channel Setup succeded\n");
> + vhost->do_enquiry = 0;
> + break;
> + case IBMVFC_MAD_FAILED:
> + level += ibmvfc_retry_host_init(vhost);
> + ibmvfc_log(vhost, level, "Channel Setup failed\n");
> + fallthrough;
> + case IBMVFC_MAD_DRIVER_FAILED:
> + return;
> + default:
> + dev_err(vhost->dev, "Invalid Channel Setup response: 0x%x\n",
> + mad_status);
> + ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
> + return;
> + }
> +
> + ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_QUERY);
> + wake_up(>work_wait_q);
> +}
> +
> +static void ibmvfc_channel_setup(struct ibmvfc_host *vhost)
> +{
> + struct ibmvfc_channel_setup_mad *mad;
> + struct ibmvfc_channel_setup *setup_buf = vhost->channel_setup_buf;
> + struct ibmvfc_event *evt = ibmvfc_get_event(vhost);
> +
> + memset(setup_buf, 0, sizeof(*setup_buf));
> + setup_buf->flags = cpu_to_be32(IBMVFC_CANCEL_CHANNELS);
> +
> + ibmvfc_init_event(evt, ibmvfc_channel_setup_done, IBMVFC_MAD_FORMAT);
> + mad = >iu.channel_setup;
> + memset(mad, 0, sizeof(*mad));
> + mad->common.version = cpu_to_be32(1);
> + mad->common.opcode = cpu_to_be32(IBMVFC_CHANNEL_SETUP);
> + mad->common.length = cpu_to_be16(sizeof(*mad));
> + mad->buffer.va = cpu_to_be64(vhost->channel_setup_dma);
> + mad->buffer.len = cpu_to_be32(sizeof(*vhost->channel_setup_buf));
> +
> + ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_INIT_WAIT);
> +
> + if (!ibmvfc_send_event(evt, vhost, default_timeout))
> + ibmvfc_dbg(vhost, "Sent channel setup\n");
> + else
> + ibmvfc_link_down(vhost, IBMVFC_LINK_DOWN);
> +}
> +
> +static void ibmvfc_channel_enquiry_done(struct ibmvfc_event *evt)
> +{
> + struct ibmvfc_host *vhost = evt->vhost;
> + struct ibmvfc_channel_enquiry *rsp = >xfer_iu->channel_enquiry;
> + u32 mad_status = be16_to_cpu(rsp->common.status);
> + int level = IBMVFC_DEFAULT_LOG_LEVEL;
> +
> + switch (mad_status) {
> + case IBMVFC_MAD_SUCCESS:
> + ibmvfc_dbg(vhost, "Channel Enquiry succeeded\n");
> + vhost->max_vios_scsi_channels = 
> be32_to_cpu(rsp->num_scsi_subq_channels);

You need a ibmvfc_free_event(evt) here so you don't leak events.

> + break;
> + case IBMVFC_MAD_FAILED:
> + level += ibmvfc_retry_host_init(vhost);
> + ibmvfc_log(vhost, level, "Channel Enquiry failed\n");
> + ibmvfc_free_event(evt);

Looks like you are freeing this event twice due to the fallthrough...

> + fallthrough;
> + case IBMVFC_MAD_DRIVER_FAILED:
> +     ibmvfc_free_event(evt);
> + return;
> + default:
> + dev_err(vhost->dev, "Invalid Channel Enquiry response: 0x%x\n",
> + mad_status);
> + ibmvfc_link_down(vhost, IBMVFC_LINK_DEAD);
> + ibmvfc_free_event(evt);
> + return;
> + }
> +
> + ibmvfc_channel_setup(vhost);
> +}
> +



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 08/13] ibmvfc: map/request irq and register Sub-CRQ interrupt handler

2020-11-27 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 07/13] ibmvfc: define Sub-CRQ interrupt handler routine

2020-11-27 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 06/13] ibmvfc: add handlers to drain and complete Sub-CRQ responses

2020-11-27 Thread Brian King

On 11/25/20 7:48 PM, Tyrel Datwyler wrote:
> The logic for iterating over the Sub-CRQ responses is similiar to that
> of the primary CRQ. Add the necessary handlers for processing those
> responses.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 72 ++
>  1 file changed, 72 insertions(+)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 6eaedda4917a..a8730522920e 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -3371,6 +3371,78 @@ static int ibmvfc_toggle_scrq_irq(struct 
> ibmvfc_sub_queue *scrq, int enable)
>   return rc;
>  }
>  
> +static void ibmvfc_handle_scrq(struct ibmvfc_crq *crq, struct ibmvfc_host 
> *vhost)
> +{
> + struct ibmvfc_event *evt = (struct ibmvfc_event 
> *)be64_to_cpu(crq->ioba);
> +
> + switch (crq->valid) {
> + case IBMVFC_CRQ_CMD_RSP:
> + break;
> + default:
> + dev_err(vhost->dev, "Got and invalid message type 0x%02x\n", 
> crq->valid);

Is this correct? Can't we get transport events here as well?

> + return;
> + }
> +
> + /* The only kind of payload CRQs we should get are responses to
> +  * things we send. Make sure this response is to something we
> +  * actually sent
> +  */
> + if (unlikely(!ibmvfc_valid_event(>pool, evt))) {
> + dev_err(vhost->dev, "Returned correlation_token 0x%08llx is 
> invalid!\n",
> + crq->ioba);
> + return;
> + }
> +
> + if (unlikely(atomic_read(>free))) {
> + dev_err(vhost->dev, "Received duplicate correlation_token 
> 0x%08llx!\n",
> +     crq->ioba);
> + return;
> + }
> +
> + del_timer(>timer);
> + list_del(>queue);
> + ibmvfc_trc_end(evt);
> + evt->done(evt);
> +}
> +



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 05/13] ibmvfc: add Sub-CRQ IRQ enable/disable routine

2020-11-27 Thread Brian King

Reviewed-by: Brian King 

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 04/13] ibmvfc: add alloc/dealloc routines for SCSI Sub-CRQ Channels

2020-11-27 Thread Brian King

On 11/25/20 7:48 PM, Tyrel Datwyler wrote:
> Allocate a set of Sub-CRQs in advance. During channel setup the client
> and VIOS negotiate the number of queues the VIOS supports and the number
> that the client desires to request. Its possible that the final channel
> resources allocated is less than requested, but the client is still
> responsible for sending handles for every queue it is hoping for.
> 
> Also, provide deallocation cleanup routines.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 115 +
>  drivers/scsi/ibmvscsi/ibmvfc.h |   1 +
>  2 files changed, 116 insertions(+)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index 260b82e3cc01..571abdb48384 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -4983,6 +4983,114 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
>   return retrc;
>  }
>  
> +static int ibmvfc_register_scsi_channel(struct ibmvfc_host *vhost,
> +   int index)
> +{
> + struct device *dev = vhost->dev;
> + struct vio_dev *vdev = to_vio_dev(dev);
> + struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index];
> + int rc = -ENOMEM;
> +
> + ENTER;
> +
> + scrq->msgs = (struct ibmvfc_sub_crq *)get_zeroed_page(GFP_KERNEL);
> + if (!scrq->msgs)
> + return rc;
> +
> + scrq->size = PAGE_SIZE / sizeof(*scrq->msgs);
> + scrq->msg_token = dma_map_single(dev, scrq->msgs, PAGE_SIZE,
> +  DMA_BIDIRECTIONAL);
> +
> + if (dma_mapping_error(dev, scrq->msg_token))
> + goto dma_map_failed;
> +
> + rc = h_reg_sub_crq(vdev->unit_address, scrq->msg_token, PAGE_SIZE,
> +>cookie, >hw_irq);
> +
> + if (rc) {
> + dev_warn(dev, "Error registering sub-crq: %d\n", rc);
> + dev_warn(dev, "Firmware may not support MQ\n");
> + goto reg_failed;
> + }
> +
> + scrq->hwq_id = index;
> + scrq->vhost = vhost;
> +
> + LEAVE;
> + return 0;
> +
> +reg_failed:
> + dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
> +dma_map_failed:
> + free_page((unsigned long)scrq->msgs);
> + LEAVE;
> + return rc;
> +}
> +
> +static void ibmvfc_deregister_scsi_channel(struct ibmvfc_host *vhost, int 
> index)
> +{
> + struct device *dev = vhost->dev;
> + struct vio_dev *vdev = to_vio_dev(dev);
> + struct ibmvfc_sub_queue *scrq = >scsi_scrqs.scrqs[index];
> + long rc;
> +
> + ENTER;
> +
> + do {
> + rc = plpar_hcall_norets(H_FREE_SUB_CRQ, vdev->unit_address,
> + scrq->cookie);
> + } while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
> +
> + if (rc)
> + dev_err(dev, "Failed to free sub-crq[%d]: rc=%ld\n", index, rc);
> +
> + dma_unmap_single(dev, scrq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
> + free_page((unsigned long)scrq->msgs);
> + LEAVE;
> +}
> +
> +static int ibmvfc_init_sub_crqs(struct ibmvfc_host *vhost)
> +{
> + int i, j;
> +
> + ENTER;
> +
> + vhost->scsi_scrqs.scrqs = kcalloc(vhost->client_scsi_channels,
> +   sizeof(*vhost->scsi_scrqs.scrqs),
> +   GFP_KERNEL);
> + if (!vhost->scsi_scrqs.scrqs)
> + return -1;
> +
> + for (i = 0; i < vhost->client_scsi_channels; i++) {
> + if (ibmvfc_register_scsi_channel(vhost, i)) {
> + for (j = i; j > 0; j--)
> + ibmvfc_deregister_scsi_channel(vhost, j - 1);
> + kfree(vhost->scsi_scrqs.scrqs);
> + LEAVE;
> + return -1;
> + }
> + }
> +
> + LEAVE;
> + return 0;
> +}
> +
> +static void ibmvfc_release_sub_crqs(struct ibmvfc_host *vhost)
> +{
> + int i;
> +
> + ENTER;
> + if (!vhost->scsi_scrqs.scrqs)
> + return;
> +
> + for (i = 0; i < vhost->client_scsi_channels; i++)
> + ibmvfc_deregister_scsi_channel(vhost, i);
> +
> + vhost->scsi_scrqs.active_queues = 0;
> + kfree(vhost->scsi_scrqs.scrqs);

Do you want to NULL this out after you free it do you don't keep
a reference to a freed page around?

> + LEAVE;
> +}
> +
>  /**
>   * ibmvfc_free_mem - Free memory for vhost
>   * @vhost:   ibmvfc host struct



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 03/13] ibmvfc: add Subordinate CRQ definitions

2020-11-27 Thread Brian King

Reviewed-by: Brian King 

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 02/13] ibmvfc: define hcall wrapper for registering a Sub-CRQ

2020-11-27 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 01/13] ibmvfc: add vhost fields and defaults for MQ enablement

2020-11-27 Thread Brian King

On 11/25/20 7:48 PM, Tyrel Datwyler wrote:
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
> index 9d58cfd774d3..8225bdbb127e 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.h
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.h
> @@ -41,6 +41,11 @@
>  #define IBMVFC_DEFAULT_LOG_LEVEL 2
>  #define IBMVFC_MAX_CDB_LEN   16
>  #define IBMVFC_CLS3_ERROR0
> +#define IBMVFC_MQ0

Given that IBMVFC_MQ is getting set to 0 here, that means mq_enabled is also
always zero, so am I correct that a lot of this code being added is not
yet capable of being executed?

> +#define IBMVFC_SCSI_CHANNELS 0

Similar comment here...

> +#define IBMVFC_SCSI_HW_QUEUES1

I don't see any subsequent patches in this series that would ever result
in nr_hw_queues getting set to anything other than 1. Is that future work
planned or am I missing something?

> +#define IBMVFC_MIG_NO_SUB_TO_CRQ 0
> +#define IBMVFC_MIG_NO_N_TO_M 0
>  
>  /*
>   * Ensure we have resources for ERP and initialization:
> @@ -826,6 +831,10 @@ struct ibmvfc_host {
>   int delay_init;
>   int scan_complete;
>   int logged_in;
> + int mq_enabled;
> + int using_channels;
> + int do_enquiry;
> + int client_scsi_channels;
>   int aborting_passthru;
>   int events_to_log;
>  #define IBMVFC_AE_LINKUP 0x0001
> 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 4/6] ibmvfc: add FC payload retrieval routines for versioned vfcFrames

2020-11-17 Thread Brian King

On 11/11/20 7:04 PM, Tyrel Datwyler wrote:
> The FC iu and response payloads are located at different offsets
> depending on the ibmvfc_cmd version. This is a result of the version 2
> vfcFrame definition adding an extra 64bytes of reserved space to the
> structure prior to the payloads.
> 
> Add helper routines to determine the current vfcFrame version and
> returning pointers to the proper iu or response structures within that
> ibmvfc_cmd.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvfc.c | 76 --
>  1 file changed, 53 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index aa3445bec42c..5e666f7c9266 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -138,6 +138,22 @@ static void ibmvfc_tgt_move_login(struct ibmvfc_target 
> *);
>  
>  static const char *unknown_error = "unknown error";
>  
> +static struct ibmvfc_fcp_cmd_iu *ibmvfc_get_fcp_iu(struct ibmvfc_host 
> *vhost, struct ibmvfc_cmd *vfc_cmd)
> +{
> + if (be64_to_cpu(vhost->login_buf->resp.capabilities) & 
> IBMVFC_HANDLE_VF_WWPN)

Suggest adding a flag to the vhost structure that you setup after login in 
order to
simplify this check and avoid chasing multiple pointers along with a byte swap.

Maybe something like:

vhost->is_v2

> + return _cmd->v2.iu;
> + else
> + return _cmd->v1.iu;
> +}
> +
> +static struct ibmvfc_fcp_rsp *ibmvfc_get_fcp_rsp(struct ibmvfc_host *vhost, 
> struct ibmvfc_cmd *vfc_cmd)
> +{
> + if (be64_to_cpu(vhost->login_buf->resp.capabilities) & 
> IBMVFC_HANDLE_VF_WWPN)

Same here

> + return _cmd->v2.rsp;
> +     else
> + return _cmd->v1.rsp;
> +}
> +
>  #ifdef CONFIG_SCSI_IBMVFC_TRACE
>  /**
>   * ibmvfc_trc_start - Log a start trace entry



-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 6/6] ibmvfc: advertise client support for targetWWPN using v2 commands

2020-11-17 Thread Brian King

Everything else in this series looks OK to me.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 4/6] ibmvfc: add FC payload retrieval routines for versioned vfcFrames

2020-11-17 Thread Brian King

On 11/17/20 4:14 PM, Brian King wrote:
> On 11/11/20 7:04 PM, Tyrel Datwyler wrote:
>> The FC iu and response payloads are located at different offsets
>> depending on the ibmvfc_cmd version. This is a result of the version 2
>> vfcFrame definition adding an extra 64bytes of reserved space to the
>> structure prior to the payloads.
>>
>> Add helper routines to determine the current vfcFrame version and
>> returning pointers to the proper iu or response structures within that
>> ibmvfc_cmd.
>>
>> Signed-off-by: Tyrel Datwyler 
>> ---
>>  drivers/scsi/ibmvscsi/ibmvfc.c | 76 --
>>  1 file changed, 53 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>> index aa3445bec42c..5e666f7c9266 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>> @@ -138,6 +138,22 @@ static void ibmvfc_tgt_move_login(struct ibmvfc_target 
>> *);
>>  
>>  static const char *unknown_error = "unknown error";
>>  
>> +static struct ibmvfc_fcp_cmd_iu *ibmvfc_get_fcp_iu(struct ibmvfc_host 
>> *vhost, struct ibmvfc_cmd *vfc_cmd)
>> +{
>> +if (be64_to_cpu(vhost->login_buf->resp.capabilities) & 
>> IBMVFC_HANDLE_VF_WWPN)
> 
> Suggest adding a flag to the vhost structure that you setup after login in 
> order to
> simplify this check and avoid chasing multiple pointers along with a byte 
> swap.
> 
> Maybe something like:
> 
> vhost->is_v2

Even better might be vhost->version which you'd set to 1 or 2 and then you 
could directly
use that to set the field in the command structures later.


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 3/6] ibmvfc: add new fields for version 2 of several MADs

2020-11-17 Thread Brian King

On 11/11/20 7:04 PM, Tyrel Datwyler wrote:
> @@ -211,7 +214,9 @@ struct ibmvfc_npiv_login_resp {
>   __be64 capabilities;
>  #define IBMVFC_CAN_FLUSH_ON_HALT 0x08
>  #define IBMVFC_CAN_SUPPRESS_ABTS 0x10
> -#define IBMVFC_CAN_SUPPORT_CHANNELS  0x20
> +#define IBMVFC_MAD_VERSION_CAP   0x20
> +#define IBMVFC_HANDLE_VF_WWPN0x40
> +#define IBMVFC_CAN_SUPPORT_CHANNELS  0x80
>   __be32 max_cmds;
>   __be32 scsi_id_sz;
>   __be64 max_dma_len;
> @@ -293,6 +298,7 @@ struct ibmvfc_port_login {
>   __be32 reserved2;
>   struct ibmvfc_service_parms service_parms;
>   struct ibmvfc_service_parms service_parms_change;
> + __be64 targetWWPN;

For consistency, can you make this target_wwpn?

>   __be64 reserved3[2];
>  } __packed __aligned(8);
>  
> @@ -344,6 +350,7 @@ struct ibmvfc_process_login {
>   __be16 status;
>   __be16 error;   /* also fc_reason */
>   __be32 reserved2;
> + __be64 targetWWPN;

For consistency, can you make this target_wwpn?

>   __be64 reserved3[2];
>  } __packed __aligned(8);
>  
> @@ -378,6 +385,8 @@ struct ibmvfc_tmf {
>   __be32 cancel_key;
>   __be32 my_cancel_key;
>   __be32 pad;
> + __be64 targetWWPN;

For consistency, can you make this target_wwpn?

> + __be64 taskTag;

and make this task_tag. 

>   __be64 reserved[2];
>  } __packed __aligned(8);
>  
> @@ -474,9 +483,19 @@ struct ibmvfc_cmd {
>   __be64 correlation;
>   __be64 tgt_scsi_id;
>   __be64 tag;
> - __be64 reserved3[2];
> - struct ibmvfc_fcp_cmd_iu iu;
> - struct ibmvfc_fcp_rsp rsp;
> + __be64 targetWWPN;

For consistency, can you make this target_wwpn?

> + __be64 reserved3;
> + union {
> + struct {
> + struct ibmvfc_fcp_cmd_iu iu;
> + struct ibmvfc_fcp_rsp rsp;
> + } v1;
> + struct {
> + __be64 reserved4;
> + struct ibmvfc_fcp_cmd_iu iu;
> + struct ibmvfc_fcp_rsp rsp;
> + } v2;
> + };
>  } __packed __aligned(8);
>  
>  struct ibmvfc_passthru_fc_iu {
> @@ -503,6 +522,7 @@ struct ibmvfc_passthru_iu {
>   __be64 correlation;
>   __be64 scsi_id;
>   __be64 tag;
> + __be64 targetWWPN;

For consistency, can you make this target_wwpn?

>   __be64 reserved2[2];
>  } __packed __aligned(8);
>  
> 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH net-next 02/12] ibmvnic: Introduce indirect subordinate Command Response Queue buffer

2020-11-13 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH net-next 01/12] ibmvnic: Ensure that subCRQ entry reads are ordered

2020-11-13 Thread Brian King

Reviewed-by: Brian King 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

[PATCH] ibmvfc: Protect vhost->task_set increment by the host lock

2020-09-16 Thread Brian King

In the discovery thread, ibmvfc does a vhost->task_set++ without
any lock held. This could result in two targets getting the same
cancel key, which could have strange effects in error recovery.
The actual probability of this occurring should be extremely
small, since this should all be done in a single threaded loop
from the discovery thread, but let's fix it up anyway to be safe.

Signed-off-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 322bb30..b393587 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4169,11 +4169,11 @@ static int ibmvfc_alloc_target(struct ibmvfc_host 
*vhost,struct ibmvfc_discover_
tgt->wwpn = wwpn;
tgt->vhost = vhost;
tgt->need_login = 1;
-   tgt->cancel_key = vhost->task_set++;
timer_setup(>timer, ibmvfc_adisc_timeout, 0);
kref_init(>kref);
ibmvfc_init_tgt(tgt, ibmvfc_tgt_implicit_logout);
spin_lock_irqsave(vhost->host->host_lock, flags);
+   tgt->cancel_key = vhost->task_set++;
list_add_tail(>queue, >targets);
 
 unlock_out:
-- 
1.8.3.1

Re: [PATCH] ibmvfc: Avoid link down on FS9100 canister reboot

2020-09-16 Thread Brian King

On 9/15/20 7:49 PM, Martin K. Petersen wrote:
> 
> Brian,
> 
>> When a canister on a FS9100, or similar storage, running in NPIV mode,
>> is rebooted, its WWPNs will fail over to another canister.
> 
> [...]
> 
> Applied to 5.10/scsi-staging, thanks! I fixed a bunch of checkpatch
> warnings.

Sorry about the checkpatch issues. Thanks for pulling this in.

-Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

[PATCH] ibmvfc: Avoid link down on FS9100 canister reboot

2020-09-11 Thread Brian King

When a canister on a FS9100, or similar storage, running in NPIV mode,
is rebooted, its WWPNs will fail over to another canister. When this
occurs, we see a WWPN going away from the fabric at one N-Port ID,
and, a short time later, the same WWPN appears at a different N-Port ID.
When the canister is fully operational again, the WWPNs fail back to
the original canister. If there is any I/O outstanding to the target
when this occurs, it will result in the implicit logout the ibmvfc driver
issues before removing the rport to fail. When the WWPN then shows up at a
different N-Port ID, and we issue a PLOGI to it, the VIOS will
see that it still has a login for this WWPN at the old N-Port ID,
which results in the VIOS simulating a link down / link up sequence
to the client, in order to get the VIOS and client LPAR in sync.

The patch below improves the way we handle this scenario so as to
avoid the link bounce, which affects all targets under the virtual
host adapter. The change is to utilize the Move Login MAD, which
will work even when I/O is outstanding to the target. The change
only alters the target state machine for the case where the implicit
logout fails prior to deleting the rport. If this implicit logout fails,
we defer deleting the ibmvfc_target object after calling
fc_remote_port_delete. This enables us to later retry the implicit logout
after terminate_rport_io occurs, or to issue the Move Login request if
a WWPN shows up at a new N-Port ID prior to this occurring.

This has been tested by IBM's storage interoperability team
on a FS9100, forcing the failover to occur. With debug tracing enabled
in the ibmvfc driver, we confirmed the move login was sent
in this scenario and confirmed the link bounce no longer occurred.

Signed-off-by: Brian King 
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 211 ++---
 drivers/scsi/ibmvscsi/ibmvfc.h |  38 +++-
 2 files changed, 232 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 175a165..322bb30 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -134,6 +134,7 @@
 static void ibmvfc_tgt_query_target(struct ibmvfc_target *);
 static void ibmvfc_npiv_logout(struct ibmvfc_host *);
 static void ibmvfc_tgt_implicit_logout_and_del(struct ibmvfc_target *);
+static void ibmvfc_tgt_move_login(struct ibmvfc_target *);
 
 static const char *unknown_error = "unknown error";
 
@@ -431,7 +432,20 @@ static int ibmvfc_set_tgt_action(struct ibmvfc_target *tgt,
}
break;
case IBMVFC_TGT_ACTION_LOGOUT_RPORT_WAIT:
-   if (action == IBMVFC_TGT_ACTION_DEL_RPORT) {
+   if (action == IBMVFC_TGT_ACTION_DEL_RPORT ||
+   action == IBMVFC_TGT_ACTION_DEL_AND_LOGOUT_RPORT) {
+   tgt->action = action;
+   rc = 0;
+   }
+   break;
+   case IBMVFC_TGT_ACTION_LOGOUT_DELETED_RPORT:
+   if (action == IBMVFC_TGT_ACTION_LOGOUT_RPORT) {
+   tgt->action = action;
+   rc = 0;
+   }
+   break;
+   case IBMVFC_TGT_ACTION_DEL_AND_LOGOUT_RPORT:
+   if (action == IBMVFC_TGT_ACTION_LOGOUT_DELETED_RPORT) {
tgt->action = action;
rc = 0;
}
@@ -441,16 +455,18 @@ static int ibmvfc_set_tgt_action(struct ibmvfc_target 
*tgt,
tgt->action = action;
rc = 0;
}
+   break;
case IBMVFC_TGT_ACTION_DELETED_RPORT:
break;
default:
-   if (action >= IBMVFC_TGT_ACTION_LOGOUT_RPORT)
-   tgt->add_rport = 0;
tgt->action = action;
rc = 0;
break;
}
 
+   if (action >= IBMVFC_TGT_ACTION_LOGOUT_RPORT)
+   tgt->add_rport = 0;
+
return rc;
 }
 
@@ -548,7 +564,8 @@ static void ibmvfc_set_host_action(struct ibmvfc_host 
*vhost,
  **/
 static void ibmvfc_reinit_host(struct ibmvfc_host *vhost)
 {
-   if (vhost->action == IBMVFC_HOST_ACTION_NONE) {
+   if (vhost->action == IBMVFC_HOST_ACTION_NONE &&
+   vhost->state == IBMVFC_ACTIVE) {
if (!ibmvfc_set_host_state(vhost, IBMVFC_INITIALIZING)) {
scsi_block_requests(vhost->host);
ibmvfc_set_host_action(vhost, IBMVFC_HOST_ACTION_QUERY);
@@ -2574,7 +2591,9 @@ static void ibmvfc_terminate_rport_io(struct fc_rport 
*rport)
struct ibmvfc_host *vhost = shost_priv(shost);
struct fc_rport *dev_rport;
struct scsi_device *sdev;
-   unsigned long rc;
+   struct ibmvfc_target *tgt;
+   unsigned long rc, flags;
+   unsigned int found;
 
ENTER;
shost_for_each

Re: [PATCH v4 5/7] powerpc/iommu: Move iommu_table cleaning routine to iommu_table_clean

2020-07-21 Thread Brian King

On 7/21/20 5:13 PM, Leonardo Bras wrote:
> On Tue, 2020-07-21 at 14:59 +1000, Alexey Kardashevskiy wrote:
>>
>> On 16/07/2020 17:16, Leonardo Bras wrote:
>>> Move the part of iommu_table_free() that does struct iommu_table cleaning
>>> into iommu_table_clean, so we can invoke it separately.
>>>
>>> This new function is useful for cleaning struct iommu_table before
>>> initializing it again with a new DMA window, without having it freed and
>>> allocated again.
>>>
>>> Signed-off-by: Leonardo Bras 
>>> ---
>>>  arch/powerpc/kernel/iommu.c | 30 ++
>>>  1 file changed, 18 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
>>> index 9704f3f76e63..c3242253a4e7 100644
>>> --- a/arch/powerpc/kernel/iommu.c
>>> +++ b/arch/powerpc/kernel/iommu.c
>>> @@ -735,21 +735,10 @@ struct iommu_table *iommu_init_table(struct 
>>> iommu_table *tbl, int nid,
>>> return tbl;
>>>  }
>>>  
>>> -static void iommu_table_free(struct kref *kref)
>>> +static void iommu_table_clean(struct iommu_table *tbl)
>>
>> iommu_table_free() + iommu_init_table() + set_iommu_table_base() should
>> work too, why new helper?
> 
> iommu_table_free() also frees the tbl, which would need allocate it
> again (new address) and to fill it up again, unnecessarily. 
> I think it's a better approach to only change what is needed.
> 
>> There is also iommu_table_clear() which does a different thing so you
>> need a better name.
> 
> I agree.
> I had not noticed this other function before sending the patchset. What
> would be a better name though? __iommu_table_free()? 
> 
>> Second, iommu_table_free
>> use and it would be ok as we would only see this when hot-unplugging a
>> PE because we always kept the default window.
>> Btw you must be seeing these warnings now every time you create DDW with
>> these patches as at least the first page is reserved, do not you?
> 
> It does not print a warning.
> I noticed other warnings, but not this one from iommu_table_free():
> /* verify that table contains no entries */
> if (!bitmap_empty(tbl->it_ma
> p, tbl->it_size))
>   pr_warn("%s: Unexpected TCEs\n", __func__);
> 
> Before that, iommu_table_release_pages(tbl) is supposed to clear the 
> bitmap, so this only tests for a tce that is created in this short period.
> 
>> Since we are replacing a table for a device which is still in the
>> system, we should not try messing with its DMA if it already has
>> mappings so the warning should become an error preventing DDW. It is
>> rather hard to trigger in practice but I could hack a driver to ask for
>> 32bit DMA mask first, map few pages and then ask for 64bit DMA mask, it
>> is not illegal, I think. So this needs a new helper - "bool
>> iommu_table_in_use(tbl)" - to use in enable_ddw(). Or I am overthinking
>> this?... Thanks,
> 
> As of today, there seems to be nothing like that happening in the
> driver I am testing. 
> I spoke to Brian King on slack, and he mentioned that at the point DDW
> is created there should be no allocations in place.

I think there are a couple of scenarios here. One is where there is a DMA
allocation prior to a call to set the DMA mask. Second scenario is if the
driver makes multiple calls to set the DMA mask. I would argue that a properly
written driver should tell the IOMMU subsystem what DMA mask it supports prior
to allocating DMA memroy. Documentation/core-api/dma-api-howto.rst should
describe what is legal and what is not.

It might be reasonable to declare its not allowed to allocate DMA memory
and then later change the DMA mask and clearly call this out in the 
documentation
if its not already.

-Brian


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 1/3] ibmvscsi: Wire up host_reset() in the drivers scsi_host_template

2019-05-02 Thread Brian King

On 5/1/19 7:47 PM, Tyrel Datwyler wrote:
> From: Tyrel Datwyler 
> 
> Wire up the host_reset function in our driver_template to allow a user
> requested adpater reset via the host_reset sysfs attribute.
> 
> Example:
> 
> echo "adapter" > /sys/class/scsi_host/host0/host_reset
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvscsi.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
> b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index 8cec5230fe31..1c37244f16a0 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -2050,6 +2050,18 @@ static struct device_attribute ibmvscsi_host_config = {
>   .show = show_host_config,
>  };
>  
> +static int ibmvscsi_host_reset(struct Scsi_Host *shost, int reset_type)
> +{
> + struct ibmvscsi_host_data *hostdata = shost_priv(shost);
> +
> + vio_disable_interrupts(to_vio_dev(hostdata->dev));
> + dev_info(hostdata->dev, "Initiating adapter reset!\n");
> + ibmvscsi_reset_host(hostdata);
> + vio_enable_interrupts(to_vio_dev(hostdata->dev));

Is it necessary to disable / enable interrupts around the call to 
ibmvscsi_reset_host?
I don't know why we'd need to do that before calling the reset as we have other
cases, like ibmvscsi_timeout where we don't bother doing this. Also, at the end
of the reset we look to be already enabling interrupts.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH 2/3] ibmvscsi: redo driver work thread to use enum action states

2019-05-02 Thread Brian King

On 5/1/19 7:47 PM, Tyrel Datwyler wrote:
> From: Tyrel Datwyler 
> 
> The current implemenation relies on two flags in the drivers private host
> structure to signal the need for a host reset or to reenable the CRQ after a
> LPAR migration. This patch does away with those flags and introduces a single
> action flag and defined enums for the supported kthread work actions. Lastly,
> the if/else logic is replaced with a switch statement.
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvscsi.c | 57 +---
>  drivers/scsi/ibmvscsi/ibmvscsi.h |  9 +++--
>  2 files changed, 45 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
> b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index 1c37244f16a0..683139e6c63f 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -828,7 +828,7 @@ static void ibmvscsi_reset_host(struct ibmvscsi_host_data 
> *hostdata)
>   atomic_set(>request_limit, 0);
>  
>   purge_requests(hostdata, DID_ERROR);
> - hostdata->reset_crq = 1;
> + hostdata->action = IBMVSCSI_HOST_ACTION_RESET;
>   wake_up(>work_wait_q);
>  }
>  
> @@ -1797,7 +1797,7 @@ static void ibmvscsi_handle_crq(struct viosrp_crq *crq,
>   /* We need to re-setup the interpartition connection */
>   dev_info(hostdata->dev, "Re-enabling adapter!\n");
>   hostdata->client_migrated = 1;
> - hostdata->reenable_crq = 1;
> + hostdata->action = IBMVSCSI_HOST_ACTION_REENABLE;
>   purge_requests(hostdata, DID_REQUEUE);
>   wake_up(>work_wait_q);
>   } else {
> @@ -2118,26 +2118,32 @@ static unsigned long ibmvscsi_get_desired_dma(struct 
> vio_dev *vdev)
>  
>  static void ibmvscsi_do_work(struct ibmvscsi_host_data *hostdata)
>  {
> + unsigned long flags;
>   int rc;
>   char *action = "reset";
>  
> - if (hostdata->reset_crq) {
> - smp_rmb();
> - hostdata->reset_crq = 0;
> -
> + spin_lock_irqsave(hostdata->host->host_lock, flags);
> + switch (hostdata->action) {
> + case IBMVSCSI_HOST_ACTION_NONE:
> + break;
> + case IBMVSCSI_HOST_ACTION_RESET:
>   rc = ibmvscsi_reset_crq_queue(>queue, hostdata);

Looks like you are now calling ibmvscsi_reset_crq_queue with the host_lock held.
However, ibmvscsi_reset_crq_queue can call msleep.

This had been implemented as separate reset_crq and reenable_crq fields
so that it could run lockless. I'm not opposed to changing this to a single
field in general, we just need to be careful where we are adding locking.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH] ibmvscsi: use GFP_KERNEL with dma_alloc_coherent in initialize_event_pool

2019-01-10 Thread Brian King

On 01/09/2019 08:59 PM, Tyrel Datwyler wrote:
> During driver probe we allocate a dma region for our event pool.
> Currently, zero is passed for the gfp_flags parameter. Driver probe
> callbacks run in process context and we hold no locks so we can sleep
> here if necessary.
> 
> Fix by passing GFP_KERNEL explicitly to dma_alloc_coherent().
> 
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
> b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index cb8535e..10d5e77 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -465,7 +465,7 @@ static int initialize_event_pool(struct event_pool *pool,
>   pool->iu_storage =
>   dma_alloc_coherent(hostdata->dev,
>  pool->size * sizeof(*pool->iu_storage),
> ->iu_token, 0);
> +>iu_token, GFP_KERNEL);
>   if (!pool->iu_storage) {
>       kfree(pool->events);
>   return -ENOMEM;
> 

Reviewed-by: Brian King 

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

Re: [PATCH] ibmvscsi: use GFP_ATOMIC with dma_alloc_coherent in map_sg_data

2019-01-10 Thread Brian King

On 01/09/2019 08:58 PM, Tyrel Datwyler wrote:
> While mapping DMA for scatter list when a scsi command is queued the
> existing call to dma_alloc_coherent() in our map_sg_data() function
> passes zero for the gfp_flags parameter. We are most definitly in atomic
> context at this point as queue_command() is called in softirq context
> and further we have a spinlock holding the scsi host lock.
> 
> Fix this by passing GFP_ATOMIC to dma_alloc_coherent() to prevent any
> sort of sleeping in atomic context deadlock.
> 
> Fixes: 4dddbc26c389 ("[SCSI] ibmvscsi: handle large scatter/gather lists")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Tyrel Datwyler 
> ---
>  drivers/scsi/ibmvscsi/ibmvscsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
> b/drivers/scsi/ibmvscsi/ibmvscsi.c
> index 1135e74..cb8535e 100644
> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
> @@ -731,7 +731,7 @@ static int map_sg_data(struct scsi_cmnd *cmd,
>   evt_struct->ext_list = (struct srp_direct_buf *)
>   dma_alloc_coherent(dev,
>  SG_ALL * sizeof(struct 
> srp_direct_buf),
> -_struct->ext_list_token, 0);
> +_struct->ext_list_token, 
> GFP_ATOMIC);
>   if (!evt_struct->ext_list) {
>   if (!firmware_has_feature(FW_FEATURE_CMO))
>   sdev_printk(KERN_ERR, cmd->device,
> 

Reviewed-by: Brian King 

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center

1 2 3 4 >

1 - 100 of 306 matches

Mail list logo