Re: [PATCH 19/23] scsi_dh_alua: Use workqueue for RTPG

2015-11-05 Thread Todd Gill
On 09/02/2015 04:48 AM, Hannes Reinecke wrote:
> On 09/02/2015 08:39 AM, Christoph Hellwig wrote:
>> On Tue, Sep 01, 2015 at 02:57:57PM +0200, Hannes Reinecke wrote:
>>
>> It might be a good idea to prioritize that.  Todd has been asking
>> for multipath monitoring APIs and suggested adding D-BUS APIs to
>> multipathd.  I'd much prefer them directly talking to the kernel
>> instead of cementing multipathd, which is a bit of a wart.

I have a working prototype of a D-Bus API implemented as a new thread of
multipathd.  The good news is it would be very easy to move to another
process.

>>
> Precisely my idea.
> I'm fine with having a device-mapper D-Bus API, so that any application
> can retrieve the device-mapper layout via D-Bus.
> (It might as well be using sysfs directly, but who am I to argue here)
> Path events, however, out of necessity are instantiated within the
> kernel, and should be send out from the kernel via uevents.
> 
> D-Bus can be added on top of that, but multipathd should not generate
> path events for D-Bus.
> 

Where should D-Bus events be layered on top of the uevents if not inside
multipathd?

Would putting it inside multipathd be a reasonable first step?  We could
maintain the same public D-Bus API and move it to a different process.

Thanks,
Todd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/23] scsi_dh_alua: Use workqueue for RTPG

2015-09-23 Thread Ewan Milne
On Tue, 2015-09-22 at 22:15 +0200, Hannes Reinecke wrote:
> On 09/22/2015 09:49 PM, Ewan Milne wrote:
> > On Thu, 2015-08-27 at 14:41 +0200, Hannes Reinecke wrote:
> >> The current ALUA device_handler has two drawbacks:
> >> - We're sending a 'SET TARGET PORT GROUP' command to every LUN,
> >>   disregarding the fact that several LUNs might be in a port group
> >>   and will be automatically switched whenever _any_ LUN within
> >>   that port group receives the command.
> >> - Whenever a LUN is in 'transitioning' mode we cannot block I/O
> >>   to that LUN, instead the controller has to abort the command.
> >>   This leads to increased traffic across the wire and heavy load
> >>   on the controller during switchover.
> > 
> > I'm not sure I understand what this means - why couldn't we block I/O?
> > and what does 'heavy load' mean?  Aborting commands is 'heavy load'?
> > 
> If we're getting a sense code indicating that the LUN is in
> transitioning _and_ we're blocking I/O we never ever send down I/Os to
> that driver anymore, so we cannot receive any sense codes indicating the
> transitioning is done.
> At the same time, every I/O we're sending down will be returned by the
> storage I/O with a sense code, requiring us to retry the command.
> Hence we're constantly retrying I/O.

Ah, OK.  Perhaps including this explanation either in the comments with
patch 22/23 which adds the TEST UNIT READY commands to poll for the
status, or in the patch description somewhere would be helpful.

> 
> [ .. ]
> >> @@ -811,10 +1088,17 @@ failed:
> >>  static void alua_bus_detach(struct scsi_device *sdev)
> >>  {
> >>struct alua_dh_data *h = sdev->handler_data;
> >> +  struct alua_port_group *pg;
> >>  
> >> -  if (h->pg) {
> >> -  kref_put(>pg->kref, release_port_group);
> >> -  h->pg = NULL;
> >> +  spin_lock(>pg_lock);
> >> +  pg = h->pg;
> >> +  rcu_assign_pointer(h->pg, NULL);
> >> +  spin_unlock(>pg_lock);
> >> +  synchronize_rcu();
> >> +  if (pg) {
> >> +  if (pg->rtpg_sdev)
> >> +  flush_workqueue(pg->work_q);
> >> +  kref_put(>kref, release_port_group);
> >>}
> >>sdev->handler_data = NULL;
> >>kfree(h);
> > 
> > So, you've already had a bit of discussion with Christoph about this,
> > the main portion of your ALUA rewrite, and I won't go over all of that,
> > except to say that I'd have to agree that having separate work queues
> > for the different RTPG/STPG functions and having them manipulate each
> > other's flags seems like we'd be better off having just one work
> > function that did everything.  Less messy and easier to maintain.
> > 
> > Also, it seems like wrong ordering of kref_get() vs. scsi_device_get(),
> > in alua_rtpg_queue() since they are released as kref_put() then
> > scsi_device_put()?
> > 
> Yeah, I've reworked the reference counting.
> And reverted the workqueue handling to use the original model.
> 
> Cheers,
> 
> Hannes


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/23] scsi_dh_alua: Use workqueue for RTPG

2015-09-22 Thread Ewan Milne
On Thu, 2015-08-27 at 14:41 +0200, Hannes Reinecke wrote:
> The current ALUA device_handler has two drawbacks:
> - We're sending a 'SET TARGET PORT GROUP' command to every LUN,
>   disregarding the fact that several LUNs might be in a port group
>   and will be automatically switched whenever _any_ LUN within
>   that port group receives the command.
> - Whenever a LUN is in 'transitioning' mode we cannot block I/O
>   to that LUN, instead the controller has to abort the command.
>   This leads to increased traffic across the wire and heavy load
>   on the controller during switchover.

I'm not sure I understand what this means - why couldn't we block I/O?
and what does 'heavy load' mean?  Aborting commands is 'heavy load'?

> 
> With this patch the RTPG handling is moved to a per-portgroup
> workqueue. This reduces the number of 'REPORT TARGET PORT GROUP'
> and 'SET TARGET PORT GROUPS' sent to the controller as we're sending
> them now per port group, and not per device as previously.
> It also allows us to block I/O to any LUN / port group found to be
> in 'transitioning' ALUA mode, as the workqueue item will be requeued
> until the controller moves out of transitioning.
> 
> Signed-off-by: Hannes Reinecke 
> ---
>  drivers/scsi/device_handler/scsi_dh_alua.c | 414 
> -
>  1 file changed, 349 insertions(+), 65 deletions(-)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
> b/drivers/scsi/device_handler/scsi_dh_alua.c
> index b52db8b..85fd597 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -22,6 +22,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -59,10 +61,15 @@
>  #define ALUA_RTPG_SIZE   128
>  #define ALUA_FAILOVER_TIMEOUT60
>  #define ALUA_FAILOVER_RETRIES5
> +#define ALUA_RTPG_DELAY_MSECS5
>  
>  /* device handler flags */
> -#define ALUA_OPTIMIZE_STPG   1
> -#define ALUA_RTPG_EXT_HDR_UNSUPP 2
> +#define ALUA_OPTIMIZE_STPG   0x01
> +#define ALUA_RTPG_EXT_HDR_UNSUPP 0x02
> +/* State machine flags */
> +#define ALUA_PG_RUN_RTPG 0x10
> +#define ALUA_PG_RUN_STPG 0x20
> +
>  
>  static LIST_HEAD(port_group_list);
>  static DEFINE_SPINLOCK(port_group_lock);
> @@ -78,13 +85,28 @@ struct alua_port_group {
>   int pref;
>   unsignedflags; /* used for optimizing STPG */
>   unsigned char   transition_tmo;
> + unsigned long   expiry;
> + unsigned long   interval;
> + struct workqueue_struct *work_q;
> + struct delayed_work rtpg_work;
> + struct delayed_work stpg_work;
> + struct delayed_work qdata_work;
> + spinlock_t  rtpg_lock;
> + struct list_headrtpg_list;
> + struct scsi_device  *rtpg_sdev;
>  };
>  
>  struct alua_dh_data {
>   struct alua_port_group  *pg;
> + spinlock_t  pg_lock;
>   int rel_port;
>   int tpgs;
> - struct scsi_device  *sdev;
> + int init_error;
> + struct mutexinit_mutex;
> +};
> +
> +struct alua_queue_data {
> + struct list_headentry;
>   activate_complete   callback_fn;
>   void*callback_data;
>  };
> @@ -93,6 +115,10 @@ struct alua_dh_data {
>  #define ALUA_POLICY_SWITCH_ALL   1
>  
>  static char print_alua_state(int);
> +static void alua_rtpg_work(struct work_struct *work);
> +static void alua_stpg_work(struct work_struct *work);
> +static void alua_qdata_work(struct work_struct *work);
> +static void alua_check(struct scsi_device *sdev);
>  
>  static void release_port_group(struct kref *kref)
>  {
> @@ -103,6 +129,9 @@ static void release_port_group(struct kref *kref)
>   spin_lock(_group_lock);
>   list_del(>node);
>   spin_unlock(_group_lock);
> + WARN_ON(pg->rtpg_sdev);
> + if (pg->work_q)
> + destroy_workqueue(pg->work_q);
>   kfree(pg);
>  }
>  
> @@ -296,13 +325,17 @@ static int alua_check_vpd(struct scsi_device *sdev, 
> struct alua_dh_data *h)
>   if (strncmp(tmp_pg->device_id_str, device_id_str,
>   device_id_size))
>   continue;
> - h->pg = tmp_pg;
>   kref_get(_pg->kref);
> + spin_lock(>pg_lock);
> + rcu_assign_pointer(h->pg, tmp_pg);
> + spin_unlock(>pg_lock);
>   break;
>   }
>   spin_unlock(_group_lock);
> - if (h->pg)
> + if (h->pg) {
> + synchronize_rcu();
>   return SCSI_DH_OK;
> + }
>  
>   pg = kzalloc(sizeof(struct alua_port_group), GFP_KERNEL);
>   if (!pg) {
> @@ -322,6 +355,19 @@ static int alua_check_vpd(struct scsi_device *sdev, 
> 

Re: [PATCH 19/23] scsi_dh_alua: Use workqueue for RTPG

2015-09-22 Thread Hannes Reinecke
On 09/22/2015 09:49 PM, Ewan Milne wrote:
> On Thu, 2015-08-27 at 14:41 +0200, Hannes Reinecke wrote:
>> The current ALUA device_handler has two drawbacks:
>> - We're sending a 'SET TARGET PORT GROUP' command to every LUN,
>>   disregarding the fact that several LUNs might be in a port group
>>   and will be automatically switched whenever _any_ LUN within
>>   that port group receives the command.
>> - Whenever a LUN is in 'transitioning' mode we cannot block I/O
>>   to that LUN, instead the controller has to abort the command.
>>   This leads to increased traffic across the wire and heavy load
>>   on the controller during switchover.
> 
> I'm not sure I understand what this means - why couldn't we block I/O?
> and what does 'heavy load' mean?  Aborting commands is 'heavy load'?
> 
If we're getting a sense code indicating that the LUN is in
transitioning _and_ we're blocking I/O we never ever send down I/Os to
that driver anymore, so we cannot receive any sense codes indicating the
transitioning is done.
At the same time, every I/O we're sending down will be returned by the
storage I/O with a sense code, requiring us to retry the command.
Hence we're constantly retrying I/O.

[ .. ]
>> @@ -811,10 +1088,17 @@ failed:
>>  static void alua_bus_detach(struct scsi_device *sdev)
>>  {
>>  struct alua_dh_data *h = sdev->handler_data;
>> +struct alua_port_group *pg;
>>  
>> -if (h->pg) {
>> -kref_put(>pg->kref, release_port_group);
>> -h->pg = NULL;
>> +spin_lock(>pg_lock);
>> +pg = h->pg;
>> +rcu_assign_pointer(h->pg, NULL);
>> +spin_unlock(>pg_lock);
>> +synchronize_rcu();
>> +if (pg) {
>> +if (pg->rtpg_sdev)
>> +flush_workqueue(pg->work_q);
>> +kref_put(>kref, release_port_group);
>>  }
>>  sdev->handler_data = NULL;
>>  kfree(h);
> 
> So, you've already had a bit of discussion with Christoph about this,
> the main portion of your ALUA rewrite, and I won't go over all of that,
> except to say that I'd have to agree that having separate work queues
> for the different RTPG/STPG functions and having them manipulate each
> other's flags seems like we'd be better off having just one work
> function that did everything.  Less messy and easier to maintain.
> 
> Also, it seems like wrong ordering of kref_get() vs. scsi_device_get(),
> in alua_rtpg_queue() since they are released as kref_put() then
> scsi_device_put()?
> 
Yeah, I've reworked the reference counting.
And reverted the workqueue handling to use the original model.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/23] scsi_dh_alua: Use workqueue for RTPG

2015-09-02 Thread Christoph Hellwig
On Tue, Sep 01, 2015 at 02:57:57PM +0200, Hannes Reinecke wrote:
> That is what I'm eventually planning to do.
> My final goal is to move the multipath path monitoring stuff
> into the kernel (via the block device polling mechanism), and sending
> block events for path failure and re-establishment.

It might be a good idea to prioritize that.  Todd has been asking
for multipath monitoring APIs and suggested adding D-BUS APIs to
multipathd.  I'd much prefer them directly talking to the kernel
instead of cementing multipathd, which is a bit of a wart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/23] scsi_dh_alua: Use workqueue for RTPG

2015-09-02 Thread Hannes Reinecke
On 09/02/2015 08:39 AM, Christoph Hellwig wrote:
> On Tue, Sep 01, 2015 at 02:57:57PM +0200, Hannes Reinecke wrote:
>> That is what I'm eventually planning to do.
>> My final goal is to move the multipath path monitoring stuff
>> into the kernel (via the block device polling mechanism), and sending
>> block events for path failure and re-establishment.
> 
> It might be a good idea to prioritize that.  Todd has been asking
> for multipath monitoring APIs and suggested adding D-BUS APIs to
> multipathd.  I'd much prefer them directly talking to the kernel
> instead of cementing multipathd, which is a bit of a wart.
>
Precisely my idea.
I'm fine with having a device-mapper D-Bus API, so that any application
can retrieve the device-mapper layout via D-Bus.
(It might as well be using sysfs directly, but who am I to argue here)
Path events, however, out of necessity are instantiated within the
kernel, and should be send out from the kernel via uevents.

D-Bus can be added on top of that, but multipathd should not generate
path events for D-Bus.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 19/23] scsi_dh_alua: Use workqueue for RTPG

2015-09-01 Thread Christoph Hellwig
> + unsigned long   expiry;
> + unsigned long   interval;
> + struct workqueue_struct *work_q;
> + struct delayed_work rtpg_work;
> + struct delayed_work stpg_work;
> + struct delayed_work qdata_work;
> + spinlock_t  rtpg_lock;

This one also protects ->flag.  I'd just call it lock, and
introduce it at the time struct alua_port_group is introduced,
so that we can have coherent locking from the very beginning.

> +struct alua_queue_data {
> + struct list_headentry;
>   activate_complete   callback_fn;
>   void*callback_data;

I've been looking over the active callback for a while, and I
think the model of it is fundamentally broken.  Yes, as long as
multipathing isn't in SCSI we'll need path change notifications
for dm-mpath, but I don't see how tying them into ->activate
makes any sense.

I guess for this series we're stuck with it, but in the long
run we should have a callback in the request_queue which the
consumer that has bd_claim()ed the device can set and get path
change notifications instead.

That being said after spending a lot of time reading this code I think
my orginal advice to use different work_structs for the different
kinds of events was wrong, and I'd like to apologize for making you
go down that direction.

STPG/RTPG is just too dependent to sensibly split them up, and the
qdata bit while broken can be shoe horned in for now.  So I'd suggest
to revert back to the original model with a single work_struct per
port group, and a global workqueue.

> - if (h->pg)
> + if (h->pg) {
> + synchronize_rcu();
>   return SCSI_DH_OK;
> + }

What is this synchronize_rcu supposed to help with?

> - h->pg = tmp_pg;
>   kref_get(_pg->kref);
> + spin_lock(>pg_lock);
> + rcu_assign_pointer(h->pg, tmp_pg);
> + spin_unlock(>pg_lock);

I think the assignment to h->ph should have been past the kref_get
from the very beginning, please fold this into the original patch.

> + struct alua_port_group *pg = NULL;
> + int error;
> +
> + mutex_lock(>init_mutex);
> + error = alua_check_tpgs(sdev, h);
> + if (error == SCSI_DH_OK) {
> + error = alua_check_vpd(sdev, h);
> + rcu_read_lock();
> + pg = rcu_dereference(h->pg);
> + if (!pg) {
> + rcu_read_unlock();
> + h->tpgs = TPGS_MODE_NONE;
> + error = SCSI_DH_DEV_UNSUPP;
> + } else {
> + WARN_ON(error != SCSI_DH_OK);
> + kref_get(>kref);
> + rcu_read_unlock();
> + }
> + }

Eww. Please grab the reference to the pg inside the replacement for
alua_check_vpd, and ensure that we never return from that without a
valid h->pg.  Together with the previously suggested removal of h->tpgs,
and moving the call to alua_rtpg_queue into the alua_check_vpd replacement
that would keep alua_initialize nice and simple.

> @@ -679,6 +895,10 @@ static int alua_set_params(struct scsi_device *sdev, 
> const char *params)
>   unsigned int optimize = 0, argc;
>   const char *p = params;
>   int result = SCSI_DH_OK;
> + unsigned long flags;
> +
> + if (!h)
> + return -ENXIO;

How could that happen?  Why does it beling into this patch?

> @@ -723,24 +944,46 @@ static int alua_activate(struct scsi_device *sdev,
>  {
>   struct alua_dh_data *h = sdev->handler_data;
>   int err = SCSI_DH_OK;
> + struct alua_queue_data *qdata;
> + struct alua_port_group *pg;
>  
> - if (!h->pg)
> + if (!h) {
> + err = SCSI_DH_NOSYS;
>   goto out;
> + }

Same here.

> + qdata = kzalloc(sizeof(*qdata), GFP_KERNEL);
> + if (!qdata) {
> + err = SCSI_DH_RES_TEMP_UNAVAIL;
> + goto out;
> + }
> + qdata->callback_fn = fn;
> + qdata->callback_data = data;
>  
> + mutex_lock(>init_mutex);
> + rcu_read_lock();
> + pg = rcu_dereference(h->pg);
> + if (!pg) {
> + rcu_read_unlock();
> + kfree(qdata);
> + err = h->init_error;
> + mutex_unlock(>init_mutex);
>   goto out;
>   }
> + mutex_unlock(>init_mutex);
> + fn = NULL;
> + kref_get(>kref);
> + rcu_read_unlock();
> +
> + if (optimize_stpg) {
> + unsigned long flags;
> +
> + spin_lock_irqsave(>rtpg_lock, flags);
> + pg->flags |= ALUA_OPTIMIZE_STPG;
> + spin_unlock_irqrestore(>rtpg_lock, flags);
> + }

The optimize_stpg should have been moved towards the alua_port_group
allocation earlier in the series, please fold that in there.

> + alua_rtpg_queue(pg, sdev, qdata);
> + kref_put(>kref, release_port_group);

Note that all alua_rtpg_queue callers already have a reference to the
PG, so I 

Re: [PATCH 19/23] scsi_dh_alua: Use workqueue for RTPG

2015-09-01 Thread Hannes Reinecke
On 09/01/2015 01:15 PM, Christoph Hellwig wrote:
>> +unsigned long   expiry;
>> +unsigned long   interval;
>> +struct workqueue_struct *work_q;
>> +struct delayed_work rtpg_work;
>> +struct delayed_work stpg_work;
>> +struct delayed_work qdata_work;
>> +spinlock_t  rtpg_lock;
> 
> This one also protects ->flag.  I'd just call it lock, and
> introduce it at the time struct alua_port_group is introduced,
> so that we can have coherent locking from the very beginning.
> 
Okay.

>> +struct alua_queue_data {
>> +struct list_headentry;
>>  activate_complete   callback_fn;
>>  void*callback_data;
> 
> I've been looking over the active callback for a while, and I
> think the model of it is fundamentally broken.  Yes, as long as
> multipathing isn't in SCSI we'll need path change notifications
> for dm-mpath, but I don't see how tying them into ->activate
> makes any sense.
> 
> I guess for this series we're stuck with it, but in the long
> run we should have a callback in the request_queue which the
> consumer that has bd_claim()ed the device can set and get path
> change notifications instead.
> 
That is what I'm eventually planning to do.
My final goal is to move the multipath path monitoring stuff
into the kernel (via the block device polling mechanism), and sending
block events for path failure and re-establishment.

But that'll be subject to further patches :-)

> That being said after spending a lot of time reading this code I think
> my orginal advice to use different work_structs for the different
> kinds of events was wrong, and I'd like to apologize for making you
> go down that direction.
> 
> STPG/RTPG is just too dependent to sensibly split them up, and the
> qdata bit while broken can be shoe horned in for now.  So I'd suggest
> to revert back to the original model with a single work_struct per
> port group, and a global workqueue.
> 
Weelll ... there was a reason why I did that initially :-)
But anyway, no harm done. I still have the original series around.

>> -if (h->pg)
>> +if (h->pg) {
>> +synchronize_rcu();
>>  return SCSI_DH_OK;
>> +}
> 
> What is this synchronize_rcu supposed to help with?
> 
>> -h->pg = tmp_pg;
>>  kref_get(_pg->kref);
>> +spin_lock(>pg_lock);
>> +rcu_assign_pointer(h->pg, tmp_pg);
>> +spin_unlock(>pg_lock);
> 
> I think the assignment to h->ph should have been past the kref_get
> from the very beginning, please fold this into the original patch.
> 
Okay.

>> +struct alua_port_group *pg = NULL;
>> +int error;
>> +
>> +mutex_lock(>init_mutex);
>> +error = alua_check_tpgs(sdev, h);
>> +if (error == SCSI_DH_OK) {
>> +error = alua_check_vpd(sdev, h);
>> +rcu_read_lock();
>> +pg = rcu_dereference(h->pg);
>> +if (!pg) {
>> +rcu_read_unlock();
>> +h->tpgs = TPGS_MODE_NONE;
>> +error = SCSI_DH_DEV_UNSUPP;
>> +} else {
>> +WARN_ON(error != SCSI_DH_OK);
>> +kref_get(>kref);
>> +rcu_read_unlock();
>> +}
>> +}
> 
> Eww. Please grab the reference to the pg inside the replacement for
> alua_check_vpd, and ensure that we never return from that without a
> valid h->pg.  Together with the previously suggested removal of h->tpgs,
> and moving the call to alua_rtpg_queue into the alua_check_vpd replacement
> that would keep alua_initialize nice and simple.
> 
You are right in that the interface for alua_check_tpgs isn't very
clear. Currently we need to check both the return code _and_ 'h->pg',
even though both are interrelated.
I guess it should rather be 'h->pg = alua_check_vpd()' or even making
alua_check_vpd() a void function, which would eliminate the need for the
separate return value.
I'll have to think about it.

>> @@ -679,6 +895,10 @@ static int alua_set_params(struct scsi_device *sdev, 
>> const char *params)
>>  unsigned int optimize = 0, argc;
>>  const char *p = params;
>>  int result = SCSI_DH_OK;
>> +unsigned long flags;
>> +
>> +if (!h)
>> +return -ENXIO;
> 
> How could that happen?  Why does it beling into this patch?
> 
Leftover from the original patch, which didn't have your scsi_dh
patchset. I'll remove it.

>> @@ -723,24 +944,46 @@ static int alua_activate(struct scsi_device *sdev,
>>  {
>>  struct alua_dh_data *h = sdev->handler_data;
>>  int err = SCSI_DH_OK;
>> +struct alua_queue_data *qdata;
>> +struct alua_port_group *pg;
>>  
>> -if (!h->pg)
>> +if (!h) {
>> +err = SCSI_DH_NOSYS;
>>  goto out;
>> +}
> 
> Same here.
> 
>> +qdata = kzalloc(sizeof(*qdata), GFP_KERNEL);
>> +if (!qdata) {
>> +err = SCSI_DH_RES_TEMP_UNAVAIL;
>> +goto out;
>> 

[PATCH 19/23] scsi_dh_alua: Use workqueue for RTPG

2015-08-27 Thread Hannes Reinecke
The current ALUA device_handler has two drawbacks:
- We're sending a 'SET TARGET PORT GROUP' command to every LUN,
  disregarding the fact that several LUNs might be in a port group
  and will be automatically switched whenever _any_ LUN within
  that port group receives the command.
- Whenever a LUN is in 'transitioning' mode we cannot block I/O
  to that LUN, instead the controller has to abort the command.
  This leads to increased traffic across the wire and heavy load
  on the controller during switchover.

With this patch the RTPG handling is moved to a per-portgroup
workqueue. This reduces the number of 'REPORT TARGET PORT GROUP'
and 'SET TARGET PORT GROUPS' sent to the controller as we're sending
them now per port group, and not per device as previously.
It also allows us to block I/O to any LUN / port group found to be
in 'transitioning' ALUA mode, as the workqueue item will be requeued
until the controller moves out of transitioning.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 414 -
 1 file changed, 349 insertions(+), 65 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
b/drivers/scsi/device_handler/scsi_dh_alua.c
index b52db8b..85fd597 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -22,6 +22,8 @@
 #include linux/slab.h
 #include linux/delay.h
 #include linux/module.h
+#include linux/workqueue.h
+#include linux/rcupdate.h
 #include asm/unaligned.h
 #include scsi/scsi.h
 #include scsi/scsi_dbg.h
@@ -59,10 +61,15 @@
 #define ALUA_RTPG_SIZE 128
 #define ALUA_FAILOVER_TIMEOUT  60
 #define ALUA_FAILOVER_RETRIES  5
+#define ALUA_RTPG_DELAY_MSECS  5
 
 /* device handler flags */
-#define ALUA_OPTIMIZE_STPG 1
-#define ALUA_RTPG_EXT_HDR_UNSUPP   2
+#define ALUA_OPTIMIZE_STPG 0x01
+#define ALUA_RTPG_EXT_HDR_UNSUPP   0x02
+/* State machine flags */
+#define ALUA_PG_RUN_RTPG   0x10
+#define ALUA_PG_RUN_STPG   0x20
+
 
 static LIST_HEAD(port_group_list);
 static DEFINE_SPINLOCK(port_group_lock);
@@ -78,13 +85,28 @@ struct alua_port_group {
int pref;
unsignedflags; /* used for optimizing STPG */
unsigned char   transition_tmo;
+   unsigned long   expiry;
+   unsigned long   interval;
+   struct workqueue_struct *work_q;
+   struct delayed_work rtpg_work;
+   struct delayed_work stpg_work;
+   struct delayed_work qdata_work;
+   spinlock_t  rtpg_lock;
+   struct list_headrtpg_list;
+   struct scsi_device  *rtpg_sdev;
 };
 
 struct alua_dh_data {
struct alua_port_group  *pg;
+   spinlock_t  pg_lock;
int rel_port;
int tpgs;
-   struct scsi_device  *sdev;
+   int init_error;
+   struct mutexinit_mutex;
+};
+
+struct alua_queue_data {
+   struct list_headentry;
activate_complete   callback_fn;
void*callback_data;
 };
@@ -93,6 +115,10 @@ struct alua_dh_data {
 #define ALUA_POLICY_SWITCH_ALL 1
 
 static char print_alua_state(int);
+static void alua_rtpg_work(struct work_struct *work);
+static void alua_stpg_work(struct work_struct *work);
+static void alua_qdata_work(struct work_struct *work);
+static void alua_check(struct scsi_device *sdev);
 
 static void release_port_group(struct kref *kref)
 {
@@ -103,6 +129,9 @@ static void release_port_group(struct kref *kref)
spin_lock(port_group_lock);
list_del(pg-node);
spin_unlock(port_group_lock);
+   WARN_ON(pg-rtpg_sdev);
+   if (pg-work_q)
+   destroy_workqueue(pg-work_q);
kfree(pg);
 }
 
@@ -296,13 +325,17 @@ static int alua_check_vpd(struct scsi_device *sdev, 
struct alua_dh_data *h)
if (strncmp(tmp_pg-device_id_str, device_id_str,
device_id_size))
continue;
-   h-pg = tmp_pg;
kref_get(tmp_pg-kref);
+   spin_lock(h-pg_lock);
+   rcu_assign_pointer(h-pg, tmp_pg);
+   spin_unlock(h-pg_lock);
break;
}
spin_unlock(port_group_lock);
-   if (h-pg)
+   if (h-pg) {
+   synchronize_rcu();
return SCSI_DH_OK;
+   }
 
pg = kzalloc(sizeof(struct alua_port_group), GFP_KERNEL);
if (!pg) {
@@ -322,6 +355,19 @@ static int alua_check_vpd(struct scsi_device *sdev, struct 
alua_dh_data *h)
pg-tpgs = h-tpgs;
pg-state = TPGS_STATE_OPTIMIZED;
kref_init(pg-kref);
+   INIT_DELAYED_WORK(pg-rtpg_work, alua_rtpg_work);
+   INIT_DELAYED_WORK(pg-stpg_work, alua_stpg_work);
+