Re: [PATCH 09/11] target_core_alua: Use workqueue for ALUA transitioning

2013-10-18 Thread Nicholas A. Bellinger
On Thu, 2013-10-17 at 08:52 +0200, Hannes Reinecke wrote:
 On 10/17/2013 12:06 AM, Nicholas A. Bellinger wrote:
  On Wed, 2013-10-16 at 09:20 +0200, Hannes Reinecke wrote:
  Use a workqueue for processing ALUA state transitions; this allows
  us to process implicit delay properly.
 
  Signed-off-by: Hannes Reinecke h...@suse.de
  ---
   drivers/target/target_core_alua.c | 174 
  +++---
   include/target/target_core_base.h |   4 +
   2 files changed, 128 insertions(+), 50 deletions(-)
 
  
  SNIP
  
  +static int core_alua_do_transition_tg_pt(
  +  struct t10_alua_tg_pt_gp *tg_pt_gp,
  +  int new_state,
  +  int explicit)
  +{
  +  struct se_device *dev = tg_pt_gp-tg_pt_gp_dev;
  +  DECLARE_COMPLETION_ONSTACK(wait);
  +
  +  /* Nothing to be done here */
  +  if (atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) == new_state)
  +  return 0;
  +
  +  if (new_state == ALUA_ACCESS_STATE_TRANSITION)
  +  return -EAGAIN;
  +
  +  /*
  +   * Flush any pending transitions
  +   */
  +  if (!explicit  tg_pt_gp-tg_pt_gp_implicit_trans_secs 
  +  atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) ==
  +  ALUA_ACCESS_STATE_TRANSITION) {
  +  /* Just in case */
  +  tg_pt_gp-tg_pt_gp_alua_pending_state = new_state;
  +  tg_pt_gp-tg_pt_gp_transition_complete = wait;
  +  flush_delayed_work(tg_pt_gp-tg_pt_gp_transition_work);
  +  wait_for_completion(wait);
  +  tg_pt_gp-tg_pt_gp_transition_complete = NULL;
  +  return 0;
  +  }
  +
  +  /*
  +   * Save the old primary ALUA access state, and set the current state
  +   * to ALUA_ACCESS_STATE_TRANSITION.
  +   */
  +  tg_pt_gp-tg_pt_gp_alua_previous_state =
  +  atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state);
  +  tg_pt_gp-tg_pt_gp_alua_pending_state = new_state;
  +
  +  atomic_set(tg_pt_gp-tg_pt_gp_alua_access_state,
  +  ALUA_ACCESS_STATE_TRANSITION);
  +  tg_pt_gp-tg_pt_gp_alua_access_status = (explicit) ?
  +  ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG :
  +  ALUA_STATUS_ALTERED_BY_IMPLICIT_ALUA;
  +
  +  /*
  +   * Check for the optional ALUA primary state transition delay
  +   */
  +  if (tg_pt_gp-tg_pt_gp_trans_delay_msecs != 0)
  +  msleep_interruptible(tg_pt_gp-tg_pt_gp_trans_delay_msecs);
  +
  +  /*
  +   * Take a reference for workqueue item
  +   */
  +  spin_lock(dev-t10_alua.tg_pt_gps_lock);
  +  atomic_inc(tg_pt_gp-tg_pt_gp_ref_cnt);
  +  smp_mb__after_atomic_inc();
  +  spin_unlock(dev-t10_alua.tg_pt_gps_lock);
  +
  +  if (!explicit  tg_pt_gp-tg_pt_gp_implicit_trans_secs) {
  +  unsigned long transition_tmo;
  +
  +  transition_tmo = tg_pt_gp-tg_pt_gp_implicit_trans_secs * HZ;
  +  queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq,
  + tg_pt_gp-tg_pt_gp_transition_work,
  + transition_tmo);
  +  } else {
  +  tg_pt_gp-tg_pt_gp_transition_complete = wait;
  +  queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq,
  + tg_pt_gp-tg_pt_gp_transition_work, 0);
  +  wait_for_completion(wait);
  +  tg_pt_gp-tg_pt_gp_transition_complete = NULL;
  +  }
  
  Mmm, the trans_delay_msecs delay seems a bit out of place now..
  
  How about dropping it's usage with msleep_interruptible() above, and
  instead combining it with the delay passed into queue_delayed_work()..?
  
 Yeah, we could.
 Actually I was thinking of opening up the implicit transition time
 to be a general transitioning time, to be used for both implicit and
 explicit.

Sounds reasonable to me..

 Reasoning here is that one could kick off a userland tool once
 'transitioning' mode has been requested.
 That tool would then go ahead and do whatever is required to switch
 paths around etc, and could write the final ALUA state into configfs.
 That would then terminate the workqueue and the system would
 continue with the new state.
 If the userland tool fails to execute in time the system would
 revert to the requested state as it does now.

Also sounds reasonable.  This timeout, along with the explicit +
implicit timeout probably needs to be limited to a value less than the
SCSI abort timeout on the host side..

 
 The only problem with that approach is that it's currently
 impossible to have an atomic implicit transition for several tpgs.
 
 You can to an atomic _explicit_ transition, as SET TARGET PORT
 GROUPS _can_ carry information about all target port groups.
 (Mind you, scsi_dh_alua doesn't use it that way, it only sends
 information for the active/optimized target port group).
 But you cannot achieve a similar operation via configfs; there you
 can only set one tpg at a time, and each of this will potentially
 trigger a move to transitioning.
 
 While this is not a violation of the spec, it is certainly
 confusing. I'd rather have a way to set the new state for _all_ tpgs
 at once 

Re: [PATCH 09/11] target_core_alua: Use workqueue for ALUA transitioning

2013-10-17 Thread Hannes Reinecke
On 10/17/2013 12:06 AM, Nicholas A. Bellinger wrote:
 On Wed, 2013-10-16 at 09:20 +0200, Hannes Reinecke wrote:
 Use a workqueue for processing ALUA state transitions; this allows
 us to process implicit delay properly.

 Signed-off-by: Hannes Reinecke h...@suse.de
 ---
  drivers/target/target_core_alua.c | 174 
 +++---
  include/target/target_core_base.h |   4 +
  2 files changed, 128 insertions(+), 50 deletions(-)

 
 SNIP
 
 +static int core_alua_do_transition_tg_pt(
 +struct t10_alua_tg_pt_gp *tg_pt_gp,
 +int new_state,
 +int explicit)
 +{
 +struct se_device *dev = tg_pt_gp-tg_pt_gp_dev;
 +DECLARE_COMPLETION_ONSTACK(wait);
 +
 +/* Nothing to be done here */
 +if (atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) == new_state)
 +return 0;
 +
 +if (new_state == ALUA_ACCESS_STATE_TRANSITION)
 +return -EAGAIN;
 +
 +/*
 + * Flush any pending transitions
 + */
 +if (!explicit  tg_pt_gp-tg_pt_gp_implicit_trans_secs 
 +atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) ==
 +ALUA_ACCESS_STATE_TRANSITION) {
 +/* Just in case */
 +tg_pt_gp-tg_pt_gp_alua_pending_state = new_state;
 +tg_pt_gp-tg_pt_gp_transition_complete = wait;
 +flush_delayed_work(tg_pt_gp-tg_pt_gp_transition_work);
 +wait_for_completion(wait);
 +tg_pt_gp-tg_pt_gp_transition_complete = NULL;
 +return 0;
 +}
 +
 +/*
 + * Save the old primary ALUA access state, and set the current state
 + * to ALUA_ACCESS_STATE_TRANSITION.
 + */
 +tg_pt_gp-tg_pt_gp_alua_previous_state =
 +atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state);
 +tg_pt_gp-tg_pt_gp_alua_pending_state = new_state;
 +
 +atomic_set(tg_pt_gp-tg_pt_gp_alua_access_state,
 +ALUA_ACCESS_STATE_TRANSITION);
 +tg_pt_gp-tg_pt_gp_alua_access_status = (explicit) ?
 +ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG :
 +ALUA_STATUS_ALTERED_BY_IMPLICIT_ALUA;
 +
 +/*
 + * Check for the optional ALUA primary state transition delay
 + */
 +if (tg_pt_gp-tg_pt_gp_trans_delay_msecs != 0)
 +msleep_interruptible(tg_pt_gp-tg_pt_gp_trans_delay_msecs);
 +
 +/*
 + * Take a reference for workqueue item
 + */
 +spin_lock(dev-t10_alua.tg_pt_gps_lock);
 +atomic_inc(tg_pt_gp-tg_pt_gp_ref_cnt);
 +smp_mb__after_atomic_inc();
 +spin_unlock(dev-t10_alua.tg_pt_gps_lock);
 +
 +if (!explicit  tg_pt_gp-tg_pt_gp_implicit_trans_secs) {
 +unsigned long transition_tmo;
 +
 +transition_tmo = tg_pt_gp-tg_pt_gp_implicit_trans_secs * HZ;
 +queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq,
 +   tg_pt_gp-tg_pt_gp_transition_work,
 +   transition_tmo);
 +} else {
 +tg_pt_gp-tg_pt_gp_transition_complete = wait;
 +queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq,
 +   tg_pt_gp-tg_pt_gp_transition_work, 0);
 +wait_for_completion(wait);
 +tg_pt_gp-tg_pt_gp_transition_complete = NULL;
 +}
 
 Mmm, the trans_delay_msecs delay seems a bit out of place now..
 
 How about dropping it's usage with msleep_interruptible() above, and
 instead combining it with the delay passed into queue_delayed_work()..?
 
Yeah, we could.
Actually I was thinking of opening up the implicit transition time
to be a general transitioning time, to be used for both implicit and
explicit.
Reasoning here is that one could kick off a userland tool once
'transitioning' mode has been requested.
That tool would then go ahead and do whatever is required to switch
paths around etc, and could write the final ALUA state into configfs.
That would then terminate the workqueue and the system would
continue with the new state.
If the userland tool fails to execute in time the system would
revert to the requested state as it does now.

The only problem with that approach is that it's currently
impossible to have an atomic implicit transition for several tpgs.

You can to an atomic _explicit_ transition, as SET TARGET PORT
GROUPS _can_ carry information about all target port groups.
(Mind you, scsi_dh_alua doesn't use it that way, it only sends
information for the active/optimized target port group).
But you cannot achieve a similar operation via configfs; there you
can only set one tpg at a time, and each of this will potentially
trigger a move to transitioning.

While this is not a violation of the spec, it is certainly
confusing. I'd rather have a way to set the new state for _all_ tpgs
at once and only then kick off the transitioning mechanism.

Any ideas how this could be achieved?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 

[PATCH 09/11] target_core_alua: Use workqueue for ALUA transitioning

2013-10-16 Thread Hannes Reinecke
Use a workqueue for processing ALUA state transitions; this allows
us to process implicit delay properly.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 drivers/target/target_core_alua.c | 174 +++---
 include/target/target_core_base.h |   4 +
 2 files changed, 128 insertions(+), 50 deletions(-)

diff --git a/drivers/target/target_core_alua.c 
b/drivers/target/target_core_alua.c
index 33e3f23..166bee6 100644
--- a/drivers/target/target_core_alua.c
+++ b/drivers/target/target_core_alua.c
@@ -761,8 +761,7 @@ static int core_alua_write_tpg_metadata(
  * Called with tg_pt_gp-tg_pt_gp_md_mutex held
  */
 static int core_alua_update_tpg_primary_metadata(
-   struct t10_alua_tg_pt_gp *tg_pt_gp,
-   int primary_state)
+   struct t10_alua_tg_pt_gp *tg_pt_gp)
 {
unsigned char *md_buf;
struct t10_wwn *wwn = tg_pt_gp-tg_pt_gp_dev-t10_wwn;
@@ -781,7 +780,8 @@ static int core_alua_update_tpg_primary_metadata(
tg_pt_gp_id=%hu\n
alua_access_state=0x%02x\n
alua_access_status=0x%02x\n,
-   tg_pt_gp-tg_pt_gp_id, primary_state,
+   tg_pt_gp-tg_pt_gp_id,
+   tg_pt_gp-tg_pt_gp_alua_pending_state,
tg_pt_gp-tg_pt_gp_alua_access_status);
 
snprintf(path, ALUA_METADATA_PATH_LEN,
@@ -793,36 +793,17 @@ static int core_alua_update_tpg_primary_metadata(
return rc;
 }
 
-static int core_alua_do_transition_tg_pt(
-   struct t10_alua_tg_pt_gp *tg_pt_gp,
-   struct se_port *l_port,
-   struct se_node_acl *nacl,
-   int new_state,
-   int explicit)
+static void core_alua_do_transition_tg_pt_work(struct work_struct *work)
 {
+   struct t10_alua_tg_pt_gp *tg_pt_gp = container_of(work,
+   struct t10_alua_tg_pt_gp, tg_pt_gp_transition_work.work);
+   struct se_device *dev = tg_pt_gp-tg_pt_gp_dev;
struct se_dev_entry *se_deve;
struct se_lun_acl *lacl;
struct se_port *port;
struct t10_alua_tg_pt_gp_member *mem;
-
-   /*
-* Save the old primary ALUA access state, and set the current state
-* to ALUA_ACCESS_STATE_TRANSITION.
-*/
-   tg_pt_gp-tg_pt_gp_alua_previous_state =
-   atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state);
-   tg_pt_gp-tg_pt_gp_alua_pending_state = new_state;
-
-   atomic_set(tg_pt_gp-tg_pt_gp_alua_access_state,
-   ALUA_ACCESS_STATE_TRANSITION);
-   tg_pt_gp-tg_pt_gp_alua_access_status = (explicit) ?
-   ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG :
-   ALUA_STATUS_ALTERED_BY_IMPLICIT_ALUA;
-   /*
-* Check for the optional ALUA primary state transition delay
-*/
-   if (tg_pt_gp-tg_pt_gp_trans_delay_msecs != 0)
-   msleep_interruptible(tg_pt_gp-tg_pt_gp_trans_delay_msecs);
+   bool explicit = (tg_pt_gp-tg_pt_gp_alua_access_status ==
+ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG);
 
spin_lock(tg_pt_gp-tg_pt_gp_lock);
list_for_each_entry(mem, tg_pt_gp-tg_pt_gp_mem_list,
@@ -857,9 +838,12 @@ static int core_alua_do_transition_tg_pt(
if (!lacl)
continue;
 
-   if (explicit 
-  (nacl != NULL)  (nacl == lacl-se_lun_nacl) 
-  (l_port != NULL)  (l_port == port))
+   if ((tg_pt_gp-tg_pt_gp_alua_access_status ==
+ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG) 
+  (tg_pt_gp-tg_pt_gp_alua_nacl != NULL) 
+   (tg_pt_gp-tg_pt_gp_alua_nacl == lacl-se_lun_nacl) 

+  (tg_pt_gp-tg_pt_gp_alua_port != NULL) 
+   (tg_pt_gp-tg_pt_gp_alua_port == port))
continue;
 
core_scsi3_ua_allocate(lacl-se_lun_nacl,
@@ -887,7 +871,7 @@ static int core_alua_do_transition_tg_pt(
 */
if (tg_pt_gp-tg_pt_gp_write_metadata) {
mutex_lock(tg_pt_gp-tg_pt_gp_md_mutex);
-   core_alua_update_tpg_primary_metadata(tg_pt_gp, new_state);
+   core_alua_update_tpg_primary_metadata(tg_pt_gp);
mutex_unlock(tg_pt_gp-tg_pt_gp_md_mutex);
}
/*
@@ -902,6 +886,87 @@ static int core_alua_do_transition_tg_pt(
tg_pt_gp-tg_pt_gp_id,
core_alua_dump_state(tg_pt_gp-tg_pt_gp_alua_previous_state),
core_alua_dump_state(tg_pt_gp-tg_pt_gp_alua_pending_state));
+   spin_lock(dev-t10_alua.tg_pt_gps_lock);
+   atomic_dec(tg_pt_gp-tg_pt_gp_ref_cnt);
+   smp_mb__after_atomic_dec();
+   spin_unlock(dev-t10_alua.tg_pt_gps_lock);
+
+   if (tg_pt_gp-tg_pt_gp_transition_complete)
+   complete(tg_pt_gp-tg_pt_gp_transition_complete);
+}
+

Re: [PATCH 09/11] target_core_alua: Use workqueue for ALUA transitioning

2013-10-16 Thread Nicholas A. Bellinger
On Wed, 2013-10-16 at 09:20 +0200, Hannes Reinecke wrote:
 Use a workqueue for processing ALUA state transitions; this allows
 us to process implicit delay properly.
 
 Signed-off-by: Hannes Reinecke h...@suse.de
 ---
  drivers/target/target_core_alua.c | 174 
 +++---
  include/target/target_core_base.h |   4 +
  2 files changed, 128 insertions(+), 50 deletions(-)
 

SNIP

 +static int core_alua_do_transition_tg_pt(
 + struct t10_alua_tg_pt_gp *tg_pt_gp,
 + int new_state,
 + int explicit)
 +{
 + struct se_device *dev = tg_pt_gp-tg_pt_gp_dev;
 + DECLARE_COMPLETION_ONSTACK(wait);
 +
 + /* Nothing to be done here */
 + if (atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) == new_state)
 + return 0;
 +
 + if (new_state == ALUA_ACCESS_STATE_TRANSITION)
 + return -EAGAIN;
 +
 + /*
 +  * Flush any pending transitions
 +  */
 + if (!explicit  tg_pt_gp-tg_pt_gp_implicit_trans_secs 
 + atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) ==
 + ALUA_ACCESS_STATE_TRANSITION) {
 + /* Just in case */
 + tg_pt_gp-tg_pt_gp_alua_pending_state = new_state;
 + tg_pt_gp-tg_pt_gp_transition_complete = wait;
 + flush_delayed_work(tg_pt_gp-tg_pt_gp_transition_work);
 + wait_for_completion(wait);
 + tg_pt_gp-tg_pt_gp_transition_complete = NULL;
 + return 0;
 + }
 +
 + /*
 +  * Save the old primary ALUA access state, and set the current state
 +  * to ALUA_ACCESS_STATE_TRANSITION.
 +  */
 + tg_pt_gp-tg_pt_gp_alua_previous_state =
 + atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state);
 + tg_pt_gp-tg_pt_gp_alua_pending_state = new_state;
 +
 + atomic_set(tg_pt_gp-tg_pt_gp_alua_access_state,
 + ALUA_ACCESS_STATE_TRANSITION);
 + tg_pt_gp-tg_pt_gp_alua_access_status = (explicit) ?
 + ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG :
 + ALUA_STATUS_ALTERED_BY_IMPLICIT_ALUA;
 +
 + /*
 +  * Check for the optional ALUA primary state transition delay
 +  */
 + if (tg_pt_gp-tg_pt_gp_trans_delay_msecs != 0)
 + msleep_interruptible(tg_pt_gp-tg_pt_gp_trans_delay_msecs);
 +
 + /*
 +  * Take a reference for workqueue item
 +  */
 + spin_lock(dev-t10_alua.tg_pt_gps_lock);
 + atomic_inc(tg_pt_gp-tg_pt_gp_ref_cnt);
 + smp_mb__after_atomic_inc();
 + spin_unlock(dev-t10_alua.tg_pt_gps_lock);
 +
 + if (!explicit  tg_pt_gp-tg_pt_gp_implicit_trans_secs) {
 + unsigned long transition_tmo;
 +
 + transition_tmo = tg_pt_gp-tg_pt_gp_implicit_trans_secs * HZ;
 + queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq,
 +tg_pt_gp-tg_pt_gp_transition_work,
 +transition_tmo);
 + } else {
 + tg_pt_gp-tg_pt_gp_transition_complete = wait;
 + queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq,
 +tg_pt_gp-tg_pt_gp_transition_work, 0);
 + wait_for_completion(wait);
 + tg_pt_gp-tg_pt_gp_transition_complete = NULL;
 + }

Mmm, the trans_delay_msecs delay seems a bit out of place now..

How about dropping it's usage with msleep_interruptible() above, and
instead combining it with the delay passed into queue_delayed_work()..?

--nab


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html