Re: [PATCH 09/11] target_core_alua: Use workqueue for ALUA transitioning
On Thu, 2013-10-17 at 08:52 +0200, Hannes Reinecke wrote: On 10/17/2013 12:06 AM, Nicholas A. Bellinger wrote: On Wed, 2013-10-16 at 09:20 +0200, Hannes Reinecke wrote: Use a workqueue for processing ALUA state transitions; this allows us to process implicit delay properly. Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/target/target_core_alua.c | 174 +++--- include/target/target_core_base.h | 4 + 2 files changed, 128 insertions(+), 50 deletions(-) SNIP +static int core_alua_do_transition_tg_pt( + struct t10_alua_tg_pt_gp *tg_pt_gp, + int new_state, + int explicit) +{ + struct se_device *dev = tg_pt_gp-tg_pt_gp_dev; + DECLARE_COMPLETION_ONSTACK(wait); + + /* Nothing to be done here */ + if (atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) == new_state) + return 0; + + if (new_state == ALUA_ACCESS_STATE_TRANSITION) + return -EAGAIN; + + /* + * Flush any pending transitions + */ + if (!explicit tg_pt_gp-tg_pt_gp_implicit_trans_secs + atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) == + ALUA_ACCESS_STATE_TRANSITION) { + /* Just in case */ + tg_pt_gp-tg_pt_gp_alua_pending_state = new_state; + tg_pt_gp-tg_pt_gp_transition_complete = wait; + flush_delayed_work(tg_pt_gp-tg_pt_gp_transition_work); + wait_for_completion(wait); + tg_pt_gp-tg_pt_gp_transition_complete = NULL; + return 0; + } + + /* + * Save the old primary ALUA access state, and set the current state + * to ALUA_ACCESS_STATE_TRANSITION. + */ + tg_pt_gp-tg_pt_gp_alua_previous_state = + atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state); + tg_pt_gp-tg_pt_gp_alua_pending_state = new_state; + + atomic_set(tg_pt_gp-tg_pt_gp_alua_access_state, + ALUA_ACCESS_STATE_TRANSITION); + tg_pt_gp-tg_pt_gp_alua_access_status = (explicit) ? + ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG : + ALUA_STATUS_ALTERED_BY_IMPLICIT_ALUA; + + /* + * Check for the optional ALUA primary state transition delay + */ + if (tg_pt_gp-tg_pt_gp_trans_delay_msecs != 0) + msleep_interruptible(tg_pt_gp-tg_pt_gp_trans_delay_msecs); + + /* + * Take a reference for workqueue item + */ + spin_lock(dev-t10_alua.tg_pt_gps_lock); + atomic_inc(tg_pt_gp-tg_pt_gp_ref_cnt); + smp_mb__after_atomic_inc(); + spin_unlock(dev-t10_alua.tg_pt_gps_lock); + + if (!explicit tg_pt_gp-tg_pt_gp_implicit_trans_secs) { + unsigned long transition_tmo; + + transition_tmo = tg_pt_gp-tg_pt_gp_implicit_trans_secs * HZ; + queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq, + tg_pt_gp-tg_pt_gp_transition_work, + transition_tmo); + } else { + tg_pt_gp-tg_pt_gp_transition_complete = wait; + queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq, + tg_pt_gp-tg_pt_gp_transition_work, 0); + wait_for_completion(wait); + tg_pt_gp-tg_pt_gp_transition_complete = NULL; + } Mmm, the trans_delay_msecs delay seems a bit out of place now.. How about dropping it's usage with msleep_interruptible() above, and instead combining it with the delay passed into queue_delayed_work()..? Yeah, we could. Actually I was thinking of opening up the implicit transition time to be a general transitioning time, to be used for both implicit and explicit. Sounds reasonable to me.. Reasoning here is that one could kick off a userland tool once 'transitioning' mode has been requested. That tool would then go ahead and do whatever is required to switch paths around etc, and could write the final ALUA state into configfs. That would then terminate the workqueue and the system would continue with the new state. If the userland tool fails to execute in time the system would revert to the requested state as it does now. Also sounds reasonable. This timeout, along with the explicit + implicit timeout probably needs to be limited to a value less than the SCSI abort timeout on the host side.. The only problem with that approach is that it's currently impossible to have an atomic implicit transition for several tpgs. You can to an atomic _explicit_ transition, as SET TARGET PORT GROUPS _can_ carry information about all target port groups. (Mind you, scsi_dh_alua doesn't use it that way, it only sends information for the active/optimized target port group). But you cannot achieve a similar operation via configfs; there you can only set one tpg at a time, and each of this will potentially trigger a move to transitioning. While this is not a violation of the spec, it is certainly confusing. I'd rather have a way to set the new state for _all_ tpgs at once
Re: [PATCH 09/11] target_core_alua: Use workqueue for ALUA transitioning
On 10/17/2013 12:06 AM, Nicholas A. Bellinger wrote: On Wed, 2013-10-16 at 09:20 +0200, Hannes Reinecke wrote: Use a workqueue for processing ALUA state transitions; this allows us to process implicit delay properly. Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/target/target_core_alua.c | 174 +++--- include/target/target_core_base.h | 4 + 2 files changed, 128 insertions(+), 50 deletions(-) SNIP +static int core_alua_do_transition_tg_pt( +struct t10_alua_tg_pt_gp *tg_pt_gp, +int new_state, +int explicit) +{ +struct se_device *dev = tg_pt_gp-tg_pt_gp_dev; +DECLARE_COMPLETION_ONSTACK(wait); + +/* Nothing to be done here */ +if (atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) == new_state) +return 0; + +if (new_state == ALUA_ACCESS_STATE_TRANSITION) +return -EAGAIN; + +/* + * Flush any pending transitions + */ +if (!explicit tg_pt_gp-tg_pt_gp_implicit_trans_secs +atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) == +ALUA_ACCESS_STATE_TRANSITION) { +/* Just in case */ +tg_pt_gp-tg_pt_gp_alua_pending_state = new_state; +tg_pt_gp-tg_pt_gp_transition_complete = wait; +flush_delayed_work(tg_pt_gp-tg_pt_gp_transition_work); +wait_for_completion(wait); +tg_pt_gp-tg_pt_gp_transition_complete = NULL; +return 0; +} + +/* + * Save the old primary ALUA access state, and set the current state + * to ALUA_ACCESS_STATE_TRANSITION. + */ +tg_pt_gp-tg_pt_gp_alua_previous_state = +atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state); +tg_pt_gp-tg_pt_gp_alua_pending_state = new_state; + +atomic_set(tg_pt_gp-tg_pt_gp_alua_access_state, +ALUA_ACCESS_STATE_TRANSITION); +tg_pt_gp-tg_pt_gp_alua_access_status = (explicit) ? +ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG : +ALUA_STATUS_ALTERED_BY_IMPLICIT_ALUA; + +/* + * Check for the optional ALUA primary state transition delay + */ +if (tg_pt_gp-tg_pt_gp_trans_delay_msecs != 0) +msleep_interruptible(tg_pt_gp-tg_pt_gp_trans_delay_msecs); + +/* + * Take a reference for workqueue item + */ +spin_lock(dev-t10_alua.tg_pt_gps_lock); +atomic_inc(tg_pt_gp-tg_pt_gp_ref_cnt); +smp_mb__after_atomic_inc(); +spin_unlock(dev-t10_alua.tg_pt_gps_lock); + +if (!explicit tg_pt_gp-tg_pt_gp_implicit_trans_secs) { +unsigned long transition_tmo; + +transition_tmo = tg_pt_gp-tg_pt_gp_implicit_trans_secs * HZ; +queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq, + tg_pt_gp-tg_pt_gp_transition_work, + transition_tmo); +} else { +tg_pt_gp-tg_pt_gp_transition_complete = wait; +queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq, + tg_pt_gp-tg_pt_gp_transition_work, 0); +wait_for_completion(wait); +tg_pt_gp-tg_pt_gp_transition_complete = NULL; +} Mmm, the trans_delay_msecs delay seems a bit out of place now.. How about dropping it's usage with msleep_interruptible() above, and instead combining it with the delay passed into queue_delayed_work()..? Yeah, we could. Actually I was thinking of opening up the implicit transition time to be a general transitioning time, to be used for both implicit and explicit. Reasoning here is that one could kick off a userland tool once 'transitioning' mode has been requested. That tool would then go ahead and do whatever is required to switch paths around etc, and could write the final ALUA state into configfs. That would then terminate the workqueue and the system would continue with the new state. If the userland tool fails to execute in time the system would revert to the requested state as it does now. The only problem with that approach is that it's currently impossible to have an atomic implicit transition for several tpgs. You can to an atomic _explicit_ transition, as SET TARGET PORT GROUPS _can_ carry information about all target port groups. (Mind you, scsi_dh_alua doesn't use it that way, it only sends information for the active/optimized target port group). But you cannot achieve a similar operation via configfs; there you can only set one tpg at a time, and each of this will potentially trigger a move to transitioning. While this is not a violation of the spec, it is certainly confusing. I'd rather have a way to set the new state for _all_ tpgs at once and only then kick off the transitioning mechanism. Any ideas how this could be achieved? Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr.
[PATCH 09/11] target_core_alua: Use workqueue for ALUA transitioning
Use a workqueue for processing ALUA state transitions; this allows us to process implicit delay properly. Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/target/target_core_alua.c | 174 +++--- include/target/target_core_base.h | 4 + 2 files changed, 128 insertions(+), 50 deletions(-) diff --git a/drivers/target/target_core_alua.c b/drivers/target/target_core_alua.c index 33e3f23..166bee6 100644 --- a/drivers/target/target_core_alua.c +++ b/drivers/target/target_core_alua.c @@ -761,8 +761,7 @@ static int core_alua_write_tpg_metadata( * Called with tg_pt_gp-tg_pt_gp_md_mutex held */ static int core_alua_update_tpg_primary_metadata( - struct t10_alua_tg_pt_gp *tg_pt_gp, - int primary_state) + struct t10_alua_tg_pt_gp *tg_pt_gp) { unsigned char *md_buf; struct t10_wwn *wwn = tg_pt_gp-tg_pt_gp_dev-t10_wwn; @@ -781,7 +780,8 @@ static int core_alua_update_tpg_primary_metadata( tg_pt_gp_id=%hu\n alua_access_state=0x%02x\n alua_access_status=0x%02x\n, - tg_pt_gp-tg_pt_gp_id, primary_state, + tg_pt_gp-tg_pt_gp_id, + tg_pt_gp-tg_pt_gp_alua_pending_state, tg_pt_gp-tg_pt_gp_alua_access_status); snprintf(path, ALUA_METADATA_PATH_LEN, @@ -793,36 +793,17 @@ static int core_alua_update_tpg_primary_metadata( return rc; } -static int core_alua_do_transition_tg_pt( - struct t10_alua_tg_pt_gp *tg_pt_gp, - struct se_port *l_port, - struct se_node_acl *nacl, - int new_state, - int explicit) +static void core_alua_do_transition_tg_pt_work(struct work_struct *work) { + struct t10_alua_tg_pt_gp *tg_pt_gp = container_of(work, + struct t10_alua_tg_pt_gp, tg_pt_gp_transition_work.work); + struct se_device *dev = tg_pt_gp-tg_pt_gp_dev; struct se_dev_entry *se_deve; struct se_lun_acl *lacl; struct se_port *port; struct t10_alua_tg_pt_gp_member *mem; - - /* -* Save the old primary ALUA access state, and set the current state -* to ALUA_ACCESS_STATE_TRANSITION. -*/ - tg_pt_gp-tg_pt_gp_alua_previous_state = - atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state); - tg_pt_gp-tg_pt_gp_alua_pending_state = new_state; - - atomic_set(tg_pt_gp-tg_pt_gp_alua_access_state, - ALUA_ACCESS_STATE_TRANSITION); - tg_pt_gp-tg_pt_gp_alua_access_status = (explicit) ? - ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG : - ALUA_STATUS_ALTERED_BY_IMPLICIT_ALUA; - /* -* Check for the optional ALUA primary state transition delay -*/ - if (tg_pt_gp-tg_pt_gp_trans_delay_msecs != 0) - msleep_interruptible(tg_pt_gp-tg_pt_gp_trans_delay_msecs); + bool explicit = (tg_pt_gp-tg_pt_gp_alua_access_status == +ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG); spin_lock(tg_pt_gp-tg_pt_gp_lock); list_for_each_entry(mem, tg_pt_gp-tg_pt_gp_mem_list, @@ -857,9 +838,12 @@ static int core_alua_do_transition_tg_pt( if (!lacl) continue; - if (explicit - (nacl != NULL) (nacl == lacl-se_lun_nacl) - (l_port != NULL) (l_port == port)) + if ((tg_pt_gp-tg_pt_gp_alua_access_status == +ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG) + (tg_pt_gp-tg_pt_gp_alua_nacl != NULL) + (tg_pt_gp-tg_pt_gp_alua_nacl == lacl-se_lun_nacl) + (tg_pt_gp-tg_pt_gp_alua_port != NULL) + (tg_pt_gp-tg_pt_gp_alua_port == port)) continue; core_scsi3_ua_allocate(lacl-se_lun_nacl, @@ -887,7 +871,7 @@ static int core_alua_do_transition_tg_pt( */ if (tg_pt_gp-tg_pt_gp_write_metadata) { mutex_lock(tg_pt_gp-tg_pt_gp_md_mutex); - core_alua_update_tpg_primary_metadata(tg_pt_gp, new_state); + core_alua_update_tpg_primary_metadata(tg_pt_gp); mutex_unlock(tg_pt_gp-tg_pt_gp_md_mutex); } /* @@ -902,6 +886,87 @@ static int core_alua_do_transition_tg_pt( tg_pt_gp-tg_pt_gp_id, core_alua_dump_state(tg_pt_gp-tg_pt_gp_alua_previous_state), core_alua_dump_state(tg_pt_gp-tg_pt_gp_alua_pending_state)); + spin_lock(dev-t10_alua.tg_pt_gps_lock); + atomic_dec(tg_pt_gp-tg_pt_gp_ref_cnt); + smp_mb__after_atomic_dec(); + spin_unlock(dev-t10_alua.tg_pt_gps_lock); + + if (tg_pt_gp-tg_pt_gp_transition_complete) + complete(tg_pt_gp-tg_pt_gp_transition_complete); +} +
Re: [PATCH 09/11] target_core_alua: Use workqueue for ALUA transitioning
On Wed, 2013-10-16 at 09:20 +0200, Hannes Reinecke wrote: Use a workqueue for processing ALUA state transitions; this allows us to process implicit delay properly. Signed-off-by: Hannes Reinecke h...@suse.de --- drivers/target/target_core_alua.c | 174 +++--- include/target/target_core_base.h | 4 + 2 files changed, 128 insertions(+), 50 deletions(-) SNIP +static int core_alua_do_transition_tg_pt( + struct t10_alua_tg_pt_gp *tg_pt_gp, + int new_state, + int explicit) +{ + struct se_device *dev = tg_pt_gp-tg_pt_gp_dev; + DECLARE_COMPLETION_ONSTACK(wait); + + /* Nothing to be done here */ + if (atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) == new_state) + return 0; + + if (new_state == ALUA_ACCESS_STATE_TRANSITION) + return -EAGAIN; + + /* + * Flush any pending transitions + */ + if (!explicit tg_pt_gp-tg_pt_gp_implicit_trans_secs + atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state) == + ALUA_ACCESS_STATE_TRANSITION) { + /* Just in case */ + tg_pt_gp-tg_pt_gp_alua_pending_state = new_state; + tg_pt_gp-tg_pt_gp_transition_complete = wait; + flush_delayed_work(tg_pt_gp-tg_pt_gp_transition_work); + wait_for_completion(wait); + tg_pt_gp-tg_pt_gp_transition_complete = NULL; + return 0; + } + + /* + * Save the old primary ALUA access state, and set the current state + * to ALUA_ACCESS_STATE_TRANSITION. + */ + tg_pt_gp-tg_pt_gp_alua_previous_state = + atomic_read(tg_pt_gp-tg_pt_gp_alua_access_state); + tg_pt_gp-tg_pt_gp_alua_pending_state = new_state; + + atomic_set(tg_pt_gp-tg_pt_gp_alua_access_state, + ALUA_ACCESS_STATE_TRANSITION); + tg_pt_gp-tg_pt_gp_alua_access_status = (explicit) ? + ALUA_STATUS_ALTERED_BY_EXPLICIT_STPG : + ALUA_STATUS_ALTERED_BY_IMPLICIT_ALUA; + + /* + * Check for the optional ALUA primary state transition delay + */ + if (tg_pt_gp-tg_pt_gp_trans_delay_msecs != 0) + msleep_interruptible(tg_pt_gp-tg_pt_gp_trans_delay_msecs); + + /* + * Take a reference for workqueue item + */ + spin_lock(dev-t10_alua.tg_pt_gps_lock); + atomic_inc(tg_pt_gp-tg_pt_gp_ref_cnt); + smp_mb__after_atomic_inc(); + spin_unlock(dev-t10_alua.tg_pt_gps_lock); + + if (!explicit tg_pt_gp-tg_pt_gp_implicit_trans_secs) { + unsigned long transition_tmo; + + transition_tmo = tg_pt_gp-tg_pt_gp_implicit_trans_secs * HZ; + queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq, +tg_pt_gp-tg_pt_gp_transition_work, +transition_tmo); + } else { + tg_pt_gp-tg_pt_gp_transition_complete = wait; + queue_delayed_work(tg_pt_gp-tg_pt_gp_dev-tmr_wq, +tg_pt_gp-tg_pt_gp_transition_work, 0); + wait_for_completion(wait); + tg_pt_gp-tg_pt_gp_transition_complete = NULL; + } Mmm, the trans_delay_msecs delay seems a bit out of place now.. How about dropping it's usage with msleep_interruptible() above, and instead combining it with the delay passed into queue_delayed_work()..? --nab -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html