Re: [PATCH 10/20] qla2xxx: Fix interaction issue between qla2xxx and Target Core Module

2015-12-14 Thread Christoph Hellwig
On Wed, Dec 09, 2015 at 10:07:32PM +, Quinn Tran wrote:
> >Err, no.  Looking into the refcount inside a kref is never the
> >right thing to do.
> 
> QT> even for debug purpose??

No.  Please treat struct kref as opaque.

> QT> These bits provide indication as to where the command has traversed in
> the QLA code.  Each bit is set one time. Due to the async nature of the
> TMR code, it triggers QLA driver to repeat this specific free path in the
> double free case.  This BUG_ON allows us trap it early on.
> 
> In one of the corner case (below), I need to overloaded it + lock for the
> cleanup process.

Setting bits fundamentaly is a read/modify/write cycle.  You either
need to use {set,clear,test}_bit or lock around these manipulations.

> QT> The cmd->aborted flag is used to track the CMD_T_ABORT flag at TCM
> level.  If the command have been requested to be aborted by TCM or already
> aborted, we advance it to the ?free" state because our hardware have
> already started freeing up resources associated to this command/exchange.
> In this specific case(above), a XFER RDY was aborted by the TMR.
> Returning the cmd to TCM to generate SCSI Status would generate erroneous
> HW error due to freed resource.

I really think this nees to be updated on top of Bat's changes as a
start and re-reviewed.  The amoutn of special casing and second guessing
here is simply not sustainable in the long run.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/20] qla2xxx: Fix interaction issue between qla2xxx and Target Core Module

2015-12-14 Thread Quinn Tran
Christoph,  Thanks for reviewing.  I¹ll withdraw this patch.  Will rework
with new code and submit at a later time.

Regards,
Quinn Tran




On 12/14/15, 2:34 AM, "Christoph Hellwig"  wrote:

>On Wed, Dec 09, 2015 at 10:07:32PM +, Quinn Tran wrote:
>> >Err, no.  Looking into the refcount inside a kref is never the
>> >right thing to do.
>> 
>> QT> even for debug purpose??
>
>No.  Please treat struct kref as opaque.
>
>> QT> These bits provide indication as to where the command has traversed
>>in
>> the QLA code.  Each bit is set one time. Due to the async nature of the
>> TMR code, it triggers QLA driver to repeat this specific free path in
>>the
>> double free case.  This BUG_ON allows us trap it early on.
>> 
>> In one of the corner case (below), I need to overloaded it + lock for
>>the
>> cleanup process.
>
>Setting bits fundamentaly is a read/modify/write cycle.  You either
>need to use {set,clear,test}_bit or lock around these manipulations.
>
>> QT> The cmd->aborted flag is used to track the CMD_T_ABORT flag at TCM
>> level.  If the command have been requested to be aborted by TCM or
>>already
>> aborted, we advance it to the ?free" state because our hardware have
>> already started freeing up resources associated to this
>>command/exchange.
>> In this specific case(above), a XFER RDY was aborted by the TMR.
>> Returning the cmd to TCM to generate SCSI Status would generate
>>erroneous
>> HW error due to freed resource.
>
>I really think this nees to be updated on top of Bat's changes as a
>start and re-reviewed.  The amoutn of special casing and second guessing
>here is simply not sustainable in the long run.

<>

Re: [PATCH 10/20] qla2xxx: Fix interaction issue between qla2xxx and Target Core Module

2015-12-09 Thread Quinn Tran

On 12/7/15, 6:37 PM, "target-devel-ow...@vger.kernel.org on behalf of
Christoph Hellwig"  wrote:

>> -void qlt_abort_cmd(struct qla_tgt_cmd *cmd)
>> +int qlt_abort_cmd(struct qla_tgt_cmd *cmd)
>>  {
>>  struct qla_tgt *tgt = cmd->tgt;
>>  struct scsi_qla_host *vha = tgt->vha;
>>  struct se_cmd *se_cmd = >se_cmd;
>> +unsigned long flags,refcount;
>>  
>>  ql_dbg(ql_dbg_tgt_mgt, vha, 0xf014,
>>  "qla_target(%d): terminating exchange for aborted cmd=%p "
>>  "(se_cmd=%p, tag=%llu)", vha->vp_idx, cmd, >se_cmd,
>>  se_cmd->tag);
>>  
>> +spin_lock_irqsave(>cmd_lock, flags);
>> +if (cmd->aborted) {
>> +spin_unlock_irqrestore(>cmd_lock, flags);
>> +
>> +/* It's normal to see 2 calls in this path:
>> + *  1) XFER Rdy completion + CMD_T_ABORT
>> + *  2) TCM TMR - drain_state_list
>> + */
>> +refcount = atomic_read(>se_cmd.cmd_kref.refcount);
>> +ql_dbg(ql_dbg_tgt_mgt, vha, 0x,
>> +   "multiple abort. %p refcount %lx"
>> +   "transport_state %x, t_state %x, se_cmd_flags %x \n",
>> +   cmd, refcount,cmd->se_cmd.transport_state,
>> +   cmd->se_cmd.t_state,cmd->se_cmd.se_cmd_flags);
>> +
>> +return EIO;
>> +}
>
>Err, no.  Looking into the refcount inside a kref is never the
>right thing to do.

QT> even for debug purpose??

>
>> +typedef enum {
>> +/*
>> + * BIT_0 - Atio Arrival / schedule to work
>> + * BIT_1 - qlt_do_work
>> + * BIT_2 - qlt_do work failed
>> + * BIT_3 - xfer rdy/tcm_qla2xxx_write_pending
>> + * BIT_4 - read respond/tcm_qla2xx_queue_data_in
>> + * BIT_5 - status respond / tcm_qla2xx_queue_status
>> + * BIT_6 - tcm request to abort/Term exchange.
>> + *  pre_xmit_response->qlt_send_term_exchange
>> + * BIT_7 - SRR received (qlt_handle_srr->qlt_xmit_response)
>> + * BIT_8 - SRR received (qlt_handle_srr->qlt_rdy_to_xfer)
>> + * BIT_9 - SRR received (qla_handle_srr->qlt_send_term_exchange)
>> + * BIT_10 - Data in - hanlde_data->tcm_qla2xxx_handle_data
>> +
>> + * BIT_12 - good completion - qlt_ctio_do_completion -->free_cmd
>> + * BIT_13 - Bad completion -
>> + *  qlt_ctio_do_completion --> qlt_term_ctio_exchange
>> + * BIT_14 - Back end data received/sent.
>> + * BIT_15 - SRR prepare ctio
>> + * BIT_16 - complete free
>> + * BIT_17 - flush - qlt_abort_cmd_on_host_reset
>> + * BIT_18 - completion w/abort status
>> + * BIT_19 - completion w/unknown status
>> + * BIT_20 - tcm_qla2xxx_free_cmd
>
>Please use descriptive names for these flags in the source code!

QT> ACK.  We¹ll change the bits to more descriptive name in a ³follow on²
patch.

>
>> +BUG_ON(cmd->cmd_flags & BIT_20);
>> +cmd->cmd_flags |= BIT_20;
>> +
>
>And no crazieness like this.  While we're at it: what synchronizes
>access to ->cmd_flags?

QT> These bits provide indication as to where the command has traversed in
the QLA code.  Each bit is set one time. Due to the async nature of the
TMR code, it triggers QLA driver to repeat this specific free path in the
double free case.  This BUG_ON allows us trap it early on.

In one of the corner case (below), I need to overloaded it + lock for the
cleanup process.

>
>> @@ -466,13 +484,25 @@ static int tcm_qla2xxx_handle_cmd(scsi_qla_host_t
>>*vha, struct qla_tgt_cmd *cmd,
>>  static void tcm_qla2xxx_handle_data_work(struct work_struct *work)
>>  {
>>  struct qla_tgt_cmd *cmd = container_of(work, struct qla_tgt_cmd,
>>work);
>> +unsigned long flags;
>>  
>>  /*
>>   * Ensure that the complete FCP WRITE payload has been received.
>>   * Otherwise return an exception via CHECK_CONDITION status.
>>   */
>>  cmd->cmd_in_wq = 0;
>> -cmd->cmd_flags |= BIT_11;
>> +
>> +spin_lock_irqsave(>cmd_lock, flags);
>> +cmd->cmd_flags |= CMD_FLAG_DATA_WORK;
>> +if (cmd->aborted) {
>> +cmd->cmd_flags |= CMD_FLAG_DATA_WORK_FREE;
>> +spin_unlock_irqrestore(>cmd_lock, flags);
>> +
>> +tcm_qla2xxx_free_cmd(cmd);
>> +return;
>> +}
>> +spin_unlock_irqrestore(>cmd_lock, flags);
>
>All these abort flag hacks look very suspicios.  Can you explain the
>exact theory of operation behind them?

QT> The cmd->aborted flag is used to track the CMD_T_ABORT flag at TCM
level.  If the command have been requested to be aborted by TCM or already
aborted, we advance it to the ³free" state because our hardware have
already started freeing up resources associated to this command/exchange.
In this specific case(above), a XFER RDY was aborted by the TMR.
Returning the cmd to TCM to generate SCSI Status would generate erroneous
HW error due to freed resource.


>
>--
>To unsubscribe from this list: send the line "unsubscribe target-devel" in
>the body of a message to majord...@vger.kernel.org
>More 

Re: [PATCH 10/20] qla2xxx: Fix interaction issue between qla2xxx and Target Core Module

2015-12-09 Thread Quinn Tran
Hannes,

ACK.  We¹ll move the flags to bitops in the "follow on" patch to clean it
up.  Those flags was introduced from a different patch. Will move the few
overloaded flag to bit field.

However, getting rid of the spin lock would prove tricky because the code
is trying to serialize the cleanup.  With out the lock, we kept hitting
multiple free problem.

Regards,
Quinn Tran




On 12/8/15, 11:01 PM, "target-devel-ow...@vger.kernel.org on behalf of
Hannes Reinecke"  wrote:

>>+
>>  }
>>  
>>  static void tcm_qla2xxx_clear_sess_lookup(struct tcm_qla2xxx_lport *,
>Have you considered moving to bit ops when modifying cmd_flags?
>I guess you can also move the ->aborted bit into the bit field, and
>could get rid of some of the spinlocks ...
>
>Cheers,
>
>Hannes
>-- 
>Dr. Hannes Reinecke  zSeries & Storage
>h...@suse.de +49 911 74053 688
>SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
>GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
>--

<>

Re: [PATCH 10/20] qla2xxx: Fix interaction issue between qla2xxx and Target Core Module

2015-12-08 Thread Hannes Reinecke
On 12/08/2015 01:48 AM, Himanshu Madhani wrote:
> From: Quinn Tran 
> 
> During lun reset, TMR thread from TCM would issue abort
> to qla driver.  At abort time, each command is in different
> state.  Depending on the state, qla will use the TMR thread
> to trigger a command free(cmd_kref--) if command is not
> down at firmware.
> 
> Signed-off-by: Quinn Tran 
> Signed-off-by: Himanshu Madhani 
> ---
>  drivers/scsi/qla2xxx/qla_target.c  |   60 +
>  drivers/scsi/qla2xxx/qla_target.h  |   59 +
>  drivers/scsi/qla2xxx/tcm_qla2xxx.c |   73 ++-
>  3 files changed, 147 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_target.c 
> b/drivers/scsi/qla2xxx/qla_target.c
> index 638940f..4d42b79 100644
> --- a/drivers/scsi/qla2xxx/qla_target.c
> +++ b/drivers/scsi/qla2xxx/qla_target.c
> @@ -105,7 +105,7 @@ static void qlt_response_pkt(struct scsi_qla_host *ha, 
> response_t *pkt);
>  static int qlt_issue_task_mgmt(struct qla_tgt_sess *sess, uint32_t lun,
>   int fn, void *iocb, int flags);
>  static void qlt_send_term_exchange(struct scsi_qla_host *ha, struct 
> qla_tgt_cmd
> - *cmd, struct atio_from_isp *atio, int ha_locked);
> + *cmd, struct atio_from_isp *atio, int ha_locked, int ul_abort);
>  static void qlt_reject_free_srr_imm(struct scsi_qla_host *ha,
>   struct qla_tgt_srr_imm *imm, int ha_lock);
>  static void qlt_abort_cmd_on_host_reset(struct scsi_qla_host *vha,
> @@ -2646,7 +2646,7 @@ int qlt_xmit_response(struct qla_tgt_cmd *cmd, int 
> xmit_type,
>   /* no need to terminate. FW already freed exchange. */
>   qlt_abort_cmd_on_host_reset(cmd->vha, cmd);
>   else
> - qlt_send_term_exchange(vha, cmd, >atio, 1);
> + qlt_send_term_exchange(vha, cmd, >atio, 1, 0);
>   spin_unlock_irqrestore(>hardware_lock, flags);
>   return 0;
>   }
> @@ -3154,7 +3154,8 @@ static int __qlt_send_term_exchange(struct 
> scsi_qla_host *vha,
>  }
>  
>  static void qlt_send_term_exchange(struct scsi_qla_host *vha,
> - struct qla_tgt_cmd *cmd, struct atio_from_isp *atio, int ha_locked)
> + struct qla_tgt_cmd *cmd, struct atio_from_isp *atio, int ha_locked,
> + int ul_abort)
>  {
>   unsigned long flags = 0;
>   int rc;
> @@ -3174,8 +3175,7 @@ static void qlt_send_term_exchange(struct scsi_qla_host 
> *vha,
>   qlt_alloc_qfull_cmd(vha, atio, 0, 0);
>  
>  done:
> - if (cmd && (!cmd->aborted ||
> - !cmd->cmd_sent_to_fw)) {
> + if (cmd && !ul_abort && !cmd->aborted) {
>   if (cmd->sg_mapped)
>   qlt_unmap_sg(vha, cmd);
>   vha->hw->tgt.tgt_ops->free_cmd(cmd);
> @@ -3234,21 +3234,43 @@ static void qlt_chk_exch_leak_thresh_hold(struct 
> scsi_qla_host *vha)
>  
>  }
>  
> -void qlt_abort_cmd(struct qla_tgt_cmd *cmd)
> +int qlt_abort_cmd(struct qla_tgt_cmd *cmd)
>  {
>   struct qla_tgt *tgt = cmd->tgt;
>   struct scsi_qla_host *vha = tgt->vha;
>   struct se_cmd *se_cmd = >se_cmd;
> + unsigned long flags,refcount;
>  
>   ql_dbg(ql_dbg_tgt_mgt, vha, 0xf014,
>   "qla_target(%d): terminating exchange for aborted cmd=%p "
>   "(se_cmd=%p, tag=%llu)", vha->vp_idx, cmd, >se_cmd,
>   se_cmd->tag);
>  
> +spin_lock_irqsave(>cmd_lock, flags);
> +if (cmd->aborted) {
> +spin_unlock_irqrestore(>cmd_lock, flags);
> +
> +/* It's normal to see 2 calls in this path:
> + *  1) XFER Rdy completion + CMD_T_ABORT
> + *  2) TCM TMR - drain_state_list
> + */
> +refcount = atomic_read(>se_cmd.cmd_kref.refcount);
> +ql_dbg(ql_dbg_tgt_mgt, vha, 0x,
> +   "multiple abort. %p refcount %lx"
> +   "transport_state %x, t_state %x, se_cmd_flags %x \n",
> +   cmd, refcount,cmd->se_cmd.transport_state,
> +   cmd->se_cmd.t_state,cmd->se_cmd.se_cmd_flags);
> +
> +return EIO;
> +}
> +
>   cmd->aborted = 1;
>   cmd->cmd_flags |= BIT_6;
> +spin_unlock_irqrestore(>cmd_lock, flags);
> +
> + qlt_send_term_exchange(vha, cmd, >atio, 0, 1);
>  
> - qlt_send_term_exchange(vha, cmd, >atio, 0);
> + return 0;
>  }
>  EXPORT_SYMBOL(qlt_abort_cmd);
>  
> @@ -3263,6 +3285,9 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
>  
>   BUG_ON(cmd->cmd_in_wq);
>  
> + if (cmd->sg_mapped)
> + qlt_unmap_sg(cmd->vha, cmd);
> +
>   if (!cmd->q_full)
>   qlt_decr_num_pend_cmds(cmd->vha);
>  
> @@ -3380,7 +3405,7 @@ static int qlt_term_ctio_exchange(struct scsi_qla_host 
> *vha, void *ctio,
>   term = 1;
>  
>   if (term)
> - qlt_send_term_exchange(vha, cmd, >atio, 1);
> + qlt_send_term_exchange(vha, cmd, >atio, 1, 0);
>  
> 

Re: [PATCH 10/20] qla2xxx: Fix interaction issue between qla2xxx and Target Core Module

2015-12-07 Thread Christoph Hellwig
> -void qlt_abort_cmd(struct qla_tgt_cmd *cmd)
> +int qlt_abort_cmd(struct qla_tgt_cmd *cmd)
>  {
>   struct qla_tgt *tgt = cmd->tgt;
>   struct scsi_qla_host *vha = tgt->vha;
>   struct se_cmd *se_cmd = >se_cmd;
> + unsigned long flags,refcount;
>  
>   ql_dbg(ql_dbg_tgt_mgt, vha, 0xf014,
>   "qla_target(%d): terminating exchange for aborted cmd=%p "
>   "(se_cmd=%p, tag=%llu)", vha->vp_idx, cmd, >se_cmd,
>   se_cmd->tag);
>  
> +spin_lock_irqsave(>cmd_lock, flags);
> +if (cmd->aborted) {
> +spin_unlock_irqrestore(>cmd_lock, flags);
> +
> +/* It's normal to see 2 calls in this path:
> + *  1) XFER Rdy completion + CMD_T_ABORT
> + *  2) TCM TMR - drain_state_list
> + */
> +refcount = atomic_read(>se_cmd.cmd_kref.refcount);
> +ql_dbg(ql_dbg_tgt_mgt, vha, 0x,
> +   "multiple abort. %p refcount %lx"
> +   "transport_state %x, t_state %x, se_cmd_flags %x \n",
> +   cmd, refcount,cmd->se_cmd.transport_state,
> +   cmd->se_cmd.t_state,cmd->se_cmd.se_cmd_flags);
> +
> +return EIO;
> +}

Err, no.  Looking into the refcount inside a kref is never the
right thing to do.

> +typedef enum {
> + /*
> +  * BIT_0 - Atio Arrival / schedule to work
> +  * BIT_1 - qlt_do_work
> +  * BIT_2 - qlt_do work failed
> +  * BIT_3 - xfer rdy/tcm_qla2xxx_write_pending
> +  * BIT_4 - read respond/tcm_qla2xx_queue_data_in
> +  * BIT_5 - status respond / tcm_qla2xx_queue_status
> +  * BIT_6 - tcm request to abort/Term exchange.
> +  *  pre_xmit_response->qlt_send_term_exchange
> +  * BIT_7 - SRR received (qlt_handle_srr->qlt_xmit_response)
> +  * BIT_8 - SRR received (qlt_handle_srr->qlt_rdy_to_xfer)
> +  * BIT_9 - SRR received (qla_handle_srr->qlt_send_term_exchange)
> +  * BIT_10 - Data in - hanlde_data->tcm_qla2xxx_handle_data
> +
> +  * BIT_12 - good completion - qlt_ctio_do_completion -->free_cmd
> +  * BIT_13 - Bad completion -
> +  *  qlt_ctio_do_completion --> qlt_term_ctio_exchange
> +  * BIT_14 - Back end data received/sent.
> +  * BIT_15 - SRR prepare ctio
> +  * BIT_16 - complete free
> +  * BIT_17 - flush - qlt_abort_cmd_on_host_reset
> +  * BIT_18 - completion w/abort status
> +  * BIT_19 - completion w/unknown status
> +  * BIT_20 - tcm_qla2xxx_free_cmd

Please use descriptive names for these flags in the source code!

> + BUG_ON(cmd->cmd_flags & BIT_20);
> + cmd->cmd_flags |= BIT_20;
> +

And no crazieness like this.  While we're at it: what synchronizes
access to ->cmd_flags?

> @@ -466,13 +484,25 @@ static int tcm_qla2xxx_handle_cmd(scsi_qla_host_t *vha, 
> struct qla_tgt_cmd *cmd,
>  static void tcm_qla2xxx_handle_data_work(struct work_struct *work)
>  {
>   struct qla_tgt_cmd *cmd = container_of(work, struct qla_tgt_cmd, work);
> + unsigned long flags;
>  
>   /*
>* Ensure that the complete FCP WRITE payload has been received.
>* Otherwise return an exception via CHECK_CONDITION status.
>*/
>   cmd->cmd_in_wq = 0;
> - cmd->cmd_flags |= BIT_11;
> +
> + spin_lock_irqsave(>cmd_lock, flags);
> + cmd->cmd_flags |= CMD_FLAG_DATA_WORK;
> + if (cmd->aborted) {
> + cmd->cmd_flags |= CMD_FLAG_DATA_WORK_FREE;
> + spin_unlock_irqrestore(>cmd_lock, flags);
> +
> + tcm_qla2xxx_free_cmd(cmd);
> + return;
> + }
> + spin_unlock_irqrestore(>cmd_lock, flags);

All these abort flag hacks look very suspicios.  Can you explain the
exact theory of operation behind them?

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/20] qla2xxx: Fix interaction issue between qla2xxx and Target Core Module

2015-12-07 Thread Himanshu Madhani
From: Quinn Tran 

During lun reset, TMR thread from TCM would issue abort
to qla driver.  At abort time, each command is in different
state.  Depending on the state, qla will use the TMR thread
to trigger a command free(cmd_kref--) if command is not
down at firmware.

Signed-off-by: Quinn Tran 
Signed-off-by: Himanshu Madhani 
---
 drivers/scsi/qla2xxx/qla_target.c  |   60 +
 drivers/scsi/qla2xxx/qla_target.h  |   59 +
 drivers/scsi/qla2xxx/tcm_qla2xxx.c |   73 ++-
 3 files changed, 147 insertions(+), 45 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_target.c 
b/drivers/scsi/qla2xxx/qla_target.c
index 638940f..4d42b79 100644
--- a/drivers/scsi/qla2xxx/qla_target.c
+++ b/drivers/scsi/qla2xxx/qla_target.c
@@ -105,7 +105,7 @@ static void qlt_response_pkt(struct scsi_qla_host *ha, 
response_t *pkt);
 static int qlt_issue_task_mgmt(struct qla_tgt_sess *sess, uint32_t lun,
int fn, void *iocb, int flags);
 static void qlt_send_term_exchange(struct scsi_qla_host *ha, struct qla_tgt_cmd
-   *cmd, struct atio_from_isp *atio, int ha_locked);
+   *cmd, struct atio_from_isp *atio, int ha_locked, int ul_abort);
 static void qlt_reject_free_srr_imm(struct scsi_qla_host *ha,
struct qla_tgt_srr_imm *imm, int ha_lock);
 static void qlt_abort_cmd_on_host_reset(struct scsi_qla_host *vha,
@@ -2646,7 +2646,7 @@ int qlt_xmit_response(struct qla_tgt_cmd *cmd, int 
xmit_type,
/* no need to terminate. FW already freed exchange. */
qlt_abort_cmd_on_host_reset(cmd->vha, cmd);
else
-   qlt_send_term_exchange(vha, cmd, >atio, 1);
+   qlt_send_term_exchange(vha, cmd, >atio, 1, 0);
spin_unlock_irqrestore(>hardware_lock, flags);
return 0;
}
@@ -3154,7 +3154,8 @@ static int __qlt_send_term_exchange(struct scsi_qla_host 
*vha,
 }
 
 static void qlt_send_term_exchange(struct scsi_qla_host *vha,
-   struct qla_tgt_cmd *cmd, struct atio_from_isp *atio, int ha_locked)
+   struct qla_tgt_cmd *cmd, struct atio_from_isp *atio, int ha_locked,
+   int ul_abort)
 {
unsigned long flags = 0;
int rc;
@@ -3174,8 +3175,7 @@ static void qlt_send_term_exchange(struct scsi_qla_host 
*vha,
qlt_alloc_qfull_cmd(vha, atio, 0, 0);
 
 done:
-   if (cmd && (!cmd->aborted ||
-   !cmd->cmd_sent_to_fw)) {
+   if (cmd && !ul_abort && !cmd->aborted) {
if (cmd->sg_mapped)
qlt_unmap_sg(vha, cmd);
vha->hw->tgt.tgt_ops->free_cmd(cmd);
@@ -3234,21 +3234,43 @@ static void qlt_chk_exch_leak_thresh_hold(struct 
scsi_qla_host *vha)
 
 }
 
-void qlt_abort_cmd(struct qla_tgt_cmd *cmd)
+int qlt_abort_cmd(struct qla_tgt_cmd *cmd)
 {
struct qla_tgt *tgt = cmd->tgt;
struct scsi_qla_host *vha = tgt->vha;
struct se_cmd *se_cmd = >se_cmd;
+   unsigned long flags,refcount;
 
ql_dbg(ql_dbg_tgt_mgt, vha, 0xf014,
"qla_target(%d): terminating exchange for aborted cmd=%p "
"(se_cmd=%p, tag=%llu)", vha->vp_idx, cmd, >se_cmd,
se_cmd->tag);
 
+spin_lock_irqsave(>cmd_lock, flags);
+if (cmd->aborted) {
+spin_unlock_irqrestore(>cmd_lock, flags);
+
+/* It's normal to see 2 calls in this path:
+ *  1) XFER Rdy completion + CMD_T_ABORT
+ *  2) TCM TMR - drain_state_list
+ */
+refcount = atomic_read(>se_cmd.cmd_kref.refcount);
+ql_dbg(ql_dbg_tgt_mgt, vha, 0x,
+   "multiple abort. %p refcount %lx"
+   "transport_state %x, t_state %x, se_cmd_flags %x \n",
+   cmd, refcount,cmd->se_cmd.transport_state,
+   cmd->se_cmd.t_state,cmd->se_cmd.se_cmd_flags);
+
+return EIO;
+}
+
cmd->aborted = 1;
cmd->cmd_flags |= BIT_6;
+spin_unlock_irqrestore(>cmd_lock, flags);
+
+   qlt_send_term_exchange(vha, cmd, >atio, 0, 1);
 
-   qlt_send_term_exchange(vha, cmd, >atio, 0);
+   return 0;
 }
 EXPORT_SYMBOL(qlt_abort_cmd);
 
@@ -3263,6 +3285,9 @@ void qlt_free_cmd(struct qla_tgt_cmd *cmd)
 
BUG_ON(cmd->cmd_in_wq);
 
+   if (cmd->sg_mapped)
+   qlt_unmap_sg(cmd->vha, cmd);
+
if (!cmd->q_full)
qlt_decr_num_pend_cmds(cmd->vha);
 
@@ -3380,7 +3405,7 @@ static int qlt_term_ctio_exchange(struct scsi_qla_host 
*vha, void *ctio,
term = 1;
 
if (term)
-   qlt_send_term_exchange(vha, cmd, >atio, 1);
+   qlt_send_term_exchange(vha, cmd, >atio, 1, 0);
 
return term;
 }
@@ -3735,6 +3760,7 @@ static void __qlt_do_work(struct qla_tgt_cmd *cmd)
goto out_term;
}
 
+   spin_lock_init(>cmd_lock);
cdb = >u.isp24.fcp_cmnd.cdb[0];