Re: Add better detection of if a command is not going to complete

Mike Christie Tue, 19 May 2009 07:29:29 -0700

Mike Christie wrote:
> Hey Hannes,
> 
> This will not fix any hangs after the scsi eh or iscsi eh has fired, but 
> I think this patch will help prevent the scsi eh from firing when we do 
> not need it to like you have seen in some bugzillas. The patch was made 
> over the my iscsi tree. It should also apply to scsi-rc-fixes with the 
> patches I sent the other day.
> 
> I modified our command timedout handler so if a command has made some 
> progress since the last timeout or if it is just getting started (it has 
> been put on the wire but we have not yet got anything for it), then we 
> will ask for some more time to run it.
> 
> This is helping here for these problems:
> 1. sending more IO than the disk/target can handle
> 2. using a shorter scsi cmd timeout with a slower link
> 
> I am going to combine this with those change_queue_depth patches if they 
> are ok upstream, and in the end also add lpfc/qla2xxx's rampup/rampdown 
> code to scsi-ml. So basically if we determine if we are sending too many 
> IOs, then we can call some helper rampdown code to drop the queue depth 
> for the user. If however it was a transient problem, the common ramp up 
> code will detect it and increase it again.
> 
> I think with the combo rampup/rampdown and the modified 
> iscsi_eh_cmd_timed_out in this patch, it should fix a lot of problems we 
> see where the scsi-eh runs when it should not.
> 
> > 
>



@@ -1361,6 +1361,9 @@ static inline struct iscsi_task 
*iscsi_alloc_task(struct iscsi_conn *conn,
        task->state = ISCSI_TASK_PENDING;
        task->conn = conn;
        task->sc = sc;
+       task->have_checked_conn = 0;
+       task->last_timeout = jiffies;
+       task->last_recv = jiffies;
        INIT_LIST_HEAD(&task->running);
        return task;
  }


+        * If we have processed a PDU for the command since the last
+        * timeout then ask for more time.
+        */
+       if (time_after_eq(task->last_recv, task->last_timeout)) {


Oh yeah, for the case where the command has been sent but we have not 
got anything for it yet, we just give it one more cmd timeouts worth of 
time. That is where the eq part of the test above is commonly hit (when 
the task is allocate they are set to the same value). If the command 
times out again and we still have not got any data then we will let the 
scsi eh have it. In the future I was thinking that when we first detect 
this (before we have scsi ml reset the timer), we should decrease the 
queue_depth using the rampdown code I want to add to scsi-ml (the code 
posted yesterday only ramps down the depth when seeing a QUEUE_FULL so I 
would add a helper for use to call).


+               ISCSI_DBG_CONN(conn, "Command making progress. Asking "
+                              "scsi-ml for more time to complete. "
+                              "Last data recv at %lu. Last timeout was at "
+                              "%lu\n.", task->last_recv,



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Re: Add better detection of if a command is not going to complete

Reply via email to