Erez Zilber wrote: > > I enabled open-iscsi logging + added some printk calls when the abort > handler returns. > Here's the log. I see that iscsi_eh_cmd_timed_out gets called, but > there's no abort.
> May 17 11:00:06 kpc36 kernel: session1: iscsi_eh_cmd_timed_out scsi > cmd ffff8101e30efe40 timedout > May 17 11:00:06 kpc36 kernel: session1: iscsi_eh_cmd_timed_out return > timer reset As you can see in iscsi_eh_cmd_timed_out, if the sesison is down then there is no point in letting the scsi eh run since we have to relogin and restart commands so we would return reset timer which prevents the scsi eh from running. And then there is code in there to check if we are in the middle of checking the connection. If we are then we ask for some more time with the command, and that will prevent the scsi eh from running. This looks like it can be problem because we would get a response to our nop which would update the last_recv field. If there was no progress being made for the scsi command we would still ask to reset the timer and we could end up in that loop forever since the scsi layer does not cap the number of times you can reset the time. I will send a patch to fix that. However, that probably will not fix your problem. For your specific setup, it looks like we hit the iscsi_eh_cmd_timed_out, reset the scsi command timer becuase we are in the middle of checking the the connection with the nop/ping, but then the nop/ping does not return in time and so we drop the session: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4526718494, last ping 4526723494, now 4526728494 That is why on the target you see it cleanup up commands. On the initiator you can see us cleaning up: May 17 11:00:07 kpc36 kernel: session1: iscsi_start_session_recovery blocking session May 17 11:00:07 kpc36 kernel: session1: fail_scsi_tasks failing sc ffff8101e30efe40 itt 0x13 state 3 And then later in the logs you will see us start the commands again when we are logged in again. So you probably need to continue to replying to nops when the r2t is dropped. I will fix it on the initiatotr side to detect if we are not getting IO for a specific command and then let the scsi eh run. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---
