Mike Christie wrote: > v42bis wrote: >> >> On Aug 20, 1:39 am, Mike Christie <[EMAIL PROTECTED]> wrote: >>> v42bis wrote: >>>> Thank for the reply, Mike. >>> No problem. >>> >>>> The iscsi connections failed about 1m13s after my iscsi target went >>>> down (timestamps that follow are synced from same ntp master, however >>>> clock skew may account for a few seconds difference [1m45sec seems >>>> very conspicuous - a multiplier of default 15sec timers?]). The target >>>> went down at Aug 19 13:33:33. >>> Actually this looks like a different problem. What version of open-iscsi >>> are you using? Do a "iscsiadm -P 3". The top part should dump the >>> iscsiadm version. >> `iscsiadm -P 3` just spits out the usage/help information - no >> version. I know it is version open-iscsi-2.0-865.15, though. > > Ah older versions had private info argument for debugging. It later > become stable as -P. Try "iscsiadm -m --info" > > >>>> Aug 19 13:36:42 ak1-vz2 kernel: iscsi: scsi conn_destroy(): host_busy >>>> 0 host_failed 0 >>> This means that userspace decided to kill the iscsi session/connection >>> which means that we ignore the recovery/replacement timeout and just >>> kill everything which forces IO errors. We only did this for fatal >>> errors, but we should not do that anymore. >> What userspace process would have done that? > > The iscsi userspace daemon that handles iscsi errors and does the > login/relogin and session/connection management, iscsid. > > >>>> The above did not affect normal operation of my open-iscsi initiators. >>> That is weirder. In this setup do you have multiple >>> sessions/connections? When you checked the machine were all the >>> session/connections running? There should have been two sessions that >>> were destroyed. >> Only one session per connection. One connection to each iscsi target. >> >> All of the filesystems and iscsi connections seemed fine, as far as I >> could tell. >> >>> In older open-iscsi userspace tools there were certain errors the target >>> could send us and iscsid would consider it a fatal error and it would >>> kill the sessions like above. For example if a target was shutting down >>> it could tell us that it was not coming back, so we would kill the >>> session. There was also a case where iscsid got confused and thought it >>> was a fatal error and would kill the session. We now just retry forever >>> or until the user kills the session manually to avoid problems like this. >> To confirm: open-iscsi version 2.0-869.2 and above will never kill >> iscsi sessions unless the user explicitly tells iscsid to logout/kill > > Right. > >> the session? I want to make sure my open-iscsi initiators never return >> errors until replacement_timeout is reached. I'd rather have any >> processes accessing filesystems on iscsi hang forever than have the >> connections lost and journals aborted. >> >> Looking at the code, there is no problem with setting such a high >> replacement_timeout? > > With the kernel time code or iscsi code that handles the timer? As a > quick test try setting the timer to 10 days and set the nop times to 5 > seconds. Unplug the cable and in about 10 seconds you will see the ping > timeout message. Then shortly after (within minutes instead of days) > that you should see the recovery/replacment timed out message. >
Actually that is a waste of time. It looks like not everyone is hitting it and my be due to the kernel config and having the right timing. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---
