v42bis wrote:
> 
> 
> On Aug 20, 1:39 am, Mike Christie <[EMAIL PROTECTED]> wrote:
>> v42bis wrote:
>>> Thank for the reply, Mike.
>> No problem.
>>
>>> The iscsi connections failed about 1m13s after my iscsi target went
>>> down (timestamps that follow are synced from same ntp master, however
>>> clock skew may account for a few seconds difference [1m45sec seems
>>> very conspicuous - a multiplier of default 15sec timers?]). The target
>>> went down at Aug 19 13:33:33.
>> Actually this looks like a different problem. What version of open-iscsi
>> are you using? Do a "iscsiadm -P 3". The top part should dump the
>> iscsiadm version.
> 
> `iscsiadm -P 3` just spits out the usage/help information - no
> version. I know it is version open-iscsi-2.0-865.15, though.

Ah older versions had private info argument for debugging. It later 
become stable as -P. Try "iscsiadm -m --info"


> 
>>> Aug 19 13:36:42 ak1-vz2 kernel: iscsi: scsi conn_destroy(): host_busy
>>> 0 host_failed 0
>> This means that userspace decided to kill the iscsi session/connection
>> which means that we ignore the recovery/replacement timeout and just
>> kill everything which forces IO errors. We only did this for fatal
>> errors, but we should not do that anymore.
> 
> What userspace process would have done that?

The iscsi userspace daemon that handles iscsi errors and does the 
login/relogin and session/connection management, iscsid.


> 
>>> The above did not affect normal operation of my open-iscsi initiators.
>> That is weirder. In this setup do you have multiple
>> sessions/connections? When you checked the machine were all the
>> session/connections running? There should have been two sessions that
>> were destroyed.
> 
> Only one session per connection. One connection to each iscsi target.
> 
> All of the filesystems and iscsi connections seemed fine, as far as I
> could tell.
> 
>> In older open-iscsi userspace tools there were certain errors the target
>> could send us and iscsid would consider it a fatal error and it would
>> kill the sessions like above. For example if a target was shutting down
>> it could tell us that it was not coming back, so we would kill the
>> session. There was also a case where iscsid got confused and thought it
>> was a fatal error and would kill the session. We now just retry forever
>> or until the user kills the session manually to avoid problems like this.
> 
> To confirm: open-iscsi version 2.0-869.2 and above will never kill
> iscsi sessions unless the user explicitly tells iscsid to logout/kill

Right.

> the session? I want to make sure my open-iscsi initiators never return
> errors until replacement_timeout is reached. I'd rather have any
> processes accessing filesystems on iscsi hang forever than have the
> connections lost and journals aborted.
> 
> Looking at the code, there is no problem with setting such a high
> replacement_timeout?

With the kernel time code or iscsi code that handles the timer? As a 
quick test try setting the timer to 10 days and set the nop times to 5 
seconds. Unplug the cable and in about 10 seconds you will see the ping 
timeout message. Then shortly after (within minutes instead of days) 
that you should see the recovery/replacment timed out message.


> 
>> Please tell me you were using a older version than open-iscsi-2.0-869.2
>> :) If you were using open-iscsi-2.0-869.2 then we have a different
>> problem :(
> 
> I am definitely running 2.0-865.15. I will upgrade to 2.0-869.2.
> 
> It would be *very* convenient if the Changelog would include changes
> in every version and not just the current release. :)
> 

Will start that on the next release.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to