Mike Christie wrote:
> v42bis wrote:
>>
>> On Aug 20, 1:39 am, Mike Christie <[EMAIL PROTECTED]> wrote:
>>> v42bis wrote:
>>>> Thank for the reply, Mike.
>>> No problem.
>>>
>>>> The iscsi connections failed about 1m13s after my iscsi target went
>>>> down (timestamps that follow are synced from same ntp master, however
>>>> clock skew may account for a few seconds difference [1m45sec seems
>>>> very conspicuous - a multiplier of default 15sec timers?]). The target
>>>> went down at Aug 19 13:33:33.
>>> Actually this looks like a different problem. What version of open-iscsi
>>> are you using? Do a "iscsiadm -P 3". The top part should dump the
>>> iscsiadm version.
>> `iscsiadm -P 3` just spits out the usage/help information - no
>> version. I know it is version open-iscsi-2.0-865.15, though.
> 
> Ah older versions had private info argument for debugging. It later 
> become stable as -P. Try "iscsiadm -m --info"
> 
> 
>>>> Aug 19 13:36:42 ak1-vz2 kernel: iscsi: scsi conn_destroy(): host_busy
>>>> 0 host_failed 0
>>> This means that userspace decided to kill the iscsi session/connection
>>> which means that we ignore the recovery/replacement timeout and just
>>> kill everything which forces IO errors. We only did this for fatal
>>> errors, but we should not do that anymore.
>> What userspace process would have done that?
> 
> The iscsi userspace daemon that handles iscsi errors and does the 
> login/relogin and session/connection management, iscsid.
> 
> 
>>>> The above did not affect normal operation of my open-iscsi initiators.
>>> That is weirder. In this setup do you have multiple
>>> sessions/connections? When you checked the machine were all the
>>> session/connections running? There should have been two sessions that
>>> were destroyed.
>> Only one session per connection. One connection to each iscsi target.
>>
>> All of the filesystems and iscsi connections seemed fine, as far as I
>> could tell.
>>
>>> In older open-iscsi userspace tools there were certain errors the target
>>> could send us and iscsid would consider it a fatal error and it would
>>> kill the sessions like above. For example if a target was shutting down
>>> it could tell us that it was not coming back, so we would kill the
>>> session. There was also a case where iscsid got confused and thought it
>>> was a fatal error and would kill the session. We now just retry forever
>>> or until the user kills the session manually to avoid problems like this.
>> To confirm: open-iscsi version 2.0-869.2 and above will never kill
>> iscsi sessions unless the user explicitly tells iscsid to logout/kill
> 
> Right.
> 
>> the session? I want to make sure my open-iscsi initiators never return
>> errors until replacement_timeout is reached. I'd rather have any
>> processes accessing filesystems on iscsi hang forever than have the
>> connections lost and journals aborted.
>>
>> Looking at the code, there is no problem with setting such a high
>> replacement_timeout?
> 
> With the kernel time code or iscsi code that handles the timer? As a 
> quick test try setting the timer to 10 days and set the nop times to 5 
> seconds. Unplug the cable and in about 10 seconds you will see the ping 
> timeout message. Then shortly after (within minutes instead of days) 
> that you should see the recovery/replacment timed out message.
>

Actually that is a waste of time. It looks like not everyone is hitting 
it and my be due to the kernel config and having the right timing.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to