Re: iscsi diagnosis help

Mike Christie Tue, 17 Nov 2009 19:06:24 -0800

Mike Christie wrote:
> Hoot, Joseph wrote:
>> more INLINE below...
>>
>> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote:
>>
>>> Pasi Kärkkäinen wrote:
>>>> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
>>>>> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
>>>>>
>>>>>> thanks.  That helps.  So I know that with the EqualLogic targets, there 
>>>>>> is a "Group IP" which, I believe, responds with an iscsi login_redirect. 
>>>>>>
>>>>>> 1) Could the "Login authentication failed" message be the response 
>>>>>> because of a login redirect messages from the EQL redirect?
>>>>>>
>>>>>> and then my next question is more for curiosity sake:
>>>>>>
>>>>>> 2) Are there plans in the future to have more than one connection per 
>>>>>> session?  and I guess in addition to that, would that mean multiple 
>>>>>> connections to a single volume over the same nic?
>>>>>>
>>>>>>
>>>>> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam 
>>>>> our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 
>>>>> volumes.  In this type of scenario, would you expect to see timeouts like 
>>>>> this once in awhile?  If so, do you think increasing my NOOP timeouts 
>>>>> would assist so we don't get these?  maybe set it to 15 seconds instead 
>>>>> of 10?
>>>>>
>>>> Equallogic does active loadbalancing (redirects) during operation..
>>>> dunno about the errors though.
>>>>
>>> Oh yeah, forgot about that. Thanks Pasi!
>>>
>>> Joseph, look in the EQL target logs for something about the EQL box 
>>> doing load balancing. I think normally we handle the load balancing more 
>>> gracefully, but we might be messing up. I think if EQL was load 
>>> balancing in the open-iscsi logs we would see something about getting a 
>>> async iscsi pdu from the target that asks us to logout. Then when we 
>>> relogin the target would redirect us to the optimal path.
>>
>> There are two things that the EQL does, I believe-- one thing is async 
>> logout, the other is login_redirect.   Unfortunately, from the EQL syslog 
>> side we don't see any errors related to this.  It's my understanding, 
>> however, that when a login is initially attempted to the EQL, it hits the 
>> "group ip" or an alias'd IP sitting on a real nic.  The group IP looks at 
>> all the interfaces on the EQL and decides, based on some algorithm, which 
>> EQL nic the session should connect to.  It then sends the initiator that 
>> made the request a login_redirect, which I thought is basically a "logout 
>> and reconnect" pdu.  It would say, for example, "you're can't log into the 
>> group IP, however, you can log into this IP (a real nic) that it would 
>> prefer you be logged into."
>>
>> I'm thinking that the "failed login" is actually the result of that attempt 
>> to log into the group IP and it sending a login redirect pdu back to it.
>>
> 
> If the target was load balancing us it would:
> 
> - Send a async logout pdu.
> - We then send a logout pdu.
> - When we get the logout response pdu we kill the tcp ip connection
> - We then create a new tcp connection
> - We then log in to the portal that was passed into iscsiadm/iscsid (the 
> one in the DB that you see when you run iscsiadm -m node, which is 
> probably what you call the group IP). For this process we send a login 
> pdu. It then sends a login response pdu with the login redirect 
> response. In this response we also get the new IP to log into.
> - We see that response and kill the tcp connection, and create a new tcp 
> connection to the portal we are being redirected to.
> - We then log into the portal we were redirected to. We again do this by 
> sending a login pdu. This time the login response pdu should be ok and 
> we are done.


Oh yeah, I meant to also say that this is pretty much the same process 
that happens we do the first login, and if we have to relogin because of 
a connection problem like the nop/ping timeout. The only difference in 
those cases is that we do not get the async logout and we do not do a 
logout by sending a logout pdu. We start at the killing tcp ip 
connection step.

So even if we are not getting load balanced we would be in the same 
place in the open-iscsi code when we are getting the login failed errors.


To get back on track solving why we get the nop timeouts then if we are 
not seeing load balancing messages or async logout messages,  it could 
be the open-iscsi bug I mentioned in the other mail. If you can send the 
open-iscsi and kernel info I asked for in the other mail, we can start 
down that path.

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.

Re: iscsi diagnosis help

Reply via email to