Re: iscsi diagnosis help

Hoot, Joseph Wed, 18 Nov 2009 04:16:08 -0800

Sure.  Its the OVM 2.2 environment that I talked with you about a couple of 
days ago.  Here is the info:


[r...@oim6102506 ~]# uname -r
2.6.18-128.2.1.4.9.el5xen
[r...@oim6102506 ~]# rpm -qa | grep iscsi
iscsi-initiator-utils-6.2.0.871-0.7.el5

I have a tcpdump that I had sent to Don WIlliams.  I'll pull that up shortly.

On Nov 17, 2009, at 10:06 PM, Mike Christie wrote:

> Mike Christie wrote:
>> Hoot, Joseph wrote:
>>> more INLINE below...
>>> 
>>> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote:
>>> 
>>>> Pasi Kärkkäinen wrote:
>>>>> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
>>>>>> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
>>>>>> 
>>>>>>> thanks.  That helps.  So I know that with the EqualLogic targets, there 
>>>>>>> is a "Group IP" which, I believe, responds with an iscsi 
>>>>>>> login_redirect. 
>>>>>>> 
>>>>>>> 1) Could the "Login authentication failed" message be the response 
>>>>>>> because of a login redirect messages from the EQL redirect?
>>>>>>> 
>>>>>>> and then my next question is more for curiosity sake:
>>>>>>> 
>>>>>>> 2) Are there plans in the future to have more than one connection per 
>>>>>>> session?  and I guess in addition to that, would that mean multiple 
>>>>>>> connections to a single volume over the same nic?
>>>>>>> 
>>>>>>> 
>>>>>> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam 
>>>>>> our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 
>>>>>> volumes.  In this type of scenario, would you expect to see timeouts 
>>>>>> like this once in awhile?  If so, do you think increasing my NOOP 
>>>>>> timeouts would assist so we don't get these?  maybe set it to 15 seconds 
>>>>>> instead of 10?
>>>>>> 
>>>>> Equallogic does active loadbalancing (redirects) during operation..
>>>>> dunno about the errors though.
>>>>> 
>>>> Oh yeah, forgot about that. Thanks Pasi!
>>>> 
>>>> Joseph, look in the EQL target logs for something about the EQL box 
>>>> doing load balancing. I think normally we handle the load balancing more 
>>>> gracefully, but we might be messing up. I think if EQL was load 
>>>> balancing in the open-iscsi logs we would see something about getting a 
>>>> async iscsi pdu from the target that asks us to logout. Then when we 
>>>> relogin the target would redirect us to the optimal path.
>>> 
>>> There are two things that the EQL does, I believe-- one thing is async 
>>> logout, the other is login_redirect.   Unfortunately, from the EQL syslog 
>>> side we don't see any errors related to this.  It's my understanding, 
>>> however, that when a login is initially attempted to the EQL, it hits the 
>>> "group ip" or an alias'd IP sitting on a real nic.  The group IP looks at 
>>> all the interfaces on the EQL and decides, based on some algorithm, which 
>>> EQL nic the session should connect to.  It then sends the initiator that 
>>> made the request a login_redirect, which I thought is basically a "logout 
>>> and reconnect" pdu.  It would say, for example, "you're can't log into the 
>>> group IP, however, you can log into this IP (a real nic) that it would 
>>> prefer you be logged into."
>>> 
>>> I'm thinking that the "failed login" is actually the result of that attempt 
>>> to log into the group IP and it sending a login redirect pdu back to it.
>>> 
>> 
>> If the target was load balancing us it would:
>> 
>> - Send a async logout pdu.
>> - We then send a logout pdu.
>> - When we get the logout response pdu we kill the tcp ip connection
>> - We then create a new tcp connection
>> - We then log in to the portal that was passed into iscsiadm/iscsid (the 
>> one in the DB that you see when you run iscsiadm -m node, which is 
>> probably what you call the group IP). For this process we send a login 
>> pdu. It then sends a login response pdu with the login redirect 
>> response. In this response we also get the new IP to log into.
>> - We see that response and kill the tcp connection, and create a new tcp 
>> connection to the portal we are being redirected to.
>> - We then log into the portal we were redirected to. We again do this by 
>> sending a login pdu. This time the login response pdu should be ok and 
>> we are done.
> 
> Oh yeah, I meant to also say that this is pretty much the same process 
> that happens we do the first login, and if we have to relogin because of 
> a connection problem like the nop/ping timeout. The only difference in 
> those cases is that we do not get the async logout and we do not do a 
> logout by sending a logout pdu. We start at the killing tcp ip 
> connection step.
> 
> So even if we are not getting load balanced we would be in the same 
> place in the open-iscsi code when we are getting the login failed errors.
> 
> 
> To get back on track solving why we get the nop timeouts then if we are 
> not seeing load balancing messages or async logout messages,  it could 
> be the open-iscsi bug I mentioned in the other mail. If you can send the 
> open-iscsi and kernel info I asked for in the other mail, we can start 
> down that path.
> 
> --
> 
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/open-iscsi?hl=.
> 
> 

===========================
Joseph R. Hoot
Lead System Programmer/Analyst
(w) 716-878-4832
(c) 716-759-HOOT
[email protected]
GPG KEY:   7145F633
===========================

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.

Re: iscsi diagnosis help

Reply via email to