Re: iscsi diagnosis help

Mike Christie Wed, 18 Nov 2009 16:45:21 -0800

ccing Guru Anbalagane.

Hoot, Joseph wrote:
> Sure.  Its the OVM 2.2 environment that I talked with you about a couple of 
> days ago.  Here is the info:
>


Doh! I forgot.

Guru, I am trying to make a patch for your Oracle VM kernel. I want to 
port the patch I told you about
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4c48a82935f833d94fcf44c2b0c5d2922acfc77a;hp=d1acfae514425d680912907c6554852f1e258551
and this one:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1acfae514425d680912907c6554852f1e258551
if you do not have it (could not remember if you did or not).

I found the src rpm:
http://edelivery.oracle.com/EPD/Download/get_form?egroup_aru_number=11874896

The problem is that I keep getting RPM build errors when I try to build 
the project:

rpm -ivh kernel-2.6.18-128.2.1.4.9.el5.src.rpm
rpmbuild -bp --target=noarch /usr/src/redhat/SPECS/kernel-2.6.spec

... a bunch of stuff then

make[1]: *** [nonint_oldconfig] Error 15
make: *** [nonint_oldconfig] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.94270 (%prep)


RPM build errors:
     Bad exit status from /var/tmp/rpm-tmp.94270 (%prep)
(have also tried without the --target arg and with different archs and 
always fails)

This is how I setup the RHEL kernel source rpm. Are the commands for the 
Oracle VM kernel different?



> [r...@oim6102506 ~]# uname -r
> 2.6.18-128.2.1.4.9.el5xen
> [r...@oim6102506 ~]# rpm -qa | grep iscsi
> iscsi-initiator-utils-6.2.0.871-0.7.el5
> 
> I have a tcpdump that I had sent to Don WIlliams.  I'll pull that up shortly.
> 
> On Nov 17, 2009, at 10:06 PM, Mike Christie wrote:
> 
>> Mike Christie wrote:
>>> Hoot, Joseph wrote:
>>>> more INLINE below...
>>>>
>>>> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote:
>>>>
>>>>> Pasi Kärkkäinen wrote:
>>>>>> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
>>>>>>> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
>>>>>>>
>>>>>>>> thanks.  That helps.  So I know that with the EqualLogic targets, 
>>>>>>>> there is a "Group IP" which, I believe, responds with an iscsi 
>>>>>>>> login_redirect. 
>>>>>>>>
>>>>>>>> 1) Could the "Login authentication failed" message be the response 
>>>>>>>> because of a login redirect messages from the EQL redirect?
>>>>>>>>
>>>>>>>> and then my next question is more for curiosity sake:
>>>>>>>>
>>>>>>>> 2) Are there plans in the future to have more than one connection per 
>>>>>>>> session?  and I guess in addition to that, would that mean multiple 
>>>>>>>> connections to a single volume over the same nic?
>>>>>>>>
>>>>>>>>
>>>>>>> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam 
>>>>>>> our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 
>>>>>>> volumes.  In this type of scenario, would you expect to see timeouts 
>>>>>>> like this once in awhile?  If so, do you think increasing my NOOP 
>>>>>>> timeouts would assist so we don't get these?  maybe set it to 15 
>>>>>>> seconds instead of 10?
>>>>>>>
>>>>>> Equallogic does active loadbalancing (redirects) during operation..
>>>>>> dunno about the errors though.
>>>>>>
>>>>> Oh yeah, forgot about that. Thanks Pasi!
>>>>>
>>>>> Joseph, look in the EQL target logs for something about the EQL box 
>>>>> doing load balancing. I think normally we handle the load balancing more 
>>>>> gracefully, but we might be messing up. I think if EQL was load 
>>>>> balancing in the open-iscsi logs we would see something about getting a 
>>>>> async iscsi pdu from the target that asks us to logout. Then when we 
>>>>> relogin the target would redirect us to the optimal path.
>>>> There are two things that the EQL does, I believe-- one thing is async 
>>>> logout, the other is login_redirect.   Unfortunately, from the EQL syslog 
>>>> side we don't see any errors related to this.  It's my understanding, 
>>>> however, that when a login is initially attempted to the EQL, it hits the 
>>>> "group ip" or an alias'd IP sitting on a real nic.  The group IP looks at 
>>>> all the interfaces on the EQL and decides, based on some algorithm, which 
>>>> EQL nic the session should connect to.  It then sends the initiator that 
>>>> made the request a login_redirect, which I thought is basically a "logout 
>>>> and reconnect" pdu.  It would say, for example, "you're can't log into the 
>>>> group IP, however, you can log into this IP (a real nic) that it would 
>>>> prefer you be logged into."
>>>>
>>>> I'm thinking that the "failed login" is actually the result of that 
>>>> attempt to log into the group IP and it sending a login redirect pdu back 
>>>> to it.
>>>>
>>> If the target was load balancing us it would:
>>>
>>> - Send a async logout pdu.
>>> - We then send a logout pdu.
>>> - When we get the logout response pdu we kill the tcp ip connection
>>> - We then create a new tcp connection
>>> - We then log in to the portal that was passed into iscsiadm/iscsid (the 
>>> one in the DB that you see when you run iscsiadm -m node, which is 
>>> probably what you call the group IP). For this process we send a login 
>>> pdu. It then sends a login response pdu with the login redirect 
>>> response. In this response we also get the new IP to log into.
>>> - We see that response and kill the tcp connection, and create a new tcp 
>>> connection to the portal we are being redirected to.
>>> - We then log into the portal we were redirected to. We again do this by 
>>> sending a login pdu. This time the login response pdu should be ok and 
>>> we are done.
>> Oh yeah, I meant to also say that this is pretty much the same process 
>> that happens we do the first login, and if we have to relogin because of 
>> a connection problem like the nop/ping timeout. The only difference in 
>> those cases is that we do not get the async logout and we do not do a 
>> logout by sending a logout pdu. We start at the killing tcp ip 
>> connection step.
>>
>> So even if we are not getting load balanced we would be in the same 
>> place in the open-iscsi code when we are getting the login failed errors.
>>
>>
>> To get back on track solving why we get the nop timeouts then if we are 
>> not seeing load balancing messages or async logout messages,  it could 
>> be the open-iscsi bug I mentioned in the other mail. If you can send the 
>> open-iscsi and kernel info I asked for in the other mail, we can start 
>> down that path.
>>
>> --
>>
>> You received this message because you are subscribed to the Google Groups 
>> "open-iscsi" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/open-iscsi?hl=.
>>
>>
> 
> ===========================
> Joseph R. Hoot
> Lead System Programmer/Analyst
> (w) 716-878-4832
> (c) 716-759-HOOT
> [email protected]
> GPG KEY:   7145F633
> ===========================
> 
> --
> 
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/open-iscsi?hl=.
> 
> 

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.

Re: iscsi diagnosis help

Reply via email to