Re: iscsi diagnosis help

Guru Anbalagane Thu, 19 Nov 2009 11:40:15 -0800

Hi Mike,

Thanks.
Can you please try  --target=i686.


On the patches, yes, I will include it in our next VM kernel.
regards
Guru
Mike Christie wrote:
> ccing Guru Anbalagane.
>
> Hoot, Joseph wrote:
>> Sure.  Its the OVM 2.2 environment that I talked with you about a 
>> couple of days ago.  Here is the info:
>>
>
> Doh! I forgot.
>
> Guru, I am trying to make a patch for your Oracle VM kernel. I want to 
> port the patch I told you about
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4c48a82935f833d94fcf44c2b0c5d2922acfc77a;hp=d1acfae514425d680912907c6554852f1e258551
>  
>
> and this one:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1acfae514425d680912907c6554852f1e258551
>  
>
> if you do not have it (could not remember if you did or not).
>
> I found the src rpm:
> http://edelivery.oracle.com/EPD/Download/get_form?egroup_aru_number=11874896 
>
>
> The problem is that I keep getting RPM build errors when I try to 
> build the project:
>
> rpm -ivh kernel-2.6.18-128.2.1.4.9.el5.src.rpm
> rpmbuild -bp --target=noarch /usr/src/redhat/SPECS/kernel-2.6.spec
>
> ... a bunch of stuff then
>
> make[1]: *** [nonint_oldconfig] Error 15
> make: *** [nonint_oldconfig] Error 2
> error: Bad exit status from /var/tmp/rpm-tmp.94270 (%prep)
>
>
> RPM build errors:
>     Bad exit status from /var/tmp/rpm-tmp.94270 (%prep)
> (have also tried without the --target arg and with different archs and 
> always fails)
>
> This is how I setup the RHEL kernel source rpm. Are the commands for 
> the Oracle VM kernel different?
>
>
>
>> [r...@oim6102506 ~]# uname -r
>> 2.6.18-128.2.1.4.9.el5xen
>> [r...@oim6102506 ~]# rpm -qa | grep iscsi
>> iscsi-initiator-utils-6.2.0.871-0.7.el5
>>
>> I have a tcpdump that I had sent to Don WIlliams.  I'll pull that up 
>> shortly.
>>
>> On Nov 17, 2009, at 10:06 PM, Mike Christie wrote:
>>
>>> Mike Christie wrote:
>>>> Hoot, Joseph wrote:
>>>>> more INLINE below...
>>>>>
>>>>> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote:
>>>>>
>>>>>> Pasi Kärkkäinen wrote:
>>>>>>> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
>>>>>>>> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
>>>>>>>>
>>>>>>>>> thanks.  That helps.  So I know that with the EqualLogic 
>>>>>>>>> targets, there is a "Group IP" which, I believe, responds with 
>>>>>>>>> an iscsi login_redirect.
>>>>>>>>> 1) Could the "Login authentication failed" message be the 
>>>>>>>>> response because of a login redirect messages from the EQL 
>>>>>>>>> redirect?
>>>>>>>>>
>>>>>>>>> and then my next question is more for curiosity sake:
>>>>>>>>>
>>>>>>>>> 2) Are there plans in the future to have more than one 
>>>>>>>>> connection per session?  and I guess in addition to that, 
>>>>>>>>> would that mean multiple connections to a single volume over 
>>>>>>>>> the same nic?
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Also Mike, I'm seeing one or two of these every 30-40 minutes 
>>>>>>>> if I slam our EqualLogic with roughly 7-15k IOPS (reads and 
>>>>>>>> writes) non stop on 3 volumes.  In this type of scenario, would 
>>>>>>>> you expect to see timeouts like this once in awhile?  If so, do 
>>>>>>>> you think increasing my NOOP timeouts would assist so we don't 
>>>>>>>> get these?  maybe set it to 15 seconds instead of 10?
>>>>>>>>
>>>>>>> Equallogic does active loadbalancing (redirects) during operation..
>>>>>>> dunno about the errors though.
>>>>>>>
>>>>>> Oh yeah, forgot about that. Thanks Pasi!
>>>>>>
>>>>>> Joseph, look in the EQL target logs for something about the EQL 
>>>>>> box doing load balancing. I think normally we handle the load 
>>>>>> balancing more gracefully, but we might be messing up. I think if 
>>>>>> EQL was load balancing in the open-iscsi logs we would see 
>>>>>> something about getting a async iscsi pdu from the target that 
>>>>>> asks us to logout. Then when we relogin the target would redirect 
>>>>>> us to the optimal path.
>>>>> There are two things that the EQL does, I believe-- one thing is 
>>>>> async logout, the other is login_redirect.   Unfortunately, from 
>>>>> the EQL syslog side we don't see any errors related to this.  It's 
>>>>> my understanding, however, that when a login is initially 
>>>>> attempted to the EQL, it hits the "group ip" or an alias'd IP 
>>>>> sitting on a real nic.  The group IP looks at all the interfaces 
>>>>> on the EQL and decides, based on some algorithm, which EQL nic the 
>>>>> session should connect to.  It then sends the initiator that made 
>>>>> the request a login_redirect, which I thought is basically a 
>>>>> "logout and reconnect" pdu.  It would say, for example, "you're 
>>>>> can't log into the group IP, however, you can log into this IP (a 
>>>>> real nic) that it would prefer you be logged into."
>>>>>
>>>>> I'm thinking that the "failed login" is actually the result of 
>>>>> that attempt to log into the group IP and it sending a login 
>>>>> redirect pdu back to it.
>>>>>
>>>> If the target was load balancing us it would:
>>>>
>>>> - Send a async logout pdu.
>>>> - We then send a logout pdu.
>>>> - When we get the logout response pdu we kill the tcp ip connection
>>>> - We then create a new tcp connection
>>>> - We then log in to the portal that was passed into iscsiadm/iscsid 
>>>> (the one in the DB that you see when you run iscsiadm -m node, 
>>>> which is probably what you call the group IP). For this process we 
>>>> send a login pdu. It then sends a login response pdu with the login 
>>>> redirect response. In this response we also get the new IP to log 
>>>> into.
>>>> - We see that response and kill the tcp connection, and create a 
>>>> new tcp connection to the portal we are being redirected to.
>>>> - We then log into the portal we were redirected to. We again do 
>>>> this by sending a login pdu. This time the login response pdu 
>>>> should be ok and we are done.
>>> Oh yeah, I meant to also say that this is pretty much the same 
>>> process that happens we do the first login, and if we have to 
>>> relogin because of a connection problem like the nop/ping timeout. 
>>> The only difference in those cases is that we do not get the async 
>>> logout and we do not do a logout by sending a logout pdu. We start 
>>> at the killing tcp ip connection step.
>>>
>>> So even if we are not getting load balanced we would be in the same 
>>> place in the open-iscsi code when we are getting the login failed 
>>> errors.
>>>
>>>
>>> To get back on track solving why we get the nop timeouts then if we 
>>> are not seeing load balancing messages or async logout messages,  it 
>>> could be the open-iscsi bug I mentioned in the other mail. If you 
>>> can send the open-iscsi and kernel info I asked for in the other 
>>> mail, we can start down that path.
>>>
>>> -- 
>>>
>>> You received this message because you are subscribed to the Google 
>>> Groups "open-iscsi" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group at 
>>> http://groups.google.com/group/open-iscsi?hl=.
>>>
>>>
>>
>> ===========================
>> Joseph R. Hoot
>> Lead System Programmer/Analyst
>> (w) 716-878-4832
>> (c) 716-759-HOOT
>> [email protected]
>> GPG KEY:   7145F633
>> ===========================
>>
>> -- 
>>
>> You received this message because you are subscribed to the Google 
>> Groups "open-iscsi" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/open-iscsi?hl=.
>>
>>
>

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.

Re: iscsi diagnosis help

Reply via email to