Re: iscsi diagnosis help

2009-11-19 Thread Guru Anbalagane
Hi Mike,

Thanks.
Can you please try  --target=i686.

On the patches, yes, I will include it in our next VM kernel.
regards
Guru
Mike Christie wrote:
> ccing Guru Anbalagane.
>
> Hoot, Joseph wrote:
>> Sure.  Its the OVM 2.2 environment that I talked with you about a 
>> couple of days ago.  Here is the info:
>>
>
> Doh! I forgot.
>
> Guru, I am trying to make a patch for your Oracle VM kernel. I want to 
> port the patch I told you about
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4c48a82935f833d94fcf44c2b0c5d2922acfc77a;hp=d1acfae514425d680912907c6554852f1e258551
>  
>
> and this one:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1acfae514425d680912907c6554852f1e258551
>  
>
> if you do not have it (could not remember if you did or not).
>
> I found the src rpm:
> http://edelivery.oracle.com/EPD/Download/get_form?egroup_aru_number=11874896 
>
>
> The problem is that I keep getting RPM build errors when I try to 
> build the project:
>
> rpm -ivh kernel-2.6.18-128.2.1.4.9.el5.src.rpm
> rpmbuild -bp --target=noarch /usr/src/redhat/SPECS/kernel-2.6.spec
>
> ... a bunch of stuff then
>
> make[1]: *** [nonint_oldconfig] Error 15
> make: *** [nonint_oldconfig] Error 2
> error: Bad exit status from /var/tmp/rpm-tmp.94270 (%prep)
>
>
> RPM build errors:
> Bad exit status from /var/tmp/rpm-tmp.94270 (%prep)
> (have also tried without the --target arg and with different archs and 
> always fails)
>
> This is how I setup the RHEL kernel source rpm. Are the commands for 
> the Oracle VM kernel different?
>
>
>
>> [r...@oim6102506 ~]# uname -r
>> 2.6.18-128.2.1.4.9.el5xen
>> [r...@oim6102506 ~]# rpm -qa | grep iscsi
>> iscsi-initiator-utils-6.2.0.871-0.7.el5
>>
>> I have a tcpdump that I had sent to Don WIlliams.  I'll pull that up 
>> shortly.
>>
>> On Nov 17, 2009, at 10:06 PM, Mike Christie wrote:
>>
>>> Mike Christie wrote:
 Hoot, Joseph wrote:
> more INLINE below...
>
> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote:
>
>> Pasi Kärkkäinen wrote:
>>> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
 On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:

> thanks.  That helps.  So I know that with the EqualLogic 
> targets, there is a "Group IP" which, I believe, responds with 
> an iscsi login_redirect.
> 1) Could the "Login authentication failed" message be the 
> response because of a login redirect messages from the EQL 
> redirect?
>
> and then my next question is more for curiosity sake:
>
> 2) Are there plans in the future to have more than one 
> connection per session?  and I guess in addition to that, 
> would that mean multiple connections to a single volume over 
> the same nic?
>
>
 Also Mike, I'm seeing one or two of these every 30-40 minutes 
 if I slam our EqualLogic with roughly 7-15k IOPS (reads and 
 writes) non stop on 3 volumes.  In this type of scenario, would 
 you expect to see timeouts like this once in awhile?  If so, do 
 you think increasing my NOOP timeouts would assist so we don't 
 get these?  maybe set it to 15 seconds instead of 10?

>>> Equallogic does active loadbalancing (redirects) during operation..
>>> dunno about the errors though.
>>>
>> Oh yeah, forgot about that. Thanks Pasi!
>>
>> Joseph, look in the EQL target logs for something about the EQL 
>> box doing load balancing. I think normally we handle the load 
>> balancing more gracefully, but we might be messing up. I think if 
>> EQL was load balancing in the open-iscsi logs we would see 
>> something about getting a async iscsi pdu from the target that 
>> asks us to logout. Then when we relogin the target would redirect 
>> us to the optimal path.
> There are two things that the EQL does, I believe-- one thing is 
> async logout, the other is login_redirect.   Unfortunately, from 
> the EQL syslog side we don't see any errors related to this.  It's 
> my understanding, however, that when a login is initially 
> attempted to the EQL, it hits the "group ip" or an alias'd IP 
> sitting on a real nic.  The group IP looks at all the interfaces 
> on the EQL and decides, based on some algorithm, which EQL nic the 
> session should connect to.  It then sends the initiator that made 
> the request a login_redirect, which I thought is basically a 
> "logout and reconnect" pdu.  It would say, for example, "you're 
> can't log into the group IP, however, you can log into this IP (a 
> real nic) that it would prefer you be logged into."
>
> I'm thinking that the "failed login" is actually the result of 
> that attempt to log into the group IP and it sending a login 
> redirect pdu back to it.
>
 If the target 

Re: iscsi diagnosis help

2009-11-18 Thread Mike Christie
Guru Anbalagane wrote:
> Hi Mike,
> 
> Thanks.
> Can you please try  --target=i686.
> 

Thanks Guru.

> On the patches, yes, I will include it in our next VM kernel.


Joseph, attached is a patch that has a couple fixes from upstream in 
that code path. If it does not fix your issue then I will have to send a 
patch with some more debugging in it (the net layer is not waking us up 
when send space is opening or it really is taking a long time to send 
the data).

Patch is only compile tested.


> regards
> Guru
> Mike Christie wrote:
>> ccing Guru Anbalagane.
>>
>> Hoot, Joseph wrote:
>>> Sure.  Its the OVM 2.2 environment that I talked with you about a 
>>> couple of days ago.  Here is the info:
>>>
>>
>> Doh! I forgot.
>>
>> Guru, I am trying to make a patch for your Oracle VM kernel. I want to 
>> port the patch I told you about
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4c48a82935f833d94fcf44c2b0c5d2922acfc77a;hp=d1acfae514425d680912907c6554852f1e258551
>>  
>>
>> and this one:
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1acfae514425d680912907c6554852f1e258551
>>  
>>
>> if you do not have it (could not remember if you did or not).
>>
>> I found the src rpm:
>> http://edelivery.oracle.com/EPD/Download/get_form?egroup_aru_number=11874896 
>>
>>
>> The problem is that I keep getting RPM build errors when I try to 
>> build the project:
>>
>> rpm -ivh kernel-2.6.18-128.2.1.4.9.el5.src.rpm
>> rpmbuild -bp --target=noarch /usr/src/redhat/SPECS/kernel-2.6.spec
>>
>> ... a bunch of stuff then
>>
>> make[1]: *** [nonint_oldconfig] Error 15
>> make: *** [nonint_oldconfig] Error 2
>> error: Bad exit status from /var/tmp/rpm-tmp.94270 (%prep)
>>
>>
>> RPM build errors:
>> Bad exit status from /var/tmp/rpm-tmp.94270 (%prep)
>> (have also tried without the --target arg and with different archs and 
>> always fails)
>>
>> This is how I setup the RHEL kernel source rpm. Are the commands for 
>> the Oracle VM kernel different?
>>
>>
>>
>>> [r...@oim6102506 ~]# uname -r
>>> 2.6.18-128.2.1.4.9.el5xen
>>> [r...@oim6102506 ~]# rpm -qa | grep iscsi
>>> iscsi-initiator-utils-6.2.0.871-0.7.el5
>>>
>>> I have a tcpdump that I had sent to Don WIlliams.  I'll pull that up 
>>> shortly.
>>>
>>> On Nov 17, 2009, at 10:06 PM, Mike Christie wrote:
>>>
 Mike Christie wrote:
> Hoot, Joseph wrote:
>> more INLINE below...
>>
>> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote:
>>
>>> Pasi Kärkkäinen wrote:
 On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
>
>> thanks.  That helps.  So I know that with the EqualLogic 
>> targets, there is a "Group IP" which, I believe, responds with 
>> an iscsi login_redirect.
>> 1) Could the "Login authentication failed" message be the 
>> response because of a login redirect messages from the EQL 
>> redirect?
>>
>> and then my next question is more for curiosity sake:
>>
>> 2) Are there plans in the future to have more than one 
>> connection per session?  and I guess in addition to that, 
>> would that mean multiple connections to a single volume over 
>> the same nic?
>>
>>
> Also Mike, I'm seeing one or two of these every 30-40 minutes 
> if I slam our EqualLogic with roughly 7-15k IOPS (reads and 
> writes) non stop on 3 volumes.  In this type of scenario, would 
> you expect to see timeouts like this once in awhile?  If so, do 
> you think increasing my NOOP timeouts would assist so we don't 
> get these?  maybe set it to 15 seconds instead of 10?
>
 Equallogic does active loadbalancing (redirects) during operation..
 dunno about the errors though.

>>> Oh yeah, forgot about that. Thanks Pasi!
>>>
>>> Joseph, look in the EQL target logs for something about the EQL 
>>> box doing load balancing. I think normally we handle the load 
>>> balancing more gracefully, but we might be messing up. I think if 
>>> EQL was load balancing in the open-iscsi logs we would see 
>>> something about getting a async iscsi pdu from the target that 
>>> asks us to logout. Then when we relogin the target would redirect 
>>> us to the optimal path.
>> There are two things that the EQL does, I believe-- one thing is 
>> async logout, the other is login_redirect.   Unfortunately, from 
>> the EQL syslog side we don't see any errors related to this.  It's 
>> my understanding, however, that when a login is initially 
>> attempted to the EQL, it hits the "group ip" or an alias'd IP 
>> sitting on a real nic.  The group IP looks at all the interfaces 
>> on the EQL and decides, based on some algorithm, which EQL nic the 
>> session should connect to.  It then se

Re: iscsi diagnosis help

2009-11-18 Thread Mike Christie
ccing Guru Anbalagane.

Hoot, Joseph wrote:
> Sure.  Its the OVM 2.2 environment that I talked with you about a couple of 
> days ago.  Here is the info:
> 

Doh! I forgot.

Guru, I am trying to make a patch for your Oracle VM kernel. I want to 
port the patch I told you about
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4c48a82935f833d94fcf44c2b0c5d2922acfc77a;hp=d1acfae514425d680912907c6554852f1e258551
and this one:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1acfae514425d680912907c6554852f1e258551
if you do not have it (could not remember if you did or not).

I found the src rpm:
http://edelivery.oracle.com/EPD/Download/get_form?egroup_aru_number=11874896

The problem is that I keep getting RPM build errors when I try to build 
the project:

rpm -ivh kernel-2.6.18-128.2.1.4.9.el5.src.rpm
rpmbuild -bp --target=noarch /usr/src/redhat/SPECS/kernel-2.6.spec

... a bunch of stuff then

make[1]: *** [nonint_oldconfig] Error 15
make: *** [nonint_oldconfig] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.94270 (%prep)


RPM build errors:
 Bad exit status from /var/tmp/rpm-tmp.94270 (%prep)
(have also tried without the --target arg and with different archs and 
always fails)

This is how I setup the RHEL kernel source rpm. Are the commands for the 
Oracle VM kernel different?



> [r...@oim6102506 ~]# uname -r
> 2.6.18-128.2.1.4.9.el5xen
> [r...@oim6102506 ~]# rpm -qa | grep iscsi
> iscsi-initiator-utils-6.2.0.871-0.7.el5
> 
> I have a tcpdump that I had sent to Don WIlliams.  I'll pull that up shortly.
> 
> On Nov 17, 2009, at 10:06 PM, Mike Christie wrote:
> 
>> Mike Christie wrote:
>>> Hoot, Joseph wrote:
 more INLINE below...

 On Nov 17, 2009, at 7:27 PM, Mike Christie wrote:

> Pasi Kärkkäinen wrote:
>> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
>>> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
>>>
 thanks.  That helps.  So I know that with the EqualLogic targets, 
 there is a "Group IP" which, I believe, responds with an iscsi 
 login_redirect. 

 1) Could the "Login authentication failed" message be the response 
 because of a login redirect messages from the EQL redirect?

 and then my next question is more for curiosity sake:

 2) Are there plans in the future to have more than one connection per 
 session?  and I guess in addition to that, would that mean multiple 
 connections to a single volume over the same nic?


>>> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam 
>>> our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 
>>> volumes.  In this type of scenario, would you expect to see timeouts 
>>> like this once in awhile?  If so, do you think increasing my NOOP 
>>> timeouts would assist so we don't get these?  maybe set it to 15 
>>> seconds instead of 10?
>>>
>> Equallogic does active loadbalancing (redirects) during operation..
>> dunno about the errors though.
>>
> Oh yeah, forgot about that. Thanks Pasi!
>
> Joseph, look in the EQL target logs for something about the EQL box 
> doing load balancing. I think normally we handle the load balancing more 
> gracefully, but we might be messing up. I think if EQL was load 
> balancing in the open-iscsi logs we would see something about getting a 
> async iscsi pdu from the target that asks us to logout. Then when we 
> relogin the target would redirect us to the optimal path.
 There are two things that the EQL does, I believe-- one thing is async 
 logout, the other is login_redirect.   Unfortunately, from the EQL syslog 
 side we don't see any errors related to this.  It's my understanding, 
 however, that when a login is initially attempted to the EQL, it hits the 
 "group ip" or an alias'd IP sitting on a real nic.  The group IP looks at 
 all the interfaces on the EQL and decides, based on some algorithm, which 
 EQL nic the session should connect to.  It then sends the initiator that 
 made the request a login_redirect, which I thought is basically a "logout 
 and reconnect" pdu.  It would say, for example, "you're can't log into the 
 group IP, however, you can log into this IP (a real nic) that it would 
 prefer you be logged into."

 I'm thinking that the "failed login" is actually the result of that 
 attempt to log into the group IP and it sending a login redirect pdu back 
 to it.

>>> If the target was load balancing us it would:
>>>
>>> - Send a async logout pdu.
>>> - We then send a logout pdu.
>>> - When we get the logout response pdu we kill the tcp ip connection
>>> - We then create a new tcp connection
>>> - We then log in to the portal that was passed into iscsiadm/iscsid (the 
>>> one in the DB that you see wh

Re: iscsi diagnosis help

2009-11-18 Thread Hoot, Joseph
Sure.  Its the OVM 2.2 environment that I talked with you about a couple of 
days ago.  Here is the info:

[r...@oim6102506 ~]# uname -r
2.6.18-128.2.1.4.9.el5xen
[r...@oim6102506 ~]# rpm -qa | grep iscsi
iscsi-initiator-utils-6.2.0.871-0.7.el5

I have a tcpdump that I had sent to Don WIlliams.  I'll pull that up shortly.

On Nov 17, 2009, at 10:06 PM, Mike Christie wrote:

> Mike Christie wrote:
>> Hoot, Joseph wrote:
>>> more INLINE below...
>>> 
>>> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote:
>>> 
 Pasi Kärkkäinen wrote:
> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
>> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
>> 
>>> thanks.  That helps.  So I know that with the EqualLogic targets, there 
>>> is a "Group IP" which, I believe, responds with an iscsi 
>>> login_redirect. 
>>> 
>>> 1) Could the "Login authentication failed" message be the response 
>>> because of a login redirect messages from the EQL redirect?
>>> 
>>> and then my next question is more for curiosity sake:
>>> 
>>> 2) Are there plans in the future to have more than one connection per 
>>> session?  and I guess in addition to that, would that mean multiple 
>>> connections to a single volume over the same nic?
>>> 
>>> 
>> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam 
>> our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 
>> volumes.  In this type of scenario, would you expect to see timeouts 
>> like this once in awhile?  If so, do you think increasing my NOOP 
>> timeouts would assist so we don't get these?  maybe set it to 15 seconds 
>> instead of 10?
>> 
> Equallogic does active loadbalancing (redirects) during operation..
> dunno about the errors though.
> 
 Oh yeah, forgot about that. Thanks Pasi!
 
 Joseph, look in the EQL target logs for something about the EQL box 
 doing load balancing. I think normally we handle the load balancing more 
 gracefully, but we might be messing up. I think if EQL was load 
 balancing in the open-iscsi logs we would see something about getting a 
 async iscsi pdu from the target that asks us to logout. Then when we 
 relogin the target would redirect us to the optimal path.
>>> 
>>> There are two things that the EQL does, I believe-- one thing is async 
>>> logout, the other is login_redirect.   Unfortunately, from the EQL syslog 
>>> side we don't see any errors related to this.  It's my understanding, 
>>> however, that when a login is initially attempted to the EQL, it hits the 
>>> "group ip" or an alias'd IP sitting on a real nic.  The group IP looks at 
>>> all the interfaces on the EQL and decides, based on some algorithm, which 
>>> EQL nic the session should connect to.  It then sends the initiator that 
>>> made the request a login_redirect, which I thought is basically a "logout 
>>> and reconnect" pdu.  It would say, for example, "you're can't log into the 
>>> group IP, however, you can log into this IP (a real nic) that it would 
>>> prefer you be logged into."
>>> 
>>> I'm thinking that the "failed login" is actually the result of that attempt 
>>> to log into the group IP and it sending a login redirect pdu back to it.
>>> 
>> 
>> If the target was load balancing us it would:
>> 
>> - Send a async logout pdu.
>> - We then send a logout pdu.
>> - When we get the logout response pdu we kill the tcp ip connection
>> - We then create a new tcp connection
>> - We then log in to the portal that was passed into iscsiadm/iscsid (the 
>> one in the DB that you see when you run iscsiadm -m node, which is 
>> probably what you call the group IP). For this process we send a login 
>> pdu. It then sends a login response pdu with the login redirect 
>> response. In this response we also get the new IP to log into.
>> - We see that response and kill the tcp connection, and create a new tcp 
>> connection to the portal we are being redirected to.
>> - We then log into the portal we were redirected to. We again do this by 
>> sending a login pdu. This time the login response pdu should be ok and 
>> we are done.
> 
> Oh yeah, I meant to also say that this is pretty much the same process 
> that happens we do the first login, and if we have to relogin because of 
> a connection problem like the nop/ping timeout. The only difference in 
> those cases is that we do not get the async logout and we do not do a 
> logout by sending a logout pdu. We start at the killing tcp ip 
> connection step.
> 
> So even if we are not getting load balanced we would be in the same 
> place in the open-iscsi code when we are getting the login failed errors.
> 
> 
> To get back on track solving why we get the nop timeouts then if we are 
> not seeing load balancing messages or async logout messages,  it could 
> be the open-iscsi bug I mentioned in the other mail. If you can send the 
> open-iscsi and ke

Re: iscsi diagnosis help

2009-11-17 Thread Mike Christie
Mike Christie wrote:
> Hoot, Joseph wrote:
>> more INLINE below...
>>
>> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote:
>>
>>> Pasi Kärkkäinen wrote:
 On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
>
>> thanks.  That helps.  So I know that with the EqualLogic targets, there 
>> is a "Group IP" which, I believe, responds with an iscsi login_redirect. 
>>
>> 1) Could the "Login authentication failed" message be the response 
>> because of a login redirect messages from the EQL redirect?
>>
>> and then my next question is more for curiosity sake:
>>
>> 2) Are there plans in the future to have more than one connection per 
>> session?  and I guess in addition to that, would that mean multiple 
>> connections to a single volume over the same nic?
>>
>>
> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam 
> our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 
> volumes.  In this type of scenario, would you expect to see timeouts like 
> this once in awhile?  If so, do you think increasing my NOOP timeouts 
> would assist so we don't get these?  maybe set it to 15 seconds instead 
> of 10?
>
 Equallogic does active loadbalancing (redirects) during operation..
 dunno about the errors though.

>>> Oh yeah, forgot about that. Thanks Pasi!
>>>
>>> Joseph, look in the EQL target logs for something about the EQL box 
>>> doing load balancing. I think normally we handle the load balancing more 
>>> gracefully, but we might be messing up. I think if EQL was load 
>>> balancing in the open-iscsi logs we would see something about getting a 
>>> async iscsi pdu from the target that asks us to logout. Then when we 
>>> relogin the target would redirect us to the optimal path.
>>
>> There are two things that the EQL does, I believe-- one thing is async 
>> logout, the other is login_redirect.   Unfortunately, from the EQL syslog 
>> side we don't see any errors related to this.  It's my understanding, 
>> however, that when a login is initially attempted to the EQL, it hits the 
>> "group ip" or an alias'd IP sitting on a real nic.  The group IP looks at 
>> all the interfaces on the EQL and decides, based on some algorithm, which 
>> EQL nic the session should connect to.  It then sends the initiator that 
>> made the request a login_redirect, which I thought is basically a "logout 
>> and reconnect" pdu.  It would say, for example, "you're can't log into the 
>> group IP, however, you can log into this IP (a real nic) that it would 
>> prefer you be logged into."
>>
>> I'm thinking that the "failed login" is actually the result of that attempt 
>> to log into the group IP and it sending a login redirect pdu back to it.
>>
> 
> If the target was load balancing us it would:
> 
> - Send a async logout pdu.
> - We then send a logout pdu.
> - When we get the logout response pdu we kill the tcp ip connection
> - We then create a new tcp connection
> - We then log in to the portal that was passed into iscsiadm/iscsid (the 
> one in the DB that you see when you run iscsiadm -m node, which is 
> probably what you call the group IP). For this process we send a login 
> pdu. It then sends a login response pdu with the login redirect 
> response. In this response we also get the new IP to log into.
> - We see that response and kill the tcp connection, and create a new tcp 
> connection to the portal we are being redirected to.
> - We then log into the portal we were redirected to. We again do this by 
> sending a login pdu. This time the login response pdu should be ok and 
> we are done.

Oh yeah, I meant to also say that this is pretty much the same process 
that happens we do the first login, and if we have to relogin because of 
a connection problem like the nop/ping timeout. The only difference in 
those cases is that we do not get the async logout and we do not do a 
logout by sending a logout pdu. We start at the killing tcp ip 
connection step.

So even if we are not getting load balanced we would be in the same 
place in the open-iscsi code when we are getting the login failed errors.


To get back on track solving why we get the nop timeouts then if we are 
not seeing load balancing messages or async logout messages,  it could 
be the open-iscsi bug I mentioned in the other mail. If you can send the 
open-iscsi and kernel info I asked for in the other mail, we can start 
down that path.

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.




Re: iscsi diagnosis help

2009-11-17 Thread Mike Christie
Hoot, Joseph wrote:
> more INLINE below...
> 
> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote:
> 
>> Pasi Kärkkäinen wrote:
>>> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
 On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:

> thanks.  That helps.  So I know that with the EqualLogic targets, there 
> is a "Group IP" which, I believe, responds with an iscsi login_redirect. 
>
> 1) Could the "Login authentication failed" message be the response 
> because of a login redirect messages from the EQL redirect?
>
> and then my next question is more for curiosity sake:
>
> 2) Are there plans in the future to have more than one connection per 
> session?  and I guess in addition to that, would that mean multiple 
> connections to a single volume over the same nic?
>
>
 Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam 
 our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 
 volumes.  In this type of scenario, would you expect to see timeouts like 
 this once in awhile?  If so, do you think increasing my NOOP timeouts 
 would assist so we don't get these?  maybe set it to 15 seconds instead of 
 10?

>>> Equallogic does active loadbalancing (redirects) during operation..
>>> dunno about the errors though.
>>>
>> Oh yeah, forgot about that. Thanks Pasi!
>>
>> Joseph, look in the EQL target logs for something about the EQL box 
>> doing load balancing. I think normally we handle the load balancing more 
>> gracefully, but we might be messing up. I think if EQL was load 
>> balancing in the open-iscsi logs we would see something about getting a 
>> async iscsi pdu from the target that asks us to logout. Then when we 
>> relogin the target would redirect us to the optimal path.
> 
> 
> There are two things that the EQL does, I believe-- one thing is async 
> logout, the other is login_redirect.   Unfortunately, from the EQL syslog 
> side we don't see any errors related to this.  It's my understanding, 
> however, that when a login is initially attempted to the EQL, it hits the 
> "group ip" or an alias'd IP sitting on a real nic.  The group IP looks at all 
> the interfaces on the EQL and decides, based on some algorithm, which EQL nic 
> the session should connect to.  It then sends the initiator that made the 
> request a login_redirect, which I thought is basically a "logout and 
> reconnect" pdu.  It would say, for example, "you're can't log into the group 
> IP, however, you can log into this IP (a real nic) that it would prefer you 
> be logged into."
> 
> I'm thinking that the "failed login" is actually the result of that attempt 
> to log into the group IP and it sending a login redirect pdu back to it.
> 

If the target was load balancing us it would:

- Send a async logout pdu.
- We then send a logout pdu.
- When we get the logout response pdu we kill the tcp ip connection
- We then create a new tcp connection
- We then log in to the portal that was passed into iscsiadm/iscsid (the 
one in the DB that you see when you run iscsiadm -m node, which is 
probably what you call the group IP). For this process we send a login 
pdu. It then sends a login response pdu with the login redirect 
response. In this response we also get the new IP to log into.
- We see that response and kill the tcp connection, and create a new tcp 
connection to the portal we are being redirected to.
- We then log into the portal we were redirected to. We again do this by 
sending a login pdu. This time the login response pdu should be ok and 
we are done.


We do not know which login pdu failed, right now. You would need a 
wireshark trace or iscsid debugging. iscsid could hit the "Login 
authentication failed" path for either of the login pdus sent.

You should not see that message normally even when we are being 
redirected. We do the login redirect login when we initially log into 
the target (like when you do iscsiadm -m node -l or service iscsi 
start), and if you look in your logs you should not see a login failed 
message for that. If you do it might be a clue.

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.




Re: iscsi diagnosis help

2009-11-17 Thread Hoot, Joseph
more INLINE below...

On Nov 17, 2009, at 7:27 PM, Mike Christie wrote:

> Pasi Kärkkäinen wrote:
>> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
>>> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
>>> 
 thanks.  That helps.  So I know that with the EqualLogic targets, there is 
 a "Group IP" which, I believe, responds with an iscsi login_redirect. 
 
 1) Could the "Login authentication failed" message be the response because 
 of a login redirect messages from the EQL redirect?
 
 and then my next question is more for curiosity sake:
 
 2) Are there plans in the future to have more than one connection per 
 session?  and I guess in addition to that, would that mean multiple 
 connections to a single volume over the same nic?
 
 
>>> 
>>> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our 
>>> EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 
>>> volumes.  In this type of scenario, would you expect to see timeouts like 
>>> this once in awhile?  If so, do you think increasing my NOOP timeouts would 
>>> assist so we don't get these?  maybe set it to 15 seconds instead of 10?
>>> 
>> 
>> Equallogic does active loadbalancing (redirects) during operation..
>> dunno about the errors though.
>> 
> 
> Oh yeah, forgot about that. Thanks Pasi!
> 
> Joseph, look in the EQL target logs for something about the EQL box 
> doing load balancing. I think normally we handle the load balancing more 
> gracefully, but we might be messing up. I think if EQL was load 
> balancing in the open-iscsi logs we would see something about getting a 
> async iscsi pdu from the target that asks us to logout. Then when we 
> relogin the target would redirect us to the optimal path.


There are two things that the EQL does, I believe-- one thing is async logout, 
the other is login_redirect.   Unfortunately, from the EQL syslog side we don't 
see any errors related to this.  It's my understanding, however, that when a 
login is initially attempted to the EQL, it hits the "group ip" or an alias'd 
IP sitting on a real nic.  The group IP looks at all the interfaces on the EQL 
and decides, based on some algorithm, which EQL nic the session should connect 
to.  It then sends the initiator that made the request a login_redirect, which 
I thought is basically a "logout and reconnect" pdu.  It would say, for 
example, "you're can't log into the group IP, however, you can log into this IP 
(a real nic) that it would prefer you be logged into."

I'm thinking that the "failed login" is actually the result of that attempt to 
log into the group IP and it sending a login redirect pdu back to it.

Don, does this seem like normal EQL traffic to an OiS initiator?

> 
> --
> 
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To post to this group, send email to open-is...@googlegroups.com.
> To unsubscribe from this group, send email to 
> open-iscsi+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/open-iscsi?hl=.
> 
> 

===
Joseph R. Hoot
Lead System Programmer/Analyst
(w) 716-878-4832
(c) 716-759-HOOT
joe.h...@itec.suny.edu
GPG KEY:   7145F633
===

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.




Re: iscsi diagnosis help

2009-11-17 Thread Mike Christie
Pasi Kärkkäinen wrote:
> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
>> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
>>
>>> thanks.  That helps.  So I know that with the EqualLogic targets, there is 
>>> a "Group IP" which, I believe, responds with an iscsi login_redirect. 
>>>
>>> 1) Could the "Login authentication failed" message be the response because 
>>> of a login redirect messages from the EQL redirect?
>>>
>>> and then my next question is more for curiosity sake:
>>>
>>> 2) Are there plans in the future to have more than one connection per 
>>> session?  and I guess in addition to that, would that mean multiple 
>>> connections to a single volume over the same nic?
>>>
>>>
>>
>> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our 
>> EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. 
>>  In this type of scenario, would you expect to see timeouts like this once 
>> in awhile?  If so, do you think increasing my NOOP timeouts would assist so 
>> we don't get these?  maybe set it to 15 seconds instead of 10?
>>
> 
> Equallogic does active loadbalancing (redirects) during operation..
> dunno about the errors though.
> 

Oh yeah, forgot about that. Thanks Pasi!

Joseph, look in the EQL target logs for something about the EQL box 
doing load balancing. I think normally we handle the load balancing more 
gracefully, but we might be messing up. I think if EQL was load 
balancing in the open-iscsi logs we would see something about getting a 
async iscsi pdu from the target that asks us to logout. Then when we 
relogin the target would redirect us to the optimal path.

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.




Re: iscsi diagnosis help

2009-11-17 Thread Mike Christie
Hoot, Joseph wrote:
> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
> 
>> thanks.  That helps.  So I know that with the EqualLogic targets, there is a 
>> "Group IP" which, I believe, responds with an iscsi login_redirect. 
>>
>> 1) Could the "Login authentication failed" message be the response because 
>> of a login redirect messages from the EQL redirect?
>>
>> and then my next question is more for curiosity sake:
>>
>> 2) Are there plans in the future to have more than one connection per 
>> session?  and I guess in addition to that, would that mean multiple 
>> connections to a single volume over the same nic?
>>
>>
> 
> 
> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our 
> EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes.  
> In this type of scenario, would you expect to see timeouts like this once in 
> awhile?  If so, do you think increasing my NOOP timeouts would assist so we 
> don't get these?  maybe set it to 15 seconds instead of 10?
> 

It might be a bug.

What version of open-iscsi are you using? What kernel? Is it a distro or 
kernel.org one? And are you using the open-iscsi kernel modules that 
come with a open-iscsi.org tarball or the kernel modules that come with 
your kernel?

--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.




Re: iscsi diagnosis help

2009-11-17 Thread Mike Christie
Hoot, Joseph wrote:
> thanks.  That helps.  So I know that with the EqualLogic targets, there is a 
> "Group IP" which, I believe, responds with an iscsi login_redirect. 
> 
> 1) Could the "Login authentication failed" message be the response because of 
> a login redirect messages from the EQL redirect?
> 

It could be, but then when we retry we end up handling the redirect ok. 
I do not know why the first redirect would fail.

Take a wireshark trace or run iscsid by hand

iscsid -d 8

send the log output.

That will let us know where we failed, but it would not tell us why. 
Normally EQL targets would leave something in their logs about why.


> and then my next question is more for curiosity sake:
> 
> 2) Are there plans in the future to have more than one connection per 
> session?  and I guess in addition to that, would that mean multiple 
> connections to a single volume over the same nic?
> 

No plans for MC/s. You can do multiple sessions to the same volume 
though. You can have multiple sessions over the same nic or over 
difference nics or some combo.

> 
> 
> On Nov 16, 2009, at 7:18 PM, Mike Christie wrote:
> 
>> Hoot, Joseph wrote:
>>> Hi all,
>>>
>>> I'm trying to understand what I'm seeing in my /var/log/messages.  Here's 
>>> what I have:
>>>
>>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10 secs 
>>> expired, last rx 191838122, last ping 191839372, now 191841872
>>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error 
>>> (1011)
>>> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 
>>> error (1011) state (3)
>>> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target 
>>> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03
>>> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after 
>>> recovery (1 attempts)
>>>
>>> the first line, what is "connection5:0"?  is that referenced from iscsiadm 
>>> somewhere? I only ask because I'm seeing iscsid messages and kernel 
>>> messages.  I also have dm-multipath running, which usually shows up as 
>>> dm-multipath or something like that.  I understand that iscsid is the 
>>> process that is logging in and out.  But is the "kernel:" message just an 
>>> iscsi modules that is loaded into the kernel, which is why it is being 
>>> logged as "kernel:"?
>>>
>> It is the session id and connection id.
>>
>> connection$SESSION_ID:$CONNECTION_ID
>>
>> If you run iscsiadm -m session -P 1 or -P 3
>>
>> You will see
>>
>> #iscsiadm -m session -P 1
>> Target: iqn.1992-08.com.netapp:sn.33615311
>> Current Portal: 10.15.85.19:3260,3
>> Persistent Portal: 10.15.85.19:3260,3
>> Iface Transport: tcp
>> Iface IPaddress: 10.11.14.37
>> Iface HWaddress: default
>> Iface Netdev: default
>> SID: 7
>> iSCSI Connection State: LOGGED IN
>> Internal iscsid Session State: NO CHANGE
>>
>>
>> Session number is the SID value.
>>
>> If you run
>> iscsiadm -m session
>> tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311
>>
>> the session number/SID is the value in brackets.
>>
>>
>> If you run iscsiadm in session mode (iscsiadm -m session) then you can 
>> use the -R argument and pass in a SID to do an opertaion like
>>
>> iscsiadm -m session -R 2 --rescan
>>
>> would rescan that session.
>>
>> Connection number is currently always zero.
>>
>>
>> For the second question, iscsid handles login and logout, and error 
>> handling, and the kernel basically passes iscsi packets around.
>>
>>
>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10 
>> secs expired, last rx 191838122, last ping 191839372, now 191841872
>>
>>
>> so here the iscsi kernel code sends a iscsi ping/nop every noop_interval 
>> seconds, and if we do not get a response withing noop_timeout seconds it 
>> will fire off a connection error.
>>
>>
>>
>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error 
>> (1011)
>>
>>
>> Here is the kernel code notifying userspace of the problem.
>>
>>
>> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 
>> error (1011) state (3)
>>
>>
>> And there iscsid is accepting the error (probably no need for the error 
>> to be logged twice).
>>
>>
>> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target
>>
>>
>> And then here iscsid handled the error by killing the tcp/ip connection, 
>> reconnection the tcp/ip connection, and then re-logging into the iscsi 
>> target. But for some reason we could not log back in right away.
>>
>>
>> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03
>> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after 
>> recovery (1 attempts)
>>
>> But it looks like we tried again and we got back in.
>>
>>
>>
>>> Maybe one of these modules?
>>> iscsi_tcp  19785  46 
>>> libiscsi_tcp  

Re: iscsi diagnosis help

2009-11-17 Thread Yangkook Kim
> 2) Are there plans in the future to have more than one connection per session?

I dont't think so. If you want to know the reason, read the thread
titled "MC/S support
" in open-iscsi mailing list.

Kim

2009/11/17, Pasi Kärkkäinen :
> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
>>
>> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
>>
>> > thanks.  That helps.  So I know that with the EqualLogic targets, there
>> > is a "Group IP" which, I believe, responds with an iscsi login_redirect.
>> >
>> >
>> > 1) Could the "Login authentication failed" message be the response
>> > because of a login redirect messages from the EQL redirect?
>> >
>> > and then my next question is more for curiosity sake:
>> >
>> > 2) Are there plans in the future to have more than one connection per
>> > session?  and I guess in addition to that, would that mean multiple
>> > connections to a single volume over the same nic?
>> >
>> >
>>
>>
>> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam
>> our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3
>> volumes.  In this type of scenario, would you expect to see timeouts like
>> this once in awhile?  If so, do you think increasing my NOOP timeouts
>> would assist so we don't get these?  maybe set it to 15 seconds instead of
>> 10?
>>
>
> Equallogic does active loadbalancing (redirects) during operation..
> dunno about the errors though.
>
> -- Pasi
>
>>
>> >
>> > On Nov 16, 2009, at 7:18 PM, Mike Christie wrote:
>> >
>> >> Hoot, Joseph wrote:
>> >>> Hi all,
>> >>>
>> >>> I'm trying to understand what I'm seeing in my /var/log/messages.
>> >>> Here's what I have:
>> >>>
>> >>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10
>> >>> secs expired, last rx 191838122, last ping 191839372, now 191841872
>> >>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error
>> >>> (1011)
>> >>> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection
>> >>> 5:0 error (1011) state (3)
>> >>> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with
>> >>> target
>> >>> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03
>> >>> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after
>> >>> recovery (1 attempts)
>> >>>
>> >>> the first line, what is "connection5:0"?  is that referenced from
>> >>> iscsiadm somewhere? I only ask because I'm seeing iscsid messages and
>> >>> kernel messages.  I also have dm-multipath running, which usually
>> >>> shows up as dm-multipath or something like that.  I understand that
>> >>> iscsid is the process that is logging in and out.  But is the
>> >>> "kernel:" message just an iscsi modules that is loaded into the
>> >>> kernel, which is why it is being logged as "kernel:"?
>> >>>
>> >>
>> >> It is the session id and connection id.
>> >>
>> >> connection$SESSION_ID:$CONNECTION_ID
>> >>
>> >> If you run iscsiadm -m session -P 1 or -P 3
>> >>
>> >> You will see
>> >>
>> >> #iscsiadm -m session -P 1
>> >> Target: iqn.1992-08.com.netapp:sn.33615311
>> >>Current Portal: 10.15.85.19:3260,3
>> >>Persistent Portal: 10.15.85.19:3260,3
>> >>Iface Transport: tcp
>> >>Iface IPaddress: 10.11.14.37
>> >>Iface HWaddress: default
>> >>Iface Netdev: default
>> >>SID: 7
>> >>iSCSI Connection State: LOGGED IN
>> >>Internal iscsid Session State: NO CHANGE
>> >>
>> >>
>> >> Session number is the SID value.
>> >>
>> >> If you run
>> >> iscsiadm -m session
>> >> tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311
>> >>
>> >> the session number/SID is the value in brackets.
>> >>
>> >>
>> >> If you run iscsiadm in session mode (iscsiadm -m session) then you can
>> >> use the -R argument and pass in a SID to do an opertaion like
>> >>
>> >> iscsiadm -m session -R 2 --rescan
>> >>
>> >> would rescan that session.
>> >>
>> >> Connection number is currently always zero.
>> >>
>> >>
>> >> For the second question, iscsid handles login and logout, and error
>> >> handling, and the kernel basically passes iscsi packets around.
>> >>
>> >>
>> >> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10
>> >> secs expired, last rx 191838122, last ping 191839372, now 191841872
>> >>
>> >>
>> >> so here the iscsi kernel code sends a iscsi ping/nop every
>> >> noop_interval
>> >> seconds, and if we do not get a response withing noop_timeout seconds
>> >> it
>> >> will fire off a connection error.
>> >>
>> >>
>> >>
>> >> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error
>> >> (1011)
>> >>
>> >>
>> >> Here is the kernel code notifying userspace of the problem.
>> >>
>> >>
>> >> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0
>> >>
>> >> error (1011) state (3)
>> >>
>> >>
>> >> And there iscsid is accepting the error (probably no need for the error
>> >>
>> >> to b

Re: iscsi diagnosis help

2009-11-17 Thread Pasi Kärkkäinen
On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote:
> 
> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:
> 
> > thanks.  That helps.  So I know that with the EqualLogic targets, there is 
> > a "Group IP" which, I believe, responds with an iscsi login_redirect. 
> > 
> > 1) Could the "Login authentication failed" message be the response because 
> > of a login redirect messages from the EQL redirect?
> > 
> > and then my next question is more for curiosity sake:
> > 
> > 2) Are there plans in the future to have more than one connection per 
> > session?  and I guess in addition to that, would that mean multiple 
> > connections to a single volume over the same nic?
> > 
> > 
> 
> 
> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our 
> EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes.  
> In this type of scenario, would you expect to see timeouts like this once in 
> awhile?  If so, do you think increasing my NOOP timeouts would assist so we 
> don't get these?  maybe set it to 15 seconds instead of 10?
> 

Equallogic does active loadbalancing (redirects) during operation..
dunno about the errors though.

-- Pasi

> 
> > 
> > On Nov 16, 2009, at 7:18 PM, Mike Christie wrote:
> > 
> >> Hoot, Joseph wrote:
> >>> Hi all,
> >>> 
> >>> I'm trying to understand what I'm seeing in my /var/log/messages.  Here's 
> >>> what I have:
> >>> 
> >>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10 
> >>> secs expired, last rx 191838122, last ping 191839372, now 191841872
> >>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error 
> >>> (1011)
> >>> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 
> >>> error (1011) state (3)
> >>> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with 
> >>> target 
> >>> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03
> >>> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after 
> >>> recovery (1 attempts)
> >>> 
> >>> the first line, what is "connection5:0"?  is that referenced from 
> >>> iscsiadm somewhere? I only ask because I'm seeing iscsid messages and 
> >>> kernel messages.  I also have dm-multipath running, which usually shows 
> >>> up as dm-multipath or something like that.  I understand that iscsid is 
> >>> the process that is logging in and out.  But is the "kernel:" message 
> >>> just an iscsi modules that is loaded into the kernel, which is why it is 
> >>> being logged as "kernel:"?
> >>> 
> >> 
> >> It is the session id and connection id.
> >> 
> >> connection$SESSION_ID:$CONNECTION_ID
> >> 
> >> If you run iscsiadm -m session -P 1 or -P 3
> >> 
> >> You will see
> >> 
> >> #iscsiadm -m session -P 1
> >> Target: iqn.1992-08.com.netapp:sn.33615311
> >>Current Portal: 10.15.85.19:3260,3
> >>Persistent Portal: 10.15.85.19:3260,3
> >>Iface Transport: tcp
> >>Iface IPaddress: 10.11.14.37
> >>Iface HWaddress: default
> >>Iface Netdev: default
> >>SID: 7
> >>iSCSI Connection State: LOGGED IN
> >>Internal iscsid Session State: NO CHANGE
> >> 
> >> 
> >> Session number is the SID value.
> >> 
> >> If you run
> >> iscsiadm -m session
> >> tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311
> >> 
> >> the session number/SID is the value in brackets.
> >> 
> >> 
> >> If you run iscsiadm in session mode (iscsiadm -m session) then you can 
> >> use the -R argument and pass in a SID to do an opertaion like
> >> 
> >> iscsiadm -m session -R 2 --rescan
> >> 
> >> would rescan that session.
> >> 
> >> Connection number is currently always zero.
> >> 
> >> 
> >> For the second question, iscsid handles login and logout, and error 
> >> handling, and the kernel basically passes iscsi packets around.
> >> 
> >> 
> >> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10 
> >> secs expired, last rx 191838122, last ping 191839372, now 191841872
> >> 
> >> 
> >> so here the iscsi kernel code sends a iscsi ping/nop every noop_interval 
> >> seconds, and if we do not get a response withing noop_timeout seconds it 
> >> will fire off a connection error.
> >> 
> >> 
> >> 
> >> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error 
> >> (1011)
> >> 
> >> 
> >> Here is the kernel code notifying userspace of the problem.
> >> 
> >> 
> >> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 
> >> error (1011) state (3)
> >> 
> >> 
> >> And there iscsid is accepting the error (probably no need for the error 
> >> to be logged twice).
> >> 
> >> 
> >> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target
> >> 
> >> 
> >> And then here iscsid handled the error by killing the tcp/ip connection, 
> >> reconnection the tcp/ip connection, and then re-logging into the iscsi 
> >> target. But for some reason we could not l

Re: iscsi diagnosis help

2009-11-16 Thread Hoot, Joseph

On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote:

> thanks.  That helps.  So I know that with the EqualLogic targets, there is a 
> "Group IP" which, I believe, responds with an iscsi login_redirect. 
> 
> 1) Could the "Login authentication failed" message be the response because of 
> a login redirect messages from the EQL redirect?
> 
> and then my next question is more for curiosity sake:
> 
> 2) Are there plans in the future to have more than one connection per 
> session?  and I guess in addition to that, would that mean multiple 
> connections to a single volume over the same nic?
> 
> 


Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our 
EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes.  
In this type of scenario, would you expect to see timeouts like this once in 
awhile?  If so, do you think increasing my NOOP timeouts would assist so we 
don't get these?  maybe set it to 15 seconds instead of 10?


> 
> On Nov 16, 2009, at 7:18 PM, Mike Christie wrote:
> 
>> Hoot, Joseph wrote:
>>> Hi all,
>>> 
>>> I'm trying to understand what I'm seeing in my /var/log/messages.  Here's 
>>> what I have:
>>> 
>>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10 secs 
>>> expired, last rx 191838122, last ping 191839372, now 191841872
>>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error 
>>> (1011)
>>> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 
>>> error (1011) state (3)
>>> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target 
>>> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03
>>> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after 
>>> recovery (1 attempts)
>>> 
>>> the first line, what is "connection5:0"?  is that referenced from iscsiadm 
>>> somewhere? I only ask because I'm seeing iscsid messages and kernel 
>>> messages.  I also have dm-multipath running, which usually shows up as 
>>> dm-multipath or something like that.  I understand that iscsid is the 
>>> process that is logging in and out.  But is the "kernel:" message just an 
>>> iscsi modules that is loaded into the kernel, which is why it is being 
>>> logged as "kernel:"?
>>> 
>> 
>> It is the session id and connection id.
>> 
>> connection$SESSION_ID:$CONNECTION_ID
>> 
>> If you run iscsiadm -m session -P 1 or -P 3
>> 
>> You will see
>> 
>> #iscsiadm -m session -P 1
>> Target: iqn.1992-08.com.netapp:sn.33615311
>>Current Portal: 10.15.85.19:3260,3
>>Persistent Portal: 10.15.85.19:3260,3
>>Iface Transport: tcp
>>Iface IPaddress: 10.11.14.37
>>Iface HWaddress: default
>>Iface Netdev: default
>>SID: 7
>>iSCSI Connection State: LOGGED IN
>>Internal iscsid Session State: NO CHANGE
>> 
>> 
>> Session number is the SID value.
>> 
>> If you run
>> iscsiadm -m session
>> tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311
>> 
>> the session number/SID is the value in brackets.
>> 
>> 
>> If you run iscsiadm in session mode (iscsiadm -m session) then you can 
>> use the -R argument and pass in a SID to do an opertaion like
>> 
>> iscsiadm -m session -R 2 --rescan
>> 
>> would rescan that session.
>> 
>> Connection number is currently always zero.
>> 
>> 
>> For the second question, iscsid handles login and logout, and error 
>> handling, and the kernel basically passes iscsi packets around.
>> 
>> 
>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10 
>> secs expired, last rx 191838122, last ping 191839372, now 191841872
>> 
>> 
>> so here the iscsi kernel code sends a iscsi ping/nop every noop_interval 
>> seconds, and if we do not get a response withing noop_timeout seconds it 
>> will fire off a connection error.
>> 
>> 
>> 
>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error 
>> (1011)
>> 
>> 
>> Here is the kernel code notifying userspace of the problem.
>> 
>> 
>> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 
>> error (1011) state (3)
>> 
>> 
>> And there iscsid is accepting the error (probably no need for the error 
>> to be logged twice).
>> 
>> 
>> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target
>> 
>> 
>> And then here iscsid handled the error by killing the tcp/ip connection, 
>> reconnection the tcp/ip connection, and then re-logging into the iscsi 
>> target. But for some reason we could not log back in right away.
>> 
>> 
>> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03
>> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after 
>> recovery (1 attempts)
>> 
>> But it looks like we tried again and we got back in.
>> 
>> 
>> 
>>> Maybe one of these modules?
>>> iscsi_tcp  19785  46 
>>> libiscsi_tcp   21829  1 iscsi_tcp
>>> libiscsi2  41285  3 ib_is

Re: iscsi diagnosis help

2009-11-16 Thread Hoot, Joseph
thanks.  That helps.  So I know that with the EqualLogic targets, there is a 
"Group IP" which, I believe, responds with an iscsi login_redirect. 

1) Could the "Login authentication failed" message be the response because of a 
login redirect messages from the EQL redirect?

and then my next question is more for curiosity sake:

2) Are there plans in the future to have more than one connection per session?  
and I guess in addition to that, would that mean multiple connections to a 
single volume over the same nic?



On Nov 16, 2009, at 7:18 PM, Mike Christie wrote:

> Hoot, Joseph wrote:
>> Hi all,
>> 
>> I'm trying to understand what I'm seeing in my /var/log/messages.  Here's 
>> what I have:
>> 
>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10 secs 
>> expired, last rx 191838122, last ping 191839372, now 191841872
>> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error (1011)
>> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 
>> error (1011) state (3)
>> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target 
>> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03
>> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after 
>> recovery (1 attempts)
>> 
>> the first line, what is "connection5:0"?  is that referenced from iscsiadm 
>> somewhere? I only ask because I'm seeing iscsid messages and kernel 
>> messages.  I also have dm-multipath running, which usually shows up as 
>> dm-multipath or something like that.  I understand that iscsid is the 
>> process that is logging in and out.  But is the "kernel:" message just an 
>> iscsi modules that is loaded into the kernel, which is why it is being 
>> logged as "kernel:"?
>> 
> 
> It is the session id and connection id.
> 
> connection$SESSION_ID:$CONNECTION_ID
> 
> If you run iscsiadm -m session -P 1 or -P 3
> 
> You will see
> 
> #iscsiadm -m session -P 1
> Target: iqn.1992-08.com.netapp:sn.33615311
> Current Portal: 10.15.85.19:3260,3
> Persistent Portal: 10.15.85.19:3260,3
> Iface Transport: tcp
> Iface IPaddress: 10.11.14.37
> Iface HWaddress: default
> Iface Netdev: default
> SID: 7
> iSCSI Connection State: LOGGED IN
> Internal iscsid Session State: NO CHANGE
> 
> 
> Session number is the SID value.
> 
> If you run
> iscsiadm -m session
> tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311
> 
> the session number/SID is the value in brackets.
> 
> 
> If you run iscsiadm in session mode (iscsiadm -m session) then you can 
> use the -R argument and pass in a SID to do an opertaion like
> 
> iscsiadm -m session -R 2 --rescan
> 
> would rescan that session.
> 
> Connection number is currently always zero.
> 
> 
> For the second question, iscsid handles login and logout, and error 
> handling, and the kernel basically passes iscsi packets around.
> 
> 
> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10 
> secs expired, last rx 191838122, last ping 191839372, now 191841872
> 
> 
> so here the iscsi kernel code sends a iscsi ping/nop every noop_interval 
> seconds, and if we do not get a response withing noop_timeout seconds it 
> will fire off a connection error.
> 
> 
> 
> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error 
> (1011)
> 
> 
> Here is the kernel code notifying userspace of the problem.
> 
> 
> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 
> error (1011) state (3)
> 
> 
> And there iscsid is accepting the error (probably no need for the error 
> to be logged twice).
> 
> 
> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target
> 
> 
> And then here iscsid handled the error by killing the tcp/ip connection, 
> reconnection the tcp/ip connection, and then re-logging into the iscsi 
> target. But for some reason we could not log back in right away.
> 
> 
> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03
> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after 
> recovery (1 attempts)
> 
> But it looks like we tried again and we got back in.
> 
> 
> 
>> Maybe one of these modules?
>> iscsi_tcp  19785  46 
>> libiscsi_tcp   21829  1 iscsi_tcp
>> libiscsi2  41285  3 ib_iser,iscsi_tcp,libiscsi_tcp
>> scsi_transport_iscsi237197  5 ib_iser,iscsi_tcp,libiscsi2
>> scsi_transport_iscsi 6085  1 scsi_transport_iscsi2
>> 
>> I'm just trying to make sure that all of my timeout values line up properly 
>> so that, for example, OCFS2 filesystem doesn't timeout prior to iSCSI.  I'd 
>> rather verify that iSCSI reconnections are attempted every 10 seconds  and 
>> then dm-multipath has 30 seconds to timeout and then OCFS2 has 60 seconds 
>> before it times out (or something along those lines).  So if I understand 
>> the logging a bit

Re: iscsi diagnosis help

2009-11-16 Thread Mike Christie
Hoot, Joseph wrote:
> Hi all,
> 
> I'm trying to understand what I'm seeing in my /var/log/messages.  Here's 
> what I have:
> 
> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10 secs 
> expired, last rx 191838122, last ping 191839372, now 191841872
> Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error (1011)
> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 error 
> (1011) state (3)
> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target 
> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03
> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after 
> recovery (1 attempts)
> 
> the first line, what is "connection5:0"?  is that referenced from iscsiadm 
> somewhere? I only ask because I'm seeing iscsid messages and kernel messages. 
>  I also have dm-multipath running, which usually shows up as dm-multipath or 
> something like that.  I understand that iscsid is the process that is logging 
> in and out.  But is the "kernel:" message just an iscsi modules that is 
> loaded into the kernel, which is why it is being logged as "kernel:"?
> 

It is the session id and connection id.

connection$SESSION_ID:$CONNECTION_ID

If you run iscsiadm -m session -P 1 or -P 3

You will see

#iscsiadm -m session -P 1
Target: iqn.1992-08.com.netapp:sn.33615311
 Current Portal: 10.15.85.19:3260,3
 Persistent Portal: 10.15.85.19:3260,3
 Iface Transport: tcp
 Iface IPaddress: 10.11.14.37
 Iface HWaddress: default
 Iface Netdev: default
 SID: 7
 iSCSI Connection State: LOGGED IN
 Internal iscsid Session State: NO CHANGE


Session number is the SID value.

If you run
iscsiadm -m session
tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311

the session number/SID is the value in brackets.


If you run iscsiadm in session mode (iscsiadm -m session) then you can 
use the -R argument and pass in a SID to do an opertaion like

iscsiadm -m session -R 2 --rescan

would rescan that session.

Connection number is currently always zero.


For the second question, iscsid handles login and logout, and error 
handling, and the kernel basically passes iscsi packets around.


Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10 
secs expired, last rx 191838122, last ping 191839372, now 191841872


so here the iscsi kernel code sends a iscsi ping/nop every noop_interval 
seconds, and if we do not get a response withing noop_timeout seconds it 
will fire off a connection error.



Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error 
(1011)


Here is the kernel code notifying userspace of the problem.


Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 
error (1011) state (3)


And there iscsid is accepting the error (probably no need for the error 
to be logged twice).


Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target


And then here iscsid handled the error by killing the tcp/ip connection, 
reconnection the tcp/ip connection, and then re-logging into the iscsi 
target. But for some reason we could not log back in right away.


iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03
Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after 
recovery (1 attempts)

But it looks like we tried again and we got back in.



> Maybe one of these modules?
> iscsi_tcp  19785  46 
> libiscsi_tcp   21829  1 iscsi_tcp
> libiscsi2  41285  3 ib_iser,iscsi_tcp,libiscsi_tcp
> scsi_transport_iscsi237197  5 ib_iser,iscsi_tcp,libiscsi2
> scsi_transport_iscsi 6085  1 scsi_transport_iscsi2
> 
> I'm just trying to make sure that all of my timeout values line up properly 
> so that, for example, OCFS2 filesystem doesn't timeout prior to iSCSI.  I'd 
> rather verify that iSCSI reconnections are attempted every 10 seconds  and 
> then dm-multipath has 30 seconds to timeout and then OCFS2 has 60 seconds 
> before it times out (or something along those lines).  So if I understand the 
> logging a bit more, it'll help me.
> 
> Thanks
> Joe
> 
> 
> ===
> Joseph R. Hoot
> Lead System Programmer/Analyst
> (w) 716-878-4832
> (c) 716-759-HOOT
> joe.h...@itec.suny.edu
> GPG KEY:   7145F633
> ===
> 
> 
> --~--~-~--~~~---~--~~
> You received this message because you are subscribed to the Google Groups 
> "open-iscsi" group.
> To post to this group, send email to open-iscsi@googlegroups.com
> To unsubscribe from this group, send email to 
> open-iscsi+unsubscr...@googlegroups.com
> For more options, visit this group at 
> http://groups.google.com/group/open-iscsi
> -~--~~~~--~~--~--~---
> 

--

You received this message because you are subscribed to the Google Grou

iscsi diagnosis help

2009-11-16 Thread Hoot, Joseph

Hi all,

I'm trying to understand what I'm seeing in my /var/log/messages.  Here's what 
I have:

Nov 13 10:49:47 oim6102506 kernel:  connection5:0: ping timeout of 10 secs 
expired, last rx 191838122, last ping 191839372, now 191841872
Nov 13 10:49:47 oim6102506 kernel:  connection5:0: detected conn error (1011)
Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 error 
(1011) state (3)
Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target 
iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03
Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after recovery 
(1 attempts)

the first line, what is "connection5:0"?  is that referenced from iscsiadm 
somewhere? I only ask because I'm seeing iscsid messages and kernel messages.  
I also have dm-multipath running, which usually shows up as dm-multipath or 
something like that.  I understand that iscsid is the process that is logging 
in and out.  But is the "kernel:" message just an iscsi modules that is loaded 
into the kernel, which is why it is being logged as "kernel:"?

Maybe one of these modules?
iscsi_tcp  19785  46 
libiscsi_tcp   21829  1 iscsi_tcp
libiscsi2  41285  3 ib_iser,iscsi_tcp,libiscsi_tcp
scsi_transport_iscsi237197  5 ib_iser,iscsi_tcp,libiscsi2
scsi_transport_iscsi 6085  1 scsi_transport_iscsi2

I'm just trying to make sure that all of my timeout values line up properly so 
that, for example, OCFS2 filesystem doesn't timeout prior to iSCSI.  I'd rather 
verify that iSCSI reconnections are attempted every 10 seconds  and then 
dm-multipath has 30 seconds to timeout and then OCFS2 has 60 seconds before it 
times out (or something along those lines).  So if I understand the logging a 
bit more, it'll help me.

Thanks
Joe


===
Joseph R. Hoot
Lead System Programmer/Analyst
(w) 716-878-4832
(c) 716-759-HOOT
joe.h...@itec.suny.edu
GPG KEY:   7145F633
===


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---