Re: iscsi diagnosis help
Hi Mike, Thanks. Can you please try --target=i686. On the patches, yes, I will include it in our next VM kernel. regards Guru Mike Christie wrote: > ccing Guru Anbalagane. > > Hoot, Joseph wrote: >> Sure. Its the OVM 2.2 environment that I talked with you about a >> couple of days ago. Here is the info: >> > > Doh! I forgot. > > Guru, I am trying to make a patch for your Oracle VM kernel. I want to > port the patch I told you about > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4c48a82935f833d94fcf44c2b0c5d2922acfc77a;hp=d1acfae514425d680912907c6554852f1e258551 > > > and this one: > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1acfae514425d680912907c6554852f1e258551 > > > if you do not have it (could not remember if you did or not). > > I found the src rpm: > http://edelivery.oracle.com/EPD/Download/get_form?egroup_aru_number=11874896 > > > The problem is that I keep getting RPM build errors when I try to > build the project: > > rpm -ivh kernel-2.6.18-128.2.1.4.9.el5.src.rpm > rpmbuild -bp --target=noarch /usr/src/redhat/SPECS/kernel-2.6.spec > > ... a bunch of stuff then > > make[1]: *** [nonint_oldconfig] Error 15 > make: *** [nonint_oldconfig] Error 2 > error: Bad exit status from /var/tmp/rpm-tmp.94270 (%prep) > > > RPM build errors: > Bad exit status from /var/tmp/rpm-tmp.94270 (%prep) > (have also tried without the --target arg and with different archs and > always fails) > > This is how I setup the RHEL kernel source rpm. Are the commands for > the Oracle VM kernel different? > > > >> [r...@oim6102506 ~]# uname -r >> 2.6.18-128.2.1.4.9.el5xen >> [r...@oim6102506 ~]# rpm -qa | grep iscsi >> iscsi-initiator-utils-6.2.0.871-0.7.el5 >> >> I have a tcpdump that I had sent to Don WIlliams. I'll pull that up >> shortly. >> >> On Nov 17, 2009, at 10:06 PM, Mike Christie wrote: >> >>> Mike Christie wrote: Hoot, Joseph wrote: > more INLINE below... > > On Nov 17, 2009, at 7:27 PM, Mike Christie wrote: > >> Pasi Kärkkäinen wrote: >>> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: > thanks. That helps. So I know that with the EqualLogic > targets, there is a "Group IP" which, I believe, responds with > an iscsi login_redirect. > 1) Could the "Login authentication failed" message be the > response because of a login redirect messages from the EQL > redirect? > > and then my next question is more for curiosity sake: > > 2) Are there plans in the future to have more than one > connection per session? and I guess in addition to that, > would that mean multiple connections to a single volume over > the same nic? > > Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. In this type of scenario, would you expect to see timeouts like this once in awhile? If so, do you think increasing my NOOP timeouts would assist so we don't get these? maybe set it to 15 seconds instead of 10? >>> Equallogic does active loadbalancing (redirects) during operation.. >>> dunno about the errors though. >>> >> Oh yeah, forgot about that. Thanks Pasi! >> >> Joseph, look in the EQL target logs for something about the EQL >> box doing load balancing. I think normally we handle the load >> balancing more gracefully, but we might be messing up. I think if >> EQL was load balancing in the open-iscsi logs we would see >> something about getting a async iscsi pdu from the target that >> asks us to logout. Then when we relogin the target would redirect >> us to the optimal path. > There are two things that the EQL does, I believe-- one thing is > async logout, the other is login_redirect. Unfortunately, from > the EQL syslog side we don't see any errors related to this. It's > my understanding, however, that when a login is initially > attempted to the EQL, it hits the "group ip" or an alias'd IP > sitting on a real nic. The group IP looks at all the interfaces > on the EQL and decides, based on some algorithm, which EQL nic the > session should connect to. It then sends the initiator that made > the request a login_redirect, which I thought is basically a > "logout and reconnect" pdu. It would say, for example, "you're > can't log into the group IP, however, you can log into this IP (a > real nic) that it would prefer you be logged into." > > I'm thinking that the "failed login" is actually the result of > that attempt to log into the group IP and it sending a login > redirect pdu back to it. > If the target
Re: iscsi diagnosis help
Guru Anbalagane wrote: > Hi Mike, > > Thanks. > Can you please try --target=i686. > Thanks Guru. > On the patches, yes, I will include it in our next VM kernel. Joseph, attached is a patch that has a couple fixes from upstream in that code path. If it does not fix your issue then I will have to send a patch with some more debugging in it (the net layer is not waking us up when send space is opening or it really is taking a long time to send the data). Patch is only compile tested. > regards > Guru > Mike Christie wrote: >> ccing Guru Anbalagane. >> >> Hoot, Joseph wrote: >>> Sure. Its the OVM 2.2 environment that I talked with you about a >>> couple of days ago. Here is the info: >>> >> >> Doh! I forgot. >> >> Guru, I am trying to make a patch for your Oracle VM kernel. I want to >> port the patch I told you about >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4c48a82935f833d94fcf44c2b0c5d2922acfc77a;hp=d1acfae514425d680912907c6554852f1e258551 >> >> >> and this one: >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1acfae514425d680912907c6554852f1e258551 >> >> >> if you do not have it (could not remember if you did or not). >> >> I found the src rpm: >> http://edelivery.oracle.com/EPD/Download/get_form?egroup_aru_number=11874896 >> >> >> The problem is that I keep getting RPM build errors when I try to >> build the project: >> >> rpm -ivh kernel-2.6.18-128.2.1.4.9.el5.src.rpm >> rpmbuild -bp --target=noarch /usr/src/redhat/SPECS/kernel-2.6.spec >> >> ... a bunch of stuff then >> >> make[1]: *** [nonint_oldconfig] Error 15 >> make: *** [nonint_oldconfig] Error 2 >> error: Bad exit status from /var/tmp/rpm-tmp.94270 (%prep) >> >> >> RPM build errors: >> Bad exit status from /var/tmp/rpm-tmp.94270 (%prep) >> (have also tried without the --target arg and with different archs and >> always fails) >> >> This is how I setup the RHEL kernel source rpm. Are the commands for >> the Oracle VM kernel different? >> >> >> >>> [r...@oim6102506 ~]# uname -r >>> 2.6.18-128.2.1.4.9.el5xen >>> [r...@oim6102506 ~]# rpm -qa | grep iscsi >>> iscsi-initiator-utils-6.2.0.871-0.7.el5 >>> >>> I have a tcpdump that I had sent to Don WIlliams. I'll pull that up >>> shortly. >>> >>> On Nov 17, 2009, at 10:06 PM, Mike Christie wrote: >>> Mike Christie wrote: > Hoot, Joseph wrote: >> more INLINE below... >> >> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote: >> >>> Pasi Kärkkäinen wrote: On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: > On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: > >> thanks. That helps. So I know that with the EqualLogic >> targets, there is a "Group IP" which, I believe, responds with >> an iscsi login_redirect. >> 1) Could the "Login authentication failed" message be the >> response because of a login redirect messages from the EQL >> redirect? >> >> and then my next question is more for curiosity sake: >> >> 2) Are there plans in the future to have more than one >> connection per session? and I guess in addition to that, >> would that mean multiple connections to a single volume over >> the same nic? >> >> > Also Mike, I'm seeing one or two of these every 30-40 minutes > if I slam our EqualLogic with roughly 7-15k IOPS (reads and > writes) non stop on 3 volumes. In this type of scenario, would > you expect to see timeouts like this once in awhile? If so, do > you think increasing my NOOP timeouts would assist so we don't > get these? maybe set it to 15 seconds instead of 10? > Equallogic does active loadbalancing (redirects) during operation.. dunno about the errors though. >>> Oh yeah, forgot about that. Thanks Pasi! >>> >>> Joseph, look in the EQL target logs for something about the EQL >>> box doing load balancing. I think normally we handle the load >>> balancing more gracefully, but we might be messing up. I think if >>> EQL was load balancing in the open-iscsi logs we would see >>> something about getting a async iscsi pdu from the target that >>> asks us to logout. Then when we relogin the target would redirect >>> us to the optimal path. >> There are two things that the EQL does, I believe-- one thing is >> async logout, the other is login_redirect. Unfortunately, from >> the EQL syslog side we don't see any errors related to this. It's >> my understanding, however, that when a login is initially >> attempted to the EQL, it hits the "group ip" or an alias'd IP >> sitting on a real nic. The group IP looks at all the interfaces >> on the EQL and decides, based on some algorithm, which EQL nic the >> session should connect to. It then se
Re: iscsi diagnosis help
ccing Guru Anbalagane. Hoot, Joseph wrote: > Sure. Its the OVM 2.2 environment that I talked with you about a couple of > days ago. Here is the info: > Doh! I forgot. Guru, I am trying to make a patch for your Oracle VM kernel. I want to port the patch I told you about http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4c48a82935f833d94fcf44c2b0c5d2922acfc77a;hp=d1acfae514425d680912907c6554852f1e258551 and this one: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1acfae514425d680912907c6554852f1e258551 if you do not have it (could not remember if you did or not). I found the src rpm: http://edelivery.oracle.com/EPD/Download/get_form?egroup_aru_number=11874896 The problem is that I keep getting RPM build errors when I try to build the project: rpm -ivh kernel-2.6.18-128.2.1.4.9.el5.src.rpm rpmbuild -bp --target=noarch /usr/src/redhat/SPECS/kernel-2.6.spec ... a bunch of stuff then make[1]: *** [nonint_oldconfig] Error 15 make: *** [nonint_oldconfig] Error 2 error: Bad exit status from /var/tmp/rpm-tmp.94270 (%prep) RPM build errors: Bad exit status from /var/tmp/rpm-tmp.94270 (%prep) (have also tried without the --target arg and with different archs and always fails) This is how I setup the RHEL kernel source rpm. Are the commands for the Oracle VM kernel different? > [r...@oim6102506 ~]# uname -r > 2.6.18-128.2.1.4.9.el5xen > [r...@oim6102506 ~]# rpm -qa | grep iscsi > iscsi-initiator-utils-6.2.0.871-0.7.el5 > > I have a tcpdump that I had sent to Don WIlliams. I'll pull that up shortly. > > On Nov 17, 2009, at 10:06 PM, Mike Christie wrote: > >> Mike Christie wrote: >>> Hoot, Joseph wrote: more INLINE below... On Nov 17, 2009, at 7:27 PM, Mike Christie wrote: > Pasi Kärkkäinen wrote: >> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: >>> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: >>> thanks. That helps. So I know that with the EqualLogic targets, there is a "Group IP" which, I believe, responds with an iscsi login_redirect. 1) Could the "Login authentication failed" message be the response because of a login redirect messages from the EQL redirect? and then my next question is more for curiosity sake: 2) Are there plans in the future to have more than one connection per session? and I guess in addition to that, would that mean multiple connections to a single volume over the same nic? >>> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam >>> our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 >>> volumes. In this type of scenario, would you expect to see timeouts >>> like this once in awhile? If so, do you think increasing my NOOP >>> timeouts would assist so we don't get these? maybe set it to 15 >>> seconds instead of 10? >>> >> Equallogic does active loadbalancing (redirects) during operation.. >> dunno about the errors though. >> > Oh yeah, forgot about that. Thanks Pasi! > > Joseph, look in the EQL target logs for something about the EQL box > doing load balancing. I think normally we handle the load balancing more > gracefully, but we might be messing up. I think if EQL was load > balancing in the open-iscsi logs we would see something about getting a > async iscsi pdu from the target that asks us to logout. Then when we > relogin the target would redirect us to the optimal path. There are two things that the EQL does, I believe-- one thing is async logout, the other is login_redirect. Unfortunately, from the EQL syslog side we don't see any errors related to this. It's my understanding, however, that when a login is initially attempted to the EQL, it hits the "group ip" or an alias'd IP sitting on a real nic. The group IP looks at all the interfaces on the EQL and decides, based on some algorithm, which EQL nic the session should connect to. It then sends the initiator that made the request a login_redirect, which I thought is basically a "logout and reconnect" pdu. It would say, for example, "you're can't log into the group IP, however, you can log into this IP (a real nic) that it would prefer you be logged into." I'm thinking that the "failed login" is actually the result of that attempt to log into the group IP and it sending a login redirect pdu back to it. >>> If the target was load balancing us it would: >>> >>> - Send a async logout pdu. >>> - We then send a logout pdu. >>> - When we get the logout response pdu we kill the tcp ip connection >>> - We then create a new tcp connection >>> - We then log in to the portal that was passed into iscsiadm/iscsid (the >>> one in the DB that you see wh
Re: iscsi diagnosis help
Sure. Its the OVM 2.2 environment that I talked with you about a couple of days ago. Here is the info: [r...@oim6102506 ~]# uname -r 2.6.18-128.2.1.4.9.el5xen [r...@oim6102506 ~]# rpm -qa | grep iscsi iscsi-initiator-utils-6.2.0.871-0.7.el5 I have a tcpdump that I had sent to Don WIlliams. I'll pull that up shortly. On Nov 17, 2009, at 10:06 PM, Mike Christie wrote: > Mike Christie wrote: >> Hoot, Joseph wrote: >>> more INLINE below... >>> >>> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote: >>> Pasi Kärkkäinen wrote: > On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: >> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: >> >>> thanks. That helps. So I know that with the EqualLogic targets, there >>> is a "Group IP" which, I believe, responds with an iscsi >>> login_redirect. >>> >>> 1) Could the "Login authentication failed" message be the response >>> because of a login redirect messages from the EQL redirect? >>> >>> and then my next question is more for curiosity sake: >>> >>> 2) Are there plans in the future to have more than one connection per >>> session? and I guess in addition to that, would that mean multiple >>> connections to a single volume over the same nic? >>> >>> >> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam >> our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 >> volumes. In this type of scenario, would you expect to see timeouts >> like this once in awhile? If so, do you think increasing my NOOP >> timeouts would assist so we don't get these? maybe set it to 15 seconds >> instead of 10? >> > Equallogic does active loadbalancing (redirects) during operation.. > dunno about the errors though. > Oh yeah, forgot about that. Thanks Pasi! Joseph, look in the EQL target logs for something about the EQL box doing load balancing. I think normally we handle the load balancing more gracefully, but we might be messing up. I think if EQL was load balancing in the open-iscsi logs we would see something about getting a async iscsi pdu from the target that asks us to logout. Then when we relogin the target would redirect us to the optimal path. >>> >>> There are two things that the EQL does, I believe-- one thing is async >>> logout, the other is login_redirect. Unfortunately, from the EQL syslog >>> side we don't see any errors related to this. It's my understanding, >>> however, that when a login is initially attempted to the EQL, it hits the >>> "group ip" or an alias'd IP sitting on a real nic. The group IP looks at >>> all the interfaces on the EQL and decides, based on some algorithm, which >>> EQL nic the session should connect to. It then sends the initiator that >>> made the request a login_redirect, which I thought is basically a "logout >>> and reconnect" pdu. It would say, for example, "you're can't log into the >>> group IP, however, you can log into this IP (a real nic) that it would >>> prefer you be logged into." >>> >>> I'm thinking that the "failed login" is actually the result of that attempt >>> to log into the group IP and it sending a login redirect pdu back to it. >>> >> >> If the target was load balancing us it would: >> >> - Send a async logout pdu. >> - We then send a logout pdu. >> - When we get the logout response pdu we kill the tcp ip connection >> - We then create a new tcp connection >> - We then log in to the portal that was passed into iscsiadm/iscsid (the >> one in the DB that you see when you run iscsiadm -m node, which is >> probably what you call the group IP). For this process we send a login >> pdu. It then sends a login response pdu with the login redirect >> response. In this response we also get the new IP to log into. >> - We see that response and kill the tcp connection, and create a new tcp >> connection to the portal we are being redirected to. >> - We then log into the portal we were redirected to. We again do this by >> sending a login pdu. This time the login response pdu should be ok and >> we are done. > > Oh yeah, I meant to also say that this is pretty much the same process > that happens we do the first login, and if we have to relogin because of > a connection problem like the nop/ping timeout. The only difference in > those cases is that we do not get the async logout and we do not do a > logout by sending a logout pdu. We start at the killing tcp ip > connection step. > > So even if we are not getting load balanced we would be in the same > place in the open-iscsi code when we are getting the login failed errors. > > > To get back on track solving why we get the nop timeouts then if we are > not seeing load balancing messages or async logout messages, it could > be the open-iscsi bug I mentioned in the other mail. If you can send the > open-iscsi and ke
Re: iscsi diagnosis help
Mike Christie wrote: > Hoot, Joseph wrote: >> more INLINE below... >> >> On Nov 17, 2009, at 7:27 PM, Mike Christie wrote: >> >>> Pasi Kärkkäinen wrote: On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: > On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: > >> thanks. That helps. So I know that with the EqualLogic targets, there >> is a "Group IP" which, I believe, responds with an iscsi login_redirect. >> >> 1) Could the "Login authentication failed" message be the response >> because of a login redirect messages from the EQL redirect? >> >> and then my next question is more for curiosity sake: >> >> 2) Are there plans in the future to have more than one connection per >> session? and I guess in addition to that, would that mean multiple >> connections to a single volume over the same nic? >> >> > Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam > our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 > volumes. In this type of scenario, would you expect to see timeouts like > this once in awhile? If so, do you think increasing my NOOP timeouts > would assist so we don't get these? maybe set it to 15 seconds instead > of 10? > Equallogic does active loadbalancing (redirects) during operation.. dunno about the errors though. >>> Oh yeah, forgot about that. Thanks Pasi! >>> >>> Joseph, look in the EQL target logs for something about the EQL box >>> doing load balancing. I think normally we handle the load balancing more >>> gracefully, but we might be messing up. I think if EQL was load >>> balancing in the open-iscsi logs we would see something about getting a >>> async iscsi pdu from the target that asks us to logout. Then when we >>> relogin the target would redirect us to the optimal path. >> >> There are two things that the EQL does, I believe-- one thing is async >> logout, the other is login_redirect. Unfortunately, from the EQL syslog >> side we don't see any errors related to this. It's my understanding, >> however, that when a login is initially attempted to the EQL, it hits the >> "group ip" or an alias'd IP sitting on a real nic. The group IP looks at >> all the interfaces on the EQL and decides, based on some algorithm, which >> EQL nic the session should connect to. It then sends the initiator that >> made the request a login_redirect, which I thought is basically a "logout >> and reconnect" pdu. It would say, for example, "you're can't log into the >> group IP, however, you can log into this IP (a real nic) that it would >> prefer you be logged into." >> >> I'm thinking that the "failed login" is actually the result of that attempt >> to log into the group IP and it sending a login redirect pdu back to it. >> > > If the target was load balancing us it would: > > - Send a async logout pdu. > - We then send a logout pdu. > - When we get the logout response pdu we kill the tcp ip connection > - We then create a new tcp connection > - We then log in to the portal that was passed into iscsiadm/iscsid (the > one in the DB that you see when you run iscsiadm -m node, which is > probably what you call the group IP). For this process we send a login > pdu. It then sends a login response pdu with the login redirect > response. In this response we also get the new IP to log into. > - We see that response and kill the tcp connection, and create a new tcp > connection to the portal we are being redirected to. > - We then log into the portal we were redirected to. We again do this by > sending a login pdu. This time the login response pdu should be ok and > we are done. Oh yeah, I meant to also say that this is pretty much the same process that happens we do the first login, and if we have to relogin because of a connection problem like the nop/ping timeout. The only difference in those cases is that we do not get the async logout and we do not do a logout by sending a logout pdu. We start at the killing tcp ip connection step. So even if we are not getting load balanced we would be in the same place in the open-iscsi code when we are getting the login failed errors. To get back on track solving why we get the nop timeouts then if we are not seeing load balancing messages or async logout messages, it could be the open-iscsi bug I mentioned in the other mail. If you can send the open-iscsi and kernel info I asked for in the other mail, we can start down that path. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: iscsi diagnosis help
Hoot, Joseph wrote: > more INLINE below... > > On Nov 17, 2009, at 7:27 PM, Mike Christie wrote: > >> Pasi Kärkkäinen wrote: >>> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: > thanks. That helps. So I know that with the EqualLogic targets, there > is a "Group IP" which, I believe, responds with an iscsi login_redirect. > > 1) Could the "Login authentication failed" message be the response > because of a login redirect messages from the EQL redirect? > > and then my next question is more for curiosity sake: > > 2) Are there plans in the future to have more than one connection per > session? and I guess in addition to that, would that mean multiple > connections to a single volume over the same nic? > > Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. In this type of scenario, would you expect to see timeouts like this once in awhile? If so, do you think increasing my NOOP timeouts would assist so we don't get these? maybe set it to 15 seconds instead of 10? >>> Equallogic does active loadbalancing (redirects) during operation.. >>> dunno about the errors though. >>> >> Oh yeah, forgot about that. Thanks Pasi! >> >> Joseph, look in the EQL target logs for something about the EQL box >> doing load balancing. I think normally we handle the load balancing more >> gracefully, but we might be messing up. I think if EQL was load >> balancing in the open-iscsi logs we would see something about getting a >> async iscsi pdu from the target that asks us to logout. Then when we >> relogin the target would redirect us to the optimal path. > > > There are two things that the EQL does, I believe-- one thing is async > logout, the other is login_redirect. Unfortunately, from the EQL syslog > side we don't see any errors related to this. It's my understanding, > however, that when a login is initially attempted to the EQL, it hits the > "group ip" or an alias'd IP sitting on a real nic. The group IP looks at all > the interfaces on the EQL and decides, based on some algorithm, which EQL nic > the session should connect to. It then sends the initiator that made the > request a login_redirect, which I thought is basically a "logout and > reconnect" pdu. It would say, for example, "you're can't log into the group > IP, however, you can log into this IP (a real nic) that it would prefer you > be logged into." > > I'm thinking that the "failed login" is actually the result of that attempt > to log into the group IP and it sending a login redirect pdu back to it. > If the target was load balancing us it would: - Send a async logout pdu. - We then send a logout pdu. - When we get the logout response pdu we kill the tcp ip connection - We then create a new tcp connection - We then log in to the portal that was passed into iscsiadm/iscsid (the one in the DB that you see when you run iscsiadm -m node, which is probably what you call the group IP). For this process we send a login pdu. It then sends a login response pdu with the login redirect response. In this response we also get the new IP to log into. - We see that response and kill the tcp connection, and create a new tcp connection to the portal we are being redirected to. - We then log into the portal we were redirected to. We again do this by sending a login pdu. This time the login response pdu should be ok and we are done. We do not know which login pdu failed, right now. You would need a wireshark trace or iscsid debugging. iscsid could hit the "Login authentication failed" path for either of the login pdus sent. You should not see that message normally even when we are being redirected. We do the login redirect login when we initially log into the target (like when you do iscsiadm -m node -l or service iscsi start), and if you look in your logs you should not see a login failed message for that. If you do it might be a clue. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: iscsi diagnosis help
more INLINE below... On Nov 17, 2009, at 7:27 PM, Mike Christie wrote: > Pasi Kärkkäinen wrote: >> On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: >>> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: >>> thanks. That helps. So I know that with the EqualLogic targets, there is a "Group IP" which, I believe, responds with an iscsi login_redirect. 1) Could the "Login authentication failed" message be the response because of a login redirect messages from the EQL redirect? and then my next question is more for curiosity sake: 2) Are there plans in the future to have more than one connection per session? and I guess in addition to that, would that mean multiple connections to a single volume over the same nic? >>> >>> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our >>> EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 >>> volumes. In this type of scenario, would you expect to see timeouts like >>> this once in awhile? If so, do you think increasing my NOOP timeouts would >>> assist so we don't get these? maybe set it to 15 seconds instead of 10? >>> >> >> Equallogic does active loadbalancing (redirects) during operation.. >> dunno about the errors though. >> > > Oh yeah, forgot about that. Thanks Pasi! > > Joseph, look in the EQL target logs for something about the EQL box > doing load balancing. I think normally we handle the load balancing more > gracefully, but we might be messing up. I think if EQL was load > balancing in the open-iscsi logs we would see something about getting a > async iscsi pdu from the target that asks us to logout. Then when we > relogin the target would redirect us to the optimal path. There are two things that the EQL does, I believe-- one thing is async logout, the other is login_redirect. Unfortunately, from the EQL syslog side we don't see any errors related to this. It's my understanding, however, that when a login is initially attempted to the EQL, it hits the "group ip" or an alias'd IP sitting on a real nic. The group IP looks at all the interfaces on the EQL and decides, based on some algorithm, which EQL nic the session should connect to. It then sends the initiator that made the request a login_redirect, which I thought is basically a "logout and reconnect" pdu. It would say, for example, "you're can't log into the group IP, however, you can log into this IP (a real nic) that it would prefer you be logged into." I'm thinking that the "failed login" is actually the result of that attempt to log into the group IP and it sending a login redirect pdu back to it. Don, does this seem like normal EQL traffic to an OiS initiator? > > -- > > You received this message because you are subscribed to the Google Groups > "open-iscsi" group. > To post to this group, send email to open-is...@googlegroups.com. > To unsubscribe from this group, send email to > open-iscsi+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/open-iscsi?hl=. > > === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: iscsi diagnosis help
Pasi Kärkkäinen wrote: > On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: >> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: >> >>> thanks. That helps. So I know that with the EqualLogic targets, there is >>> a "Group IP" which, I believe, responds with an iscsi login_redirect. >>> >>> 1) Could the "Login authentication failed" message be the response because >>> of a login redirect messages from the EQL redirect? >>> >>> and then my next question is more for curiosity sake: >>> >>> 2) Are there plans in the future to have more than one connection per >>> session? and I guess in addition to that, would that mean multiple >>> connections to a single volume over the same nic? >>> >>> >> >> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our >> EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. >> In this type of scenario, would you expect to see timeouts like this once >> in awhile? If so, do you think increasing my NOOP timeouts would assist so >> we don't get these? maybe set it to 15 seconds instead of 10? >> > > Equallogic does active loadbalancing (redirects) during operation.. > dunno about the errors though. > Oh yeah, forgot about that. Thanks Pasi! Joseph, look in the EQL target logs for something about the EQL box doing load balancing. I think normally we handle the load balancing more gracefully, but we might be messing up. I think if EQL was load balancing in the open-iscsi logs we would see something about getting a async iscsi pdu from the target that asks us to logout. Then when we relogin the target would redirect us to the optimal path. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: iscsi diagnosis help
Hoot, Joseph wrote: > On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: > >> thanks. That helps. So I know that with the EqualLogic targets, there is a >> "Group IP" which, I believe, responds with an iscsi login_redirect. >> >> 1) Could the "Login authentication failed" message be the response because >> of a login redirect messages from the EQL redirect? >> >> and then my next question is more for curiosity sake: >> >> 2) Are there plans in the future to have more than one connection per >> session? and I guess in addition to that, would that mean multiple >> connections to a single volume over the same nic? >> >> > > > Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our > EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. > In this type of scenario, would you expect to see timeouts like this once in > awhile? If so, do you think increasing my NOOP timeouts would assist so we > don't get these? maybe set it to 15 seconds instead of 10? > It might be a bug. What version of open-iscsi are you using? What kernel? Is it a distro or kernel.org one? And are you using the open-iscsi kernel modules that come with a open-iscsi.org tarball or the kernel modules that come with your kernel? -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: iscsi diagnosis help
Hoot, Joseph wrote: > thanks. That helps. So I know that with the EqualLogic targets, there is a > "Group IP" which, I believe, responds with an iscsi login_redirect. > > 1) Could the "Login authentication failed" message be the response because of > a login redirect messages from the EQL redirect? > It could be, but then when we retry we end up handling the redirect ok. I do not know why the first redirect would fail. Take a wireshark trace or run iscsid by hand iscsid -d 8 send the log output. That will let us know where we failed, but it would not tell us why. Normally EQL targets would leave something in their logs about why. > and then my next question is more for curiosity sake: > > 2) Are there plans in the future to have more than one connection per > session? and I guess in addition to that, would that mean multiple > connections to a single volume over the same nic? > No plans for MC/s. You can do multiple sessions to the same volume though. You can have multiple sessions over the same nic or over difference nics or some combo. > > > On Nov 16, 2009, at 7:18 PM, Mike Christie wrote: > >> Hoot, Joseph wrote: >>> Hi all, >>> >>> I'm trying to understand what I'm seeing in my /var/log/messages. Here's >>> what I have: >>> >>> Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs >>> expired, last rx 191838122, last ping 191839372, now 191841872 >>> Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error >>> (1011) >>> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 >>> error (1011) state (3) >>> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target >>> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 >>> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after >>> recovery (1 attempts) >>> >>> the first line, what is "connection5:0"? is that referenced from iscsiadm >>> somewhere? I only ask because I'm seeing iscsid messages and kernel >>> messages. I also have dm-multipath running, which usually shows up as >>> dm-multipath or something like that. I understand that iscsid is the >>> process that is logging in and out. But is the "kernel:" message just an >>> iscsi modules that is loaded into the kernel, which is why it is being >>> logged as "kernel:"? >>> >> It is the session id and connection id. >> >> connection$SESSION_ID:$CONNECTION_ID >> >> If you run iscsiadm -m session -P 1 or -P 3 >> >> You will see >> >> #iscsiadm -m session -P 1 >> Target: iqn.1992-08.com.netapp:sn.33615311 >> Current Portal: 10.15.85.19:3260,3 >> Persistent Portal: 10.15.85.19:3260,3 >> Iface Transport: tcp >> Iface IPaddress: 10.11.14.37 >> Iface HWaddress: default >> Iface Netdev: default >> SID: 7 >> iSCSI Connection State: LOGGED IN >> Internal iscsid Session State: NO CHANGE >> >> >> Session number is the SID value. >> >> If you run >> iscsiadm -m session >> tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311 >> >> the session number/SID is the value in brackets. >> >> >> If you run iscsiadm in session mode (iscsiadm -m session) then you can >> use the -R argument and pass in a SID to do an opertaion like >> >> iscsiadm -m session -R 2 --rescan >> >> would rescan that session. >> >> Connection number is currently always zero. >> >> >> For the second question, iscsid handles login and logout, and error >> handling, and the kernel basically passes iscsi packets around. >> >> >> Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 >> secs expired, last rx 191838122, last ping 191839372, now 191841872 >> >> >> so here the iscsi kernel code sends a iscsi ping/nop every noop_interval >> seconds, and if we do not get a response withing noop_timeout seconds it >> will fire off a connection error. >> >> >> >> Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error >> (1011) >> >> >> Here is the kernel code notifying userspace of the problem. >> >> >> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 >> error (1011) state (3) >> >> >> And there iscsid is accepting the error (probably no need for the error >> to be logged twice). >> >> >> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target >> >> >> And then here iscsid handled the error by killing the tcp/ip connection, >> reconnection the tcp/ip connection, and then re-logging into the iscsi >> target. But for some reason we could not log back in right away. >> >> >> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 >> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after >> recovery (1 attempts) >> >> But it looks like we tried again and we got back in. >> >> >> >>> Maybe one of these modules? >>> iscsi_tcp 19785 46 >>> libiscsi_tcp
Re: iscsi diagnosis help
> 2) Are there plans in the future to have more than one connection per session? I dont't think so. If you want to know the reason, read the thread titled "MC/S support " in open-iscsi mailing list. Kim 2009/11/17, Pasi Kärkkäinen : > On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: >> >> On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: >> >> > thanks. That helps. So I know that with the EqualLogic targets, there >> > is a "Group IP" which, I believe, responds with an iscsi login_redirect. >> > >> > >> > 1) Could the "Login authentication failed" message be the response >> > because of a login redirect messages from the EQL redirect? >> > >> > and then my next question is more for curiosity sake: >> > >> > 2) Are there plans in the future to have more than one connection per >> > session? and I guess in addition to that, would that mean multiple >> > connections to a single volume over the same nic? >> > >> > >> >> >> Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam >> our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 >> volumes. In this type of scenario, would you expect to see timeouts like >> this once in awhile? If so, do you think increasing my NOOP timeouts >> would assist so we don't get these? maybe set it to 15 seconds instead of >> 10? >> > > Equallogic does active loadbalancing (redirects) during operation.. > dunno about the errors though. > > -- Pasi > >> >> > >> > On Nov 16, 2009, at 7:18 PM, Mike Christie wrote: >> > >> >> Hoot, Joseph wrote: >> >>> Hi all, >> >>> >> >>> I'm trying to understand what I'm seeing in my /var/log/messages. >> >>> Here's what I have: >> >>> >> >>> Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 >> >>> secs expired, last rx 191838122, last ping 191839372, now 191841872 >> >>> Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error >> >>> (1011) >> >>> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection >> >>> 5:0 error (1011) state (3) >> >>> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with >> >>> target >> >>> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 >> >>> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after >> >>> recovery (1 attempts) >> >>> >> >>> the first line, what is "connection5:0"? is that referenced from >> >>> iscsiadm somewhere? I only ask because I'm seeing iscsid messages and >> >>> kernel messages. I also have dm-multipath running, which usually >> >>> shows up as dm-multipath or something like that. I understand that >> >>> iscsid is the process that is logging in and out. But is the >> >>> "kernel:" message just an iscsi modules that is loaded into the >> >>> kernel, which is why it is being logged as "kernel:"? >> >>> >> >> >> >> It is the session id and connection id. >> >> >> >> connection$SESSION_ID:$CONNECTION_ID >> >> >> >> If you run iscsiadm -m session -P 1 or -P 3 >> >> >> >> You will see >> >> >> >> #iscsiadm -m session -P 1 >> >> Target: iqn.1992-08.com.netapp:sn.33615311 >> >>Current Portal: 10.15.85.19:3260,3 >> >>Persistent Portal: 10.15.85.19:3260,3 >> >>Iface Transport: tcp >> >>Iface IPaddress: 10.11.14.37 >> >>Iface HWaddress: default >> >>Iface Netdev: default >> >>SID: 7 >> >>iSCSI Connection State: LOGGED IN >> >>Internal iscsid Session State: NO CHANGE >> >> >> >> >> >> Session number is the SID value. >> >> >> >> If you run >> >> iscsiadm -m session >> >> tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311 >> >> >> >> the session number/SID is the value in brackets. >> >> >> >> >> >> If you run iscsiadm in session mode (iscsiadm -m session) then you can >> >> use the -R argument and pass in a SID to do an opertaion like >> >> >> >> iscsiadm -m session -R 2 --rescan >> >> >> >> would rescan that session. >> >> >> >> Connection number is currently always zero. >> >> >> >> >> >> For the second question, iscsid handles login and logout, and error >> >> handling, and the kernel basically passes iscsi packets around. >> >> >> >> >> >> Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 >> >> secs expired, last rx 191838122, last ping 191839372, now 191841872 >> >> >> >> >> >> so here the iscsi kernel code sends a iscsi ping/nop every >> >> noop_interval >> >> seconds, and if we do not get a response withing noop_timeout seconds >> >> it >> >> will fire off a connection error. >> >> >> >> >> >> >> >> Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error >> >> (1011) >> >> >> >> >> >> Here is the kernel code notifying userspace of the problem. >> >> >> >> >> >> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 >> >> >> >> error (1011) state (3) >> >> >> >> >> >> And there iscsid is accepting the error (probably no need for the error >> >> >> >> to b
Re: iscsi diagnosis help
On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: > > On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: > > > thanks. That helps. So I know that with the EqualLogic targets, there is > > a "Group IP" which, I believe, responds with an iscsi login_redirect. > > > > 1) Could the "Login authentication failed" message be the response because > > of a login redirect messages from the EQL redirect? > > > > and then my next question is more for curiosity sake: > > > > 2) Are there plans in the future to have more than one connection per > > session? and I guess in addition to that, would that mean multiple > > connections to a single volume over the same nic? > > > > > > > Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our > EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. > In this type of scenario, would you expect to see timeouts like this once in > awhile? If so, do you think increasing my NOOP timeouts would assist so we > don't get these? maybe set it to 15 seconds instead of 10? > Equallogic does active loadbalancing (redirects) during operation.. dunno about the errors though. -- Pasi > > > > > On Nov 16, 2009, at 7:18 PM, Mike Christie wrote: > > > >> Hoot, Joseph wrote: > >>> Hi all, > >>> > >>> I'm trying to understand what I'm seeing in my /var/log/messages. Here's > >>> what I have: > >>> > >>> Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 > >>> secs expired, last rx 191838122, last ping 191839372, now 191841872 > >>> Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error > >>> (1011) > >>> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 > >>> error (1011) state (3) > >>> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with > >>> target > >>> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 > >>> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after > >>> recovery (1 attempts) > >>> > >>> the first line, what is "connection5:0"? is that referenced from > >>> iscsiadm somewhere? I only ask because I'm seeing iscsid messages and > >>> kernel messages. I also have dm-multipath running, which usually shows > >>> up as dm-multipath or something like that. I understand that iscsid is > >>> the process that is logging in and out. But is the "kernel:" message > >>> just an iscsi modules that is loaded into the kernel, which is why it is > >>> being logged as "kernel:"? > >>> > >> > >> It is the session id and connection id. > >> > >> connection$SESSION_ID:$CONNECTION_ID > >> > >> If you run iscsiadm -m session -P 1 or -P 3 > >> > >> You will see > >> > >> #iscsiadm -m session -P 1 > >> Target: iqn.1992-08.com.netapp:sn.33615311 > >>Current Portal: 10.15.85.19:3260,3 > >>Persistent Portal: 10.15.85.19:3260,3 > >>Iface Transport: tcp > >>Iface IPaddress: 10.11.14.37 > >>Iface HWaddress: default > >>Iface Netdev: default > >>SID: 7 > >>iSCSI Connection State: LOGGED IN > >>Internal iscsid Session State: NO CHANGE > >> > >> > >> Session number is the SID value. > >> > >> If you run > >> iscsiadm -m session > >> tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311 > >> > >> the session number/SID is the value in brackets. > >> > >> > >> If you run iscsiadm in session mode (iscsiadm -m session) then you can > >> use the -R argument and pass in a SID to do an opertaion like > >> > >> iscsiadm -m session -R 2 --rescan > >> > >> would rescan that session. > >> > >> Connection number is currently always zero. > >> > >> > >> For the second question, iscsid handles login and logout, and error > >> handling, and the kernel basically passes iscsi packets around. > >> > >> > >> Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 > >> secs expired, last rx 191838122, last ping 191839372, now 191841872 > >> > >> > >> so here the iscsi kernel code sends a iscsi ping/nop every noop_interval > >> seconds, and if we do not get a response withing noop_timeout seconds it > >> will fire off a connection error. > >> > >> > >> > >> Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error > >> (1011) > >> > >> > >> Here is the kernel code notifying userspace of the problem. > >> > >> > >> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 > >> error (1011) state (3) > >> > >> > >> And there iscsid is accepting the error (probably no need for the error > >> to be logged twice). > >> > >> > >> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target > >> > >> > >> And then here iscsid handled the error by killing the tcp/ip connection, > >> reconnection the tcp/ip connection, and then re-logging into the iscsi > >> target. But for some reason we could not l
Re: iscsi diagnosis help
On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: > thanks. That helps. So I know that with the EqualLogic targets, there is a > "Group IP" which, I believe, responds with an iscsi login_redirect. > > 1) Could the "Login authentication failed" message be the response because of > a login redirect messages from the EQL redirect? > > and then my next question is more for curiosity sake: > > 2) Are there plans in the future to have more than one connection per > session? and I guess in addition to that, would that mean multiple > connections to a single volume over the same nic? > > Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. In this type of scenario, would you expect to see timeouts like this once in awhile? If so, do you think increasing my NOOP timeouts would assist so we don't get these? maybe set it to 15 seconds instead of 10? > > On Nov 16, 2009, at 7:18 PM, Mike Christie wrote: > >> Hoot, Joseph wrote: >>> Hi all, >>> >>> I'm trying to understand what I'm seeing in my /var/log/messages. Here's >>> what I have: >>> >>> Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs >>> expired, last rx 191838122, last ping 191839372, now 191841872 >>> Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error >>> (1011) >>> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 >>> error (1011) state (3) >>> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target >>> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 >>> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after >>> recovery (1 attempts) >>> >>> the first line, what is "connection5:0"? is that referenced from iscsiadm >>> somewhere? I only ask because I'm seeing iscsid messages and kernel >>> messages. I also have dm-multipath running, which usually shows up as >>> dm-multipath or something like that. I understand that iscsid is the >>> process that is logging in and out. But is the "kernel:" message just an >>> iscsi modules that is loaded into the kernel, which is why it is being >>> logged as "kernel:"? >>> >> >> It is the session id and connection id. >> >> connection$SESSION_ID:$CONNECTION_ID >> >> If you run iscsiadm -m session -P 1 or -P 3 >> >> You will see >> >> #iscsiadm -m session -P 1 >> Target: iqn.1992-08.com.netapp:sn.33615311 >>Current Portal: 10.15.85.19:3260,3 >>Persistent Portal: 10.15.85.19:3260,3 >>Iface Transport: tcp >>Iface IPaddress: 10.11.14.37 >>Iface HWaddress: default >>Iface Netdev: default >>SID: 7 >>iSCSI Connection State: LOGGED IN >>Internal iscsid Session State: NO CHANGE >> >> >> Session number is the SID value. >> >> If you run >> iscsiadm -m session >> tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311 >> >> the session number/SID is the value in brackets. >> >> >> If you run iscsiadm in session mode (iscsiadm -m session) then you can >> use the -R argument and pass in a SID to do an opertaion like >> >> iscsiadm -m session -R 2 --rescan >> >> would rescan that session. >> >> Connection number is currently always zero. >> >> >> For the second question, iscsid handles login and logout, and error >> handling, and the kernel basically passes iscsi packets around. >> >> >> Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 >> secs expired, last rx 191838122, last ping 191839372, now 191841872 >> >> >> so here the iscsi kernel code sends a iscsi ping/nop every noop_interval >> seconds, and if we do not get a response withing noop_timeout seconds it >> will fire off a connection error. >> >> >> >> Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error >> (1011) >> >> >> Here is the kernel code notifying userspace of the problem. >> >> >> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 >> error (1011) state (3) >> >> >> And there iscsid is accepting the error (probably no need for the error >> to be logged twice). >> >> >> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target >> >> >> And then here iscsid handled the error by killing the tcp/ip connection, >> reconnection the tcp/ip connection, and then re-logging into the iscsi >> target. But for some reason we could not log back in right away. >> >> >> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 >> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after >> recovery (1 attempts) >> >> But it looks like we tried again and we got back in. >> >> >> >>> Maybe one of these modules? >>> iscsi_tcp 19785 46 >>> libiscsi_tcp 21829 1 iscsi_tcp >>> libiscsi2 41285 3 ib_is
Re: iscsi diagnosis help
thanks. That helps. So I know that with the EqualLogic targets, there is a "Group IP" which, I believe, responds with an iscsi login_redirect. 1) Could the "Login authentication failed" message be the response because of a login redirect messages from the EQL redirect? and then my next question is more for curiosity sake: 2) Are there plans in the future to have more than one connection per session? and I guess in addition to that, would that mean multiple connections to a single volume over the same nic? On Nov 16, 2009, at 7:18 PM, Mike Christie wrote: > Hoot, Joseph wrote: >> Hi all, >> >> I'm trying to understand what I'm seeing in my /var/log/messages. Here's >> what I have: >> >> Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs >> expired, last rx 191838122, last ping 191839372, now 191841872 >> Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error (1011) >> Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 >> error (1011) state (3) >> Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target >> iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 >> Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after >> recovery (1 attempts) >> >> the first line, what is "connection5:0"? is that referenced from iscsiadm >> somewhere? I only ask because I'm seeing iscsid messages and kernel >> messages. I also have dm-multipath running, which usually shows up as >> dm-multipath or something like that. I understand that iscsid is the >> process that is logging in and out. But is the "kernel:" message just an >> iscsi modules that is loaded into the kernel, which is why it is being >> logged as "kernel:"? >> > > It is the session id and connection id. > > connection$SESSION_ID:$CONNECTION_ID > > If you run iscsiadm -m session -P 1 or -P 3 > > You will see > > #iscsiadm -m session -P 1 > Target: iqn.1992-08.com.netapp:sn.33615311 > Current Portal: 10.15.85.19:3260,3 > Persistent Portal: 10.15.85.19:3260,3 > Iface Transport: tcp > Iface IPaddress: 10.11.14.37 > Iface HWaddress: default > Iface Netdev: default > SID: 7 > iSCSI Connection State: LOGGED IN > Internal iscsid Session State: NO CHANGE > > > Session number is the SID value. > > If you run > iscsiadm -m session > tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311 > > the session number/SID is the value in brackets. > > > If you run iscsiadm in session mode (iscsiadm -m session) then you can > use the -R argument and pass in a SID to do an opertaion like > > iscsiadm -m session -R 2 --rescan > > would rescan that session. > > Connection number is currently always zero. > > > For the second question, iscsid handles login and logout, and error > handling, and the kernel basically passes iscsi packets around. > > > Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 > secs expired, last rx 191838122, last ping 191839372, now 191841872 > > > so here the iscsi kernel code sends a iscsi ping/nop every noop_interval > seconds, and if we do not get a response withing noop_timeout seconds it > will fire off a connection error. > > > > Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error > (1011) > > > Here is the kernel code notifying userspace of the problem. > > > Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 > error (1011) state (3) > > > And there iscsid is accepting the error (probably no need for the error > to be logged twice). > > > Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target > > > And then here iscsid handled the error by killing the tcp/ip connection, > reconnection the tcp/ip connection, and then re-logging into the iscsi > target. But for some reason we could not log back in right away. > > > iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 > Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after > recovery (1 attempts) > > But it looks like we tried again and we got back in. > > > >> Maybe one of these modules? >> iscsi_tcp 19785 46 >> libiscsi_tcp 21829 1 iscsi_tcp >> libiscsi2 41285 3 ib_iser,iscsi_tcp,libiscsi_tcp >> scsi_transport_iscsi237197 5 ib_iser,iscsi_tcp,libiscsi2 >> scsi_transport_iscsi 6085 1 scsi_transport_iscsi2 >> >> I'm just trying to make sure that all of my timeout values line up properly >> so that, for example, OCFS2 filesystem doesn't timeout prior to iSCSI. I'd >> rather verify that iSCSI reconnections are attempted every 10 seconds and >> then dm-multipath has 30 seconds to timeout and then OCFS2 has 60 seconds >> before it times out (or something along those lines). So if I understand >> the logging a bit
Re: iscsi diagnosis help
Hoot, Joseph wrote: > Hi all, > > I'm trying to understand what I'm seeing in my /var/log/messages. Here's > what I have: > > Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs > expired, last rx 191838122, last ping 191839372, now 191841872 > Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error (1011) > Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 error > (1011) state (3) > Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target > iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 > Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after > recovery (1 attempts) > > the first line, what is "connection5:0"? is that referenced from iscsiadm > somewhere? I only ask because I'm seeing iscsid messages and kernel messages. > I also have dm-multipath running, which usually shows up as dm-multipath or > something like that. I understand that iscsid is the process that is logging > in and out. But is the "kernel:" message just an iscsi modules that is > loaded into the kernel, which is why it is being logged as "kernel:"? > It is the session id and connection id. connection$SESSION_ID:$CONNECTION_ID If you run iscsiadm -m session -P 1 or -P 3 You will see #iscsiadm -m session -P 1 Target: iqn.1992-08.com.netapp:sn.33615311 Current Portal: 10.15.85.19:3260,3 Persistent Portal: 10.15.85.19:3260,3 Iface Transport: tcp Iface IPaddress: 10.11.14.37 Iface HWaddress: default Iface Netdev: default SID: 7 iSCSI Connection State: LOGGED IN Internal iscsid Session State: NO CHANGE Session number is the SID value. If you run iscsiadm -m session tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311 the session number/SID is the value in brackets. If you run iscsiadm in session mode (iscsiadm -m session) then you can use the -R argument and pass in a SID to do an opertaion like iscsiadm -m session -R 2 --rescan would rescan that session. Connection number is currently always zero. For the second question, iscsid handles login and logout, and error handling, and the kernel basically passes iscsi packets around. Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs expired, last rx 191838122, last ping 191839372, now 191841872 so here the iscsi kernel code sends a iscsi ping/nop every noop_interval seconds, and if we do not get a response withing noop_timeout seconds it will fire off a connection error. Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error (1011) Here is the kernel code notifying userspace of the problem. Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 error (1011) state (3) And there iscsid is accepting the error (probably no need for the error to be logged twice). Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target And then here iscsid handled the error by killing the tcp/ip connection, reconnection the tcp/ip connection, and then re-logging into the iscsi target. But for some reason we could not log back in right away. iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after recovery (1 attempts) But it looks like we tried again and we got back in. > Maybe one of these modules? > iscsi_tcp 19785 46 > libiscsi_tcp 21829 1 iscsi_tcp > libiscsi2 41285 3 ib_iser,iscsi_tcp,libiscsi_tcp > scsi_transport_iscsi237197 5 ib_iser,iscsi_tcp,libiscsi2 > scsi_transport_iscsi 6085 1 scsi_transport_iscsi2 > > I'm just trying to make sure that all of my timeout values line up properly > so that, for example, OCFS2 filesystem doesn't timeout prior to iSCSI. I'd > rather verify that iSCSI reconnections are attempted every 10 seconds and > then dm-multipath has 30 seconds to timeout and then OCFS2 has 60 seconds > before it times out (or something along those lines). So if I understand the > logging a bit more, it'll help me. > > Thanks > Joe > > > === > Joseph R. Hoot > Lead System Programmer/Analyst > (w) 716-878-4832 > (c) 716-759-HOOT > joe.h...@itec.suny.edu > GPG KEY: 7145F633 > === > > > --~--~-~--~~~---~--~~ > You received this message because you are subscribed to the Google Groups > "open-iscsi" group. > To post to this group, send email to open-iscsi@googlegroups.com > To unsubscribe from this group, send email to > open-iscsi+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/open-iscsi > -~--~~~~--~~--~--~--- > -- You received this message because you are subscribed to the Google Grou
iscsi diagnosis help
Hi all, I'm trying to understand what I'm seeing in my /var/log/messages. Here's what I have: Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs expired, last rx 191838122, last ping 191839372, now 191841872 Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error (1011) Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 error (1011) state (3) Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after recovery (1 attempts) the first line, what is "connection5:0"? is that referenced from iscsiadm somewhere? I only ask because I'm seeing iscsid messages and kernel messages. I also have dm-multipath running, which usually shows up as dm-multipath or something like that. I understand that iscsid is the process that is logging in and out. But is the "kernel:" message just an iscsi modules that is loaded into the kernel, which is why it is being logged as "kernel:"? Maybe one of these modules? iscsi_tcp 19785 46 libiscsi_tcp 21829 1 iscsi_tcp libiscsi2 41285 3 ib_iser,iscsi_tcp,libiscsi_tcp scsi_transport_iscsi237197 5 ib_iser,iscsi_tcp,libiscsi2 scsi_transport_iscsi 6085 1 scsi_transport_iscsi2 I'm just trying to make sure that all of my timeout values line up properly so that, for example, OCFS2 filesystem doesn't timeout prior to iSCSI. I'd rather verify that iSCSI reconnections are attempted every 10 seconds and then dm-multipath has 30 seconds to timeout and then OCFS2 has 60 seconds before it times out (or something along those lines). So if I understand the logging a bit more, it'll help me. Thanks Joe === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---