Re: open-iscsi init script on suse
Eli Dorfman wrote: >> For discovery though it seems like you can just do discovery and tell >> iscsiadm not to overwrite the existing db (just add new ones) and that >> would solve some of the issues with iser records getting wacked. >> Sorry for the late response. I was on vacation. > How do we tell iscsiadm not to overwrite the existing db? iscsiadm -m discovery -t st -p ip -o new will just add new records for portals that are not in the db. iscsiadm -m discovery -t st -p ip -o delete would just remove ones that are no longer returned. You can then pass them both or also pass in -o update. See the README. > Also maybe i missed something but how does re-discover (in the initd.suse) > help > when login returns EHOSTNOTREACH. It added an extra timeout. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
> Yes, this looks okay. However, I would _really_ like to test it against > the STP scenario. > Hmm. I see if I can pull it in for SLES11. Care to open a bugzilla? > On which bugzilla do you want me to open the bug? It will be great if you will managed to fix it. Thanks, Doron --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
Doron Shoham wrote: >> If you promise to test me the script with the STP fixes I'll be willing >> to add it. Sadly I don't have time currently to do any decent testing here, >> but I'm always open to patches :-) >> >> Cheers, >> >> Hannes > > Hi Hannes, > > Unfortunately I don't have any setup which I could test any script with STP > fixes. > As far as I understood, due to Mike's patch, there is no need to re-discover > all nodes > at startup. > So I suggest to remove the re-discover and to logout from all nodes and not > only from the automatic nodes. > Please tell me if what is your opinion. > > Thanks, > Doron > > > revert some of the changes from commit > 2146208ccd8c6579fa1accbe3dbe7181b46539b3. > logout to all nodes when stopping open-iscsi. > do not try to re-discover nodes on startup. > > Signed-off-by: Doron Shoham <[EMAIL PROTECTED]> > --- > etc/initd/initd.suse | 40 ++-- > 1 files changed, 2 insertions(+), 38 deletions(-) > > diff --git a/etc/initd/initd.suse b/etc/initd/initd.suse > index 23bbac0..4bf216c 100644 > --- a/etc/initd/initd.suse > +++ b/etc/initd/initd.suse > @@ -39,8 +39,8 @@ iscsi_login_all_nodes() > iscsi_logout_all_nodes() > { > echo -n "Closing all iSCSI connections: " > - # Logout from all sessions marked automatic > - if ! $ISCSIADM -m node --logoutall=automatic 2> /dev/null; then > + # Logout from all sessions > + if ! $ISCSIADM -m node --logoutall=all 2> /dev/null; then > if [ $? == 19 ] ; then > RETVAL=6 > else No. We cannot do this as it kills root on iSCSI. We can only logout from the nodes marked 'automatic' and 'manual', not those marked 'onboot'. > @@ -101,38 +101,6 @@ iscsi_list_all_nodes() > done > } > > -iscsi_discover_all_targets() > -{ > - # Strip off any existing ID information > - RAW_NODE_LIST=`iscsiadm -m node | sed -nre 's/^(\[[0-9a-f]*\] > )?(.*)$/\2/p'` > - # Obtain IPv4 list > - IPV4_NODE_LIST=`echo "$RAW_NODE_LIST" | sed -nre > 's/^([0-9]{1,3}(\.[0-9]{1,3}){3}):[^: ]* (.*)$/\1 \3/p'` > - # Now obtain IPv6 list > - IPV6_NODE_LIST=`echo "$RAW_NODE_LIST" | sed -nre > 's/^([0-9a-f]{1,4}(:[0-9a-f]{0,4}){6}:[0-9a-f]{1,4}):[^: ]* (.*)$/\1 \3/p'` > - > - DISC_TARGETS="" > - while read NODE_ADDR NODE_NAME; do > - [ -z "$NODE_ADDR" -a -z "$NODE_NAME" ] && continue > - NODE_ATTRS=`iscsiadm -m node -p "$NODE_ADDR" -T "$NODE_NAME"` > - NODE_STATUS=`echo "$NODE_ATTRS" | sed -nre > 's/^.*node\.conn\[0\]\.startup = ([a-z]*).*$/\1/p'` > - > - if [ "$NODE_STATUS" == 'automatic' ]; then > - DISC_TARGETS=`echo "$DISC_TARGETS" | sed -re > '/'"$NODE_ADDR"'/!{s/(.*)/\1 '"$NODE_ADDR"'/}'` > - fi > - done < <(echo "$IPV4_NODE_LIST"; echo "$IPV6_NODE_LIST") > - > - for TARGET_ADDR in $DISC_TARGETS; do > - echo -n "Attempting discovery on target at ${TARGET_ADDR}: " > - iscsiadm -m discovery -t st -p "$TARGET_ADDR" > /dev/null 2>&1 > - if [ "$?" -ne 0 ]; then > - rc_failed 1 > - rc_status -v > - return 1 > - fi > - rc_status -v > - done > -} > - > case "$1" in > start) > [ ! -d /var/lib/iscsi ] && mkdir -p /var/lib/iscsi > @@ -147,10 +115,6 @@ case "$1" in > rc_status -v > fi > if [ "$RETVAL" == "0" ]; then > - iscsi_discover_all_targets > - RETVAL=$? > - fi > - if [ "$RETVAL" == "0" ]; then > iscsi_login_all_nodes > fi > ;; Yes, this looks okay. However, I would _really_ like to test it against the STP scenario. Hmm. I see if I can pull it in for SLES11. Care to open a bugzilla? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage [EMAIL PROTECTED] +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
> If you promise to test me the script with the STP fixes I'll be willing > to add it. Sadly I don't have time currently to do any decent testing here, > but I'm always open to patches :-) > > Cheers, > > Hannes Hi Hannes, Unfortunately I don't have any setup which I could test any script with STP fixes. As far as I understood, due to Mike's patch, there is no need to re-discover all nodes at startup. So I suggest to remove the re-discover and to logout from all nodes and not only from the automatic nodes. Please tell me if what is your opinion. Thanks, Doron revert some of the changes from commit 2146208ccd8c6579fa1accbe3dbe7181b46539b3. logout to all nodes when stopping open-iscsi. do not try to re-discover nodes on startup. Signed-off-by: Doron Shoham <[EMAIL PROTECTED]> --- etc/initd/initd.suse | 40 ++-- 1 files changed, 2 insertions(+), 38 deletions(-) diff --git a/etc/initd/initd.suse b/etc/initd/initd.suse index 23bbac0..4bf216c 100644 --- a/etc/initd/initd.suse +++ b/etc/initd/initd.suse @@ -39,8 +39,8 @@ iscsi_login_all_nodes() iscsi_logout_all_nodes() { echo -n "Closing all iSCSI connections: " - # Logout from all sessions marked automatic - if ! $ISCSIADM -m node --logoutall=automatic 2> /dev/null; then + # Logout from all sessions + if ! $ISCSIADM -m node --logoutall=all 2> /dev/null; then if [ $? == 19 ] ; then RETVAL=6 else @@ -101,38 +101,6 @@ iscsi_list_all_nodes() done } -iscsi_discover_all_targets() -{ - # Strip off any existing ID information - RAW_NODE_LIST=`iscsiadm -m node | sed -nre 's/^(\[[0-9a-f]*\] )?(.*)$/\2/p'` - # Obtain IPv4 list - IPV4_NODE_LIST=`echo "$RAW_NODE_LIST" | sed -nre 's/^([0-9]{1,3}(\.[0-9]{1,3}){3}):[^: ]* (.*)$/\1 \3/p'` - # Now obtain IPv6 list - IPV6_NODE_LIST=`echo "$RAW_NODE_LIST" | sed -nre 's/^([0-9a-f]{1,4}(:[0-9a-f]{0,4}){6}:[0-9a-f]{1,4}):[^: ]* (.*)$/\1 \3/p'` - - DISC_TARGETS="" - while read NODE_ADDR NODE_NAME; do - [ -z "$NODE_ADDR" -a -z "$NODE_NAME" ] && continue - NODE_ATTRS=`iscsiadm -m node -p "$NODE_ADDR" -T "$NODE_NAME"` - NODE_STATUS=`echo "$NODE_ATTRS" | sed -nre 's/^.*node\.conn\[0\]\.startup = ([a-z]*).*$/\1/p'` - - if [ "$NODE_STATUS" == 'automatic' ]; then - DISC_TARGETS=`echo "$DISC_TARGETS" | sed -re '/'"$NODE_ADDR"'/!{s/(.*)/\1 '"$NODE_ADDR"'/}'` - fi - done < <(echo "$IPV4_NODE_LIST"; echo "$IPV6_NODE_LIST") - - for TARGET_ADDR in $DISC_TARGETS; do - echo -n "Attempting discovery on target at ${TARGET_ADDR}: " - iscsiadm -m discovery -t st -p "$TARGET_ADDR" > /dev/null 2>&1 - if [ "$?" -ne 0 ]; then - rc_failed 1 - rc_status -v - return 1 - fi - rc_status -v - done -} - case "$1" in start) [ ! -d /var/lib/iscsi ] && mkdir -p /var/lib/iscsi @@ -147,10 +115,6 @@ case "$1" in rc_status -v fi if [ "$RETVAL" == "0" ]; then - iscsi_discover_all_targets - RETVAL=$? - fi - if [ "$RETVAL" == "0" ]; then iscsi_login_all_nodes fi ;; -- 1.5.3.8 --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
> For discovery though it seems like you can just do discovery and tell > iscsiadm not to overwrite the existing db (just add new ones) and that > would solve some of the issues with iser records getting wacked. > How do we tell iscsiadm not to overwrite the existing db? Also maybe i missed something but how does re-discover (in the initd.suse) help when login returns EHOSTNOTREACH. Eli. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
Hi Doron, Doron Shoham wrote: >> Actually this was bad. If we have to wait for the login_timeout to fire >> then initial_login_retry_max = 4 was a nice round number and the max >> time we had to wait was 1 minute. If I just increase it (tried 45 >> stupidly first), it increases the possible max default wait to 11 >> minutes :( >> >> So what I did was make initial_login_retry_max just be the max number of >> initial iscsi login timeouts we can withstand and then let other initial >> login failures retry for up to initial_login_retry_max * login_timeout. > > Have you change something in the code? > I can't see any change in the git. > Can you please explain your calculation again? > > I wanted to know if we are going to change back the init script. > If the problem is to wait for the spanning tree, does increasing the > initial_login_retry_max should do the work? > > Currently the init script causes other bugs. If you promise to test me the script with the STP fixes I'll be willing to add it. Sadly I don't have time currently to do any decent testing here, but I'm always open to patches :-) Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage [EMAIL PROTECTED] +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
Doron Shoham wrote: >> Actually this was bad. If we have to wait for the login_timeout to fire >> then initial_login_retry_max = 4 was a nice round number and the max >> time we had to wait was 1 minute. If I just increase it (tried 45 >> stupidly first), it increases the possible max default wait to 11 >> minutes :( >> >> So what I did was make initial_login_retry_max just be the max number of >> initial iscsi login timeouts we can withstand and then let other initial >> login failures retry for up to initial_login_retry_max * login_timeout. > > Have you change something in the code? It is commit 31c9d428556088c886be3ea89333e9b116bc0a09 Author: Mike Christie <[EMAIL PROTECTED]> Date: Wed Sep 24 17:34:47 2008 -0500 modify initial login retry max > I can't see any change in the git. > Can you please explain your calculation again? It is just the same thing we do for scsi commands. login retry max * login timeout = max time to retry the initial login. So to put it in scsi command terms of retry and timeout, the login failure we see for the initial login of EHOSTNOTREACH is considered retryable like scsi-ml's DID_IMM_RETRY value, and does not count against the retry counter, but we will only retry up to the login retry max * login timeout seconds so it does not retry forever on the first login and stop up the boot process. > > I wanted to know if we are going to change back the init script. > If the problem is to wait for the spanning tree, does increasing the > initial_login_retry_max should do the work? Yes it should work around the problem - sort of :) We do not know if EHOSTNOTREACH is because of the spanning tree problem or because a cable is unplugged. For the first one we want to retry, for the second we probably do not (unless the admin is running to the box and trying to plug it back in :)). So now we set the initial login timeout and retry to a value to where most people hitting the spanning problem will be ok. At least according to the bug reports we are seeing on the list and at Red Hat if users set up those values to retry for at most 2 minutes that was long enough. The draw back is that at most we used to retry for 1 minute, so everyone else using the defaults will have to wait an extra minute which can be a pain. However everyone can change the value to fail fast or wait longer like before so hopefully this new value is a good compromise. > > Currently the init script causes other bugs. This is only meant to help Hannes in the spanning tree issue, so he does not have to use the discovery trick to work around it. For discovery though it seems like you can just do discovery and tell iscsiadm not to overwrite the existing db (just add new ones) and that would solve some of the issues with iser records getting wacked. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
> Actually this was bad. If we have to wait for the login_timeout to fire > then initial_login_retry_max = 4 was a nice round number and the max > time we had to wait was 1 minute. If I just increase it (tried 45 > stupidly first), it increases the possible max default wait to 11 > minutes :( > > So what I did was make initial_login_retry_max just be the max number of > initial iscsi login timeouts we can withstand and then let other initial > login failures retry for up to initial_login_retry_max * login_timeout. Have you change something in the code? I can't see any change in the git. Can you please explain your calculation again? I wanted to know if we are going to change back the init script. If the problem is to wait for the spanning tree, does increasing the initial_login_retry_max should do the work? Currently the init script causes other bugs. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
Mike Christie wrote: > Hannes Reinecke wrote: >> On Tue, Sep 23, 2008 at 12:13:19PM -0500, Mike Christie wrote: >>> Hannes Reinecke wrote: Hi Doron, Doron Shoham wrote: > Doron Shoham wrote: >> Hi, >> >> Why does the init script on suse re-discovers all iscsi targets which >> were set >> to automatic login? >> To avoid deadlocks on the root fs there is patch which limits the number >> of retries on first login. >> When doing so, it sets back all the default parameters (overriding any >> user definitions). >> I think it should be like in redhat - just login to all the targets >> which are automatic. >> That's what we tried initially. However, certain switches take quite a bit of time for the Spanning-Tree Protocol to work out the route, during which time any connect() attempt returns with -EHOSTUNREACH. If we do an automatic login, the login request is sent from the kernel directly. And any connect() failure from the kernel is taken as a terminal error, hence the login fails. >>> Are we talking about the same thing that keeps coming up :) >>> >> I know. Main reason here is that I didn't have time to investigate > > It is ok. I like repeating what I said in this mail more than fixing > aic7xxx bugs, so as long as you fix that driver you can do anything here :) > > >> this further, so I'll have to fall back to answer the same results >> I had the last time ... >> >>> I swear someone from Voltaire asked this before. You gave the same reply. >>> And then I said you can increase node.session.initial_login_retry_max >>> so we retry the login for all cases (almost all not CHAP or target not >>> there errors). If we get -EHOSTUNREACH we will retry up to >>> node.session.initial_login_retry_max times (there is a 1 second delay >>> between retries so it is a delay of node.session.initial_login_retry_max >>> seconds). I then said that for -EHOSTUNREACH I can add a check so that we >>> always test for this and always retry so the user does not have to set >>> node.session.initial_login_retry_max but I was not sure if there was a case >>> where we would not want to retry. >>> >> Problem is that there are valid cases for which we should _not_ retry an >> -EHOSTUNREACH failure case. So I wouldn't retry for EHOSTUNREACH always. >> But increasing the initial_login_retry_max value would really help here. >> Hmm. Will have to check, but this seems like a viable route. >> >> Sorry for not being responsive, but I've been kept really busy recently. >> > > No problem. > > I have been having our users try initial_login_retry_max = 60 and they > have reported success. For iscsistart which red hat and fedora uses for > the root session in the initramfs I just set it to 120. > > For the default let me up the default to something longer than 4. Actually this was bad. If we have to wait for the login_timeout to fire then initial_login_retry_max = 4 was a nice round number and the max time we had to wait was 1 minute. If I just increase it (tried 45 stupidly first), it increases the possible max default wait to 11 minutes :( So what I did was make initial_login_retry_max just be the max number of initial iscsi login timeouts we can withstand and then let other initial login failures retry for up to initial_login_retry_max * login_timeout. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
Hannes Reinecke wrote: > On Tue, Sep 23, 2008 at 12:13:19PM -0500, Mike Christie wrote: >> Hannes Reinecke wrote: >>> Hi Doron, >>> Doron Shoham wrote: Doron Shoham wrote: > Hi, > > Why does the init script on suse re-discovers all iscsi targets which > were set > to automatic login? > To avoid deadlocks on the root fs there is patch which limits the number > of retries on first login. > When doing so, it sets back all the default parameters (overriding any > user definitions). > I think it should be like in redhat - just login to all the targets > which are automatic. > >>> That's what we tried initially. However, certain switches take quite a bit >>> of time for the Spanning-Tree >>> Protocol to work out the route, during which time any connect() attempt >>> returns with -EHOSTUNREACH. >>> If we do an automatic login, the login request is sent from the kernel >>> directly. And any connect() >>> failure from the kernel is taken as a terminal error, hence the login >>> fails. >> Are we talking about the same thing that keeps coming up :) >> > I know. Main reason here is that I didn't have time to investigate It is ok. I like repeating what I said in this mail more than fixing aic7xxx bugs, so as long as you fix that driver you can do anything here :) > this further, so I'll have to fall back to answer the same results > I had the last time ... > >> I swear someone from Voltaire asked this before. You gave the same reply. >> And then I said you can increase node.session.initial_login_retry_max >> so we retry the login for all cases (almost all not CHAP or target not >> there errors). If we get -EHOSTUNREACH we will retry up to >> node.session.initial_login_retry_max times (there is a 1 second delay >> between retries so it is a delay of node.session.initial_login_retry_max >> seconds). I then said that for -EHOSTUNREACH I can add a check so that we >> always test for this and always retry so the user does not have to set >> node.session.initial_login_retry_max but I was not sure if there was a case >> where we would not want to retry. >> > Problem is that there are valid cases for which we should _not_ retry an > -EHOSTUNREACH failure case. So I wouldn't retry for EHOSTUNREACH always. > But increasing the initial_login_retry_max value would really help here. > Hmm. Will have to check, but this seems like a viable route. > > Sorry for not being responsive, but I've been kept really busy recently. > No problem. I have been having our users try initial_login_retry_max = 60 and they have reported success. For iscsistart which red hat and fedora uses for the root session in the initramfs I just set it to 120. For the default let me up the default to something longer than 4. Because we do all the logins in parallel we do not have to worry about one login delaying another, so the max wait is just going to be initial_login_retry_max instead of possibly the worst old case number_of_portals_or_tragets_for_eql * initial_login_retry_max seconds. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
Eli Dorfman wrote: > On Tue, Sep 23, 2008 at 8:04 PM, Mike Christie <[EMAIL PROTECTED]> wrote: >> Doron Shoham wrote: >>> Hi, >>> >>> Why does the init script on suse re-discovers all iscsi targets which were >>> set >>> to automatic login? >>> To avoid deadlocks on the root fs there is patch which limits the number of >>> retries on first login. >>> When doing so, it sets back all the default parameters (overriding any user >>> definitions). >>> I think it should be like in redhat - just login to all the targets which >>> are automatic. >>> >>> Another issue is that the script logouts only from automatic nodes (not >>> from all nodes as in redhat). >>> This causes a bug, when iscsi is stopped while manual node is still >>> logged-in (session is active). >>> The result is that iscsid is down but session is still alive - iscsiadm -m >>> session shows this stale session. >>> I suggest that we do the same as redhat, any objections? >>> >>> >>> Also, what is the purpose of "node.startup" parameter? >>> When is it in use? >>> >> node.startup should be renamed record.startup. The possible values are >> automatic, manual and onboot. When the init scripts start they can run >> over the the db and check which records that the users has requested >> autoatmic startup for and login at that time. >> >> onboot is used to for the session used for boot/root. It just signals >> the tools to handle it differently. During shutdown for example we >> cannot kill that session when the init script stop is done, because it >> is still needed for root. >> >> manual is used because a lot of targets will return all the portals on >> the target. Some of these portals may be disabled or not even connected >> to the network. Instead of iscsiadm/iscsid wasting time trying to log in >> admins can mark them as manual and the init scripts will not auto start >> them. Why not just delete them of they cannot be used? I do not know. >> > The question is why there are two node.startup fields and what is the > difference between them (if any): > node.startup > AND > node.conn[0].startup > node.conn[0].startup used to be from when there was basic MC/s support. Since we only support one connection per session there is no difference. Some distro scripts used to check for one or the other, but now the iscsiadm -m node -L/U command will check for either to support all users. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
On Tue, Sep 23, 2008 at 8:04 PM, Mike Christie <[EMAIL PROTECTED]> wrote: > > Doron Shoham wrote: >> Hi, >> >> Why does the init script on suse re-discovers all iscsi targets which were >> set >> to automatic login? >> To avoid deadlocks on the root fs there is patch which limits the number of >> retries on first login. >> When doing so, it sets back all the default parameters (overriding any user >> definitions). >> I think it should be like in redhat - just login to all the targets which >> are automatic. >> >> Another issue is that the script logouts only from automatic nodes (not from >> all nodes as in redhat). >> This causes a bug, when iscsi is stopped while manual node is still >> logged-in (session is active). >> The result is that iscsid is down but session is still alive - iscsiadm -m >> session shows this stale session. >> I suggest that we do the same as redhat, any objections? >> >> >> Also, what is the purpose of "node.startup" parameter? >> When is it in use? >> > > node.startup should be renamed record.startup. The possible values are > automatic, manual and onboot. When the init scripts start they can run > over the the db and check which records that the users has requested > autoatmic startup for and login at that time. > > onboot is used to for the session used for boot/root. It just signals > the tools to handle it differently. During shutdown for example we > cannot kill that session when the init script stop is done, because it > is still needed for root. > > manual is used because a lot of targets will return all the portals on > the target. Some of these portals may be disabled or not even connected > to the network. Instead of iscsiadm/iscsid wasting time trying to log in > admins can mark them as manual and the init scripts will not auto start > them. Why not just delete them of they cannot be used? I do not know. > The question is why there are two node.startup fields and what is the difference between them (if any): node.startup AND node.conn[0].startup Thanks, Eli --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
On Tue, Sep 23, 2008 at 12:13:19PM -0500, Mike Christie wrote: > Hannes Reinecke wrote: >> Hi Doron, >> Doron Shoham wrote: >>> Doron Shoham wrote: Hi, Why does the init script on suse re-discovers all iscsi targets which were set to automatic login? To avoid deadlocks on the root fs there is patch which limits the number of retries on first login. When doing so, it sets back all the default parameters (overriding any user definitions). I think it should be like in redhat - just login to all the targets which are automatic. >> That's what we tried initially. However, certain switches take quite a bit >> of time for the Spanning-Tree >> Protocol to work out the route, during which time any connect() attempt >> returns with -EHOSTUNREACH. >> If we do an automatic login, the login request is sent from the kernel >> directly. And any connect() >> failure from the kernel is taken as a terminal error, hence the login >> fails. > > Are we talking about the same thing that keeps coming up :) > I know. Main reason here is that I didn't have time to investigate this further, so I'll have to fall back to answer the same results I had the last time ... > I swear someone from Voltaire asked this before. You gave the same reply. > And then I said you can increase node.session.initial_login_retry_max > so we retry the login for all cases (almost all not CHAP or target not > there errors). If we get -EHOSTUNREACH we will retry up to > node.session.initial_login_retry_max times (there is a 1 second delay > between retries so it is a delay of node.session.initial_login_retry_max > seconds). I then said that for -EHOSTUNREACH I can add a check so that we > always test for this and always retry so the user does not have to set > node.session.initial_login_retry_max but I was not sure if there was a case > where we would not want to retry. > Problem is that there are valid cases for which we should _not_ retry an -EHOSTUNREACH failure case. So I wouldn't retry for EHOSTUNREACH always. But increasing the initial_login_retry_max value would really help here. Hmm. Will have to check, but this seems like a viable route. Sorry for not being responsive, but I've been kept really busy recently. > I can even increase the default node.session.initial_login_retry_max. It is > only 4 right now. We do all the logins in parallel now, so the max delay > would be node.session.initial_login_retry_max seconds basically. Previously > when we did one portal at a time, we might have to wait > node.session.initial_login_retry_max for each portal or in cases like EQL > each device. Ah. Good to know. I really hope to get this cleared up in the near future. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage [EMAIL PROTECTED] +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
Mike Christie wrote: > Doron Shoham wrote: >> Hi, >> >> Why does the init script on suse re-discovers all iscsi targets which >> were set >> to automatic login? >> To avoid deadlocks on the root fs there is patch which limits the >> number of retries on first login. >> When doing so, it sets back all the default parameters (overriding any >> user definitions). >> I think it should be like in redhat - just login to all the targets >> which are automatic. >> >> Another issue is that the script logouts only from automatic nodes >> (not from all nodes as in redhat). >> This causes a bug, when iscsi is stopped while manual node is still >> logged-in (session is active). >> The result is that iscsid is down but session is still alive - >> iscsiadm -m session shows this stale session. >> I suggest that we do the same as redhat, any objections? >> >> >> Also, what is the purpose of "node.startup" parameter? >> When is it in use? >> > > node.startup should be renamed record.startup. The possible values are > automatic, manual and onboot. When the init scripts start they can run > over the the db and check which records that the users has requested > autoatmic startup for and login at that time. For the redhat ones, iscsiadm loops over the records when it does iscsiadm -m node --loginall=automatic > > onboot is used to for the session used for boot/root. It just signals > the tools to handle it differently. During shutdown for example we > cannot kill that session when the init script stop is done, because it > is still needed for root. iscsiadm -m node --logoutall=all does not logout the records marked onboot. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
Hannes Reinecke wrote: > Hi Doron, > > Doron Shoham wrote: >> Doron Shoham wrote: >>> Hi, >>> >>> Why does the init script on suse re-discovers all iscsi targets which >>> were set >>> to automatic login? >>> To avoid deadlocks on the root fs there is patch which limits the >>> number of retries on first login. >>> When doing so, it sets back all the default parameters (overriding >>> any user definitions). >>> I think it should be like in redhat - just login to all the targets >>> which are automatic. >>> > That's what we tried initially. However, certain switches take quite a > bit of time for the Spanning-Tree > Protocol to work out the route, during which time any connect() attempt > returns with -EHOSTUNREACH. > If we do an automatic login, the login request is sent from the kernel > directly. And any connect() > failure from the kernel is taken as a terminal error, hence the login > fails. Are we talking about the same thing that keeps coming up :) I swear someone from Voltaire asked this before. You gave the same reply. And then I said you can increase node.session.initial_login_retry_max so we retry the login for all cases (almost all not CHAP or target not there errors). If we get -EHOSTUNREACH we will retry up to node.session.initial_login_retry_max times (there is a 1 second delay between retries so it is a delay of node.session.initial_login_retry_max seconds). I then said that for -EHOSTUNREACH I can add a check so that we always test for this and always retry so the user does not have to set node.session.initial_login_retry_max but I was not sure if there was a case where we would not want to retry. I can even increase the default node.session.initial_login_retry_max. It is only 4 right now. We do all the logins in parallel now, so the max delay would be node.session.initial_login_retry_max seconds basically. Previously when we did one portal at a time, we might have to wait node.session.initial_login_retry_max for each portal or in cases like EQL each device. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
Doron Shoham wrote: > Hi, > > Why does the init script on suse re-discovers all iscsi targets which were set > to automatic login? > To avoid deadlocks on the root fs there is patch which limits the number of > retries on first login. > When doing so, it sets back all the default parameters (overriding any user > definitions). > I think it should be like in redhat - just login to all the targets which are > automatic. > > Another issue is that the script logouts only from automatic nodes (not from > all nodes as in redhat). > This causes a bug, when iscsi is stopped while manual node is still logged-in > (session is active). > The result is that iscsid is down but session is still alive - iscsiadm -m > session shows this stale session. > I suggest that we do the same as redhat, any objections? > > > Also, what is the purpose of "node.startup" parameter? > When is it in use? > node.startup should be renamed record.startup. The possible values are automatic, manual and onboot. When the init scripts start they can run over the the db and check which records that the users has requested autoatmic startup for and login at that time. onboot is used to for the session used for boot/root. It just signals the tools to handle it differently. During shutdown for example we cannot kill that session when the init script stop is done, because it is still needed for root. manual is used because a lot of targets will return all the portals on the target. Some of these portals may be disabled or not even connected to the network. Instead of iscsiadm/iscsid wasting time trying to log in admins can mark them as manual and the init scripts will not auto start them. Why not just delete them of they cannot be used? I do not know. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
> Also, what is the purpose of "node.startup" parameter? > When is it in use? > Hi Mike, Can you please explain this? Thanks, Doron --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
On Mon, Sep 22, 2008 at 9:39 AM, Hannes Reinecke <[EMAIL PROTECTED]> wrote: > > Hi Doron, > > Doron Shoham wrote: >> Doron Shoham wrote: >>> Hi, >>> >>> Why does the init script on suse re-discovers all iscsi targets which were >>> set >>> to automatic login? >>> To avoid deadlocks on the root fs there is patch which limits the number of >>> retries on first login. >>> When doing so, it sets back all the default parameters (overriding any user >>> definitions). >>> I think it should be like in redhat - just login to all the targets which >>> are automatic. >>> > That's what we tried initially. However, certain switches take quite a bit of > time for the Spanning-Tree > Protocol to work out the route, during which time any connect() attempt > returns with -EHOSTUNREACH. > If we do an automatic login, the login request is sent from the kernel > directly. And any connect() > failure from the kernel is taken as a terminal error, hence the login fails. > The best we can do here is to make this re-discovery conditional, which would > allow customers not > suffering from STP failures to get a faster booting time. Current implementation only partially solves the issue, but creates another problem instead - node parameters are changed. What if first login will ignore this error and and retry anyway - this is not the cleanest solution but it will satisfy both requirements. > >>> Another issue is that the script logouts only from automatic nodes (not >>> from all nodes as in redhat). >>> This causes a bug, when iscsi is stopped while manual node is still >>> logged-in (session is active). >>> The result is that iscsid is down but session is still alive - iscsiadm -m >>> session shows this stale session. >>> I suggest that we do the same as redhat, any objections? >>> > Ouch. You touched a very complicated topic. I've had long discussions and > patches with NetApp on > how to get iscsi shutdown right. It's not only that we have stale nodes > (which would be ok, given > that we're shutting down anyway), but it's also well possible that some > crucial filesystem bits > are in fact served by iSCSI, so we definitely shouldn'd be shutting them > down, regardless of any > automatic settings. Having stale nodes is not ok, since we may use "iscsi stop" not only when machine shutdowns but also to change node parameters (e.g. node_transport set to iser). The dependency of filesystem with iscsi should be resolved independently by the user. This applies both for automatic and manual sessions. What we suggest is to logout all nodes (and not only automatic). > > There's a bugzilla open to get this right (Novell bug#392080), you're welcome > to join and get > this sorted out. I could not find this bug, please send a link. Thanks, Eli --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
Hi Doron, Doron Shoham wrote: > Doron Shoham wrote: >> Hi, >> >> Why does the init script on suse re-discovers all iscsi targets which were >> set >> to automatic login? >> To avoid deadlocks on the root fs there is patch which limits the number of >> retries on first login. >> When doing so, it sets back all the default parameters (overriding any user >> definitions). >> I think it should be like in redhat - just login to all the targets which >> are automatic. >> That's what we tried initially. However, certain switches take quite a bit of time for the Spanning-Tree Protocol to work out the route, during which time any connect() attempt returns with -EHOSTUNREACH. If we do an automatic login, the login request is sent from the kernel directly. And any connect() failure from the kernel is taken as a terminal error, hence the login fails. The best we can do here is to make this re-discovery conditional, which would allow customers not suffering from STP failures to get a faster booting time. >> Another issue is that the script logouts only from automatic nodes (not from >> all nodes as in redhat). >> This causes a bug, when iscsi is stopped while manual node is still >> logged-in (session is active). >> The result is that iscsid is down but session is still alive - iscsiadm -m >> session shows this stale session. >> I suggest that we do the same as redhat, any objections? >> Ouch. You touched a very complicated topic. I've had long discussions and patches with NetApp on how to get iscsi shutdown right. It's not only that we have stale nodes (which would be ok, given that we're shutting down anyway), but it's also well possible that some crucial filesystem bits are in fact served by iSCSI, so we definitely shouldn'd be shutting them down, regardless of any automatic settings. There's a bugzilla open to get this right (Novell bug#392080), you're welcome to join and get this sorted out. >> Also, what is the purpose of "node.startup" parameter? >> When is it in use? >> Don't know. Ask Mike, he implemented it. Probably a leftover. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage [EMAIL PROTECTED] +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi init script on suse
Doron Shoham wrote: > Hi, > > Why does the init script on suse re-discovers all iscsi targets which were set > to automatic login? > To avoid deadlocks on the root fs there is patch which limits the number of > retries on first login. > When doing so, it sets back all the default parameters (overriding any user > definitions). > I think it should be like in redhat - just login to all the targets which are > automatic. > > Another issue is that the script logouts only from automatic nodes (not from > all nodes as in redhat). > This causes a bug, when iscsi is stopped while manual node is still logged-in > (session is active). > The result is that iscsid is down but session is still alive - iscsiadm -m > session shows this stale session. > I suggest that we do the same as redhat, any objections? > > > Also, what is the purpose of "node.startup" parameter? > When is it in use? > > > Thanks, > Doron > Hi, Does my suggestion sounds ok? Doron --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
open-iscsi init script on suse
Hi, Why does the init script on suse re-discovers all iscsi targets which were set to automatic login? To avoid deadlocks on the root fs there is patch which limits the number of retries on first login. When doing so, it sets back all the default parameters (overriding any user definitions). I think it should be like in redhat - just login to all the targets which are automatic. Another issue is that the script logouts only from automatic nodes (not from all nodes as in redhat). This causes a bug, when iscsi is stopped while manual node is still logged-in (session is active). The result is that iscsid is down but session is still alive - iscsiadm -m session shows this stale session. I suggest that we do the same as redhat, any objections? Also, what is the purpose of "node.startup" parameter? When is it in use? Thanks, Doron --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---