Re: open-iscsi init script on suse

2008-10-07 Thread Hannes Reinecke

Doron Shoham wrote:
 If you promise to test me the script with the STP fixes I'll be willing
 to add it. Sadly I don't have time currently to do any decent testing here,
 but I'm always open to patches :-)

 Cheers,

 Hannes
 
 Hi Hannes,
 
 Unfortunately I don't have any setup which I could test any script with STP 
 fixes.
 As far as I understood, due to Mike's patch, there is no need to re-discover 
 all nodes
 at startup.
 So I suggest to remove the re-discover and to logout from all nodes and not
 only from the automatic nodes.
 Please tell me if what is your opinion.
 
 Thanks,
 Doron
 
 
 revert some of the changes from commit 
 2146208ccd8c6579fa1accbe3dbe7181b46539b3.
 logout to all nodes when stopping open-iscsi.
 do not try to re-discover nodes on startup.
 
 Signed-off-by: Doron Shoham [EMAIL PROTECTED]
 ---
  etc/initd/initd.suse |   40 ++--
  1 files changed, 2 insertions(+), 38 deletions(-)
 
 diff --git a/etc/initd/initd.suse b/etc/initd/initd.suse
 index 23bbac0..4bf216c 100644
 --- a/etc/initd/initd.suse
 +++ b/etc/initd/initd.suse
 @@ -39,8 +39,8 @@ iscsi_login_all_nodes()
  iscsi_logout_all_nodes()
  {
   echo -n Closing all iSCSI connections: 
 - # Logout from all sessions marked automatic
 - if ! $ISCSIADM -m node --logoutall=automatic 2 /dev/null; then
 + # Logout from all sessions
 + if ! $ISCSIADM -m node --logoutall=all 2 /dev/null; then
   if [ $? == 19 ] ; then
   RETVAL=6
   else
No. We cannot do this as it kills root on iSCSI. We can only logout
from the nodes marked 'automatic' and 'manual', not those marked 'onboot'.

 @@ -101,38 +101,6 @@ iscsi_list_all_nodes()
  done
  }
  
 -iscsi_discover_all_targets()
 -{
 - # Strip off any existing ID information
 - RAW_NODE_LIST=`iscsiadm -m node | sed -nre 's/^(\[[0-9a-f]*\] 
 )?(.*)$/\2/p'`
 - # Obtain IPv4 list
 - IPV4_NODE_LIST=`echo $RAW_NODE_LIST | sed -nre 
 's/^([0-9]{1,3}(\.[0-9]{1,3}){3}):[^: ]* (.*)$/\1 \3/p'`
 - # Now obtain IPv6 list
 - IPV6_NODE_LIST=`echo $RAW_NODE_LIST | sed -nre 
 's/^([0-9a-f]{1,4}(:[0-9a-f]{0,4}){6}:[0-9a-f]{1,4}):[^: ]* (.*)$/\1 \3/p'`
 -
 - DISC_TARGETS=
 - while read NODE_ADDR NODE_NAME; do
 - [ -z $NODE_ADDR -a -z $NODE_NAME ]  continue
 - NODE_ATTRS=`iscsiadm -m node -p $NODE_ADDR -T $NODE_NAME`
 - NODE_STATUS=`echo $NODE_ATTRS | sed -nre 
 's/^.*node\.conn\[0\]\.startup = ([a-z]*).*$/\1/p'`
 -
 - if [ $NODE_STATUS == 'automatic' ]; then
 - DISC_TARGETS=`echo $DISC_TARGETS | sed -re 
 '/'$NODE_ADDR'/!{s/(.*)/\1 '$NODE_ADDR'/}'`
 - fi
 - done  (echo $IPV4_NODE_LIST; echo $IPV6_NODE_LIST)
 -
 - for TARGET_ADDR in $DISC_TARGETS; do
 - echo -n Attempting discovery on target at ${TARGET_ADDR}: 
 - iscsiadm -m discovery -t st -p $TARGET_ADDR  /dev/null 21
 - if [ $? -ne 0 ]; then
 - rc_failed 1
 - rc_status -v
 - return 1
 - fi
 - rc_status -v
 - done
 -}
 -
  case $1 in
  start)
   [ ! -d /var/lib/iscsi ]  mkdir -p /var/lib/iscsi
 @@ -147,10 +115,6 @@ case $1 in
   rc_status -v
   fi
   if [ $RETVAL == 0 ]; then
 - iscsi_discover_all_targets
 - RETVAL=$?
 - fi
 - if [ $RETVAL == 0 ]; then
   iscsi_login_all_nodes
   fi
   ;;

Yes, this looks okay. However, I would _really_ like to test it against the STP 
scenario.
Hmm. I see if I can pull it in for SLES11. Care to open a bugzilla?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-10-06 Thread Hannes Reinecke

Hi Doron,

Doron Shoham wrote:
 Actually this was bad. If we have to wait for the login_timeout to fire
 then initial_login_retry_max = 4 was a nice round number and the max
 time we had to wait was 1 minute. If I just increase it (tried 45
 stupidly first), it increases the possible max default wait to 11
 minutes :(

 So what I did was make initial_login_retry_max just be the max number of
 initial iscsi login timeouts we can withstand and then let other initial
 login failures retry for up to initial_login_retry_max * login_timeout.
 
 Have you change something in the code?
 I can't see any change in the git.
 Can you please explain your calculation again?
 
 I wanted to know if we are going to change back the init script.
 If the problem is to wait for the spanning tree, does increasing the 
 initial_login_retry_max should do the work?
 
 Currently the init script causes other bugs.

If you promise to test me the script with the STP fixes I'll be willing
to add it. Sadly I don't have time currently to do any decent testing here,
but I'm always open to patches :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-10-02 Thread Mike Christie

Doron Shoham wrote:
 Actually this was bad. If we have to wait for the login_timeout to fire
 then initial_login_retry_max = 4 was a nice round number and the max
 time we had to wait was 1 minute. If I just increase it (tried 45
 stupidly first), it increases the possible max default wait to 11
 minutes :(

 So what I did was make initial_login_retry_max just be the max number of
 initial iscsi login timeouts we can withstand and then let other initial
 login failures retry for up to initial_login_retry_max * login_timeout.
 
 Have you change something in the code?

It is

commit 31c9d428556088c886be3ea89333e9b116bc0a09
Author: Mike Christie [EMAIL PROTECTED]
Date:   Wed Sep 24 17:34:47 2008 -0500

 modify initial login retry max


 I can't see any change in the git.
 Can you please explain your calculation again?

It is just the same thing we do for scsi commands.

login retry max * login timeout = max time to retry the initial login.

So to put it in scsi command terms of retry and timeout, the login 
failure we see for the initial login of EHOSTNOTREACH is considered 
retryable like scsi-ml's DID_IMM_RETRY value, and does not count against 
the retry counter, but we will only retry up to the login retry max * 
login timeout seconds so it does not retry forever on the first login 
and stop up the boot process.


 
 I wanted to know if we are going to change back the init script.
 If the problem is to wait for the spanning tree, does increasing the 
 initial_login_retry_max should do the work?

Yes it should work around the problem - sort of :) We do not know if 
EHOSTNOTREACH is because of the spanning tree problem or because a cable 
is unplugged. For the first one we want to retry, for the second we 
probably do not (unless the admin is running to the box and trying to 
plug it back in :)). So now we set the initial login timeout and retry 
to a value to where most people hitting the spanning problem will be ok. 
At least according to the bug reports we are seeing on the list and at 
Red Hat if users set up those values to retry for at most 2 minutes that 
was long enough. The draw back is that at most we used to retry for 1 
minute, so everyone else using the defaults will have to wait an extra 
minute which can be a pain. However everyone can change the value to 
fail fast or wait longer like before so hopefully this new value is a 
good compromise.

 
 Currently the init script causes other bugs.

This is only meant to help Hannes in the spanning tree issue, so he does 
not have to use the discovery trick to work around it.

For discovery though it seems like you can just do discovery and tell 
iscsiadm not to overwrite the existing db (just add new ones) and that 
would solve some of the issues with iser records getting wacked.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Mike Christie

Eli Dorfman wrote:
 On Tue, Sep 23, 2008 at 8:04 PM, Mike Christie [EMAIL PROTECTED] wrote:
 Doron Shoham wrote:
 Hi,

 Why does the init script on suse re-discovers all iscsi targets which were 
 set
 to automatic login?
 To avoid deadlocks on the root fs there is patch which limits the number of 
 retries on first login.
 When doing so, it sets back all the default parameters (overriding any user 
 definitions).
 I think it should be like in redhat - just login to all the targets which 
 are automatic.

 Another issue is that the script logouts only from automatic nodes (not 
 from all nodes as in redhat).
 This causes a bug, when iscsi is stopped while manual node is still 
 logged-in (session is active).
 The result is that iscsid is down but session is still alive - iscsiadm -m 
 session shows this stale session.
 I suggest that we do the same as redhat, any objections?


 Also, what is the purpose of node.startup parameter?
 When is it in use?

 node.startup should be renamed record.startup. The possible values are
 automatic, manual and onboot. When the init scripts start they can run
 over the the db and check which records that the users has requested
 autoatmic startup for and login at that time.

 onboot is used to for the session used for boot/root. It just signals
 the tools to handle it differently. During shutdown for example we
 cannot kill that session when the init script stop is done, because it
 is still needed for root.

 manual is used because a lot of targets will return all the portals on
 the target. Some of these portals may be disabled or not even connected
 to the network. Instead of iscsiadm/iscsid wasting time trying to log in
 admins can mark them as manual and the init scripts will not auto start
 them. Why not just delete them of they cannot be used? I do not know.

 The question is why there are two node.startup fields and what is the
 difference between them (if any):
 node.startup
 AND
 node.conn[0].startup
 

node.conn[0].startup used to be from when there was basic MC/s support. 
Since we only support one connection per session there is no difference. 
Some distro scripts used to check for one or the other, but now the 
iscsiadm -m node -L/U command will check for either to support all users.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Mike Christie

Hannes Reinecke wrote:
 On Tue, Sep 23, 2008 at 12:13:19PM -0500, Mike Christie wrote:
 Hannes Reinecke wrote:
 Hi Doron,
 Doron Shoham wrote:
 Doron Shoham wrote:
 Hi,

 Why does the init script on suse re-discovers all iscsi targets which 
 were set
 to automatic login?
 To avoid deadlocks on the root fs there is patch which limits the number 
 of retries on first login.
 When doing so, it sets back all the default parameters (overriding any 
 user definitions).
 I think it should be like in redhat - just login to all the targets 
 which are automatic.

 That's what we tried initially. However, certain switches take quite a bit 
 of time for the Spanning-Tree
 Protocol to work out the route, during which time any connect() attempt 
 returns with -EHOSTUNREACH.
 If we do an automatic login, the login request is sent from the kernel 
 directly. And any connect()
 failure from the kernel is taken as a terminal error, hence the login 
 fails.
 Are we talking about the same thing that keeps coming up :)

 I know. Main reason here is that I didn't have time to investigate

It is ok. I like repeating what I said in this mail more than fixing 
aic7xxx bugs, so as long as you fix that driver you can do anything here :)


 this further, so I'll have to fall back to answer the same results
 I had the last time ...
 
 I swear someone from Voltaire asked this before. You gave the same reply. 
 And then I said you can increase node.session.initial_login_retry_max
 so we retry the login for all cases (almost all not CHAP or target not 
 there errors). If we get -EHOSTUNREACH we will retry up to 
 node.session.initial_login_retry_max times (there is a 1 second delay 
 between retries so it is a delay of node.session.initial_login_retry_max 
 seconds). I then said that for -EHOSTUNREACH I can add a check so that we 
 always test for this and always retry so the user does not have to set 
 node.session.initial_login_retry_max but I was not sure if there was a case 
 where we would not want to retry.

 Problem is that there are valid cases for which we should _not_ retry an
 -EHOSTUNREACH failure case. So I wouldn't retry for EHOSTUNREACH always.
 But increasing the initial_login_retry_max value would really help here.
 Hmm. Will have to check, but this seems like a viable route.
 
 Sorry for not being responsive, but I've been kept really busy recently.
 

No problem.

I have been having our users try initial_login_retry_max = 60 and they 
have reported success. For iscsistart which red hat and fedora uses for 
the root session in the initramfs I just set it to 120.

For the default let me up the default to something longer than 4. 
Because we do all the logins in parallel we do not have to worry about 
one login delaying another, so the max wait is just going to be 
initial_login_retry_max instead of possibly the worst old case 
number_of_portals_or_tragets_for_eql * initial_login_retry_max seconds.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Mike Christie

Mike Christie wrote:
 Hannes Reinecke wrote:
 On Tue, Sep 23, 2008 at 12:13:19PM -0500, Mike Christie wrote:
 Hannes Reinecke wrote:
 Hi Doron,
 Doron Shoham wrote:
 Doron Shoham wrote:
 Hi,

 Why does the init script on suse re-discovers all iscsi targets which 
 were set
 to automatic login?
 To avoid deadlocks on the root fs there is patch which limits the number 
 of retries on first login.
 When doing so, it sets back all the default parameters (overriding any 
 user definitions).
 I think it should be like in redhat - just login to all the targets 
 which are automatic.

 That's what we tried initially. However, certain switches take quite a bit 
 of time for the Spanning-Tree
 Protocol to work out the route, during which time any connect() attempt 
 returns with -EHOSTUNREACH.
 If we do an automatic login, the login request is sent from the kernel 
 directly. And any connect()
 failure from the kernel is taken as a terminal error, hence the login 
 fails.
 Are we talking about the same thing that keeps coming up :)

 I know. Main reason here is that I didn't have time to investigate
 
 It is ok. I like repeating what I said in this mail more than fixing 
 aic7xxx bugs, so as long as you fix that driver you can do anything here :)
 
 
 this further, so I'll have to fall back to answer the same results
 I had the last time ...

 I swear someone from Voltaire asked this before. You gave the same reply. 
 And then I said you can increase node.session.initial_login_retry_max
 so we retry the login for all cases (almost all not CHAP or target not 
 there errors). If we get -EHOSTUNREACH we will retry up to 
 node.session.initial_login_retry_max times (there is a 1 second delay 
 between retries so it is a delay of node.session.initial_login_retry_max 
 seconds). I then said that for -EHOSTUNREACH I can add a check so that we 
 always test for this and always retry so the user does not have to set 
 node.session.initial_login_retry_max but I was not sure if there was a case 
 where we would not want to retry.

 Problem is that there are valid cases for which we should _not_ retry an
 -EHOSTUNREACH failure case. So I wouldn't retry for EHOSTUNREACH always.
 But increasing the initial_login_retry_max value would really help here.
 Hmm. Will have to check, but this seems like a viable route.

 Sorry for not being responsive, but I've been kept really busy recently.

 
 No problem.
 
 I have been having our users try initial_login_retry_max = 60 and they 
 have reported success. For iscsistart which red hat and fedora uses for 
 the root session in the initramfs I just set it to 120.
 
 For the default let me up the default to something longer than 4. 


Actually this was bad. If we have to wait for the login_timeout to fire 
then initial_login_retry_max = 4 was a nice round number and the max 
time we had to wait was 1 minute. If I just increase it (tried 45 
stupidly first), it increases the possible max default wait to 11 minutes :(

So what I did was make initial_login_retry_max just be the max number of 
initial iscsi login timeouts we can withstand and then let other initial 
login failures retry for up to initial_login_retry_max * login_timeout.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-23 Thread Doron Shoham

  Also, what is the purpose of node.startup parameter?
 When is it in use?
 

Hi Mike,
Can you please explain this?

Thanks,
Doron

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-23 Thread Mike Christie

Doron Shoham wrote:
 Hi,
 
 Why does the init script on suse re-discovers all iscsi targets which were set
 to automatic login?
 To avoid deadlocks on the root fs there is patch which limits the number of 
 retries on first login.
 When doing so, it sets back all the default parameters (overriding any user 
 definitions).
 I think it should be like in redhat - just login to all the targets which are 
 automatic.
 
 Another issue is that the script logouts only from automatic nodes (not from 
 all nodes as in redhat).
 This causes a bug, when iscsi is stopped while manual node is still logged-in 
 (session is active).
 The result is that iscsid is down but session is still alive - iscsiadm -m 
 session shows this stale session.
 I suggest that we do the same as redhat, any objections?
  
 
 Also, what is the purpose of node.startup parameter?
 When is it in use?
 

node.startup should be renamed record.startup. The possible values are 
automatic, manual and onboot. When the init scripts start they can run 
over the the db and check which records that the users has requested 
autoatmic startup for and login at that time.

onboot is used to for the session used for boot/root. It just signals 
the tools to handle it differently. During shutdown for example we 
cannot kill that session when the init script stop is done, because it 
is still needed for root.

manual is used because a lot of targets will return all the portals on 
the target. Some of these portals may be disabled or not even connected 
to the network. Instead of iscsiadm/iscsid wasting time trying to log in 
admins can mark them as manual and the init scripts will not auto start 
them. Why not just delete them of they cannot be used? I do not know.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-23 Thread Mike Christie

Hannes Reinecke wrote:
 Hi Doron,
 
 Doron Shoham wrote:
 Doron Shoham wrote:
 Hi,

 Why does the init script on suse re-discovers all iscsi targets which 
 were set
 to automatic login?
 To avoid deadlocks on the root fs there is patch which limits the 
 number of retries on first login.
 When doing so, it sets back all the default parameters (overriding 
 any user definitions).
 I think it should be like in redhat - just login to all the targets 
 which are automatic.

 That's what we tried initially. However, certain switches take quite a 
 bit of time for the Spanning-Tree
 Protocol to work out the route, during which time any connect() attempt 
 returns with -EHOSTUNREACH.
 If we do an automatic login, the login request is sent from the kernel 
 directly. And any connect()
 failure from the kernel is taken as a terminal error, hence the login 
 fails.

Are we talking about the same thing that keeps coming up :)

I swear someone from Voltaire asked this before. You gave the same 
reply. And then I said you can increase node.session.initial_login_retry_max
so we retry the login for all cases (almost all not CHAP or target not 
there errors). If we get -EHOSTUNREACH we will retry up to 
node.session.initial_login_retry_max times (there is a 1 second delay 
between retries so it is a delay of node.session.initial_login_retry_max 
seconds). I then said that for -EHOSTUNREACH I can add a check so that 
we always test for this and always retry so the user does not have to 
set node.session.initial_login_retry_max but I was not sure if there was 
a case where we would not want to retry.

I can even increase the default node.session.initial_login_retry_max. It 
is only 4 right now. We do all the logins in parallel now, so the max 
delay would be node.session.initial_login_retry_max seconds basically. 
Previously when we did one portal at a time, we might have to wait 
node.session.initial_login_retry_max for each portal or in cases like 
EQL each device.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-23 Thread Mike Christie

Mike Christie wrote:
 Doron Shoham wrote:
 Hi,

 Why does the init script on suse re-discovers all iscsi targets which 
 were set
 to automatic login?
 To avoid deadlocks on the root fs there is patch which limits the 
 number of retries on first login.
 When doing so, it sets back all the default parameters (overriding any 
 user definitions).
 I think it should be like in redhat - just login to all the targets 
 which are automatic.

 Another issue is that the script logouts only from automatic nodes 
 (not from all nodes as in redhat).
 This causes a bug, when iscsi is stopped while manual node is still 
 logged-in (session is active).
 The result is that iscsid is down but session is still alive - 
 iscsiadm -m session shows this stale session.
 I suggest that we do the same as redhat, any objections?
  

 Also, what is the purpose of node.startup parameter?
 When is it in use?

 
 node.startup should be renamed record.startup. The possible values are 
 automatic, manual and onboot. When the init scripts start they can run 
 over the the db and check which records that the users has requested 
 autoatmic startup for and login at that time.

For the redhat ones, iscsiadm loops over the records when it does

iscsiadm -m node --loginall=automatic

 
 onboot is used to for the session used for boot/root. It just signals 
 the tools to handle it differently. During shutdown for example we 
 cannot kill that session when the init script stop is done, because it 
 is still needed for root.

iscsiadm -m node --logoutall=all does not logout the records marked onboot.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-22 Thread Hannes Reinecke

Hi Doron,

Doron Shoham wrote:
 Doron Shoham wrote:
 Hi,

 Why does the init script on suse re-discovers all iscsi targets which were 
 set
 to automatic login?
 To avoid deadlocks on the root fs there is patch which limits the number of 
 retries on first login.
 When doing so, it sets back all the default parameters (overriding any user 
 definitions).
 I think it should be like in redhat - just login to all the targets which 
 are automatic.

That's what we tried initially. However, certain switches take quite a bit of 
time for the Spanning-Tree
Protocol to work out the route, during which time any connect() attempt returns 
with -EHOSTUNREACH.
If we do an automatic login, the login request is sent from the kernel 
directly. And any connect()
failure from the kernel is taken as a terminal error, hence the login fails.
The best we can do here is to make this re-discovery conditional, which would 
allow customers not
suffering from STP failures to get a faster booting time.

 Another issue is that the script logouts only from automatic nodes (not from 
 all nodes as in redhat).
 This causes a bug, when iscsi is stopped while manual node is still 
 logged-in (session is active).
 The result is that iscsid is down but session is still alive - iscsiadm -m 
 session shows this stale session.
 I suggest that we do the same as redhat, any objections?
  
Ouch. You touched a very complicated topic. I've had long discussions and 
patches with NetApp on
how to get iscsi shutdown right. It's not only that we have stale nodes (which 
would be ok, given
that we're shutting down anyway), but it's also well possible that some crucial 
filesystem bits
are in fact served by iSCSI, so we definitely shouldn'd be shutting them down, 
regardless of any
automatic settings.

There's a bugzilla open to get this right (Novell bug#392080), you're welcome 
to join and get
this sorted out.


 Also, what is the purpose of node.startup parameter?
 When is it in use?

Don't know. Ask Mike, he implemented it.
Probably a leftover.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-22 Thread Eli Dorfman

On Mon, Sep 22, 2008 at 9:39 AM, Hannes Reinecke [EMAIL PROTECTED] wrote:

 Hi Doron,

 Doron Shoham wrote:
 Doron Shoham wrote:
 Hi,

 Why does the init script on suse re-discovers all iscsi targets which were 
 set
 to automatic login?
 To avoid deadlocks on the root fs there is patch which limits the number of 
 retries on first login.
 When doing so, it sets back all the default parameters (overriding any user 
 definitions).
 I think it should be like in redhat - just login to all the targets which 
 are automatic.

 That's what we tried initially. However, certain switches take quite a bit of 
 time for the Spanning-Tree
 Protocol to work out the route, during which time any connect() attempt 
 returns with -EHOSTUNREACH.
 If we do an automatic login, the login request is sent from the kernel 
 directly. And any connect()
 failure from the kernel is taken as a terminal error, hence the login fails.
 The best we can do here is to make this re-discovery conditional, which would 
 allow customers not
 suffering from STP failures to get a faster booting time.

Current implementation only partially solves the issue, but creates
another problem instead - node parameters are changed.
What if first login will ignore this error and and retry anyway - this
is not the cleanest solution but it will satisfy both requirements.


 Another issue is that the script logouts only from automatic nodes (not 
 from all nodes as in redhat).
 This causes a bug, when iscsi is stopped while manual node is still 
 logged-in (session is active).
 The result is that iscsid is down but session is still alive - iscsiadm -m 
 session shows this stale session.
 I suggest that we do the same as redhat, any objections?

 Ouch. You touched a very complicated topic. I've had long discussions and 
 patches with NetApp on
 how to get iscsi shutdown right. It's not only that we have stale nodes 
 (which would be ok, given
 that we're shutting down anyway), but it's also well possible that some 
 crucial filesystem bits
 are in fact served by iSCSI, so we definitely shouldn'd be shutting them 
 down, regardless of any
 automatic settings.

Having stale nodes is not ok, since we may use iscsi stop not only
when machine shutdowns
but also to change node parameters (e.g. node_transport set to iser).
The dependency of filesystem with iscsi should be resolved
independently by the user.
This applies both for automatic and manual sessions.
What we suggest is to logout all nodes (and not only automatic).


 There's a bugzilla open to get this right (Novell bug#392080), you're welcome 
 to join and get
 this sorted out.
I could not find this bug, please send a link.


Thanks,
Eli

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---