Re: open-iscsi init script on suse

2008-10-13 Thread Mike Christie

Eli Dorfman wrote:
>> For discovery though it seems like you can just do discovery and tell
>> iscsiadm not to overwrite the existing db (just add new ones) and that
>> would solve some of the issues with iser records getting wacked.
>>

Sorry for the late response. I was on vacation.

> How do we tell iscsiadm not to overwrite the existing db?

iscsiadm -m discovery -t st -p ip -o new

will just add new records for portals that are not in the db.

iscsiadm -m discovery -t st -p ip -o delete would just remove ones that 
are no longer returned.

You can then pass them both or also pass in -o update. See the README.

> Also maybe i missed something but how does re-discover (in the initd.suse) 
> help
> when login returns EHOSTNOTREACH.

It added an extra timeout.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-10-07 Thread Doron Shoham

> Yes, this looks okay. However, I would _really_ like to test it against
> the STP scenario.
> Hmm. I see if I can pull it in for SLES11. Care to open a bugzilla?
> 

On which bugzilla do you want me to open the bug?
It will be great if you will managed to fix it.

Thanks,
Doron


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-10-07 Thread Hannes Reinecke

Doron Shoham wrote:
>> If you promise to test me the script with the STP fixes I'll be willing
>> to add it. Sadly I don't have time currently to do any decent testing here,
>> but I'm always open to patches :-)
>>
>> Cheers,
>>
>> Hannes
> 
> Hi Hannes,
> 
> Unfortunately I don't have any setup which I could test any script with STP 
> fixes.
> As far as I understood, due to Mike's patch, there is no need to re-discover 
> all nodes
> at startup.
> So I suggest to remove the re-discover and to logout from all nodes and not
> only from the automatic nodes.
> Please tell me if what is your opinion.
> 
> Thanks,
> Doron
> 
> 
> revert some of the changes from commit 
> 2146208ccd8c6579fa1accbe3dbe7181b46539b3.
> logout to all nodes when stopping open-iscsi.
> do not try to re-discover nodes on startup.
> 
> Signed-off-by: Doron Shoham <[EMAIL PROTECTED]>
> ---
>  etc/initd/initd.suse |   40 ++--
>  1 files changed, 2 insertions(+), 38 deletions(-)
> 
> diff --git a/etc/initd/initd.suse b/etc/initd/initd.suse
> index 23bbac0..4bf216c 100644
> --- a/etc/initd/initd.suse
> +++ b/etc/initd/initd.suse
> @@ -39,8 +39,8 @@ iscsi_login_all_nodes()
>  iscsi_logout_all_nodes()
>  {
>   echo -n "Closing all iSCSI connections: "
> - # Logout from all sessions marked automatic
> - if ! $ISCSIADM -m node --logoutall=automatic 2> /dev/null; then
> + # Logout from all sessions
> + if ! $ISCSIADM -m node --logoutall=all 2> /dev/null; then
>   if [ $? == 19 ] ; then
>   RETVAL=6
>   else
No. We cannot do this as it kills root on iSCSI. We can only logout
from the nodes marked 'automatic' and 'manual', not those marked 'onboot'.

> @@ -101,38 +101,6 @@ iscsi_list_all_nodes()
>  done
>  }
>  
> -iscsi_discover_all_targets()
> -{
> - # Strip off any existing ID information
> - RAW_NODE_LIST=`iscsiadm -m node | sed -nre 's/^(\[[0-9a-f]*\] 
> )?(.*)$/\2/p'`
> - # Obtain IPv4 list
> - IPV4_NODE_LIST=`echo "$RAW_NODE_LIST" | sed -nre 
> 's/^([0-9]{1,3}(\.[0-9]{1,3}){3}):[^: ]* (.*)$/\1 \3/p'`
> - # Now obtain IPv6 list
> - IPV6_NODE_LIST=`echo "$RAW_NODE_LIST" | sed -nre 
> 's/^([0-9a-f]{1,4}(:[0-9a-f]{0,4}){6}:[0-9a-f]{1,4}):[^: ]* (.*)$/\1 \3/p'`
> -
> - DISC_TARGETS=""
> - while read NODE_ADDR NODE_NAME; do
> - [ -z "$NODE_ADDR" -a -z "$NODE_NAME" ] && continue
> - NODE_ATTRS=`iscsiadm -m node -p "$NODE_ADDR" -T "$NODE_NAME"`
> - NODE_STATUS=`echo "$NODE_ATTRS" | sed -nre 
> 's/^.*node\.conn\[0\]\.startup = ([a-z]*).*$/\1/p'`
> -
> - if [ "$NODE_STATUS" == 'automatic' ]; then
> - DISC_TARGETS=`echo "$DISC_TARGETS" | sed -re 
> '/'"$NODE_ADDR"'/!{s/(.*)/\1 '"$NODE_ADDR"'/}'`
> - fi
> - done < <(echo "$IPV4_NODE_LIST"; echo "$IPV6_NODE_LIST")
> -
> - for TARGET_ADDR in $DISC_TARGETS; do
> - echo -n "Attempting discovery on target at ${TARGET_ADDR}: "
> - iscsiadm -m discovery -t st -p "$TARGET_ADDR" > /dev/null 2>&1
> - if [ "$?" -ne 0 ]; then
> - rc_failed 1
> - rc_status -v
> - return 1
> - fi
> - rc_status -v
> - done
> -}
> -
>  case "$1" in
>  start)
>   [ ! -d /var/lib/iscsi ] && mkdir -p /var/lib/iscsi
> @@ -147,10 +115,6 @@ case "$1" in
>   rc_status -v
>   fi
>   if [ "$RETVAL" == "0" ]; then
> - iscsi_discover_all_targets
> - RETVAL=$?
> - fi
> - if [ "$RETVAL" == "0" ]; then
>   iscsi_login_all_nodes
>   fi
>   ;;

Yes, this looks okay. However, I would _really_ like to test it against the STP 
scenario.
Hmm. I see if I can pull it in for SLES11. Care to open a bugzilla?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-10-07 Thread Doron Shoham

> If you promise to test me the script with the STP fixes I'll be willing
> to add it. Sadly I don't have time currently to do any decent testing here,
> but I'm always open to patches :-)
> 
> Cheers,
> 
> Hannes

Hi Hannes,

Unfortunately I don't have any setup which I could test any script with STP 
fixes.
As far as I understood, due to Mike's patch, there is no need to re-discover 
all nodes
at startup.
So I suggest to remove the re-discover and to logout from all nodes and not
only from the automatic nodes.
Please tell me if what is your opinion.

Thanks,
Doron


revert some of the changes from commit 2146208ccd8c6579fa1accbe3dbe7181b46539b3.
logout to all nodes when stopping open-iscsi.
do not try to re-discover nodes on startup.

Signed-off-by: Doron Shoham <[EMAIL PROTECTED]>
---
 etc/initd/initd.suse |   40 ++--
 1 files changed, 2 insertions(+), 38 deletions(-)

diff --git a/etc/initd/initd.suse b/etc/initd/initd.suse
index 23bbac0..4bf216c 100644
--- a/etc/initd/initd.suse
+++ b/etc/initd/initd.suse
@@ -39,8 +39,8 @@ iscsi_login_all_nodes()
 iscsi_logout_all_nodes()
 {
echo -n "Closing all iSCSI connections: "
-   # Logout from all sessions marked automatic
-   if ! $ISCSIADM -m node --logoutall=automatic 2> /dev/null; then
+   # Logout from all sessions
+   if ! $ISCSIADM -m node --logoutall=all 2> /dev/null; then
if [ $? == 19 ] ; then
RETVAL=6
else
@@ -101,38 +101,6 @@ iscsi_list_all_nodes()
 done
 }
 
-iscsi_discover_all_targets()
-{
-   # Strip off any existing ID information
-   RAW_NODE_LIST=`iscsiadm -m node | sed -nre 's/^(\[[0-9a-f]*\] 
)?(.*)$/\2/p'`
-   # Obtain IPv4 list
-   IPV4_NODE_LIST=`echo "$RAW_NODE_LIST" | sed -nre 
's/^([0-9]{1,3}(\.[0-9]{1,3}){3}):[^: ]* (.*)$/\1 \3/p'`
-   # Now obtain IPv6 list
-   IPV6_NODE_LIST=`echo "$RAW_NODE_LIST" | sed -nre 
's/^([0-9a-f]{1,4}(:[0-9a-f]{0,4}){6}:[0-9a-f]{1,4}):[^: ]* (.*)$/\1 \3/p'`
-
-   DISC_TARGETS=""
-   while read NODE_ADDR NODE_NAME; do
-   [ -z "$NODE_ADDR" -a -z "$NODE_NAME" ] && continue
-   NODE_ATTRS=`iscsiadm -m node -p "$NODE_ADDR" -T "$NODE_NAME"`
-   NODE_STATUS=`echo "$NODE_ATTRS" | sed -nre 
's/^.*node\.conn\[0\]\.startup = ([a-z]*).*$/\1/p'`
-
-   if [ "$NODE_STATUS" == 'automatic' ]; then
-   DISC_TARGETS=`echo "$DISC_TARGETS" | sed -re 
'/'"$NODE_ADDR"'/!{s/(.*)/\1 '"$NODE_ADDR"'/}'`
-   fi
-   done < <(echo "$IPV4_NODE_LIST"; echo "$IPV6_NODE_LIST")
-
-   for TARGET_ADDR in $DISC_TARGETS; do
-   echo -n "Attempting discovery on target at ${TARGET_ADDR}: "
-   iscsiadm -m discovery -t st -p "$TARGET_ADDR" > /dev/null 2>&1
-   if [ "$?" -ne 0 ]; then
-   rc_failed 1
-   rc_status -v
-   return 1
-   fi
-   rc_status -v
-   done
-}
-
 case "$1" in
 start)
[ ! -d /var/lib/iscsi ] && mkdir -p /var/lib/iscsi
@@ -147,10 +115,6 @@ case "$1" in
rc_status -v
fi
if [ "$RETVAL" == "0" ]; then
-   iscsi_discover_all_targets
-   RETVAL=$?
-   fi
-   if [ "$RETVAL" == "0" ]; then
iscsi_login_all_nodes
fi
;;
-- 
1.5.3.8


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-10-06 Thread Eli Dorfman

> For discovery though it seems like you can just do discovery and tell
> iscsiadm not to overwrite the existing db (just add new ones) and that
> would solve some of the issues with iser records getting wacked.
>
How do we tell iscsiadm not to overwrite the existing db?
Also maybe i missed something but how does re-discover (in the initd.suse) help
when login returns EHOSTNOTREACH.

Eli.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-10-06 Thread Hannes Reinecke

Hi Doron,

Doron Shoham wrote:
>> Actually this was bad. If we have to wait for the login_timeout to fire
>> then initial_login_retry_max = 4 was a nice round number and the max
>> time we had to wait was 1 minute. If I just increase it (tried 45
>> stupidly first), it increases the possible max default wait to 11
>> minutes :(
>>
>> So what I did was make initial_login_retry_max just be the max number of
>> initial iscsi login timeouts we can withstand and then let other initial
>> login failures retry for up to initial_login_retry_max * login_timeout.
> 
> Have you change something in the code?
> I can't see any change in the git.
> Can you please explain your calculation again?
> 
> I wanted to know if we are going to change back the init script.
> If the problem is to wait for the spanning tree, does increasing the 
> initial_login_retry_max should do the work?
> 
> Currently the init script causes other bugs.

If you promise to test me the script with the STP fixes I'll be willing
to add it. Sadly I don't have time currently to do any decent testing here,
but I'm always open to patches :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-10-02 Thread Mike Christie

Doron Shoham wrote:
>> Actually this was bad. If we have to wait for the login_timeout to fire
>> then initial_login_retry_max = 4 was a nice round number and the max
>> time we had to wait was 1 minute. If I just increase it (tried 45
>> stupidly first), it increases the possible max default wait to 11
>> minutes :(
>>
>> So what I did was make initial_login_retry_max just be the max number of
>> initial iscsi login timeouts we can withstand and then let other initial
>> login failures retry for up to initial_login_retry_max * login_timeout.
> 
> Have you change something in the code?

It is

commit 31c9d428556088c886be3ea89333e9b116bc0a09
Author: Mike Christie <[EMAIL PROTECTED]>
Date:   Wed Sep 24 17:34:47 2008 -0500

 modify initial login retry max


> I can't see any change in the git.
> Can you please explain your calculation again?

It is just the same thing we do for scsi commands.

login retry max * login timeout = max time to retry the initial login.

So to put it in scsi command terms of retry and timeout, the login 
failure we see for the initial login of EHOSTNOTREACH is considered 
retryable like scsi-ml's DID_IMM_RETRY value, and does not count against 
the retry counter, but we will only retry up to the login retry max * 
login timeout seconds so it does not retry forever on the first login 
and stop up the boot process.


> 
> I wanted to know if we are going to change back the init script.
> If the problem is to wait for the spanning tree, does increasing the 
> initial_login_retry_max should do the work?

Yes it should work around the problem - sort of :) We do not know if 
EHOSTNOTREACH is because of the spanning tree problem or because a cable 
is unplugged. For the first one we want to retry, for the second we 
probably do not (unless the admin is running to the box and trying to 
plug it back in :)). So now we set the initial login timeout and retry 
to a value to where most people hitting the spanning problem will be ok. 
At least according to the bug reports we are seeing on the list and at 
Red Hat if users set up those values to retry for at most 2 minutes that 
was long enough. The draw back is that at most we used to retry for 1 
minute, so everyone else using the defaults will have to wait an extra 
minute which can be a pain. However everyone can change the value to 
fail fast or wait longer like before so hopefully this new value is a 
good compromise.

> 
> Currently the init script causes other bugs.

This is only meant to help Hannes in the spanning tree issue, so he does 
not have to use the discovery trick to work around it.

For discovery though it seems like you can just do discovery and tell 
iscsiadm not to overwrite the existing db (just add new ones) and that 
would solve some of the issues with iser records getting wacked.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-10-02 Thread Doron Shoham

> Actually this was bad. If we have to wait for the login_timeout to fire
> then initial_login_retry_max = 4 was a nice round number and the max
> time we had to wait was 1 minute. If I just increase it (tried 45
> stupidly first), it increases the possible max default wait to 11
> minutes :(
> 
> So what I did was make initial_login_retry_max just be the max number of
> initial iscsi login timeouts we can withstand and then let other initial
> login failures retry for up to initial_login_retry_max * login_timeout.

Have you change something in the code?
I can't see any change in the git.
Can you please explain your calculation again?

I wanted to know if we are going to change back the init script.
If the problem is to wait for the spanning tree, does increasing the 
initial_login_retry_max should do the work?

Currently the init script causes other bugs.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Mike Christie

Mike Christie wrote:
> Hannes Reinecke wrote:
>> On Tue, Sep 23, 2008 at 12:13:19PM -0500, Mike Christie wrote:
>>> Hannes Reinecke wrote:
 Hi Doron,
 Doron Shoham wrote:
> Doron Shoham wrote:
>> Hi,
>>
>> Why does the init script on suse re-discovers all iscsi targets which 
>> were set
>> to automatic login?
>> To avoid deadlocks on the root fs there is patch which limits the number 
>> of retries on first login.
>> When doing so, it sets back all the default parameters (overriding any 
>> user definitions).
>> I think it should be like in redhat - just login to all the targets 
>> which are automatic.
>>
 That's what we tried initially. However, certain switches take quite a bit 
 of time for the Spanning-Tree
 Protocol to work out the route, during which time any connect() attempt 
 returns with -EHOSTUNREACH.
 If we do an automatic login, the login request is sent from the kernel 
 directly. And any connect()
 failure from the kernel is taken as a terminal error, hence the login 
 fails.
>>> Are we talking about the same thing that keeps coming up :)
>>>
>> I know. Main reason here is that I didn't have time to investigate
> 
> It is ok. I like repeating what I said in this mail more than fixing 
> aic7xxx bugs, so as long as you fix that driver you can do anything here :)
> 
> 
>> this further, so I'll have to fall back to answer the same results
>> I had the last time ...
>>
>>> I swear someone from Voltaire asked this before. You gave the same reply. 
>>> And then I said you can increase node.session.initial_login_retry_max
>>> so we retry the login for all cases (almost all not CHAP or target not 
>>> there errors). If we get -EHOSTUNREACH we will retry up to 
>>> node.session.initial_login_retry_max times (there is a 1 second delay 
>>> between retries so it is a delay of node.session.initial_login_retry_max 
>>> seconds). I then said that for -EHOSTUNREACH I can add a check so that we 
>>> always test for this and always retry so the user does not have to set 
>>> node.session.initial_login_retry_max but I was not sure if there was a case 
>>> where we would not want to retry.
>>>
>> Problem is that there are valid cases for which we should _not_ retry an
>> -EHOSTUNREACH failure case. So I wouldn't retry for EHOSTUNREACH always.
>> But increasing the initial_login_retry_max value would really help here.
>> Hmm. Will have to check, but this seems like a viable route.
>>
>> Sorry for not being responsive, but I've been kept really busy recently.
>>
> 
> No problem.
> 
> I have been having our users try initial_login_retry_max = 60 and they 
> have reported success. For iscsistart which red hat and fedora uses for 
> the root session in the initramfs I just set it to 120.
> 
> For the default let me up the default to something longer than 4. 


Actually this was bad. If we have to wait for the login_timeout to fire 
then initial_login_retry_max = 4 was a nice round number and the max 
time we had to wait was 1 minute. If I just increase it (tried 45 
stupidly first), it increases the possible max default wait to 11 minutes :(

So what I did was make initial_login_retry_max just be the max number of 
initial iscsi login timeouts we can withstand and then let other initial 
login failures retry for up to initial_login_retry_max * login_timeout.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Mike Christie

Hannes Reinecke wrote:
> On Tue, Sep 23, 2008 at 12:13:19PM -0500, Mike Christie wrote:
>> Hannes Reinecke wrote:
>>> Hi Doron,
>>> Doron Shoham wrote:
 Doron Shoham wrote:
> Hi,
>
> Why does the init script on suse re-discovers all iscsi targets which 
> were set
> to automatic login?
> To avoid deadlocks on the root fs there is patch which limits the number 
> of retries on first login.
> When doing so, it sets back all the default parameters (overriding any 
> user definitions).
> I think it should be like in redhat - just login to all the targets 
> which are automatic.
>
>>> That's what we tried initially. However, certain switches take quite a bit 
>>> of time for the Spanning-Tree
>>> Protocol to work out the route, during which time any connect() attempt 
>>> returns with -EHOSTUNREACH.
>>> If we do an automatic login, the login request is sent from the kernel 
>>> directly. And any connect()
>>> failure from the kernel is taken as a terminal error, hence the login 
>>> fails.
>> Are we talking about the same thing that keeps coming up :)
>>
> I know. Main reason here is that I didn't have time to investigate

It is ok. I like repeating what I said in this mail more than fixing 
aic7xxx bugs, so as long as you fix that driver you can do anything here :)


> this further, so I'll have to fall back to answer the same results
> I had the last time ...
> 
>> I swear someone from Voltaire asked this before. You gave the same reply. 
>> And then I said you can increase node.session.initial_login_retry_max
>> so we retry the login for all cases (almost all not CHAP or target not 
>> there errors). If we get -EHOSTUNREACH we will retry up to 
>> node.session.initial_login_retry_max times (there is a 1 second delay 
>> between retries so it is a delay of node.session.initial_login_retry_max 
>> seconds). I then said that for -EHOSTUNREACH I can add a check so that we 
>> always test for this and always retry so the user does not have to set 
>> node.session.initial_login_retry_max but I was not sure if there was a case 
>> where we would not want to retry.
>>
> Problem is that there are valid cases for which we should _not_ retry an
> -EHOSTUNREACH failure case. So I wouldn't retry for EHOSTUNREACH always.
> But increasing the initial_login_retry_max value would really help here.
> Hmm. Will have to check, but this seems like a viable route.
> 
> Sorry for not being responsive, but I've been kept really busy recently.
> 

No problem.

I have been having our users try initial_login_retry_max = 60 and they 
have reported success. For iscsistart which red hat and fedora uses for 
the root session in the initramfs I just set it to 120.

For the default let me up the default to something longer than 4. 
Because we do all the logins in parallel we do not have to worry about 
one login delaying another, so the max wait is just going to be 
initial_login_retry_max instead of possibly the worst old case 
number_of_portals_or_tragets_for_eql * initial_login_retry_max seconds.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Mike Christie

Eli Dorfman wrote:
> On Tue, Sep 23, 2008 at 8:04 PM, Mike Christie <[EMAIL PROTECTED]> wrote:
>> Doron Shoham wrote:
>>> Hi,
>>>
>>> Why does the init script on suse re-discovers all iscsi targets which were 
>>> set
>>> to automatic login?
>>> To avoid deadlocks on the root fs there is patch which limits the number of 
>>> retries on first login.
>>> When doing so, it sets back all the default parameters (overriding any user 
>>> definitions).
>>> I think it should be like in redhat - just login to all the targets which 
>>> are automatic.
>>>
>>> Another issue is that the script logouts only from automatic nodes (not 
>>> from all nodes as in redhat).
>>> This causes a bug, when iscsi is stopped while manual node is still 
>>> logged-in (session is active).
>>> The result is that iscsid is down but session is still alive - iscsiadm -m 
>>> session shows this stale session.
>>> I suggest that we do the same as redhat, any objections?
>>>
>>>
>>> Also, what is the purpose of "node.startup" parameter?
>>> When is it in use?
>>>
>> node.startup should be renamed record.startup. The possible values are
>> automatic, manual and onboot. When the init scripts start they can run
>> over the the db and check which records that the users has requested
>> autoatmic startup for and login at that time.
>>
>> onboot is used to for the session used for boot/root. It just signals
>> the tools to handle it differently. During shutdown for example we
>> cannot kill that session when the init script stop is done, because it
>> is still needed for root.
>>
>> manual is used because a lot of targets will return all the portals on
>> the target. Some of these portals may be disabled or not even connected
>> to the network. Instead of iscsiadm/iscsid wasting time trying to log in
>> admins can mark them as manual and the init scripts will not auto start
>> them. Why not just delete them of they cannot be used? I do not know.
>>
> The question is why there are two node.startup fields and what is the
> difference between them (if any):
> node.startup
> AND
> node.conn[0].startup
> 

node.conn[0].startup used to be from when there was basic MC/s support. 
Since we only support one connection per session there is no difference. 
Some distro scripts used to check for one or the other, but now the 
iscsiadm -m node -L/U command will check for either to support all users.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Eli Dorfman

On Tue, Sep 23, 2008 at 8:04 PM, Mike Christie <[EMAIL PROTECTED]> wrote:
>
> Doron Shoham wrote:
>> Hi,
>>
>> Why does the init script on suse re-discovers all iscsi targets which were 
>> set
>> to automatic login?
>> To avoid deadlocks on the root fs there is patch which limits the number of 
>> retries on first login.
>> When doing so, it sets back all the default parameters (overriding any user 
>> definitions).
>> I think it should be like in redhat - just login to all the targets which 
>> are automatic.
>>
>> Another issue is that the script logouts only from automatic nodes (not from 
>> all nodes as in redhat).
>> This causes a bug, when iscsi is stopped while manual node is still 
>> logged-in (session is active).
>> The result is that iscsid is down but session is still alive - iscsiadm -m 
>> session shows this stale session.
>> I suggest that we do the same as redhat, any objections?
>>
>>
>> Also, what is the purpose of "node.startup" parameter?
>> When is it in use?
>>
>
> node.startup should be renamed record.startup. The possible values are
> automatic, manual and onboot. When the init scripts start they can run
> over the the db and check which records that the users has requested
> autoatmic startup for and login at that time.
>
> onboot is used to for the session used for boot/root. It just signals
> the tools to handle it differently. During shutdown for example we
> cannot kill that session when the init script stop is done, because it
> is still needed for root.
>
> manual is used because a lot of targets will return all the portals on
> the target. Some of these portals may be disabled or not even connected
> to the network. Instead of iscsiadm/iscsid wasting time trying to log in
> admins can mark them as manual and the init scripts will not auto start
> them. Why not just delete them of they cannot be used? I do not know.
>
The question is why there are two node.startup fields and what is the
difference between them (if any):
node.startup
AND
node.conn[0].startup

Thanks,
Eli

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Hannes Reinecke

On Tue, Sep 23, 2008 at 12:13:19PM -0500, Mike Christie wrote:
> Hannes Reinecke wrote:
>> Hi Doron,
>> Doron Shoham wrote:
>>> Doron Shoham wrote:
 Hi,

 Why does the init script on suse re-discovers all iscsi targets which 
 were set
 to automatic login?
 To avoid deadlocks on the root fs there is patch which limits the number 
 of retries on first login.
 When doing so, it sets back all the default parameters (overriding any 
 user definitions).
 I think it should be like in redhat - just login to all the targets 
 which are automatic.

>> That's what we tried initially. However, certain switches take quite a bit 
>> of time for the Spanning-Tree
>> Protocol to work out the route, during which time any connect() attempt 
>> returns with -EHOSTUNREACH.
>> If we do an automatic login, the login request is sent from the kernel 
>> directly. And any connect()
>> failure from the kernel is taken as a terminal error, hence the login 
>> fails.
>
> Are we talking about the same thing that keeps coming up :)
>
I know. Main reason here is that I didn't have time to investigate
this further, so I'll have to fall back to answer the same results
I had the last time ...

> I swear someone from Voltaire asked this before. You gave the same reply. 
> And then I said you can increase node.session.initial_login_retry_max
> so we retry the login for all cases (almost all not CHAP or target not 
> there errors). If we get -EHOSTUNREACH we will retry up to 
> node.session.initial_login_retry_max times (there is a 1 second delay 
> between retries so it is a delay of node.session.initial_login_retry_max 
> seconds). I then said that for -EHOSTUNREACH I can add a check so that we 
> always test for this and always retry so the user does not have to set 
> node.session.initial_login_retry_max but I was not sure if there was a case 
> where we would not want to retry.
>
Problem is that there are valid cases for which we should _not_ retry an
-EHOSTUNREACH failure case. So I wouldn't retry for EHOSTUNREACH always.
But increasing the initial_login_retry_max value would really help here.
Hmm. Will have to check, but this seems like a viable route.

Sorry for not being responsive, but I've been kept really busy recently.

> I can even increase the default node.session.initial_login_retry_max. It is 
> only 4 right now. We do all the logins in parallel now, so the max delay 
> would be node.session.initial_login_retry_max seconds basically. Previously 
> when we did one portal at a time, we might have to wait 
> node.session.initial_login_retry_max for each portal or in cases like EQL 
> each device.
Ah. Good to know.

I really hope to get this cleared up in the near future.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-23 Thread Mike Christie

Mike Christie wrote:
> Doron Shoham wrote:
>> Hi,
>>
>> Why does the init script on suse re-discovers all iscsi targets which 
>> were set
>> to automatic login?
>> To avoid deadlocks on the root fs there is patch which limits the 
>> number of retries on first login.
>> When doing so, it sets back all the default parameters (overriding any 
>> user definitions).
>> I think it should be like in redhat - just login to all the targets 
>> which are automatic.
>>
>> Another issue is that the script logouts only from automatic nodes 
>> (not from all nodes as in redhat).
>> This causes a bug, when iscsi is stopped while manual node is still 
>> logged-in (session is active).
>> The result is that iscsid is down but session is still alive - 
>> iscsiadm -m session shows this stale session.
>> I suggest that we do the same as redhat, any objections?
>>  
>>
>> Also, what is the purpose of "node.startup" parameter?
>> When is it in use?
>>
> 
> node.startup should be renamed record.startup. The possible values are 
> automatic, manual and onboot. When the init scripts start they can run 
> over the the db and check which records that the users has requested 
> autoatmic startup for and login at that time.

For the redhat ones, iscsiadm loops over the records when it does

iscsiadm -m node --loginall=automatic

> 
> onboot is used to for the session used for boot/root. It just signals 
> the tools to handle it differently. During shutdown for example we 
> cannot kill that session when the init script stop is done, because it 
> is still needed for root.

iscsiadm -m node --logoutall=all does not logout the records marked onboot.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-23 Thread Mike Christie

Hannes Reinecke wrote:
> Hi Doron,
> 
> Doron Shoham wrote:
>> Doron Shoham wrote:
>>> Hi,
>>>
>>> Why does the init script on suse re-discovers all iscsi targets which 
>>> were set
>>> to automatic login?
>>> To avoid deadlocks on the root fs there is patch which limits the 
>>> number of retries on first login.
>>> When doing so, it sets back all the default parameters (overriding 
>>> any user definitions).
>>> I think it should be like in redhat - just login to all the targets 
>>> which are automatic.
>>>
> That's what we tried initially. However, certain switches take quite a 
> bit of time for the Spanning-Tree
> Protocol to work out the route, during which time any connect() attempt 
> returns with -EHOSTUNREACH.
> If we do an automatic login, the login request is sent from the kernel 
> directly. And any connect()
> failure from the kernel is taken as a terminal error, hence the login 
> fails.

Are we talking about the same thing that keeps coming up :)

I swear someone from Voltaire asked this before. You gave the same 
reply. And then I said you can increase node.session.initial_login_retry_max
so we retry the login for all cases (almost all not CHAP or target not 
there errors). If we get -EHOSTUNREACH we will retry up to 
node.session.initial_login_retry_max times (there is a 1 second delay 
between retries so it is a delay of node.session.initial_login_retry_max 
seconds). I then said that for -EHOSTUNREACH I can add a check so that 
we always test for this and always retry so the user does not have to 
set node.session.initial_login_retry_max but I was not sure if there was 
a case where we would not want to retry.

I can even increase the default node.session.initial_login_retry_max. It 
is only 4 right now. We do all the logins in parallel now, so the max 
delay would be node.session.initial_login_retry_max seconds basically. 
Previously when we did one portal at a time, we might have to wait 
node.session.initial_login_retry_max for each portal or in cases like 
EQL each device.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-23 Thread Mike Christie

Doron Shoham wrote:
> Hi,
> 
> Why does the init script on suse re-discovers all iscsi targets which were set
> to automatic login?
> To avoid deadlocks on the root fs there is patch which limits the number of 
> retries on first login.
> When doing so, it sets back all the default parameters (overriding any user 
> definitions).
> I think it should be like in redhat - just login to all the targets which are 
> automatic.
> 
> Another issue is that the script logouts only from automatic nodes (not from 
> all nodes as in redhat).
> This causes a bug, when iscsi is stopped while manual node is still logged-in 
> (session is active).
> The result is that iscsid is down but session is still alive - iscsiadm -m 
> session shows this stale session.
> I suggest that we do the same as redhat, any objections?
>  
> 
> Also, what is the purpose of "node.startup" parameter?
> When is it in use?
> 

node.startup should be renamed record.startup. The possible values are 
automatic, manual and onboot. When the init scripts start they can run 
over the the db and check which records that the users has requested 
autoatmic startup for and login at that time.

onboot is used to for the session used for boot/root. It just signals 
the tools to handle it differently. During shutdown for example we 
cannot kill that session when the init script stop is done, because it 
is still needed for root.

manual is used because a lot of targets will return all the portals on 
the target. Some of these portals may be disabled or not even connected 
to the network. Instead of iscsiadm/iscsid wasting time trying to log in 
admins can mark them as manual and the init scripts will not auto start 
them. Why not just delete them of they cannot be used? I do not know.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-23 Thread Doron Shoham

 > Also, what is the purpose of "node.startup" parameter?
> When is it in use?
> 

Hi Mike,
Can you please explain this?

Thanks,
Doron

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-22 Thread Eli Dorfman

On Mon, Sep 22, 2008 at 9:39 AM, Hannes Reinecke <[EMAIL PROTECTED]> wrote:
>
> Hi Doron,
>
> Doron Shoham wrote:
>> Doron Shoham wrote:
>>> Hi,
>>>
>>> Why does the init script on suse re-discovers all iscsi targets which were 
>>> set
>>> to automatic login?
>>> To avoid deadlocks on the root fs there is patch which limits the number of 
>>> retries on first login.
>>> When doing so, it sets back all the default parameters (overriding any user 
>>> definitions).
>>> I think it should be like in redhat - just login to all the targets which 
>>> are automatic.
>>>
> That's what we tried initially. However, certain switches take quite a bit of 
> time for the Spanning-Tree
> Protocol to work out the route, during which time any connect() attempt 
> returns with -EHOSTUNREACH.
> If we do an automatic login, the login request is sent from the kernel 
> directly. And any connect()
> failure from the kernel is taken as a terminal error, hence the login fails.
> The best we can do here is to make this re-discovery conditional, which would 
> allow customers not
> suffering from STP failures to get a faster booting time.

Current implementation only partially solves the issue, but creates
another problem instead - node parameters are changed.
What if first login will ignore this error and and retry anyway - this
is not the cleanest solution but it will satisfy both requirements.

>
>>> Another issue is that the script logouts only from automatic nodes (not 
>>> from all nodes as in redhat).
>>> This causes a bug, when iscsi is stopped while manual node is still 
>>> logged-in (session is active).
>>> The result is that iscsid is down but session is still alive - iscsiadm -m 
>>> session shows this stale session.
>>> I suggest that we do the same as redhat, any objections?
>>>
> Ouch. You touched a very complicated topic. I've had long discussions and 
> patches with NetApp on
> how to get iscsi shutdown right. It's not only that we have stale nodes 
> (which would be ok, given
> that we're shutting down anyway), but it's also well possible that some 
> crucial filesystem bits
> are in fact served by iSCSI, so we definitely shouldn'd be shutting them 
> down, regardless of any
> automatic settings.

Having stale nodes is not ok, since we may use "iscsi stop" not only
when machine shutdowns
but also to change node parameters (e.g. node_transport set to iser).
The dependency of filesystem with iscsi should be resolved
independently by the user.
This applies both for automatic and manual sessions.
What we suggest is to logout all nodes (and not only automatic).

>
> There's a bugzilla open to get this right (Novell bug#392080), you're welcome 
> to join and get
> this sorted out.
I could not find this bug, please send a link.


Thanks,
Eli

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-21 Thread Hannes Reinecke

Hi Doron,

Doron Shoham wrote:
> Doron Shoham wrote:
>> Hi,
>>
>> Why does the init script on suse re-discovers all iscsi targets which were 
>> set
>> to automatic login?
>> To avoid deadlocks on the root fs there is patch which limits the number of 
>> retries on first login.
>> When doing so, it sets back all the default parameters (overriding any user 
>> definitions).
>> I think it should be like in redhat - just login to all the targets which 
>> are automatic.
>>
That's what we tried initially. However, certain switches take quite a bit of 
time for the Spanning-Tree
Protocol to work out the route, during which time any connect() attempt returns 
with -EHOSTUNREACH.
If we do an automatic login, the login request is sent from the kernel 
directly. And any connect()
failure from the kernel is taken as a terminal error, hence the login fails.
The best we can do here is to make this re-discovery conditional, which would 
allow customers not
suffering from STP failures to get a faster booting time.

>> Another issue is that the script logouts only from automatic nodes (not from 
>> all nodes as in redhat).
>> This causes a bug, when iscsi is stopped while manual node is still 
>> logged-in (session is active).
>> The result is that iscsid is down but session is still alive - iscsiadm -m 
>> session shows this stale session.
>> I suggest that we do the same as redhat, any objections?
>>  
Ouch. You touched a very complicated topic. I've had long discussions and 
patches with NetApp on
how to get iscsi shutdown right. It's not only that we have stale nodes (which 
would be ok, given
that we're shutting down anyway), but it's also well possible that some crucial 
filesystem bits
are in fact served by iSCSI, so we definitely shouldn'd be shutting them down, 
regardless of any
automatic settings.

There's a bugzilla open to get this right (Novell bug#392080), you're welcome 
to join and get
this sorted out.


>> Also, what is the purpose of "node.startup" parameter?
>> When is it in use?
>>
Don't know. Ask Mike, he implemented it.
Probably a leftover.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-21 Thread Doron Shoham

Doron Shoham wrote:
> Hi,
> 
> Why does the init script on suse re-discovers all iscsi targets which were set
> to automatic login?
> To avoid deadlocks on the root fs there is patch which limits the number of 
> retries on first login.
> When doing so, it sets back all the default parameters (overriding any user 
> definitions).
> I think it should be like in redhat - just login to all the targets which are 
> automatic.
> 
> Another issue is that the script logouts only from automatic nodes (not from 
> all nodes as in redhat).
> This causes a bug, when iscsi is stopped while manual node is still logged-in 
> (session is active).
> The result is that iscsid is down but session is still alive - iscsiadm -m 
> session shows this stale session.
> I suggest that we do the same as redhat, any objections?
>  
> 
> Also, what is the purpose of "node.startup" parameter?
> When is it in use?
> 
> 
> Thanks,
> Doron
> 

Hi,

Does my suggestion sounds ok?

Doron





--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



open-iscsi init script on suse

2008-09-17 Thread Doron Shoham

Hi,

Why does the init script on suse re-discovers all iscsi targets which were set
to automatic login?
To avoid deadlocks on the root fs there is patch which limits the number of 
retries on first login.
When doing so, it sets back all the default parameters (overriding any user 
definitions).
I think it should be like in redhat - just login to all the targets which are 
automatic.

Another issue is that the script logouts only from automatic nodes (not from 
all nodes as in redhat).
This causes a bug, when iscsi is stopped while manual node is still logged-in 
(session is active).
The result is that iscsid is down but session is still alive - iscsiadm -m 
session shows this stale session.
I suggest that we do the same as redhat, any objections?
 

Also, what is the purpose of "node.startup" parameter?
When is it in use?


Thanks,
Doron

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---