Re: open-iscsi init script on suse

2008-09-24 Thread Mike Christie

Mike Christie wrote:
> Hannes Reinecke wrote:
>> On Tue, Sep 23, 2008 at 12:13:19PM -0500, Mike Christie wrote:
>>> Hannes Reinecke wrote:
 Hi Doron,
 Doron Shoham wrote:
> Doron Shoham wrote:
>> Hi,
>>
>> Why does the init script on suse re-discovers all iscsi targets which 
>> were set
>> to automatic login?
>> To avoid deadlocks on the root fs there is patch which limits the number 
>> of retries on first login.
>> When doing so, it sets back all the default parameters (overriding any 
>> user definitions).
>> I think it should be like in redhat - just login to all the targets 
>> which are automatic.
>>
 That's what we tried initially. However, certain switches take quite a bit 
 of time for the Spanning-Tree
 Protocol to work out the route, during which time any connect() attempt 
 returns with -EHOSTUNREACH.
 If we do an automatic login, the login request is sent from the kernel 
 directly. And any connect()
 failure from the kernel is taken as a terminal error, hence the login 
 fails.
>>> Are we talking about the same thing that keeps coming up :)
>>>
>> I know. Main reason here is that I didn't have time to investigate
> 
> It is ok. I like repeating what I said in this mail more than fixing 
> aic7xxx bugs, so as long as you fix that driver you can do anything here :)
> 
> 
>> this further, so I'll have to fall back to answer the same results
>> I had the last time ...
>>
>>> I swear someone from Voltaire asked this before. You gave the same reply. 
>>> And then I said you can increase node.session.initial_login_retry_max
>>> so we retry the login for all cases (almost all not CHAP or target not 
>>> there errors). If we get -EHOSTUNREACH we will retry up to 
>>> node.session.initial_login_retry_max times (there is a 1 second delay 
>>> between retries so it is a delay of node.session.initial_login_retry_max 
>>> seconds). I then said that for -EHOSTUNREACH I can add a check so that we 
>>> always test for this and always retry so the user does not have to set 
>>> node.session.initial_login_retry_max but I was not sure if there was a case 
>>> where we would not want to retry.
>>>
>> Problem is that there are valid cases for which we should _not_ retry an
>> -EHOSTUNREACH failure case. So I wouldn't retry for EHOSTUNREACH always.
>> But increasing the initial_login_retry_max value would really help here.
>> Hmm. Will have to check, but this seems like a viable route.
>>
>> Sorry for not being responsive, but I've been kept really busy recently.
>>
> 
> No problem.
> 
> I have been having our users try initial_login_retry_max = 60 and they 
> have reported success. For iscsistart which red hat and fedora uses for 
> the root session in the initramfs I just set it to 120.
> 
> For the default let me up the default to something longer than 4. 


Actually this was bad. If we have to wait for the login_timeout to fire 
then initial_login_retry_max = 4 was a nice round number and the max 
time we had to wait was 1 minute. If I just increase it (tried 45 
stupidly first), it increases the possible max default wait to 11 minutes :(

So what I did was make initial_login_retry_max just be the max number of 
initial iscsi login timeouts we can withstand and then let other initial 
login failures retry for up to initial_login_retry_max * login_timeout.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Mike Christie

Hannes Reinecke wrote:
> On Tue, Sep 23, 2008 at 12:13:19PM -0500, Mike Christie wrote:
>> Hannes Reinecke wrote:
>>> Hi Doron,
>>> Doron Shoham wrote:
 Doron Shoham wrote:
> Hi,
>
> Why does the init script on suse re-discovers all iscsi targets which 
> were set
> to automatic login?
> To avoid deadlocks on the root fs there is patch which limits the number 
> of retries on first login.
> When doing so, it sets back all the default parameters (overriding any 
> user definitions).
> I think it should be like in redhat - just login to all the targets 
> which are automatic.
>
>>> That's what we tried initially. However, certain switches take quite a bit 
>>> of time for the Spanning-Tree
>>> Protocol to work out the route, during which time any connect() attempt 
>>> returns with -EHOSTUNREACH.
>>> If we do an automatic login, the login request is sent from the kernel 
>>> directly. And any connect()
>>> failure from the kernel is taken as a terminal error, hence the login 
>>> fails.
>> Are we talking about the same thing that keeps coming up :)
>>
> I know. Main reason here is that I didn't have time to investigate

It is ok. I like repeating what I said in this mail more than fixing 
aic7xxx bugs, so as long as you fix that driver you can do anything here :)


> this further, so I'll have to fall back to answer the same results
> I had the last time ...
> 
>> I swear someone from Voltaire asked this before. You gave the same reply. 
>> And then I said you can increase node.session.initial_login_retry_max
>> so we retry the login for all cases (almost all not CHAP or target not 
>> there errors). If we get -EHOSTUNREACH we will retry up to 
>> node.session.initial_login_retry_max times (there is a 1 second delay 
>> between retries so it is a delay of node.session.initial_login_retry_max 
>> seconds). I then said that for -EHOSTUNREACH I can add a check so that we 
>> always test for this and always retry so the user does not have to set 
>> node.session.initial_login_retry_max but I was not sure if there was a case 
>> where we would not want to retry.
>>
> Problem is that there are valid cases for which we should _not_ retry an
> -EHOSTUNREACH failure case. So I wouldn't retry for EHOSTUNREACH always.
> But increasing the initial_login_retry_max value would really help here.
> Hmm. Will have to check, but this seems like a viable route.
> 
> Sorry for not being responsive, but I've been kept really busy recently.
> 

No problem.

I have been having our users try initial_login_retry_max = 60 and they 
have reported success. For iscsistart which red hat and fedora uses for 
the root session in the initramfs I just set it to 120.

For the default let me up the default to something longer than 4. 
Because we do all the logins in parallel we do not have to worry about 
one login delaying another, so the max wait is just going to be 
initial_login_retry_max instead of possibly the worst old case 
number_of_portals_or_tragets_for_eql * initial_login_retry_max seconds.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Mike Christie

Eli Dorfman wrote:
> On Tue, Sep 23, 2008 at 8:04 PM, Mike Christie <[EMAIL PROTECTED]> wrote:
>> Doron Shoham wrote:
>>> Hi,
>>>
>>> Why does the init script on suse re-discovers all iscsi targets which were 
>>> set
>>> to automatic login?
>>> To avoid deadlocks on the root fs there is patch which limits the number of 
>>> retries on first login.
>>> When doing so, it sets back all the default parameters (overriding any user 
>>> definitions).
>>> I think it should be like in redhat - just login to all the targets which 
>>> are automatic.
>>>
>>> Another issue is that the script logouts only from automatic nodes (not 
>>> from all nodes as in redhat).
>>> This causes a bug, when iscsi is stopped while manual node is still 
>>> logged-in (session is active).
>>> The result is that iscsid is down but session is still alive - iscsiadm -m 
>>> session shows this stale session.
>>> I suggest that we do the same as redhat, any objections?
>>>
>>>
>>> Also, what is the purpose of "node.startup" parameter?
>>> When is it in use?
>>>
>> node.startup should be renamed record.startup. The possible values are
>> automatic, manual and onboot. When the init scripts start they can run
>> over the the db and check which records that the users has requested
>> autoatmic startup for and login at that time.
>>
>> onboot is used to for the session used for boot/root. It just signals
>> the tools to handle it differently. During shutdown for example we
>> cannot kill that session when the init script stop is done, because it
>> is still needed for root.
>>
>> manual is used because a lot of targets will return all the portals on
>> the target. Some of these portals may be disabled or not even connected
>> to the network. Instead of iscsiadm/iscsid wasting time trying to log in
>> admins can mark them as manual and the init scripts will not auto start
>> them. Why not just delete them of they cannot be used? I do not know.
>>
> The question is why there are two node.startup fields and what is the
> difference between them (if any):
> node.startup
> AND
> node.conn[0].startup
> 

node.conn[0].startup used to be from when there was basic MC/s support. 
Since we only support one connection per session there is no difference. 
Some distro scripts used to check for one or the other, but now the 
iscsiadm -m node -L/U command will check for either to support all users.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Eli Dorfman

On Tue, Sep 23, 2008 at 8:04 PM, Mike Christie <[EMAIL PROTECTED]> wrote:
>
> Doron Shoham wrote:
>> Hi,
>>
>> Why does the init script on suse re-discovers all iscsi targets which were 
>> set
>> to automatic login?
>> To avoid deadlocks on the root fs there is patch which limits the number of 
>> retries on first login.
>> When doing so, it sets back all the default parameters (overriding any user 
>> definitions).
>> I think it should be like in redhat - just login to all the targets which 
>> are automatic.
>>
>> Another issue is that the script logouts only from automatic nodes (not from 
>> all nodes as in redhat).
>> This causes a bug, when iscsi is stopped while manual node is still 
>> logged-in (session is active).
>> The result is that iscsid is down but session is still alive - iscsiadm -m 
>> session shows this stale session.
>> I suggest that we do the same as redhat, any objections?
>>
>>
>> Also, what is the purpose of "node.startup" parameter?
>> When is it in use?
>>
>
> node.startup should be renamed record.startup. The possible values are
> automatic, manual and onboot. When the init scripts start they can run
> over the the db and check which records that the users has requested
> autoatmic startup for and login at that time.
>
> onboot is used to for the session used for boot/root. It just signals
> the tools to handle it differently. During shutdown for example we
> cannot kill that session when the init script stop is done, because it
> is still needed for root.
>
> manual is used because a lot of targets will return all the portals on
> the target. Some of these portals may be disabled or not even connected
> to the network. Instead of iscsiadm/iscsid wasting time trying to log in
> admins can mark them as manual and the init scripts will not auto start
> them. Why not just delete them of they cannot be used? I do not know.
>
The question is why there are two node.startup fields and what is the
difference between them (if any):
node.startup
AND
node.conn[0].startup

Thanks,
Eli

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi init script on suse

2008-09-24 Thread Hannes Reinecke

On Tue, Sep 23, 2008 at 12:13:19PM -0500, Mike Christie wrote:
> Hannes Reinecke wrote:
>> Hi Doron,
>> Doron Shoham wrote:
>>> Doron Shoham wrote:
 Hi,

 Why does the init script on suse re-discovers all iscsi targets which 
 were set
 to automatic login?
 To avoid deadlocks on the root fs there is patch which limits the number 
 of retries on first login.
 When doing so, it sets back all the default parameters (overriding any 
 user definitions).
 I think it should be like in redhat - just login to all the targets 
 which are automatic.

>> That's what we tried initially. However, certain switches take quite a bit 
>> of time for the Spanning-Tree
>> Protocol to work out the route, during which time any connect() attempt 
>> returns with -EHOSTUNREACH.
>> If we do an automatic login, the login request is sent from the kernel 
>> directly. And any connect()
>> failure from the kernel is taken as a terminal error, hence the login 
>> fails.
>
> Are we talking about the same thing that keeps coming up :)
>
I know. Main reason here is that I didn't have time to investigate
this further, so I'll have to fall back to answer the same results
I had the last time ...

> I swear someone from Voltaire asked this before. You gave the same reply. 
> And then I said you can increase node.session.initial_login_retry_max
> so we retry the login for all cases (almost all not CHAP or target not 
> there errors). If we get -EHOSTUNREACH we will retry up to 
> node.session.initial_login_retry_max times (there is a 1 second delay 
> between retries so it is a delay of node.session.initial_login_retry_max 
> seconds). I then said that for -EHOSTUNREACH I can add a check so that we 
> always test for this and always retry so the user does not have to set 
> node.session.initial_login_retry_max but I was not sure if there was a case 
> where we would not want to retry.
>
Problem is that there are valid cases for which we should _not_ retry an
-EHOSTUNREACH failure case. So I wouldn't retry for EHOSTUNREACH always.
But increasing the initial_login_retry_max value would really help here.
Hmm. Will have to check, but this seems like a viable route.

Sorry for not being responsive, but I've been kept really busy recently.

> I can even increase the default node.session.initial_login_retry_max. It is 
> only 4 right now. We do all the logins in parallel now, so the max delay 
> would be node.session.initial_login_retry_max seconds basically. Previously 
> when we did one portal at a time, we might have to wait 
> node.session.initial_login_retry_max for each portal or in cases like EQL 
> each device.
Ah. Good to know.

I really hope to get this cleared up in the near future.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
[EMAIL PROTECTED] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---