Re: in /var/log/messages: conn errors and recovery

2008-05-14 Thread a s p a s i a

thanks for this final note/recommendation!  .. i will do so .. have
deployed the golden image though and so far my engineering users have
not complained seems like they are happy with their iscsi root on
CentoS!

- a.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: in /var/log/messages: conn errors and recovery

2008-05-12 Thread Mike Christie

a s p a s i a wrote:
>> I need more to the log. The parts above this would tell me what happened.
> 
> OK, I will capture complete console log next time, sorry, but I
> rebooted the couple of test boxes that had this 
> 
> 
>> node.session.timeo.replacement_timeout = 172800
>> node.conn[0].timeo.noop_out_timeout = 0
>> node.conn[0].timeo.noop_out_interval = 0
> 
> OK!  will do and observe - I just updated the /etc/iscsi/iscsi.conf
> file and rebooted (working on my "golden-build" server) ...
> 

You need to do one more step. open-iscsi only reads 
/etc/init.d/iscsi.conf when you do discvoery. It uses those values as 
the defaults. So to pick up new values in there you have to rerun the 
discovery command which will reset the values in the node db based on 
the iscsi.conf ones.

Or you can just run iscsiadm to set those values:

iscsiadm -m node -T target -p ip:port -o update -n 
node.session.timeo.replacement_timeout -v 172800

Then repeat for each value.


> I also downloaded the latest stable release from the open-iscsi.org
> site, so that it is now on the 869.2 version as opposed to 865 ...
> will observe and see if any strange behaviours occur.
> 
> > 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: in /var/log/messages: conn errors and recovery

2008-05-09 Thread a s p a s i a

> I need more to the log. The parts above this would tell me what happened.

OK, I will capture complete console log next time, sorry, but I
rebooted the couple of test boxes that had this 


> node.session.timeo.replacement_timeout = 172800
> node.conn[0].timeo.noop_out_timeout = 0
> node.conn[0].timeo.noop_out_interval = 0

OK!  will do and observe - I just updated the /etc/iscsi/iscsi.conf
file and rebooted (working on my "golden-build" server) ...

I also downloaded the latest stable release from the open-iscsi.org
site, so that it is now on the 869.2 version as opposed to 865 ...
will observe and see if any strange behaviours occur.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: in /var/log/messages: conn errors and recovery

2008-05-09 Thread Mike Christie

a s p a s i a wrote:
> Hi Mike,
> 
> the last issue is reported by another person - albert, I initiated
> this thread, I think we are getting similar symptoms/errors:
> 
>> What kernel are you using? I thought from the beginning of the thread
>> you were using 2.6.18-53.1.14.el. Is that right?
> 
> Yes, that is right - that is the original thread - one of my iscsi
> hosts, which I have deployed into about 50 machines is running that
> above kernel - Centos 5.1;  and as we exchanged info earlier, it is
> running the following version of open-iscsi:
>  uname -a
> Linux r04s25 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:37:38 EST 2008
> x86_64 x86_64 x86_64 GNU/Linux
> [EMAIL PROTECTED] ~]# iscsiadm -V
> iscsiadm version 2.0-865
> [EMAIL PROTECTED] ~]# iscsistart -v
> iscsistart version 2.0-865
> [EMAIL PROTECTED] ~]# modinfo scsi_transport_iscsi
> filename:
> /lib/modules/2.6.18-53.1.14.el5/kernel/drivers/scsi/scsi_transport_iscsi.ko
> version:2.0-865
> 
> --
> 
> In the last 2 days, I have deployed this iscsiRoot into about 50
> machines, where a few test/QA folks commenced testing on them.  During
> use ... they have noticed various symptoms such as:
> 
> 1.  Root FileSystem becomes inaccessible, where one is unable to
> perform any further work;  networking seems to be ok (ping'able from
> the outside) but the session is stuck;  Meanwhile at the console, we
> see the following messages:
> 
> scsi 6:0:0:0 rejecting I/O to dead device
> EXT3-fs error (device sde1)


I need more to the log. The parts above this would tell me what happened.

> 
> Upon restarting the box, the iscsiRoot mounted fine and seems like it
> is ok again.
> 
> 2.  This morning I checked the 3 other servers we left to observe and
> noticed I am unable to ssh into the box (ssh connection is denied), I
> walked over to check the console of one of the hosts the FS root is
> still there, I can ls and cd into various directories, but when I try
> to run a command such as "df" or "cat" to check the /var/log/messages
> file for instance I get the following error:
> 
> iscsi:  cmd 0x28 is not queued (8)
> end_request: I/O error, dev sde sector 3389502
> -bash:  /bin/df:  Input/output error.
> 
> ..
> 
> These issues I observed on the hosts that are actually being used
> (running application tests, etc.) ... Another same box (my golden
> image) booted iscsiroot on same OS and open-iscsi version does not
> have problems, but it's not doing anything 
> 
> I know we discussed that I should upgrade ... should I upgrade to the
> current stable release?  Before I do, I'd like to know if the above
> are known errors and why?
> 

Are you doing iscsi root for all boxes? If you are and you are using 
Centos 5.1 then I would use iscsi with multipath. I would use iscsi + 
multipath in general for all OSes and setups actually. I do not think 
iscsi root + multipath is supported out of the box in 5.1 so if that is 
not an option I would use the following settings:

node.session.timeo.replacement_timeout = 172800
node.conn[0].timeo.noop_out_timeout = 0
node.conn[0].timeo.noop_out_interval = 0

for the root session. You should set that in the node db for the session 
that will be run as root.

If you are logged into the box and the session is running then do:

iscsiadm -m session -r $sid -o update -n 
node.session.timeo.replacement_timeout -v 172800

$sid is the session id that you would see when you run iscsiadm -m 
session -P 1

Then repeat this for each value and setting above.

If the session is not running then pass the target and portal info for 
the portal that gets logged in for root.

iscsiadm -m node -T target -p ip:port -o update -n 
node.session.timeo.replacement_timeout -v 172800

then repeat for the other values.

In either case the values will be used on the next reboot (or really 
when the iscsi service is restarted but if you are running root we will 
not restart the service or it will disrupt the root disk).

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: in /var/log/messages: conn errors and recovery

2008-05-09 Thread a s p a s i a

Hi Mike,

the last issue is reported by another person - albert, I initiated
this thread, I think we are getting similar symptoms/errors:

> What kernel are you using? I thought from the beginning of the thread
> you were using 2.6.18-53.1.14.el. Is that right?

Yes, that is right - that is the original thread - one of my iscsi
hosts, which I have deployed into about 50 machines is running that
above kernel - Centos 5.1;  and as we exchanged info earlier, it is
running the following version of open-iscsi:
 uname -a
Linux r04s25 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:37:38 EST 2008
x86_64 x86_64 x86_64 GNU/Linux
[EMAIL PROTECTED] ~]# iscsiadm -V
iscsiadm version 2.0-865
[EMAIL PROTECTED] ~]# iscsistart -v
iscsistart version 2.0-865
[EMAIL PROTECTED] ~]# modinfo scsi_transport_iscsi
filename:
/lib/modules/2.6.18-53.1.14.el5/kernel/drivers/scsi/scsi_transport_iscsi.ko
version:2.0-865

--

In the last 2 days, I have deployed this iscsiRoot into about 50
machines, where a few test/QA folks commenced testing on them.  During
use ... they have noticed various symptoms such as:

1.  Root FileSystem becomes inaccessible, where one is unable to
perform any further work;  networking seems to be ok (ping'able from
the outside) but the session is stuck;  Meanwhile at the console, we
see the following messages:

scsi 6:0:0:0 rejecting I/O to dead device
EXT3-fs error (device sde1)

Upon restarting the box, the iscsiRoot mounted fine and seems like it
is ok again.

2.  This morning I checked the 3 other servers we left to observe and
noticed I am unable to ssh into the box (ssh connection is denied), I
walked over to check the console of one of the hosts the FS root is
still there, I can ls and cd into various directories, but when I try
to run a command such as "df" or "cat" to check the /var/log/messages
file for instance I get the following error:

iscsi:  cmd 0x28 is not queued (8)
end_request: I/O error, dev sde sector 3389502
-bash:  /bin/df:  Input/output error.

..

These issues I observed on the hosts that are actually being used
(running application tests, etc.) ... Another same box (my golden
image) booted iscsiroot on same OS and open-iscsi version does not
have problems, but it's not doing anything 

I know we discussed that I should upgrade ... should I upgrade to the
current stable release?  Before I do, I'd like to know if the above
are known errors and why?

thanks in advance,

A.


>
>



-- 
A S P A S I A
. . . . . . . . . . ..

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: in /var/log/messages: conn errors and recovery

2008-05-09 Thread Mike Christie

[EMAIL PROTECTED] wrote:
> Hi Mike,
> 
> using the Wasabi target and the latest open-iscsi git version (as of
> today)
> I still get the errors, which are according to the Wasabi log a time
> out problem.
> It only happens when logging in, which used to take about one second
> or less,
> but now 5-10 seconds. The messages log on the initiator shows this:
> 
> scsi6 : iSCSI Initiator over TCP/IP
>  connection1:0: detected conn error (1011)
>  connection1:0: detected conn error (1011)
>  connection1:0: detected conn error (1011)
>  connection1:0: detected conn error (1011)
> scsi scan: 66 byte inquiry failed.  Consider BLIST_INQUIRY_36 for this
> device


It looks like it is not liking the inquiry command the scsi layer sends 
it. Do you see anything about bad/invalid scsi commands or something 
about INQRUIYs in the log?

What kernel are you using? I thought from the beginning of the thread 
you were using 2.6.18-53.1.14.el. Is that right?


> scsi 6:0:0:0: Direct-Access Wasabi   WSB/iSCSI0401 PQ: 0
> ANSI: 5
> sd 6:0:0:0: [sdb] 32768 512-byte hardware sectors (167772 MB)
> sd 6:0:0:0: [sdb] Write Protect is off
> sd 6:0:0:0: [sdb] Mode Sense: 5b 00 00 08
> sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 6:0:0:0: [sdb] 32768 512-byte hardware sectors (167772 MB)
> sd 6:0:0:0: [sdb] Write Protect is off
> sd 6:0:0:0: [sdb] Mode Sense: 5b 00 00 08
> sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
>  sdb: sdb1
> sd 6:0:0:0: [sdb] Attached SCSI disk
> sd 6:0:0:0: Attached scsi generic sg2 type 0
> 
> 
> Albert
> 
> On May 8, 11:46 pm, Mike Christie <[EMAIL PROTECTED]> wrote:
>> a s p a s i a wrote:
>>
>>
>>
 Are you using 869 and do you also see the nop out timeout messages or do
 you just see these connection error messages?
>>> just the above connections errors...
>>> 865 version:
>>> [EMAIL PROTECTED] ~]# iscsiadm -V
>>> iscsiadm version 2.0-865
>>> [EMAIL PROTECTED] ~]# iscsistart -v
>>> iscsistart version 2.0-865
>>> [EMAIL PROTECTED] ~]# modinfo scsi_transport_iscsi
>>> filename:
>>> /lib/modules/2.6.18-53.1.14.el5/kernel/drivers/scsi/scsi_transport_iscsi.ko
>>> version:2.0-865
 Yeah, read the README section 8 for how to modify the nop and replacment
 timeout settings for iscsi root.
>>> yeah ... i'll do so .. maybe adjust and see if it reappears ...
>>> in searching through current /var/log/messages, seems like the errors
>>> only appeared twice yesterday and once this morning ..
>>> no big deal, but interesting to check.
>> You want to make sure you are using 2.0-865.15 if you are using the 865
>> series. There was a bug in the early 865 releases where during writes we
>> were not tracking data right and we would or the target would drop the
>> session (you would see the error messages you posted about) to get us
>> back on track.
> > 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: in /var/log/messages: conn errors and recovery

2008-05-09 Thread albert . pauw

Hi Mike,

using the Wasabi target and the latest open-iscsi git version (as of
today)
I still get the errors, which are according to the Wasabi log a time
out problem.
It only happens when logging in, which used to take about one second
or less,
but now 5-10 seconds. The messages log on the initiator shows this:

scsi6 : iSCSI Initiator over TCP/IP
 connection1:0: detected conn error (1011)
 connection1:0: detected conn error (1011)
 connection1:0: detected conn error (1011)
 connection1:0: detected conn error (1011)
scsi scan: 66 byte inquiry failed.  Consider BLIST_INQUIRY_36 for this
device
scsi 6:0:0:0: Direct-Access Wasabi   WSB/iSCSI0401 PQ: 0
ANSI: 5
sd 6:0:0:0: [sdb] 32768 512-byte hardware sectors (167772 MB)
sd 6:0:0:0: [sdb] Write Protect is off
sd 6:0:0:0: [sdb] Mode Sense: 5b 00 00 08
sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 6:0:0:0: [sdb] 32768 512-byte hardware sectors (167772 MB)
sd 6:0:0:0: [sdb] Write Protect is off
sd 6:0:0:0: [sdb] Mode Sense: 5b 00 00 08
sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
 sdb: sdb1
sd 6:0:0:0: [sdb] Attached SCSI disk
sd 6:0:0:0: Attached scsi generic sg2 type 0


Albert

On May 8, 11:46 pm, Mike Christie <[EMAIL PROTECTED]> wrote:
> a s p a s i a wrote:
>
>
>
> >> Are you using 869 and do you also see the nop out timeout messages or do
> >> you just see these connection error messages?
>
> > just the above connections errors...
>
> > 865 version:
>
> > [EMAIL PROTECTED] ~]# iscsiadm -V
> > iscsiadm version 2.0-865
> > [EMAIL PROTECTED] ~]# iscsistart -v
> > iscsistart version 2.0-865
> > [EMAIL PROTECTED] ~]# modinfo scsi_transport_iscsi
> > filename:
> > /lib/modules/2.6.18-53.1.14.el5/kernel/drivers/scsi/scsi_transport_iscsi.ko
> > version:2.0-865
>
> >> Yeah, read the README section 8 for how to modify the nop and replacment
> >> timeout settings for iscsi root.
>
> > yeah ... i'll do so .. maybe adjust and see if it reappears ...
>
> > in searching through current /var/log/messages, seems like the errors
> > only appeared twice yesterday and once this morning ..
>
> > no big deal, but interesting to check.
>
> You want to make sure you are using 2.0-865.15 if you are using the 865
> series. There was a bug in the early 865 releases where during writes we
> were not tracking data right and we would or the target would drop the
> session (you would see the error messages you posted about) to get us
> back on track.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: in /var/log/messages: conn errors and recovery

2008-05-08 Thread Mike Christie

a s p a s i a wrote:
>> Are you using 869 and do you also see the nop out timeout messages or do
>> you just see these connection error messages?
>>
> 
> just the above connections errors...
> 
> 865 version:
> 
> [EMAIL PROTECTED] ~]# iscsiadm -V
> iscsiadm version 2.0-865
> [EMAIL PROTECTED] ~]# iscsistart -v
> iscsistart version 2.0-865
> [EMAIL PROTECTED] ~]# modinfo scsi_transport_iscsi
> filename:
> /lib/modules/2.6.18-53.1.14.el5/kernel/drivers/scsi/scsi_transport_iscsi.ko
> version:2.0-865
> 
> 
>> Yeah, read the README section 8 for how to modify the nop and replacment
>> timeout settings for iscsi root.
> 
> yeah ... i'll do so .. maybe adjust and see if it reappears ...
> 
> in searching through current /var/log/messages, seems like the errors
> only appeared twice yesterday and once this morning ..
> 
> no big deal, but interesting to check.
> 

You want to make sure you are using 2.0-865.15 if you are using the 865 
series. There was a bug in the early 865 releases where during writes we 
were not tracking data right and we would or the target would drop the 
session (you would see the error messages you posted about) to get us 
back on track.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: in /var/log/messages: conn errors and recovery

2008-05-08 Thread a s p a s i a

> Are you using 869 and do you also see the nop out timeout messages or do
> you just see these connection error messages?
>

just the above connections errors...

865 version:

[EMAIL PROTECTED] ~]# iscsiadm -V
iscsiadm version 2.0-865
[EMAIL PROTECTED] ~]# iscsistart -v
iscsistart version 2.0-865
[EMAIL PROTECTED] ~]# modinfo scsi_transport_iscsi
filename:
/lib/modules/2.6.18-53.1.14.el5/kernel/drivers/scsi/scsi_transport_iscsi.ko
version:2.0-865


> Yeah, read the README section 8 for how to modify the nop and replacment
> timeout settings for iscsi root.

yeah ... i'll do so .. maybe adjust and see if it reappears ...

in searching through current /var/log/messages, seems like the errors
only appeared twice yesterday and once this morning ..

no big deal, but interesting to check.

- a.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: in /var/log/messages: conn errors and recovery

2008-05-08 Thread Mike Christie

aspasia wrote:
> Hello all,
> 
> After seeing the announcement on the regression of the current
> release, I looked more closely into my /var/log/messages and noticed
> that once in a while my iscsi connections get the following:
> 
> May  8 08:55:48 r05s23 kernel:  connection1:0: iscsi: detected conn
> error (1011)
> May  8 08:55:49 r05s23 iscsid: Kernel reported iSCSI connection 1:0
> error (1011) state (3)
> May  8 08:55:51 r05s23 iscsid: connection1:0 is operational after
> recovery (2 attempts)
> 
> Seems like it recovers, but are these critical issues?  My iscsi

Are you using 869 and do you also see the nop out timeout messages or do 
you just see these connection error messages?

> device is being mounted as my root; should I increase some paramater
> in /etc/iscsi/iscsi.conf?
> 

Yeah, read the README section 8 for how to modify the nop and replacment 
timeout settings for iscsi root.


> Any recommendation would be greatly appreciated.
> 
> A.
> > 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



in /var/log/messages: conn errors and recovery

2008-05-08 Thread aspasia

Hello all,

After seeing the announcement on the regression of the current
release, I looked more closely into my /var/log/messages and noticed
that once in a while my iscsi connections get the following:

May  8 08:55:48 r05s23 kernel:  connection1:0: iscsi: detected conn
error (1011)
May  8 08:55:49 r05s23 iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)
May  8 08:55:51 r05s23 iscsid: connection1:0 is operational after
recovery (2 attempts)

Seems like it recovers, but are these critical issues?  My iscsi
device is being mounted as my root; should I increase some paramater
in /etc/iscsi/iscsi.conf?

Any recommendation would be greatly appreciated.

A.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---