Re: information on the config option -- node.session.iscsi.FastAbort = No

2010-04-28 Thread Mike Christie

On 04/28/2010 12:43 PM, Mike Christie wrote:

On 04/28/2010 10:40 AM, maguar887 wrote:

We are currently running open iscsi version 2.0-871 on RHEL 5.3
(2.6.18-92.1.6.0.2.el5) against a Dell Equallogic iScsi SAN group
(firmware 4.3.5)



You need to upgrade your kernel. It had a bug with eql targets in the
async logout path. It is fixed in 5.4 and 5.5 kernels.



Actually it was fixed in 5.3 too.

The weird thing here is that we get this error *before* we get the async 
logout request.


Do you have multiple sessions? Is sdt accessed through session 22 (run 
iscsiadm -m session -P 3 to see) or a different session? If a different 
session is there more log and can you send it?




Apr 18 11:18:11 serv02 kernel: sd 24:0:0:0: SCSI error: return code =
0x0007
Apr 18 11:18:11 serv02 kernel: end_request: I/O error, dev sdt, sector
27424
Apr 18 11:18:11 serv02 kernel: sd 24:0:0:0: SCSI error: return code =
0x0007
Apr 18 11:18:11 serv02 kernel: end_request: I/O error, dev sdt, sector
27520
Apr 18 11:18:11 serv02 kernel: Buffer I/O error on device sdt1,
logical block 3436
Apr 18 11:18:11 serv02 kernel: lost page write due to I/O error on
sdt1
Apr 18 11:18:11 serv02 kernel: Aborting journal on device sdt1.
Apr 18 11:18:11 serv02 kernel: ext3_abort called.
Apr 18 11:18:11 serv02 kernel: EXT3-fs error (device sdt1):
ext3_journal_start_sb: Detected aborted journal
Apr 18 11:18:11 serv02 kernel: Remounting filesystem read-only
Apr 18 11:18:12 serv02 iscsid: Target requests logout within 3 seconds
for connection
Apr 18 11:18:16 serv02 iscsid: connection22:0 is operational after
recovery (1 attempts)





--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: information on the config option -- node.session.iscsi.FastAbort = No

2010-04-28 Thread Mike Christie

On 04/28/2010 12:56 PM, Mike Christie wrote:

On 04/28/2010 12:43 PM, Mike Christie wrote:

On 04/28/2010 10:40 AM, maguar887 wrote:

We are currently running open iscsi version 2.0-871 on RHEL 5.3
(2.6.18-92.1.6.0.2.el5) against a Dell Equallogic iScsi SAN group
(firmware 4.3.5)



You need to upgrade your kernel. It had a bug with eql targets in the
async logout path. It is fixed in 5.4 and 5.5 kernels.



Actually it was fixed in 5.3 too.

The weird thing here is that we get this error *before* we get the async
logout request.

Do you have multiple sessions? Is sdt accessed through session 22 (run
iscsiadm -m session -P 3 to see) or a different session? If a different
session is there more log and can you send it?



Could you also take a ethereal trace?

Looking at the iscsi code for that kernel the only place to get 
0x0007  seems to be this:


if (rhdr-response != ISCSI_STATUS_CMD_COMPLETED) {
sc-result = DID_ERROR  16;
goto out;
}

which means the target did not complete the command. The 
initiator/scsi-layer would have retried this command up to 5 times 
before failing.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: information on the config option -- node.session.iscsi.FastAbort = No

2010-04-28 Thread Mike Christie

On 04/28/2010 02:07 PM, maguar887 wrote:

Mike,

Thanks for the info!

Do you know exactly which kernel it was patched in?
what is available to us is:
 2.6.18-194.0.0.0.4.el5Matt


Forget the upgrade your kernel comment. The bug I was thinking about was 
fixed in the kernel you were using, 2.6.18-92.1.6.0.2.el5 (that is 
actually the kernel for RHEL 5.2 but you said you were using 5.3 btw).



Could you get me a ethereal trace, so I can see if the target is failing 
the IO on us?






Matt
On Apr 28, 1:58 pm, Mike Christiemicha...@cs.wisc.edu  wrote:

On 04/28/2010 12:56 PM, Mike Christie wrote:






On 04/28/2010 12:43 PM, Mike Christie wrote:

On 04/28/2010 10:40 AM, maguar887 wrote:

We are currently running open iscsi version 2.0-871 on RHEL 5.3
(2.6.18-92.1.6.0.2.el5) against a Dell Equallogic iScsi SAN group
(firmware 4.3.5)



You need to upgrade your kernel. It had a bug with eql targets in the
async logout path. It is fixed in 5.4 and 5.5 kernels.



Actually it was fixed in 5.3 too.



The weird thing here is that we get this error *before* we get the async
logout request.



Do you have multiple sessions? Is sdt accessed through session 22 (run
iscsiadm -m session -P 3 to see) or a different session? If a different
session is there more log and can you send it?


Could you also take a ethereal trace?

Looking at the iscsi code for that kernel the only place to get
0x0007  seems to be this:

  if (rhdr-response != ISCSI_STATUS_CMD_COMPLETED) {
  sc-result = DID_ERROR  16;
  goto out;
  }

which means the target did not complete the command. The
initiator/scsi-layer would have retried this command up to 5 times
before failing.

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group 
athttp://groups.google.com/group/open-iscsi?hl=en.- Hide quoted text -

- Show quoted text -




--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: information on the config option -- node.session.iscsi.FastAbort = No

2010-04-28 Thread maguar887
Sorry for the confusion, we had 2 seperate systems, and I pulled the
onfo from the wrong one.

The system with the issue turns out to be RHEL 5.2
with this kernel:
2.6.18-53.1.19.0.1.el5

I'll try and reproduce and get a packet capture

On Apr 28, 3:28 pm, Mike Christie micha...@cs.wisc.edu wrote:
 On 04/28/2010 02:07 PM, maguar887 wrote:

  Mike,

  Thanks for the info!

  Do you know exactly which kernel it was patched in?
  what is available to us is:
       2.6.18-194.0.0.0.4.el5Matt

 Forget the upgrade your kernel comment. The bug I was thinking about was
 fixed in the kernel you were using, 2.6.18-92.1.6.0.2.el5 (that is
 actually the kernel for RHEL 5.2 but you said you were using 5.3 btw).

 Could you get me a ethereal trace, so I can see if the target is failing
 the IO on us?







  Matt
  On Apr 28, 1:58 pm, Mike Christiemicha...@cs.wisc.edu  wrote:
  On 04/28/2010 12:56 PM, Mike Christie wrote:

  On 04/28/2010 12:43 PM, Mike Christie wrote:
  On 04/28/2010 10:40 AM, maguar887 wrote:
  We are currently running open iscsi version 2.0-871 on RHEL 5.3
  (2.6.18-92.1.6.0.2.el5) against a Dell Equallogic iScsi SAN group
  (firmware 4.3.5)

  You need to upgrade your kernel. It had a bug with eql targets in the
  async logout path. It is fixed in 5.4 and 5.5 kernels.

  Actually it was fixed in 5.3 too.

  The weird thing here is that we get this error *before* we get the async
  logout request.

  Do you have multiple sessions? Is sdt accessed through session 22 (run
  iscsiadm -m session -P 3 to see) or a different session? If a different
  session is there more log and can you send it?

  Could you also take a ethereal trace?

  Looking at the iscsi code for that kernel the only place to get
  0x0007  seems to be this:

            if (rhdr-response != ISCSI_STATUS_CMD_COMPLETED) {
                    sc-result = DID_ERROR  16;
                    goto out;
            }

  which means the target did not complete the command. The
  initiator/scsi-layer would have retried this command up to 5 times
  before failing.

  --
  You received this message because you are subscribed to the Google Groups 
  open-iscsi group.
  To post to this group, send email to open-is...@googlegroups.com.
  To unsubscribe from this group, send email to 
  open-iscsi+unsubscr...@googlegroups.com.
  For more options, visit this group 
  athttp://groups.google.com/group/open-iscsi?hl=en.-Hide quoted text -

  - Show quoted text -

 --
 You received this message because you are subscribed to the Google Groups 
 open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to 
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group 
 athttp://groups.google.com/group/open-iscsi?hl=en.- Hide quoted text -

 - Show quoted text -

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.