Re: Open/iSCSI + Logout Response Timeout + replacement_timeout firing

2008-06-30 Thread Konrad Rzeszutek

 Ah if your disk are using write back cache then you are going to hit 
 some problems. So if you see this in /var/log/messages when you loging:
 
 kernel: sd 9:0:0:1: [sdb] Write cache: enabled,
 
 then later when you run iscsiadm to log out you see:
 
 kernel: sd 9:0:0:1: [sdb] Synchronizing SCSI cache
 
 Then you are going to hit problems due to the scsi sysfs interface 
 changing on us. iscsiadm is going to hang. IO is going to hang. You 
 basically have to reboot the box by hand.

Mike,

Are you sure about this? When the sysfs entries are deleted (during the 
iscsiadm logout phase), the SCSI ml finishes all of the I/Os and the last
operation is sending the SCSI Cache command. Wouldn't that quiesce I/O ? Granted
this means doing these steps which are outside the normal iscsiadm:
 1). flush dirty pages (call 'sync')
 2). delete the sysfs entries (echo 1  /sys/block/sdX/device/delete)
 3). wait until /sys/class/scsi_host/hostZ/host_busy reaches zero
 4). iscsiadm -m logout


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi configuration for wasabi storage builder

2008-06-30 Thread Dominik L. Borkowski

On Friday 27 June 2008 15:48:13 Mike Christie wrote:
  Would it be worth getting sniffer dump from the existing 2.0-866
  initiator?


I placed the dump at:
http://staff.vbi.vt.edu/dom/debug/debug.tar.bz2

In that archive I included tcpdump, sample script session of what commands 
were issued, initiator configs and the kernel logs. 

Not sure if joining this thread with Ken's would be a good idea. Somehow I 
didn't notice his, when I was searching the archive. However, a lot of his 
symptoms fit my problem, including the os completely freezing.

Thank you,
dom

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Open/iSCSI + Logout Response Timeout + replacement_timeout firing

2008-06-30 Thread Mike Christie

Nicholas A. Bellinger wrote:
 On Sun, 2008-06-29 at 01:36 -0500, Mike Christie wrote:
 Nicholas A. Bellinger wrote:
 Hi Mike!

 Hey, looks like you doing some more cool stuff. Don't you have a job 
 where you have to hack on boring stuff like the rest of us :)

 
 What Jerome said. :-)
 
 On Sat, 2008-06-28 at 15:33 -0500, Mike Christie wrote:
 What version of open-iscsi and kernel are you using? And are you using 
 the kernel modules with open-iscsi or the ones that come with the kernel?

 Whoops, forgot to include that tid-bit:

 open-iscsi: 2.0.730-1etch1

 kernel: I am using v2.6.22.19-kdb, and Jerome is using
 v2.6.22-4-vserver-amd64

 Ah if your disk are using write back cache then you are going to hit 
 some problems. So if you see this in /var/log/messages when you loging:

 kernel: sd 9:0:0:1: [sdb] Write cache: enabled,

 then later when you run iscsiadm to log out you see:

 kernel: sd 9:0:0:1: [sdb] Synchronizing SCSI cache

 Then you are going to hit problems due to the scsi sysfs interface 
 changing on us. iscsiadm is going to hang. IO is going to hang. You 
 basically have to reboot the box by hand.

 
 Yep, so the LIO-CORE does NOT emulate Write Cache Enable (although there

Oh. There are some other issues that we can hit with IO getting failed 
and applications hanging waiting for IO to be sent but it gets lost. Try 
the newer code where all that should be fixed.

 is some Read Cache code :-) bit in the caching mode page in the virtual
 storage object case (IBLOCK, FILE, etc).  We are using the LIO-Core
 IBLOCK plugin for export DRBD's struct block_device, which uses the
 generic emulation of MODE_SENSE*, which leaves buf[2] untouched in
 caching mode page.  
 
 For the PSCSI subsystem plugin, these mode pages obviously come from the
 underlying hardware.  I even recall that libata does emulate this down
 to SATA/PATA bits for us. (thanks jgarzik and co)  
 
 On a related note, did you and Tomo ever get around to implement
 write/read caching emulation in STGT for virtual devices..?  Its

I am not sure the exact status. We use the buffer cache normally and so 
I think we always report that we support write caching. I think someone 
just sent patches to turn it off.

 definately something that is on my long term list for doing generically
 amoungst virtual devices in LIO-Core v3.x.
 
 Many thanks for your most valuable of time,
 

No problem.

 --nab
 
 
 
  


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Open/iSCSI + Logout Response Timeout + replacement_timeout firing

2008-06-30 Thread Mike Christie

Konrad Rzeszutek wrote:
 Ah if your disk are using write back cache then you are going to hit 
 some problems. So if you see this in /var/log/messages when you loging:

 kernel: sd 9:0:0:1: [sdb] Write cache: enabled,

 then later when you run iscsiadm to log out you see:

 kernel: sd 9:0:0:1: [sdb] Synchronizing SCSI cache

 Then you are going to hit problems due to the scsi sysfs interface 
 changing on us. iscsiadm is going to hang. IO is going to hang. You 
 basically have to reboot the box by hand.
 
 Mike,
 
 Are you sure about this? When the sysfs entries are deleted (during the 
 iscsiadm logout phase), the SCSI ml finishes all of the I/Os and the last
 operation is sending the SCSI Cache command. Wouldn't that quiesce I/O ? 
 Granted

See below. You are right if everything goes ok.

 this means doing these steps which are outside the normal iscsiadm:
  1). flush dirty pages (call 'sync')
  2). delete the sysfs entries (echo 1  /sys/block/sdX/device/delete)
  3). wait until /sys/class/scsi_host/hostZ/host_busy reaches zero
  4). iscsiadm -m logout



The problem that I described occurs when we run the iscsiadm logout 
command and we used the sysfs delete file. When iscsiadm wrote to that 
attr in 2.6.18 it would return when the cache sync was sent and the 
device was fully deleted in the kernel. In 2.6.21 and above it returned 
right away. So iscsiadm's logout code would write to that attr and think 
the devices were cleaned up, then it would call the iscsi shutdown code 
which would send a logout and cleanup the kernel session, connection and 
host structs, thinking that the devices were properly flushed but IO 
could still be waiting to get written so all types of fun things can 
happen like

We could get to the scsi host remove call and all the scsi device sysfs 
delete calls would still be starting up, so the host remove call and 
those delete calls would race (so this is we would have bypassed the 
host_busy check in the connection deletion function in the kernel). When 
this happens if the sysfs delete device got the scan mutex first, but 
the iscsi shutdown code had blocked the devices, while we were trying to 
remove the host then the iscsiadm logout command will hang, because the 
delete device would wait forever to try and send the command (it is not 
yet in the host so the command timer is not running and the device is 
blocked), and the remove host call is waiting on the scan mutex which 
the device has.

If you have multiple devices then the remove host command can also end 
up failing IO, because we will have sent the logout and later set the 
session internal state to terminiate and incoming IO on the other 
devices that was queued will be failed when the remove devices functions 
flush the IO.

If you do not have a write back cache we have other problems, where IO 
can be failed when it should not have for the reason above where the 
logout is sent, the terminate bit is set, and the remove host runs 
before the devices were properly removed and that causes IO to be failed.

And actually in some kernels you can hang (the app would hang not 
iscsiadm in this case) when a cache sync was not needed, because if a 
cache sync was not needed when we would remove the host and it would 
delete the device but IO would be stuck in the queues and no one did a 
unplug of the queue when the scsi device was removed. We added a 
iscsi_unblock_session in the iscsi_session_teardown to flush everything 
so at least apps would not hang there (but that resulted in IO getting 
failed like above).

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---