What version of open-iscsi and kernel are you using? And are you using the kernel modules with open-iscsi or the ones that come with the kernel?
Nicholas A. Bellinger wrote: >> >> The problem is that the failure of the outstanding I/Os does not seem to >> be occuring in all cases. In particular, a iscsiadm --logout I believe >> is getting issued, and said logout request failing/timing out because >> DRBD_TARGET has been released. It is at this point where umount for the >> ext3 mount and/or sync hangs indefinately. When the problem occurs, it >> looks like this from the kernel ring buffer: >> >> iscsi_deallocate_extra_thread_sets:285: ***OPS*** Stopped 1 thread set(s) (2 >> total threads). >> iscsi_deallocate_extra_thread_sets:285: ***OPS*** Stopped 2 thread set(s) (4 >> total threads). >> session10: iscsi: session recovery timed out after 120 secs >> sd 51:0:0:0: scsi: Device offlined - not ready after error recovery If you see this then any and all that was sent the device and any new IO should be failed to the FS and block layer like below. There is a bug in some kernels though, where if you were to run a iscsiadm logout command it can hang and lead to weird problems, because the scsi layer is broken. If you use open-iscsi 869.2's kernel modules or the iscsi modules in 18.104.22.168 or newer then this is fixed. Not sure if that is what you are seeing, because we see IO failed upwards here. Also once we see "Device offlined", the scsi layer is going to fail the IO when it hits the scsi prep functions and is never even reaches us. If there is IO stuck in the driver you could do cat /sys/class/scsi_host/hostX/host_busy to check (that file prints the number of commands the scsi layer has sent the driver and the driver has not yet returned back (ok so I mean how many commands is outsatnding)). >> sd 51:0:0:0: [sdg] Result: hostbyte=DID_BUS_BUSY >> driverbyte=DRIVER_OK,SUGGEST_OK >> end_request: I/O error, dev sdg, sector 0 >> Buffer I/O error on device sdg, logical block 0 >> lost page write due to I/O error on sdg >> >> I should mention that we are not doing any I/O to said iSCSI LUN via >> Open/iSCSI other than the filesystem metadata for ext3 umount and >> SYNCHRONIZE_CACHE CDB during struct scsi_device deregistration. From >> experience with Core-iSCSI, I know the logout path is tricky wrt >> exceptions (I spent months on it to handle all cases with Immediate and >> Non Immediate Logout, as well as doing logouts on the fly from the same >> connection in MC/S and different connections in MC/S :-) >> >> So the question is: >> >> I) When a ISCSI_INIT_LOGOUT_REQ is not returned with a >> ISCSI_TARGET_LOGOUT_RSP and replacement_timeout fires, are all >> outstanding I/Os for that particular session being completed with an >> non-recoveryable exception..? Has anyone ever run into this case and/or >> tested it..? If the connection is down when you run iscsiadm logout, we will not send a logout and the replacement_timeout does not come into play. We just fast fail the connection and just cleanup the commands and kernel resrouces and iscsiadm returns (yeah pretty bad I know - it is on the TODO). If the connection is up when you run iscsiadm logout, and while the logout is floating around the connection drops, we are again lazy and just fail and cleanup and return right away. The replacement_timeout does not come into play for this and we just fail right away. If you run 869.2 from open-iscsi.org and build with make DEBUG_SCSI=1 DEBUG_TCP=1 make DEBUG_SCSI=1 DEBUG_TCP=1 install and send all the log output I can tell you better what is going on. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to email@example.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---