If you kill iscsid and you get

- I was able to killall -9 iscsid which then spit this out
       Logging out of session [sid: 1, target: iqn.yadayada.example.com,
portal: 192.168.2.32,3260]
       iscsiadm: got read error (0/0), daemon died?
       iscsiadm: Could not logout of [sid: 1, target:
iqn.yadayada.example.com, portal: 192.168.2.32,3260]
       iscsiadm: initiator reported error (18 - could not communicate to
iscsid)
       iscsiadm: Could not logout of all requested sessions

then it means iscsiadm was still trying to logout the sessions and
cleanup. The disk could be in recovery.

Killing iscsid will not do anything nice btw, so do not do it.


When you say iscsiadm does nothing, what do you mean? Does it return
right away or does it just hang? It sounds like it is hanging waiting
for the kernel. In this case before you logout could you do:

iscsiadm -m session -P 3

and send that output?

Could you also send the /var/log/messages outpout from the time of the
failure until all the tests are run so I can see what happened to the
device.

And then before you run your logout test do:

[note these are the names for rhel6 and in rhel 5 it will be different]
echo 1 > /sys/module/scsi_transport_iscsi/debug_session
echo 1 > /sys/module/scsi_transport_iscsi/debug_conn
echo 1 > /sys/module/libiscsi/debug_libiscsi_session
echo 1 > /sys/module/libiscsi/debug_libiscsi_eh


then run your logout test and then send all the /var/log/messages output.



On 03/19/2012 07:50 AM, awiddersh...@hotmail.com wrote:
> I am having a similar issue. When our iSCSI disconnects longer than the
> node.session.timeo.replacement_timeout it then from what I understand
> hands back control to multipath if you are using it or the
> application/kernel to handle the case of a disk going completely
> missing. The typical response is then to remount the disk read-only and
> it seems the only way to get it back to rw after the connection is
> established is to completely umount and mount which is not really ideal.
> 
> Sometimes though the disk will get into some wierd funk state where when
> you umount it just hangs. If you try to log out of your sessions it
> hangs. If you try to stop iscsid it hangs. If you kill -9 iscsid it
> doesn't die. Everything becomes pretty much unusable and then you have
> no choice but to hard power the system. The disks just get stuck
> there... I'm guessing because I/O got queued.
> 
> Seems the only way to avoid situations like this is to set the timeout
> fairly high so it just continues to block I/O and never reaches this stage.
> 
> On Wednesday, June 25, 2008 10:40:36 AM UTC-4, Mike Christie wrote:
> 
>     An Oneironaut wrote:
>     > Hey all,
>     >
>     >    So here is an update.  I tried multi-path and it made no difference
>     > for my problem.  I was able to get umount to work, however the problem
>     > just propagated over to logout.  So for example here is what I would
>     > do in error recovery.
>     >
>     > 1.  Disconnect the iSCSI
>     > 2.  Wait for the timeout.
>     > 3.  Wait till all the iSCSI debug messages subside and the volume gets
>     > remounted as readonly.
>     > 4.  Reconnect the iSCSI
>     > 5.  Execute a 'umount -lf' on the mount point <-- IN the past I had
>     > stopped my app from trying to write here, now I just umounted.
>     > 6.  Execute a logout: iqn.2000-08.com.intransa:ivsms.dg1.storage1 --
>     > logout
>     > -->  It ends up freezing here.
> 
> 
>     The system hangs or does iscsiadm hang?
> 
>     >
>     >      I tried a bunch of stuff to get around this, but nothing worked
>     > for me.  So I upgraded my kernel from 2.6.16 to 2.6.18 and all of a
>     > sudden the problem was gone.  I ran about 50 tests and could not
>     > reproduce.  No I'm wondering what could have changed from 2.6.16 to
>     > 2.6.18 that would remedy this problem.  I'd rather just simply patch
>     > 2.6.16 with whatever changes make this work, because its hard for me
>     > to upgrade to a new kernel without causing other problems.  Does
>     > anyone out there know?  I'm going to look through the change logs and
>     > see if anything stands out but if any of you people know I'd
>     > appreciate your input.
>     >      Mike you said in your response:
>     >
>     > "The other thing is that if you get those errors and then you pull the
>     > cable back in so IO can execute again, you are in a wierd state. I am
>     > not sure what the FS will do and supports at the point it is already
>     > had
>     > some IO failed."
>     >
>     >     Is this implying that the filesystems themselves are the culprit?
>     >
> 
>     I do not know exactly. You would have to ask the FS guys. Give them the
>     FS and journal errors and they can described what state the fs is and
>     what can be done.
> 
>     I can only help with the iscsi part :) If #6 causes the system to hang
>     let me know. iscsiadm logout should not do this.
> 
>     > Thanks.
>     >
>     > On Jun 23, 1:53 pm, Mike Christie <micha...@cs.wisc.edu> wrote:
>     >> An Oneironaut wrote:
>     >>>      I posted a little while back about this, but I still seem to be
>     >>> having trouble with this issue.  Originally I tried to setup my
>     iSCSI
>     >>> connection so that it had a 24 day timeout period and the no-op
>     timers
>     >>> would be disabled.  However this timeout led to a variety of issues
>     >>> including causing umount, reboot, and other commands to hang.
>     >> Did you get IO errors before you tried those commands? If IO is still
>     >> internally queued, you would want to run the iscsiadm command to
>     logout
>     >> which in this case would just fail everything. When the FS gets
>     all the
>     >> errors for the IO it had outstanding, I think you can then forcable
>     >> unmount it.
>     >>
>     >> The reboot command is going to hang if you have IO queued still. You
>     >> need to do the logout command first (the iscsi init scripts
>     should force
>     >> a logout too).
>     >>
>     >>
>     >>
>     >>>      So in the end the long timeout proved to be too much
>     trouble so I
>     >>> moved back to the 120s timeout with noop timers enabled.
>      However even
>     >>> this is causing me trouble.
>     >>>      Currently I am using my iSCSI device to store video which
>     means I
>     >>> am sending a large amount of data over the network into my
>     device at a
>     >>> pretty high rate.  In my tests if I cut the connection sometimes
>     >>> things will work out fine.  The connection gets cut and after 120s I
>     >>> get a whole slew of iSCSI "queuing" errors and such and finally the
>     >>> iSCSI device gets remounted as read only.  Once the error messages
>     >>> stop if I stop all of my video archiving, reconnect the iSCSI
>     device,
>     >>> logout, umount the iSCSI, remount the iSCSI, log back in, and
>     restart
>     >>> my video archiver everything will work fine.
>     >>>     However in other cases when i cut the connection the iSCSI
>     debugs
>     >>> won't be as numerous and it goes to read only mode almost
>     >>> immediately.  When I try the above steps to recover my system hangs
>     >>> like before on the umount.  I used KDB to get a dump of what is
>     going
>     >>> during the umount and will add it to this message.  It appears that
>     >>> the umount process has context switched out waiting for the io to
>     >>> complete. The io to be completed are 'sync'ing of the buffers which
>     >>> never happens or completed and does not wake up umount.
>     >>>     I've tried numerous things to get around this umount issue
>     >>> including a variety of umount flags, the remount command, long
>     delays
>     >>> in my code and the kernel code.  But nothing has worked up to this
>     >>> point.  I'm currently working on version 2.6.16 of the kernel with
>     >>> open iSCSI version 2.0-865.9.  I am going to try the latest and
>     >>> greatest to see if that helps at all.  I'm convinced that the
>     current
>     >>> problem has something to do with a quick change to 'ro' mode vs a
>     >>> slower change to 'ro' mode after timeout.
>     >>> If you guys have any advice or insight I'd appreciate the help.  I
>     >>> posted the debug in the files section.  The filename is
>     >>> umount_hang.rtf.  I've bolded the area where the umount gets called.
>     >> Is this the script you are running? I did not see the bolded stuff.
>     >>
>     >> #!/bin/bash -x
>     >> umount -f /media1_0
>     >> rmdir /media1_0
>     >> iscsiadm -m node -p 172.19.153.14:3260
>     <http://172.19.153.14:3260>,0 -T
>     >> iqn.2000-08.com.intransa:ivsms.dg1.storage1 --logout
>     >> iscsiadm -m node -p 172.19.153.14:3260
>     <http://172.19.153.14:3260>,0 -T
>     >> iqn.2000-08.com.intransa:ivsms.dg1.storage1 -o delete
>     >>
>     >> If you run this command and IO is queued due to the
>     replacement_timeout
>     >> (if you pulled a cable then the initiator detected it and was
>     trying to
>     >> log back in) not expiring yet then the unmount is going to hang
>     or fail
>     >> or who knows. It is not going to do what you want though.
>     >>
>     >> If you are trying to force the unmount and do not care about data
>     >> getting writen then you can just do the logout command, then do the
>     >> unmount command.
>     >>
>     >> The other thing is that if you get those errors and then you pull the
>     >> cable back in so IO can execute again, you are in a wierd state. I am
>     >> not sure what the FS will do and supports at the point it is
>     already had
>     >> some IO failed.
>     > >
>     >
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "open-iscsi" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/open-iscsi/-/7mealYvYOrsJ.
> To post to this group, send email to open-iscsi@googlegroups.com.
> To unsubscribe from this group, send email to
> open-iscsi+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/open-iscsi?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to