On 03/23/2010 10:13 AM, James Hammer wrote:
Mike Christie wrote:
On 03/22/2010 03:38 PM, James Hammer wrote:
Every time I reboot my server it hangs on the multipath devices.

The server is Debian based. I've had this problem with all kernels I've
tried (2.6.18, 2.6.24, 2.6.32). In /etc/multipath.conf, no_path_retry is
set to queue

Here are snippets from the reboot log:

<snip>
Stopping multipath daemon: multipathd.
...
Shutting down LVM Volume Groupsdevice-mapper: multipath: Failing path
8:64.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
device-mapper: multipath: Failing path 8:48.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
device-mapper: multipath: Failing path 8:80. mult

Are there file systems mounted on the multipath device?


As far as I can tell, there are *no* file systems mounted on the
multipath device. This multipath device is used by a virtual machine.
The virtual machine is turned off at that point. The 'mount' command on
the physical host does not list the multipath device as being mounted.

This is what I have found...I ran the whole shutdown sequence manually,
i.e. running each script in /etc/rc0.d manually in order (with
*no_path_retry* set to *queue*). Between each shutdown script, I ran
'*multipath -f mpath5*' to try and remove the multipath device manually.
Each time I got this result:

mpath5: map in use

All the way down until I got to the last 3 scripts:

S50lvm2 -> ../init.d/lvm2
S60umountroot -> ../init.d/umountroot
S90halt -> ../init.d/halt



When that lvm2 script gets run to shutdown lvm2, I again get the
"multipath: Failing path" results:

Shutting down LVM Volume Groupsdevice-mapper: multipath: Failing path 8:48.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
device-mapper: multipath: Failing path 8:80.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed
device-mapper: multipath: Failing path 8:64.
device-mapper: uevent: dm_send_uevents: kobject_uevent_env failed

That hangs indefinitely.

Now, if I do the same thing with *no_path_retry* set to *fail* the
sequence goes similarly, except that when I run */etc/init.d/lvm2 stop*
I get the same as above followed by a few of these lines:

/dev/dm-9: read failed after 0 of 2048 at 0: Input/output error
end_request: I/O error, dev dm-9, sector 20971776

Then the script finishes and the reboot can proceed.

So the key seems to be the *no_path_retry* setting.

 From my tests, things seem to go so much better if *no_path_retry* is
set to *queue* and the connection to the iSCSI server is interrupted.

So, is it possible to get those paths to "fail" with *no_path_retry* set
to *queue* so the reboot can continue?


I do not know if you can easily do this, and I am not sure if it is safe in your case. It seems like though from the first iscsi messages:

Disconnecting iSCSI targets:Logging out of session [sid: 1,....
Logging out of session [sid: 2,....
Logging out of session [sid: 3,....
sd 8:0:0:0: [sde] Synchronizing SCSI cache
sd 9:0:0:0: [sdd] Synchronizing SCSI cache
sd 10:0:0:0: [sdf] Synchronizing SCSI cache
 connection2:0: detected conn error (1020)
 connection1:0: detected conn error (1020)
 connection3:0: detected conn error (1020)
Logout of [sid: 1...successful
Logout of [sid: 2...successful
Stopping iSCSI initiator server:.

that the iscsi layer has logged out of the sessoins and cleaned up at its layer, so at this point no IO is going to get executed.

The problem and reason I do not think it is safe to rerrun with no_path_retry 0, is that there is still IO somewhere in the multipath/block layer queues. When you see:

> /dev/dm-9: read failed after 0 of 2048 at 0: Input/output error
> end_request: I/O error, dev dm-9, sector 20971776

It means some IO that was in that queue failed. If it was a write to some disk it means that you lost data.

What you/(the debian scripts) want to do is shutdown multipath first, so the higher level queues have flushed they data out. Then shut down iscsi.

Or do something to flush the multipath queues and shut that down, then shutdown iscsi.

--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to