On 07/01/2011 08:57 AM, Seth Simons wrote:
> Unfortunately there's very little technical documentation. This is one
> of HP's 'low end' SANs. I'll see if HP support has anything to say
> about it.
> 
> I'm not really having luck with be2iscsi, now that i've managed to get
> it to login to the target, I'm getting the following kernel panic
> after around 15-20 minutes of the machine being up and logged in
> (copied from the console). Any ideas? I am using multipath.
> 

The hung task warnings are saying that some IO has taken longer than the
hung task timeout value which looks like it is 2 miniutes for you.

Are you doing any type of port down/up type of test?

> Debian GNU/Linux 6.0 thm-vmutil01 ttyS0
> 

Is there any line before this?

> thm-vmutil01 login: [ 2642.519605]  connection1:0: Could not send nopout

The iscsi layer sends a iscsi ping (nop iscsi packet) every
noop_interval (see iscsid.conf) seconds if there is no scsi traffic or
if the target sends us a nop (target is pinging us) and we are replying
to it.

That message indicates that the iscsi layer could not send a nop. A
failure could happen if the target sends us a lot of nops and we cannot
allocate memory to reply or if the driver cannot allocate memory or if
the session was not logged in.


> [ 2662.491488] (beiscsi_process_cq():1953):CQ Error 13, reset CID 0x0...

Here we see the target drop the connection.

> [ 2662.547658]  connection1:0: detected conn error (1011)


iscsi layer logging that.


So what could have happened is that the target sent us a ping, we could
not allocate resources and so the target dropped the connection.

Or

The log messages got logged out of order (the sending of the nop and
handling of the target drop could happen on different processors) and
what happened was the target dropped the connection, that would set the
session to not logged in state, and that would cause the Could not send
nop message/failure.



> [ 2683.577311] device-mapper: multipath: Failing path 8:0.


Multipath sends a test command every so often. It just figured out that
the path that was affected by the target dropping the connection is down.


> [ 2791.052787]  connection2:0: Could not send nopout
> [ 2811.027636] (beiscsi_process_cq():1953):CQ Error 13, reset CID 0x40...
> [ 2811.084076]  connection2:0: detected conn error (1011)


Same thing happens to session2.

> [ 2833.072003] sd 0:0:0:0: timing out command, waited 180s

Here it means the command has been running for at least 180 secs. The
scsi layer is now going to fail it.

> [ 2833.120683] sd 0:0:0:0: [sda] Unhandled error code
> [ 2833.162344] sd 0:0:0:0: [sda]  Result: hostbyte=DID_IMM_RETRY


Here is the strange thing. When the session goes down the iscsi layer
will temporarily requeue the IO with the code DID_IMM_RETRY. At the same
time the iscsi layer will set the devices/paths into the blocked state
(see /sys/block/sdX/device/state). And so we should not be seeing that
waited 180 secs error.

What should happen is that the iscsi layer will block the devices/paths,
and IO will be queued. Then if we can log back in we will start IO again
or if we cannot log in within node.session.timeo.replacement_timeout
seconds, the iscsi layer will unblock the devices/paths and fail IO
upwards to the block/multipath/FS layers.

So with the default setting of the replacement_timeout (120 secs) you
should be seeing a message:

session recovery timed out after X secs

before you see hung task message below.

Is this easy to replicate? There is just too much going wrong here. If
it happens again, can you do

cat /sys/block/sdX/device/state

and tell me the values and run

iscsiadm -m session -P 3

and send all the output?

Would you also be able to run a patch that will add some extra debugging
to the driver and iscsi layer?

I will try to contact HP and get access to a box like this. Jay is
leaving on vacation so I do not think he will be able to help for a
couple days.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to