We make a application that calls the iscsi tools to mount
and iscsi server. We then generate file-IO against the
mounted disks to load-test the iscsi servers...

One customer is testing failover in their iscsi server
and it is causing our system to crash.

I am curious if anyone has any ideas about what the problem
might be?

sd 11:0:0:3: [sdk] Result: hostbyte=DID_TRANSPORT_FAILFAST

If you look at more of the logs do you see a

session recovery timed out after X secs

message before you start to see the messages you posted?

The DID_TRANSPORT_FAILFAST means something happened to the connection. We tried to relogin for node.session.timeo.replacement_timeout seconds. We couldn't, so we failed the IO upwards the scsi/block/FS layers.

What do you mean by iscsi server? Is the iscsi server the iscsi target or does iscsi server mean the server running the iscsi initiator?

And what does failover mean in this context and how were you testing it? Some sort of cluster failover across iscsi targets? Failover across portals on the target? Were you disabling/enabling controllers, starting/stopping targets? Did you mean to use dm-multipath with iscsi?

