On 02/22/2010 09:32 AM, Or Gerlitz wrote:
Mike Christie wrote:
2. I wasn't sure if there is and if yes what is the transport role in
detecting session failure.
It varies from transport to transport.
For iscsi_tcp we do not really have a nice way to figure out if the
someone just tripped over a cable so that is where the nop comes from.
We can tell if the tcp state changes and so you can see
iscsi_tcp_state_change notify the upper layers of a problem for that.
understood. Still, the noop-out based watch-dog serve all transports, correct?
I'd like to narrow down things and understand if/what is the transport role:
For the nop out path, the trasnport just has to send/recv the nop
Some iscsi drivers will runn iscsi_conn/session_failure when they
discover a link down event or someone doing ifdown. I thought this is
sort of what you are able to do with iser_cma_handler->iser_disconnected_handler
or with the call to iscsi_conn_failure in iser_handle_comp_error
Yes, we call iscsi_conn/session_failure but I wasn't really sure if multipathing
works for non tcp transports if they never make these calls or they have to.
They do not have to make those calls for multipath to work. Multipath
will work better if the transport can signal when there is a problem,
because we can stop using a bad path and get IO going to a working path
faster. If the transport does nothing then we have to rely on the scsi
error handler/timeout to detect the problem and that is very slow.
If there are other places you can detect a link failure type of problem
you would want to call iscsi_conn_failure, so the iscsi layer can begin
trying to recover the connection and let dm-multipath know there is a problem.
I understand that once there's timeout on the noop out watch-dog, the iscsi
will call ep_disconnect, correct? currently our ep_disconnect is sometime too
Yes. You should also change your ep_disconnect because it is not
supposed to block (did we talk about this or was this just bnx2i), since
it will stop iscsid from processing other events.
and I can change that. But, still I wasn't sure if for iscsi to let
there is a failure something is needed at the transport side or not...
I do not think there is anything special. It should handle a error like
it would if multipath was not used. The user will set the iscsi timers
like replacement_timeout and nop timeout differently if they are using
I do see that there's an shost param to ep_connect, is there a way it
can give me a hint on the source IP?
I do not think it can help iser as it is today. Remember when we talked
about a shost per some physical/virtual resource vs a shost per session.
This is another place where that came in. bnx2i, cxgb3i and be2iscsi
allocate a host per port/netdev, so that is how they know the src they should
I will have to think about how to do it for iser as it is today with the host
how about extending the ep_connect user/netlink/kernel/iscsi_transport
framework to support
the functionality provided by the user space code of bind_conn_to_iface or
basically, since the connection establishment framework is IP based, I would
to just get some source ip in the kernel when ep_connect is called. I saw the
comment on why bind_src_by_address is problematic, but this doesn't apply to
Which comment are you talking about? Are you talking about bind() not
doing what you would want for iscsi_tcp (target sometimes sends data to
the wrong port) or are you talking about if you were to use DHCP and so
the IPs could change over boots?
A question for you. Some people do not like using the the netdev name
for the binding since it can change between boots. The default method is
to use iface.hwaddress instead of iface.net_ifacename. For iscsi it is
just the MAC. For iser how big is the RNIC's equivalent of the MAC?
iser is working now over IB and at some point we'll make it work also over
With IB, the RNIC is IPoIB NIC whose HW address (equiv of MAC) is 20 bytes long.
It turns out that some of these 20 bytes may change... the part which is burned
So is there anything in there that is static and can be used to identify
is called GUID and is 8 bytes long, here you see two IPoIB NICs, ib0 and ib1
port GUIDs they are using are 00:02:c9:03:00:02:6b:df and
7: ib0:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast qlen 256
8: ib1:<BROADCAST,MULTICAST> mtu 2044 qdisc pfifo_fast qlen 256
If you really interested to learn how these 20 bytes are composed its in the
flags:QPN:GID (1:3:16 bytes) where GID is of the form PREFIX:GUID (8:8 bytes) do
wget http://ietf.org/rfc/rfc4391.txt and see section "9.1.1. Link-Layer
Note that the ifconfig output is buggy so you should use $ ip address show
anyway, I wasn't really sure if/how the iface binding by hw address is working
in open iscsi, specifically, I wasn't able to track which library exports
net_get_netdev_from_hwaddress ... but I am quite sure this (binding iface to hw
That iscsi code actually uses the same sys/lib calls as ifconfig.
address and not netdev) works well for iscsi-tcp and offloads, correct?
Bind by hw address or netdev works with bnx2i and cxgb3i, because they
are tied to the netdev and export both values.
be2iscsi and qla4xxx uses bind by hwaddress, because they have no
interaction with the network subsystem so it only has the hw address.
You received this message because you are subscribed to the Google Groups
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to
For more options, visit this group at