Re: iscsi ifaces / multipathing / etc

2010-03-08 Thread Or Gerlitz
Mike Christie wrote:
 I started to do a pseudo patch to see how it would work 

With the DHCP thing at hand possibly changing the initiator IP address, I think 
what we want to do for iser is use NIC names for ifaces as I do now for 
iscsi/tcp with the user taking care of NIC name persistently across reboots 
through udev et al. I hope we can implement a loop which goes over the ip 
address associated with this NIC and finds one of them which in the subnet of 
the target portal we're connecting to. This would be the source ip fed into the 
enhanced ep_connect exported by iser in the kernel.

 # ip r s
 10.10.1.91 dev ib1  scope link
 10.10.0.91 dev ib0  scope link
 10.10.0.0/16 dev ib0  proto kernel  scope link  src 10.10.5.157
 10.10.0.0/16 dev ib1  proto kernel  scope link  src 10.10.5.158

 With this do you get 2 paths then? For iface binding we want to try and
 get 4. So if got really unlucky and ib0's port and the target portal
 connected to ib1 died, then you could still have two usable paths.

yes, I have two paths this way which serves me well at this point. 

Four paths may be problematic in case the system is a small scale one, e.g 
wired with two switches, each connected to all initiators and targets and 
there's 1-2 inter-switch-links (ISLs). Two paths are passing on ISL and should 
be prioritized lower then the two which don't, is there a way to do that?

Or.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-03-03 Thread Or Gerlitz
Mike Christie wrote:
 Ok let me clarify one other thing too. With the network layer we get a
 ethX, or whatever you set up your udev rules to use, for each port. That
 is what I am saying above represents the device name and port. The iscsi
 drivers that interact with the linux net layer like bnx2i, cxgb3i and
 iscsi_tcp supports using the netdevice name (same name you see in
 /sys/class/net or ifconfig) for iscsi iface binding.

yep, understood.

 We are talking about the iscsi iface.net_ifacename param right? It is
 the name you see in sysfs or in ifconfig. You set it to
 iface.net_ifacename and it gets passed directly down to in the
 BINDTODEVICE setsockopt.

sure, understood as well.

 For drivers that do not have a netdevice name because they do not
 interact with the network layer (they have an entire net stack in
 firmware) we use the MAC. So this includes qla4xxx and be2iscsi, and
 iscsi_tcp also supports this. In that case we loop over the net
 interfaces and match the MAC with a ethX and then pass that to
 BINDTODEVICE setsockopt.

understood, as I have explained, mac is problematic for iser from two reasons:
under bonding it will not function well (we use the fail_over_mac option of 
bonding),
and also since some of the mac isn't persistent today so we have to enhance the 
iscsi code to allow using only a portion of it.


Or.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-03-03 Thread Or Gerlitz
Mike Christie wrote:
 Maybe that is where I am missing your point. One of the reasons for this
 is to tell the kernel to ignore the routing rules and use the port we
 want. So I was saying if there is no a API to do this then just add one
 like someone did for tcp and the socket code.

bind-to-device like interface doesn't exist in the rdma-cm api which is based 
now only on IP, however, as I wrote, once you know a source ip, it is possible 
to bypass the routing rules by call rdma_bind on this ip or providing it as the 
source address in rdma_resolve_addr. With this at hand, I don't think I want to 
add new interface to the rdma-cm now.

 I understand using IP is a bit problematic when DHCP is used for the
 initiator, let me think about that a bit, thanks

 How about this, We can add a:
 ISCSI_UEVENT_TRANSPORT_EP_CONNECT_THROUGH_SRC = UEVENT_BASE + 20
 And for that we can pass the src addr after the dest addr like how you wanted

yes, basically, I think this is the way / route (...) I want to go through. The 
down 
side is this touching three pieces of the code (user-space, kernel transport, 
and iser), but I don't think we have much (any) other choices.

 
 For userspace, I think it is wrong to make the user interface based on
 the kernel API. If we used something like the name you see in ifconfig
 (ibX is the default for most drivers I think) then users can set the
 iface.net_ifacename to it. iscsid can just loop over the interfaces and
 match ibX with the ip and pass that down. The loop would be pretty much
 the same as in get_hwaddress_from_netdev() where we loop over the
 interfaces but it would do

If we are going on proving a source IP to the kernel then why go through 
the code that loops on interfaces? is this b/c today the semantics of s 
iface.ipaddress is set this address to the net device associated with iscsi 
offload? such that you don't want to create confusion/complexity? if this is 
the case, and you want the iser users to specify netdevice name such that user 
space translates it to source ip, then 
the code below has to be someone more involved:

 
 transport.c:
 
 iser_transport {
 .ep_connect = iser_ep_connect;
 };
 
 somewhere.c:
 getifaddrs()
 
 for (ifa = ifap; ifa; ifa = ifa-ifa_next) {
 
 iser_ep_connect()
 if (!strcmp(ifa-ifa_name, match_name))
 ktransport_ep_connect( ifa-ifa_addr);

here this comparison isn't sufficient as someone might put few IPs on the 
net-device
and we want to find one of them which is in the subnet of the target portal 
(sorry...)

Or.

 
 
 
 netlink.c
 ktransport_ep_connect( , struct sockaddr *src_addr)
 {
 if (srcaddr)
 ec-type = ISCSI_UEVENT_TRANSPORT_EP_CONNECT_THROUGH_SRC
 else if
 else
 
 .
 
 memcpy(setparam_buf + sizeof(*ev), dst_addr, addrlen)
 if (srcaddr)
 memcpy(setparam_buf + currlen, src_addr, src_addrlen)
 
 }
 
 For the userspace interface for the user we can do pretty much anything.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-03-02 Thread Or Gerlitz
Mike Christie micha...@cs.wisc.edu wrote:

 If the device name and port do not change normally that seems better to
 me since it works like the other drivers

just to be on the same page, when you say the other drivers
(=transports) this doesn't include tcp, correct? for the tcp what's
supported in the --net-- device name such that a BINDTODEVICE
setsockopt is done to bind the socket to this netdevice.

As for the IB device name and port number, indeed they don't change
normally, but OTOH the rdma-cm api which iser is using is IP base and
reolves IB local/remote addresses (= GUIDs which relate to
devices/ports) from IP addresses/routing rules and not the other way
around.

I understand using IP is a bit problematic when DHCP is used for the
initiator, let me think about that a bit, thanks

Or.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-03-02 Thread Mike Christie

On 03/02/2010 02:42 PM, Or Gerlitz wrote:

Mike Christiemicha...@cs.wisc.edu  wrote:


If the device name and port do not change normally that seems better to
me since it works like the other drivers


just to be on the same page, when you say the other drivers
(=transports) this doesn't include tcp, correct? for the tcp what's


Ok let me clarify one other thing too. With the network layer we get a 
ethX, or whatever you set up your udev rules to use, for each port. That 
is what I am saying above represents the device name and port. The iscsi 
drivers that interact with the linux net layer like bnx2i, cxgb3i and 
iscsi_tcp supports using the netdevice name (same name you see in 
/sys/class/net or ifconfig) for iscsi iface binding.





supported in the --net-- device name such that a BINDTODEVICE
setsockopt is done to bind the socket to this netdevice.


We are talking about the iscsi iface.net_ifacename param right? It is 
the name you see in sysfs or in ifconfig. You set it to 
iface.net_ifacename and it gets passed directly down to in the 
BINDTODEVICE setsockopt.



For drivers that do not have a netdevice name because they do not 
interact with the network layer (they have an entire net stack in 
firmware) we use the MAC. So this includes qla4xxx and be2iscsi, and 
iscsi_tcp also supports this. In that case we loop over the net 
interfaces and match the MAC with a ethX and then pass that to 
BINDTODEVICE setsockopt.





As for the IB device name and port number, indeed they don't change
normally, but OTOH the rdma-cm api which iser is using is IP base and
reolves IB local/remote addresses (= GUIDs which relate to
devices/ports) from IP addresses/routing rules and not the other way
around.

I understand using IP is a bit problematic when DHCP is used for the
initiator, let me think about that a bit, thanks

Or.



--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-03-02 Thread Mike Christie

On 03/02/2010 02:42 PM, Or Gerlitz wrote:

As for the IB device name and port number, indeed they don't change
normally, but OTOH the rdma-cm api which iser is using is IP base and
reolves IB local/remote addresses (= GUIDs which relate to
devices/ports) from IP addresses/routing rules and not the other way
around.


Maybe that is where I am missing your point. One of the reasons for this 
is to tell the kernel to ignore the routing rules and use the port we 
want. So I was saying if there is no a API to do this then just add one 
like someone did for tcp and the socket code.


Assuming the interface you want to use does what we want then let's use 
it and move on.




I understand using IP is a bit problematic when DHCP is used for the
initiator, let me think about that a bit, thanks



How about this.

We can add a:

ISCSI_UEVENT_TRANSPORT_EP_CONNECT_THROUGH_SRC = UEVENT_BASE + 20

And for that we can pass the src addr after the dest addr like how you 
wanted.


For userspace, I think it is wrong to make the user interface based on 
the kernel API. If we used something like the name you see in ifconfig 
(ibX is the default for most drivers I think) then users can set the 
iface.net_ifacename to it. iscsid can just loop over the interfaces and 
match ibX with the ip and pass that down. The loop would be pretty much 
the same as in get_hwaddress_from_netdev() where we loop over the 
interfaces but it would do


transport.c:

iser_transport {
.ep_connect = iser_ep_connect;
};

somewhere.c:
getifaddrs()

for (ifa = ifap; ifa; ifa = ifa-ifa_next) {

iser_ep_connect()
if (!strcmp(ifa-ifa_name, match_name))
ktransport_ep_connect( ifa-ifa_addr);



netlink.c
ktransport_ep_connect( , struct sockaddr *src_addr)
{
if (srcaddr)
ec-type = ISCSI_UEVENT_TRANSPORT_EP_CONNECT_THROUGH_SRC
else if
else

.

memcpy(setparam_buf + sizeof(*ev), dst_addr, addrlen)
if (srcaddr)
memcpy(setparam_buf + currlen, src_addr, src_addrlen)

}

For the userspace interface for the user we can do pretty much anything.

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-03-01 Thread Mike Christie

On 03/01/2010 02:46 AM, Or Gerlitz wrote:

Mike Christie wrote:

Ah never mind. For some reason I thought you had to have a mask, but if
you give rdma_resolve_addr a addr then it will do the right thing and
use only the port you wanted right?


YES. providing rdma_resolve_addr a source address is like calling rdma_bind 
with this source address and then calling rdma_resolve_address with only 
destination address. So its like bind(2) in that respect. Currently the rdma 
stack through the include/rdma/rdma_cm.h rdma-cm api doesn't support things 
like SO_BINDTODEVICE to either of network or rdma device. But even if it 
would/will, I prefer to stay with IP addresses.



I am not sure I got why you prefer to use the IP? Was your reason that 
part you wrote about iser being IP based?




I understand how SO_BINDTODEVICE is used for the tcp transport, but its all done 
in user space, and later when the connection is bounded to the end-point (-- 
socket created/binded/connected from user space) things are moved to the kernel. 
This isn't the case with iser. I believe that at this point we agree that there 
should be a way to specify the source address bounded by the user to the iscsi 
interface to the kernel iser transport code, correct?



Whether it is done in the kernel or userspace or if you support 
SO_BINDTODEVICE or not is not a issue. We can change userspace so you 
get a device like tcp and offload and/or we can change the kernel in any 
sane way so you can bind by whatever. You do not have to support 
SO_BINDTODEVICE. You can work like how the offload drivers do.


I am not sure where you get that I agree ip address is best. All I am 
saying above is I think I see the API you wanted to use.


If the device name and port do not change normally that seems better to 
me since it works like the other drivers.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-03-01 Thread Mike Christie

On 03/01/2010 06:29 PM, Mike Christie wrote:

If the device name and port do not change normally that seems better to
me since it works like the other drivers.



Oh yeah, just to be clear, I am saying I prefer above, but that is based 
on what I understand today. As I said I did not understand why you think 
IP based is best for iser when all other drivers use the other option. 
Beat your point into my head if I am not getting it :) I am open to change.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-02-26 Thread Mike Christie

On 02/25/2010 11:24 PM, Mike Christie wrote:

Do you mean you need a src IP for rdma_resolve_addr? If so, if you have
a src ip, can you override what rdma_resolve_addr gives you for cases
like where you have two ports on the same subnet?



Ah never mind. For some reason I thought you had to have a mask, but if 
you give rdma_resolve_addr a addr then it will do the right thing and 
use only the port you wanted right?


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-02-25 Thread Or Gerlitz
Mike Christie wrote:
 I am fine with either. The netdev name (ethX) also has the same problems
 where udev can change it. It is there for aliases or vlans where we
 cannot use hwaddress since multiple netdevs have the same MAC.

yes, correct both netdevice name and hwaddress suffer from what you describe, 
shit happens.

 Do you have similar issues with ib device names?

with out HW changes no, if someone plugs in another card of the same 
type between two boots then yes. 

 What about bonding? Can you use bonding/trunking with iser? Is that going to 
 cause troubles?

yes, indeed, we support bonding but unlike in ethernet where the bond MAC 
address is typically 
that of the current/active slave device, over ipoib, bonding automatically set 
the fail_over_mac option to active and the bond HW address is the one of the 
active slave which means we don't want
an iscsi iser interface to be associated to hwaddress in that case. 

So we are remained with net device names or ip addresses, I don't want to go on 
hw device names since the whole addressing framework of iser is the same as the 
one used for TCP e.g based on IP addresses. 

I prefer the source IP address or at least the source ip address and mask, 
through which I 
can get a source ip on this subnet, even as of DHCP between reboots the source 
ip has been changed.

thinking on this a bit more, DHCP related changed between reboots can be quite 
destructive to iscsi/nfs etc since the target IP can change. If this is the 
case and one really want to avoid that, 
one needs to use host names for the portal and not ip addresses, isn't it? this 
looks to me like going a bit too far away.

All in all, when non default interface is needed/used (e.g for multipathing), I 
am quite sure we need to have some sort of source ip for iser in ep_connect, 
please let me know what you think would be the easy/best or close to either of 
(...) way to have that.

Or. 




-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-02-25 Thread Mike Christie

On 02/25/2010 06:03 AM, Or Gerlitz wrote:

Mike Christie wrote:

I am fine with either. The netdev name (ethX) also has the same problems
where udev can change it. It is there for aliases or vlans where we
cannot use hwaddress since multiple netdevs have the same MAC.


yes, correct both netdevice name and hwaddress suffer from what you describe, 
shit happens.


Do you have similar issues with ib device names?


with out HW changes no, if someone plugs in another card of the same
type between two boots then yes.


What about bonding? Can you use bonding/trunking with iser? Is that going to 
cause troubles?


yes, indeed, we support bonding but unlike in ethernet where the bond MAC 
address is typically
that of the current/active slave device, over ipoib, bonding automatically set 
the fail_over_mac option to active and the bond HW address is the one of the 
active slave which means we don't want
an iscsi iser interface to be associated to hwaddress in that case.

So we are remained with net device names or ip addresses, I don't want to go on 
hw device names since the whole addressing framework of iser is the same as the 
one used for TCP e.g based on IP addresses.

I prefer the source IP address or at least the source ip address and mask, 
through which I
can get a source ip on this subnet, even as of DHCP between reboots the source 
ip has been changed.

thinking on this a bit more, DHCP related changed between reboots can be quite 
destructive to iscsi/nfs etc since the target IP can change. If this is the 
case and one really want to avoid that,
one needs to use host names for the portal and not ip addresses, isn't it? this 
looks to me like going a bit too far away.


I think on the target side you normally use static ips (I think you 
normally have a lot fewer target portals than initiators). People also 
actually do use hostnames for the target portals too. There have been a 
couple bug reports on the list on how open-iscsi does not support them 
correctly.




All in all, when non default interface is needed/used (e.g for multipathing), I 
am quite sure we need to have some sort of source ip for iser in ep_connect, 
please let me know what you think would be the easy/best or close to either of 
(...) way to have that.



Do you mean you need a src IP for rdma_resolve_addr? If so, if you have 
a src ip, can you override what rdma_resolve_addr gives you for cases 
like where you have two ports on the same subnet?


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-02-25 Thread Mike Christie

On 02/25/2010 11:24 PM, Mike Christie wrote:

All in all, when non default interface is needed/used (e.g for
multipathing), I am quite sure we need to have some sort of source ip
for iser in ep_connect, please let me know what you think would be the
easy/best or close to either of (...) way to have that.



Do you mean you need a src IP for rdma_resolve_addr? If so, if you have
a src ip, can you override what rdma_resolve_addr gives you for cases
like where you have two ports on the same subnet?



Oh yeah, if you are wondering, for iscsi_tcp iface binding there is a 
sockopt to tell the network layer to forget what it would do (use the 
route table) normally, and instead send IO through the netdev we pass 
it. So in userspace we use the netdev we have from iface.net_ifacename 
or match the MAC with a netdev and then pass that to the sockopt. Do you 
have something similar for ib?


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-02-24 Thread Or Gerlitz
Mike Christie wrote:
 So is there anything in there that is static and can be used to identify the 
 port?

 With srp you can bind a target to a rnic port, right (from srp_add_port
 it looks like the add_target file is on the srp classes rnic port
 object)? In userspace for the setup then how do you make sure that
 across boots the target is accessed through the same port? Is the
 ib_device-name and the port persistent?

Yes, the struct ib_device-name and the ports of this device are persistent.
The NIC device name may change as of udev, hot-plugs etc. As for making sure
that across boots we go through the same physical source point for the path, 
you 
probably have a point here which we can solve for iser if we go on the 
hwaddress as the identifier of the iscsi interface, thoughts?

Or.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-02-24 Thread Mike Christie

On 02/24/2010 10:04 AM, Or Gerlitz wrote:

Mike Christie wrote:

So is there anything in there that is static and can be used to identify the 
port?



With srp you can bind a target to a rnic port, right (from srp_add_port
it looks like the add_target file is on the srp classes rnic port
object)? In userspace for the setup then how do you make sure that
across boots the target is accessed through the same port? Is the
ib_device-name and the port persistent?


Yes, the struct ib_device-name and the ports of this device are persistent.
The NIC device name may change as of udev, hot-plugs etc. As for making sure
that across boots we go through the same physical source point for the path, you
probably have a point here which we can solve for iser if we go on the 
hwaddress as the identifier of the iscsi interface, thoughts?



I am fine with either. The netdev name (ethX) also has the same problems 
where udev can change it. It is there for aliases or vlans where we 
cannot use hwaddress since multiple netdevs have the same MAC.


Do you have similar issues with ib device names?

What about bonding? Can you use bonding/trunking with iser? Is that 
going to cause troubles?


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-02-22 Thread Or Gerlitz
Mike Christie wrote:
 2. I wasn't sure if there is and if yes what is the transport role in
 detecting session failure.

 It varies from transport to transport. 
 For iscsi_tcp we do not really have a nice way to figure out if the
 someone just tripped over a cable so that is where the nop comes from.
 We can tell if the tcp state changes and so you can see
 iscsi_tcp_state_change notify the upper layers of a problem for that.

understood. Still, the noop-out based watch-dog serve all transports, correct?
I'd like to narrow down things and understand if/what is the transport role:

 Some  iscsi drivers will runn iscsi_conn/session_failure when they
 discover a link down event or someone doing ifdown. I thought this is
 sort of what you are able to do with 
 iser_cma_handler-iser_disconnected_handler 
 or with the call to iscsi_conn_failure in iser_handle_comp_error

Yes, we call iscsi_conn/session_failure but I wasn't really sure if 
multipathing 
works for non tcp transports if they never make these calls or they have to.

 If there are other places you can detect a link failure type of problem
 you would want to call iscsi_conn_failure, so the iscsi layer can begin
 trying to recover the connection and let dm-multipath know there is a problem.

I understand that once there's timeout on the noop out watch-dog, the iscsi 
layer
will call ep_disconnect, correct? currently our ep_disconnect is sometime too 
slow
and I can change that. But, still I wasn't sure if for iscsi to let 
dm-multipath that
there is a failure something is needed at the transport side or not... 
 
 I do see that there's an shost param to ep_connect, is there a way it
 can give me a hint on the source IP?
 
 I do not think it can help iser as it is today. Remember when we talked
 about a shost per some physical/virtual resource vs a shost per session.
 This is another place where that came in. bnx2i, cxgb3i and be2iscsi
 allocate a host per port/netdev, so that is how they know the src they should 
 be using.
 I will have to think about how to do it for iser as it is today with the host 
 per session

how about extending the ep_connect user/netlink/kernel/iscsi_transport 
framework to support 
the functionality provided by the user space code of bind_conn_to_iface or 
bind_src_by_address, 
basically, since the connection establishment framework is IP based, I would 
happy 
to just get some source ip in the kernel when ep_connect is called. I saw the
comment on why bind_src_by_address is problematic, but this doesn't apply to 
IB/iser.

 A question for you. Some people do not like using the the netdev name
 for the binding since it can change between boots. The default method is
 to use  iface.hwaddress instead of iface.net_ifacename. For iscsi it is
 just the MAC. For iser how big is the RNIC's equivalent of the MAC?

iser is working now over IB and at some point we'll make it work also over 
iWARP. 

With IB, the RNIC is IPoIB NIC whose HW address (equiv of MAC) is 20 bytes long.
It turns out that some of these 20 bytes may change... the part which is burned
is called GUID and is 8 bytes long, here you see two IPoIB NICs, ib0 and ib1 
and the
port GUIDs they are using are 00:02:c9:03:00:02:6b:df and 
00:02:c9:03:00:02:6b:e0

 7: ib0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 2044 qdisc pfifo_fast qlen 256
 link/infiniband 
 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:02:6b:df 
 8: ib1: BROADCAST,MULTICAST mtu 2044 qdisc pfifo_fast qlen 256
 link/infiniband 
 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:02:6b:e0

If you really interested to learn how these 20 bytes are composed its in the 
form of 
flags:QPN:GID (1:3:16 bytes) where GID is of the form PREFIX:GUID (8:8 bytes) 
do 
wget http://ietf.org/rfc/rfc4391.txt and see section 9.1.1. Link-Layer 
Address/Hardware Address. 

Note that the ifconfig output is buggy so you should use $ ip address show 
anyway, I wasn't really sure if/how the iface binding by hw address is working
in open iscsi, specifically, I wasn't able to track which library exports 
net_get_netdev_from_hwaddress ... but I am quite sure this (binding iface to hw 
address and not netdev) works well for iscsi-tcp and offloads, correct?

Or.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: iscsi ifaces / multipathing / etc

2010-02-22 Thread Mike Christie

On 02/22/2010 09:32 AM, Or Gerlitz wrote:

Mike Christie wrote:

2. I wasn't sure if there is and if yes what is the transport role in
detecting session failure.



It varies from transport to transport.
For iscsi_tcp we do not really have a nice way to figure out if the
someone just tripped over a cable so that is where the nop comes from.
We can tell if the tcp state changes and so you can see
iscsi_tcp_state_change notify the upper layers of a problem for that.


understood. Still, the noop-out based watch-dog serve all transports, correct?


Yes.


I'd like to narrow down things and understand if/what is the transport role:



For the nop out path, the trasnport just has to send/recv the nop 
pdu/response.




Some  iscsi drivers will runn iscsi_conn/session_failure when they
discover a link down event or someone doing ifdown. I thought this is
sort of what you are able to do with iser_cma_handler-iser_disconnected_handler
or with the call to iscsi_conn_failure in iser_handle_comp_error


Yes, we call iscsi_conn/session_failure but I wasn't really sure if multipathing
works for non tcp transports if they never make these calls or they have to.



They do not have to make those calls for multipath to work. Multipath 
will work better if the transport can signal when there is a problem, 
because we can stop using a bad path and get IO going to a working path 
faster. If the transport does nothing then we have to rely on the scsi 
error handler/timeout to detect the problem and that is very slow.







If there are other places you can detect a link failure type of problem
you would want to call iscsi_conn_failure, so the iscsi layer can begin
trying to recover the connection and let dm-multipath know there is a problem.


I understand that once there's timeout on the noop out watch-dog, the iscsi 
layer
will call ep_disconnect, correct? currently our ep_disconnect is sometime too 
slow



Yes. You should also change your ep_disconnect because it is not 
supposed to block (did we talk about this or was this just bnx2i), since 
it will stop iscsid from processing other events.




and I can change that. But, still I wasn't sure if for iscsi to let 
dm-multipath that
there is a failure something is needed at the transport side or not...


I do not think there is anything special. It should handle a error like 
it would if multipath was not used. The user will set the iscsi timers 
like replacement_timeout and nop timeout differently if they are using 
multipath.






I do see that there's an shost param to ep_connect, is there a way it
can give me a hint on the source IP?



I do not think it can help iser as it is today. Remember when we talked
about a shost per some physical/virtual resource vs a shost per session.
This is another place where that came in. bnx2i, cxgb3i and be2iscsi
allocate a host per port/netdev, so that is how they know the src they should 
be using.
I will have to think about how to do it for iser as it is today with the host 
per session


how about extending the ep_connect user/netlink/kernel/iscsi_transport 
framework to support
the functionality provided by the user space code of bind_conn_to_iface or 
bind_src_by_address,
basically, since the connection establishment framework is IP based, I would 
happy
to just get some source ip in the kernel when ep_connect is called. I saw the
comment on why bind_src_by_address is problematic, but this doesn't apply to 
IB/iser.


Which comment are you talking about? Are you talking about bind() not 
doing what you would want for iscsi_tcp (target sometimes sends data to 
the wrong port) or are you talking about if you were to use DHCP and so 
the IPs could change over boots?






A question for you. Some people do not like using the the netdev name
for the binding since it can change between boots. The default method is
to use  iface.hwaddress instead of iface.net_ifacename. For iscsi it is
just the MAC. For iser how big is the RNIC's equivalent of the MAC?


iser is working now over IB and at some point we'll make it work also over 
iWARP.

With IB, the RNIC is IPoIB NIC whose HW address (equiv of MAC) is 20 bytes long.
It turns out that some of these 20 bytes may change... the part which is burned




So is there anything in there that is static and can be used to identify 
the port?






is called GUID and is 8 bytes long, here you see two IPoIB NICs, ib0 and ib1 
and the
port GUIDs they are using are 00:02:c9:03:00:02:6b:df and 
00:02:c9:03:00:02:6b:e0


7: ib0:BROADCAST,MULTICAST,UP,LOWER_UP  mtu 2044 qdisc pfifo_fast qlen 256
 link/infiniband 80:00:00:48:fe:80:00:00:00:00:00:00:00:02:c9:03:00:02:6b:df
8: ib1:BROADCAST,MULTICAST  mtu 2044 qdisc pfifo_fast qlen 256
 link/infiniband 80:00:00:49:fe:80:00:00:00:00:00:00:00:02:c9:03:00:02:6b:e0


If you really interested to learn how these 20 bytes are composed its in the 
form of
flags:QPN:GID (1:3:16 bytes) where GID is of the form PREFIX:GUID (8:8 bytes) do
wget