Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
Sagi Grimberg wrote on 01/08/2015 05:45 AM: RFC 3720 namely requires that iSCSI numbering is session-wide. This means maintaining a single counter for all MC/S sessions. Such a counter would be a contention point. I'm afraid that because of that counter performance on a multi-socket initiator system with a scsi-mq implementation based on MC/S could be worse than with the approach with multiple iSER targets. Hence my preference for an approach based on multiple independent iSER connections instead of MC/S. So this comment is spot on the pros/cons of the discussion (we might want to leave something for LSF ;)). MCS would not allow a completely lockless data-path due to command ordering. On the other hand implementing some kind of multiple sessions solution feels somewhat like a mis-fit (at least in my view). One of my thoughts about how to overcome the contention on commands sequence numbering was to suggest some kind of negotiable relaxed ordering mode but of course I don't have anything figured out yet. Linux SCSI/block stack neither uses, nor guarantees any commands order. Applications requiring commands order enforce it by queue draining (i.e. wait until all previous commands finished). Hence, MC/S enforced commands order is an overkill, which additionally coming with some non-zero performance cost. Don't do MC/S, do independent connections. You know the KISS principle. Memory overhead to setup the extra iSCSI sessions should be negligible. Vlad -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Antw: [PATCH 4/5] iscsiuio CFLAGS fixes
Chris Leech cle...@redhat.com schrieb am 12.01.2015 um 20:24 in Nachricht 1421090651-8333-5-git-send-email-cle...@redhat.com: try and keep existing CFLAGS from environment for packagers --- iscsiuio/configure.ac | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/iscsiuio/configure.ac b/iscsiuio/configure.ac index d619598..7ee1e73 100644 --- a/iscsiuio/configure.ac +++ b/iscsiuio/configure.ac @@ -53,7 +53,7 @@ AC_LIBTOOL_DLOPEN # libtool stuff AC_PROG_LIBTOOL -CFLAGS=-O2 -Wall +CFLAGS=${CFLAGS} -O2 -Wall Don't you have to use either += or := for that? See 6.6 Appending More Text to Variables in the GNU make info... ## check for --enable-debug first before checking CFLAGS before ## so that we don't mix -O and -g AC_ARG_ENABLE(debug, -- 2.1.0 -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Antw: [PATCH 2/5] add discovery as a valid mode in iscsiadm.8
Chris Leech cle...@redhat.com schrieb am 12.01.2015 um 20:24 in Nachricht 1421090651-8333-3-git-send-email-cle...@redhat.com: --- doc/iscsiadm.8 | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/iscsiadm.8 b/doc/iscsiadm.8 index 9a945d1..05793b2 100644 --- a/doc/iscsiadm.8 +++ b/doc/iscsiadm.8 @@ -174,13 +174,13 @@ for session mode). .TP \fB\-m, \-\-mode \fIop\fR specify the mode. \fIop\fR -must be one of \fIdiscoverydb\fR, \fInode\fR, \fIfw\fR, \fIhost\fR \fIiface\fR or \fIsession\fR. +must be one of \fIdiscovery\fR, \fIdiscoverydb\fR, \fInode\fR, \fIfw\fR, \fIhost\fR \fIiface\fR or \fIsession\fR. .IP -If no other options are specified: for \fIdiscoverydb\fR and \fInode\fR, all -of their respective records are displayed; for \fIsession\fR, all active -sessions and connections are displayed; for \fIfw\fR, all boot firmware -values are displayed; for \fIhost\fR, all iSCSI hosts are displayed; and -for \fIiface\fR, all ifaces setup in /etc/iscsi/ifaces are displayed. +If no other options are specified: for \fIdiscovery\fR, \fIdiscoverydb\fR and +\fInode\fR, all of their respective records are displayed; for \fIsession\fR, +all active sessions and connections are displayed; for \fIfw\fR, all boot +firmware values are displayed; for \fIhost\fR, all iSCSI hosts are displayed; +and for \fIiface\fR, all ifaces setup in /etc/iscsi/ifaces are displayed. .TP \fB\-n\fR, \fB\-\-name=\fIname\fR Hi! A matter of style: I think these font escape sequences like \fI make the manual source quite hard to read, and most likely this is why there exist some macros for that. So instead of writing (just one example) -- .TP \fB\-n\fR, \fB\-\-name=\fIname\fR -- use -- .TP .BR \-n , .BR \-\-name= name -- Maybe someone is willing to beautify the manual page thats way. Regards, Ulrich -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On 1/12/2015 2:56 PM, Bart Van Assche wrote: On 01/11/15 10:40, Sagi Grimberg wrote: I would say there is no need for specific coordination from iSCSI PoV. This is exactly what flow steering is designed for. As I see it, in order to get the TX/RX to match rings, the user can attach 5-tuple rules (using standard ethtool) to steer packets to the right rings. Hello Sagi, Can the 5-tuple rules be chosen such that it is guaranteed that the sockets used to implement per-CPU queues are spread evenly over MSI-X completion vectors ? If not, would it help to add a socket option to the Linux network stack that allows to select the TX ring explicitly, just like ib_create_cq() in the Linux RDMA stack allows to select a completion vector explicitly ? My concerns are as follows: - If the number of queues exceeds the number of MSI-X vectors then I expect that it will be much easier to guarantee even spreading by selecting tx queues explicitly instead of relying on a hashing scheme. - On multi-socket systems it is important to process completion interrupts on the CPU socket from where the I/O was initiated. I'm not sure it is possible to guarantee this when using a hashing algorithm to select the TX ring. Hey Bart, Your concerns are correct. Flow steering rules will guarantee that each socket will have a different TX/RX ring, but not necessarily the correct TX/RX ring. These issues have been addressed in the Networking subsystem. Thinking more on this out loud, There is the TX challenge, getting the HW queue selection to match the TX ring selection (which might not be the same according to flow hash), First thing that comes to mind is XPS (Transmit Packet Steering). From Documentation/networking/scaling.txt: Transmit Packet Steering is a mechanism for intelligently selecting which transmit queue to use when transmitting a packet on a multi-queue device. To accomplish this, a mapping from CPU to hardware queue(s) is recorded. The goal of this mapping is usually to assign queues exclusively to a subset of CPUs, where the transmit completions for these queues are processed on a CPU within this set. About the RX challenge, I think RFS (Receive Flow Steering) will probably be the best fit here since RX packets will be steered to the CPU where the application is running. From Documentation/networking/scaling.txt: The goal of RFS is to increase datacache hitrate by steering kernel processing of packets to the CPU where the application thread consuming the packet is running. RFS relies on the same RPS mechanisms to enqueue packets onto the backlog of another CPU and to wake up that CPU. In RFS, packets are not forwarded directly by the value of their hash, but the hash is used as index into a flow lookup table. This table maps flows to the CPUs where those flows are being processed. This definitely needs some more thinking. CC'ing Or Gerlitz which has a lot of experience in the Networking stack... Sagi. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On 1/12/2015 10:05 PM, Mike Christie wrote: On 01/11/2015 03:23 AM, Sagi Grimberg wrote: On 1/9/2015 8:00 PM, Michael Christie wrote: SNIP Session wide command sequence number synchronization isn't something to be removed as part of the MQ work. It's a iSCSI/iSER protocol requirement. That is, the expected + maximum sequence numbers are returned as part of every response PDU, which the initiator uses to determine when the command sequence number window is open so new non-immediate commands may be sent to the target. So, given some manner of session wide synchronization is required between different contexts for the existing single connection case to update the command sequence number and check when the window opens, it's a fallacy to claim MC/S adds some type of new initiator specific synchronization overhead vs. single connection code. I think you are assuming we are leaving the iscsi code as it is today. For the non-MCS mq session per CPU design, we would be allocating and binding the session and its resources to specific CPUs. They would only be accessed by the threads on that one CPU, so we get our serialization/synchronization from that. That is why we are saying we do not need something like atomic_t/spin_locks for the sequence number handling for this type of implementation. If we just tried to do this with the old code where the session could be accessed on multiple CPUs then you are right, we need locks/atomics like how we do in the MCS case. I don't think we will want to restrict session per CPU. There is a tradeoff question of system resources. We might want to allow a user to configure multiple HW queues but still not to use too much of the system resources. So the session locks would still be used but definitely less congested... Are you talking about specifically the session per CPU or also MCS and doing a connection per CPU? This applies to both. Based on the srp work, how bad do you think it will be to do a session/connection per CPU? What are you thinking will be more common? Session per 4 CPU? 2 CPUs? 8? This is a level of degree which demonstrates why we need to let the user choose. I don't think there is a magic number here, there is a tradeoff between performance and memory footprint. There is also multipath to take into account here. We could do a mq/MCS session/connection per CPU (or group of CPS) then also one of those per transport path. We could also do a mq/MCS session/connection per transport path, then bind those to specific CPUs. Or something in between. Is it a good idea to tie iSCSI implementation in multipath? I've seen deployments where multipath was not used for HA (NIC bonding was used for that). The srp implementation allowed the user to choose the number of channels per target and the default was chosen by empirical results (Bart, please correct me if I'm wrong here). Sagi. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion
On 01/09/15 12:39, Sagi Grimberg wrote: On 1/8/2015 4:11 PM, Bart Van Assche wrote: On 01/08/15 14:45, Sagi Grimberg wrote: Actually I started with that approach, but the independent connections under a single session (I-T-Nexus) violates the command ordering requirement. Plus, such a solution is specific to iSER... Which command ordering requirement are you referring to ? The Linux storage stack does not guarantee that block layer or SCSI commands will be processed in the same order as these commands have been submitted. I was referring to the iSCSI session requirement. I initially thought of an approach to maintain multiple iSER connections under a single session but pretty soon I realized that preserving commands ordering this way is not feasible. So independent iSER connections means independent iSCSI sessions (each with a single connection). This is indeed another choice, which we are clearly debating on... I'm just wandering if we are not trying to force-fit this model. How would this model look like? We will need to define another entity to track and maintain the sessions and to allocate the scsi_host. Will that be communicated to user-space? How will error recovery look like? Hello Sagi, As you probably remember scsi-mq support was added in the SRP initiator by changing the 1:1 relationship between scsi_host and RDMA connection into a 1:n relationship. I don't know how much work it would take to implement a similar transformation in the SCSI initiator. Maybe we should wait until Mike's workday starts such that Mike has a chance to comment on this. Bart. PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: bind_src_by_address() is disabled?
Hi Mike, thanks for the reply. See below. On Monday, January 12, 2015 at 11:22:41 AM UTC-8, Mike Christie wrote: On 1/9/15, 8:28 PM, Thomas Dwyer III wrote: Hi folks, I spent some time browsing through this forum but I was unable to find an explanation for this comment referring to the disabled bind_src_by_address() function in io.c: *This is not supported for now, because it is not exactly what we want.* *It also turns out that targets will send packets to other interfaces* *causing all types of weird things to happen.* I found several posts from people referring to this specific comment but I did not find an explanation. Is it possible that the author of this comment was referring to the ARP flux issue, which may cause a target to associate the bound IP address with the MAC address from an interface other than the one specified with SO_BINDTODEVICE? If so, I don't see how avoiding the call to bind() solves this problem. I would appreciate a reply from anyone who might know what weird things means in this context. bind_src_by_address() only did a bind() and was expecting all traffic to flow through the interface with the specified ip address. Clearly that's a bad assumption on the part of the administrator. If restricting traffic to a particular interface is desired, bind() is the wrong approach. That's what SO_BINDTODEVICE is for. If you have multiple interfaces on the same subnet, the network layer would send/recv on any of them. This ended up causing issues with packets not getting sent/received or received in incorrect orders to the iscsi layers on the initiator/target side. How does incorrect ordering occur? This is TCP, right? Correct ordering is guaranteed regardless of which interface(s) are used. SO_BINDTODEVICE is not related to bind() (was not sure about your comment about avoiding bind when using that sockopt). It tells the kernel to ignore the normal routing tables and to just use the interface we specify with that call. Precisely. SO_BINDTODEVICE and bind() are two very different things, for two very different purposes. One cannot be used as a substitute for the other. Why are you asking about this? Do you need something like bind by ip? Yes, I do. I have an environment where a single interface has multiple different IP addresses configured. For example: # ip addr show eth0 3: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:50:56:9f:4c:5d brd ff:ff:ff:ff:ff:ff inet 10.108.53.143/21 scope global eth0 inet 10.108.53.243/21 scope global secondary eth0 inet6 fe80::250:56ff:fe9f:4c5d/64 scope link valid_lft forever preferred_lft forever In this example, my target will only accept logins from 10.108.53.243. Unless we call bind(), there's no way to make this work. Tom.III -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: bind_src_by_address() is disabled?
On Monday, January 12, 2015 at 12:54:38 PM UTC-8, Mike Christie wrote: Thomas, let me know if your question was for functionality you needed or just looking through the code and were curious. It's functionality that I really want. In fact, I'm here in this forum because strace on iscsid showed it was never calling bind(), and when I went to add the bind() call myself I discovered all the code I wanted was already there but disabled. How does this proposal sound: If the administrator configures an explicit interface name with iface.net_ifacename, use SO_BINDTODEVICE. If the administrator configures an explicit IP address with iface.ipaddress, use bind(). If the administrator configures both of the above, use both of the above. If the administrator configures neither, use neither. I think this would give administrators all the flexibility they need. Is there a downside to this proposal that I'm not seeing? Thanks, Tom.III -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: bind_src_by_address() is disabled?
On 01/12/2015 02:03 PM, tom...@gmail.com wrote: How does incorrect ordering occur? This is TCP, right? Correct ordering is guaranteed regardless of which interface(s) are used. Software/implementation bugs. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: bind_src_by_address() is disabled?
On 01/12/2015 05:30 PM, Thomas Dwyer III wrote: I think this would give administrators all the flexibility they need. Is there a downside to this proposal that I'm not seeing? You do not have to debug and support it, so that is why it was ifdef/commented out :) I am open to accepting patches for it though. Why was creating a network alias, then using that for the net_ifacename not usable for you though? For the ip binding stuff, did you already try uncommenting/ifdeffing the bind by ip related code? I think there is other related code that has to be updated. I will look at it when I get some time, but that is not going to be for a couple weeks or so. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: bind_src_by_address() is disabled?
On 01/13/2015 03:03 PM, Mike Christie wrote: On 01/12/2015 05:30 PM, Thomas Dwyer III wrote: I think this would give administrators all the flexibility they need. Is there a downside to this proposal that I'm not seeing? You do not have to debug and support it, so that is why it was ifdef/commented out :) I am open to accepting patches for it though. Why was creating a network alias, then using that for the net_ifacename not usable for you though? You cannot use BINDTODEVICE with old-style network alias, only a 'real-ish' device. Could use mac-vlans though... Thanks, Ben For the ip binding stuff, did you already try uncommenting/ifdeffing the bind by ip related code? I think there is other related code that has to be updated. I will look at it when I get some time, but that is not going to be for a couple weeks or so. -- Ben Greear gree...@candelatech.com Candela Technologies Inc http://www.candelatech.com -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.