Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion

2015-01-13 Thread Vladislav Bolkhovitin
Sagi Grimberg wrote on 01/08/2015 05:45 AM:
 RFC 3720 namely requires that iSCSI numbering is
 session-wide. This means maintaining a single counter for all MC/S
 sessions. Such a counter would be a contention point. I'm afraid that
 because of that counter performance on a multi-socket initiator system
 with a scsi-mq implementation based on MC/S could be worse than with the
 approach with multiple iSER targets. Hence my preference for an approach
 based on multiple independent iSER connections instead of MC/S.
 
 So this comment is spot on the pros/cons of the discussion (we might want to 
 leave
 something for LSF ;)).
 MCS would not allow a completely lockless data-path due to command
 ordering. On the other hand implementing some kind of multiple sessions
 solution feels somewhat like a mis-fit (at least in my view).
 
 One of my thoughts about how to overcome the contention on commands
 sequence numbering was to suggest some kind of negotiable relaxed
 ordering mode but of course I don't have anything figured out yet.

Linux SCSI/block stack neither uses, nor guarantees any commands order. 
Applications
requiring commands order enforce it by queue draining (i.e. wait until all 
previous
commands finished). Hence, MC/S enforced commands order is an overkill, which
additionally coming with some non-zero performance cost.

Don't do MC/S, do independent connections. You know the KISS principle. Memory 
overhead
to setup the extra iSCSI sessions should be negligible.

Vlad

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Antw: [PATCH 4/5] iscsiuio CFLAGS fixes

2015-01-13 Thread Ulrich Windl
 Chris Leech cle...@redhat.com schrieb am 12.01.2015 um 20:24 in Nachricht
1421090651-8333-5-git-send-email-cle...@redhat.com:
 try and keep existing CFLAGS from environment for packagers
 ---
  iscsiuio/configure.ac | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/iscsiuio/configure.ac b/iscsiuio/configure.ac
 index d619598..7ee1e73 100644
 --- a/iscsiuio/configure.ac
 +++ b/iscsiuio/configure.ac
 @@ -53,7 +53,7 @@ AC_LIBTOOL_DLOPEN
  # libtool stuff
  AC_PROG_LIBTOOL
  
 -CFLAGS=-O2 -Wall
 +CFLAGS=${CFLAGS} -O2 -Wall

Don't you have to use either += or := for that? See 6.6 Appending More 
Text to Variables in the GNU make info...


  ## check for --enable-debug first before checking CFLAGS before
  ## so that we don't mix -O and -g
  AC_ARG_ENABLE(debug,
 -- 
 2.1.0
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 open-iscsi group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to open-iscsi+unsubscr...@googlegroups.com.
 To post to this group, send email to open-iscsi@googlegroups.com.
 Visit this group at http://groups.google.com/group/open-iscsi.
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Antw: [PATCH 2/5] add discovery as a valid mode in iscsiadm.8

2015-01-13 Thread Ulrich Windl
 Chris Leech cle...@redhat.com schrieb am 12.01.2015 um 20:24 in Nachricht
1421090651-8333-3-git-send-email-cle...@redhat.com:
 ---
  doc/iscsiadm.8 | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)
 
 diff --git a/doc/iscsiadm.8 b/doc/iscsiadm.8
 index 9a945d1..05793b2 100644
 --- a/doc/iscsiadm.8
 +++ b/doc/iscsiadm.8
 @@ -174,13 +174,13 @@ for session mode).
  .TP
  \fB\-m, \-\-mode \fIop\fR
  specify the mode. \fIop\fR
 -must be one of \fIdiscoverydb\fR, \fInode\fR, \fIfw\fR, \fIhost\fR 
 \fIiface\fR or \fIsession\fR.
 +must be one of \fIdiscovery\fR, \fIdiscoverydb\fR, \fInode\fR, \fIfw\fR, 
 \fIhost\fR \fIiface\fR or \fIsession\fR.
  .IP
 -If no other options are specified: for \fIdiscoverydb\fR and \fInode\fR, 
 all
 -of their respective records are displayed; for \fIsession\fR, all active
 -sessions and connections are displayed; for \fIfw\fR, all boot firmware
 -values are displayed; for \fIhost\fR, all iSCSI hosts are displayed; and
 -for \fIiface\fR, all ifaces setup in /etc/iscsi/ifaces are displayed.
 +If no other options are specified: for \fIdiscovery\fR, \fIdiscoverydb\fR 
 and
 +\fInode\fR, all of their respective records are displayed; for 
 \fIsession\fR,
 +all active sessions and connections are displayed; for \fIfw\fR, all boot
 +firmware values are displayed; for \fIhost\fR, all iSCSI hosts are 
 displayed;
 +and for \fIiface\fR, all ifaces setup in /etc/iscsi/ifaces are displayed.
  
  .TP
  \fB\-n\fR, \fB\-\-name=\fIname\fR

Hi!

A matter of style: I think these font escape sequences like \fI make the 
manual source quite hard to read, and most likely this is why there exist some 
macros for that. So instead of writing (just one example)
--
.TP
\fB\-n\fR, \fB\-\-name=\fIname\fR
--
use
--
.TP
.BR \-n ,
.BR \-\-name= name
--

Maybe someone is willing to beautify the manual page thats way.

Regards,
Ulrich


-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion

2015-01-13 Thread Sagi Grimberg

On 1/12/2015 2:56 PM, Bart Van Assche wrote:

On 01/11/15 10:40, Sagi Grimberg wrote:

I would say there is no need for specific coordination from iSCSI PoV.
This is exactly what flow steering is designed for. As I see it, in
order to get the TX/RX to match rings, the user can attach 5-tuple rules
(using standard ethtool) to steer packets to the right rings.


Hello Sagi,

Can the 5-tuple rules be chosen such that it is guaranteed that the
sockets used to implement per-CPU queues are spread evenly over MSI-X
completion vectors ? If not, would it help to add a socket option to the
Linux network stack that allows to select the TX ring explicitly, just
like ib_create_cq() in the Linux RDMA stack allows to select a
completion vector explicitly ? My concerns are as follows:
- If the number of queues exceeds the number of MSI-X vectors then I
   expect that it will be much easier to guarantee even spreading by
   selecting tx queues explicitly instead of relying on a hashing scheme.
- On multi-socket systems it is important to process completion
   interrupts on the CPU socket from where the I/O was initiated. I'm
   not sure it is possible to guarantee this when using a hashing
   algorithm to select the TX ring.



Hey Bart,

Your concerns are correct. Flow steering rules will guarantee that each
socket will have a different TX/RX ring, but not necessarily the
correct TX/RX ring. These issues have been addressed in the
Networking subsystem.

Thinking more on this out loud,

There is the TX challenge, getting the HW queue selection to match the
TX ring selection (which might not be the same according to flow hash), 
First thing that comes to mind is XPS (Transmit Packet Steering).


From Documentation/networking/scaling.txt:
Transmit Packet Steering is a mechanism for intelligently selecting
which transmit queue to use when transmitting a packet on a multi-queue
device. To accomplish this, a mapping from CPU to hardware queue(s) is
recorded. The goal of this mapping is usually to assign queues
exclusively to a subset of CPUs, where the transmit completions for
these queues are processed on a CPU within this set.

About the RX challenge, I think RFS (Receive Flow Steering) will
probably be the best fit here since RX packets will be steered to the
CPU where the application is running.

From Documentation/networking/scaling.txt:
The goal of RFS is to increase datacache hitrate by steering
kernel processing of packets to the CPU where the application thread
consuming the packet is running. RFS relies on the same RPS mechanisms
to enqueue packets onto the backlog of another CPU and to wake up that
CPU. In RFS, packets are not forwarded directly by the value of their
hash, but the hash is used as index into a flow lookup table. This
table maps flows to the CPUs where those flows are being processed.

This definitely needs some more thinking. CC'ing Or Gerlitz which has
a lot of experience in the Networking stack...

Sagi.

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [Lsf-pc] [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion

2015-01-13 Thread Sagi Grimberg

On 1/12/2015 10:05 PM, Mike Christie wrote:

On 01/11/2015 03:23 AM, Sagi Grimberg wrote:

On 1/9/2015 8:00 PM, Michael Christie wrote:
SNIP




Session wide command sequence number synchronization isn't something to
be removed as part of the MQ work.  It's a iSCSI/iSER protocol
requirement.

That is, the expected + maximum sequence numbers are returned as part of
every response PDU, which the initiator uses to determine when the
command sequence number window is open so new non-immediate commands may
be sent to the target.

So, given some manner of session wide synchronization is required
between different contexts for the existing single connection case to
update the command sequence number and check when the window opens, it's
a fallacy to claim MC/S adds some type of new initiator specific
synchronization overhead vs. single connection code.


I think you are assuming we are leaving the iscsi code as it is today.

For the non-MCS mq session per CPU design, we would be allocating and
binding the session and its resources to specific CPUs. They would
only be accessed by the threads on that one CPU, so we get our
serialization/synchronization from that. That is why we are saying we
do not need something like atomic_t/spin_locks for the sequence number
handling for this type of implementation.

If we just tried to do this with the old code where the session could
be accessed on multiple CPUs then you are right, we need locks/atomics
like how we do in the MCS case.



I don't think we will want to restrict session per CPU. There is a
tradeoff question of system resources. We might want to allow a user to
configure multiple HW queues but still not to use too much of the system
resources. So the session locks would still be used but definitely less
congested...


Are you talking about specifically the session per CPU or also MCS and
doing a connection per CPU?


This applies to both.



Based on the srp work, how bad do you think it will be to do a
session/connection per CPU? What are you thinking will be more common?
Session per 4 CPU? 2 CPUs? 8?


This is a level of degree which demonstrates why we need to let the
user choose. I don't think there is a magic number here, there is a
tradeoff between performance and memory footprint.



There is also multipath to take into account here. We could do a mq/MCS
session/connection per CPU (or group of CPS) then also one of those per
transport path. We could also do a mq/MCS session/connection per
transport path, then bind those to specific CPUs. Or something in between.



Is it a good idea to tie iSCSI implementation in multipath? I've seen
deployments where multipath was not used for HA (NIC bonding was used
for that).

The srp implementation allowed the user to choose the number of
channels per target and the default was chosen by empirical results
(Bart, please correct me if I'm wrong here).

Sagi.

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion

2015-01-13 Thread Bart Van Assche
On 01/09/15 12:39, Sagi Grimberg wrote:
 On 1/8/2015 4:11 PM, Bart Van Assche wrote:
 On 01/08/15 14:45, Sagi Grimberg wrote:
 Actually I started with that approach, but the independent connections
 under a single session (I-T-Nexus) violates the command ordering
 requirement. Plus, such a solution is specific to iSER...

 Which command ordering requirement are you referring to ? The Linux
 storage stack does not guarantee that block layer or SCSI commands will
 be processed in the same order as these commands have been submitted.

 I was referring to the iSCSI session requirement. I initially thought of
 an approach to maintain multiple iSER connections under a single session
 but pretty soon I realized that preserving commands ordering this way
 is not feasible. So independent iSER connections means independent
 iSCSI sessions (each with a single connection). This is indeed another
 choice, which we are clearly debating on...

 I'm just wandering if we are not trying to force-fit this model. How
 would this model look like? We will need to define another entity to
 track and maintain the sessions and to allocate the scsi_host. Will that
 be communicated to user-space? How will error recovery look like?

Hello Sagi,

As you probably remember scsi-mq support was added in the SRP initiator
by changing the 1:1 relationship between scsi_host and RDMA connection
into a 1:n relationship. I don't know how much work it would take to
implement a similar transformation in the SCSI initiator. Maybe we
should wait until Mike's workday starts such that Mike has a chance to
comment on this.

Bart.




PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: bind_src_by_address() is disabled?

2015-01-13 Thread tomiii
Hi Mike, thanks for the reply. See below.

On Monday, January 12, 2015 at 11:22:41 AM UTC-8, Mike Christie wrote:

 On 1/9/15, 8:28 PM, Thomas Dwyer III wrote: 
  Hi folks, 
  
  I spent some time browsing through this forum but I was unable to find 
 an 
  explanation for this comment referring to the disabled 
 bind_src_by_address() 
  function in io.c: 
  
  *This is not supported for now, because it is not exactly what we want.* 
  *It also turns out that targets will send packets to other interfaces* 
  *causing all types of weird things to happen.* 
  
  I found several posts from people referring to this specific comment but 
 I 
  did not find an explanation. Is it possible that the author of this 
 comment 
  was referring to the ARP flux issue, which may cause a target to 
 associate 
  the bound IP address with the MAC address from an interface other than 
 the 
  one specified with SO_BINDTODEVICE? If so, I don't see how avoiding the 
  call to bind() solves this problem. I would appreciate a reply from 
 anyone 
  who might know what weird things means in this context. 
  

 bind_src_by_address() only did a bind() and was expecting all traffic to 
 flow through the interface with the specified ip address.


Clearly that's a bad assumption on the part of the administrator. If 
restricting traffic to a particular interface is desired, bind() is the 
wrong approach. That's what SO_BINDTODEVICE is for.

 

 If you have 
 multiple interfaces on the same subnet, the network layer would 
 send/recv on any of them. This ended up causing issues with packets not 
 getting sent/received or received in incorrect orders to the iscsi 
 layers on the initiator/target side. 



How does incorrect ordering occur? This is TCP, right? Correct ordering is 
guaranteed regardless of which interface(s) are used.

 


 SO_BINDTODEVICE is not related to bind() (was not sure about your 
 comment about avoiding bind when using that sockopt). It tells the 
 kernel to ignore the normal routing tables and to just use the interface 
 we specify with that call.


Precisely. SO_BINDTODEVICE and bind() are two very different things, for 
two very different purposes. One cannot be used as a substitute for the 
other.
 


 Why are you asking about this? Do you need something like bind by ip? 


Yes, I do. I have an environment where a single interface has multiple 
different IP addresses configured. For example:

# ip addr show eth0
3: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state 
UP qlen 1000
link/ether 00:50:56:9f:4c:5d brd ff:ff:ff:ff:ff:ff
inet 10.108.53.143/21 scope global eth0
inet 10.108.53.243/21 scope global secondary eth0
inet6 fe80::250:56ff:fe9f:4c5d/64 scope link 
   valid_lft forever preferred_lft forever


In this example, my target will only accept logins from 10.108.53.243. 
Unless we call bind(), there's no way to make this work.


Tom.III

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: bind_src_by_address() is disabled?

2015-01-13 Thread tomiii
On Monday, January 12, 2015 at 12:54:38 PM UTC-8, Mike Christie wrote:


 Thomas, let me know if your question was for functionality you needed or 
 just looking through the code and were curious. 


It's functionality that I really want. In fact, I'm here in this forum 
because strace on iscsid showed it was never calling bind(), and when I 
went to add the bind() call myself I discovered all the code I wanted was 
already there but disabled.

How does this proposal sound:

If the administrator configures an explicit interface name with 
iface.net_ifacename, use SO_BINDTODEVICE.
If the administrator configures an explicit IP address with 
iface.ipaddress, use bind().
If the administrator configures both of the above, use both of the above.
If the administrator configures neither, use neither.

I think this would give administrators all the flexibility they need. Is 
there a downside to this proposal that I'm not seeing?


Thanks,
Tom.III

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: bind_src_by_address() is disabled?

2015-01-13 Thread Mike Christie
On 01/12/2015 02:03 PM, tom...@gmail.com wrote:
 
 How does incorrect ordering occur? This is TCP, right? Correct ordering
 is guaranteed regardless of which interface(s) are used.

Software/implementation bugs.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: bind_src_by_address() is disabled?

2015-01-13 Thread Mike Christie
On 01/12/2015 05:30 PM, Thomas Dwyer III wrote:
 I think this would give administrators all the flexibility they need. Is
 there a downside to this proposal that I'm not seeing?

You do not have to debug and support it, so that is why it was
ifdef/commented out :) I am open to accepting patches for it though.

Why was creating a network alias, then using that for the net_ifacename
not usable for you though?

For the ip binding stuff, did you already try uncommenting/ifdeffing the
bind by ip related code? I think there is other related code that has to
be updated. I will look at it when I get some time, but that is not
going to be for a couple weeks or so.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: bind_src_by_address() is disabled?

2015-01-13 Thread Ben Greear
On 01/13/2015 03:03 PM, Mike Christie wrote:
 On 01/12/2015 05:30 PM, Thomas Dwyer III wrote:
 I think this would give administrators all the flexibility they need. Is
 there a downside to this proposal that I'm not seeing?
 
 You do not have to debug and support it, so that is why it was
 ifdef/commented out :) I am open to accepting patches for it though.
 
 Why was creating a network alias, then using that for the net_ifacename
 not usable for you though?

You cannot use BINDTODEVICE with old-style network alias, only a 'real-ish' 
device.
Could use mac-vlans though...

Thanks,
Ben

 
 For the ip binding stuff, did you already try uncommenting/ifdeffing the
 bind by ip related code? I think there is other related code that has to
 be updated. I will look at it when I get some time, but that is not
 going to be for a couple weeks or so.
 


-- 
Ben Greear gree...@candelatech.com
Candela Technologies Inc  http://www.candelatech.com

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.