Re: [PATCH 04/11] IB/srp: keep processing commands during host removal

2012-11-26 Thread Bart Van Assche
On 11/26/12 05:44, David Dillow wrote: > From: Bart Van Assche > > Some SCSI upper layer drivers, e.g. sd, issue SCSI commands from > inside scsi_remove_host() (see also the sd_shutdown() call in > sd_remove()). Make sure that these commands have a chance to reach > the SCSI d

Re: [PATCH 02/11] IB/srp: simplify state tracking

2012-11-26 Thread Bart Van Assche
On 11/26/12 05:44, David Dillow wrote: The state of the target has several conditions that overlap, making it easier to model as a bit-field of exceptional conditions rather than an enum of all possible states. Bart Van Assche did the hard work of identifying the states that can be removed, and

Re: [PATCH 05/11] IB/srp: destroy and recreate QP and CQs on each connection

2012-11-26 Thread Bart Van Assche
On 11/26/12 05:44, David Dillow wrote: From: Ishai Rabinovitz HW QP FATAL errors persist over a reset operation, but we can recover from that by recreating the QP and associated CQs for each connection. Creating a new QP/CQ also completely forecloses any possibility of getting stale completions

Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-11-27 Thread Bart Van Assche
On 11/26/12 05:44, David Dillow wrote: Here is a first, UNTESTED, pass at preparing a merge of Bart's SRP HA work to upstream. It is not complete, as I have not yet added the transport layer error handling and related patches. It is also currently missing the patch to maintain a single connection

Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-11-28 Thread Bart Van Assche
On 11/27/12 23:13, Or Gerlitz wrote: On Tue, Nov 27, 2012 at 6:34 PM, Bart Van Assche wrote: Thanks Dave for doing all this work. A reworked and retested patch series that should address all comments that have been posted so far can be found here: http://github.com/bvanassche/linux/srp-ha. I

Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-12-05 Thread Bart Van Assche
On 12/05/12 19:23, Or Gerlitz wrote: On Fri, Nov 30, 2012 at 4:21 AM, David Dillow wrote: [...] Modulo a few style issues (braces around one line if branches, etc.) and having three state variables vs one, I can live with everything up to aabfa852acd27962 at git://github.com/bvanassche/linux.gi

Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-12-05 Thread Bart Van Assche
On 12/05/12 19:50, Bart Van Assche wrote: On 12/05/12 19:23, Or Gerlitz wrote: On Fri, Nov 30, 2012 at 4:21 AM, David Dillow wrote: [...] Modulo a few style issues (braces around one line if branches, etc.) and having three state variables vs one, I can live with everything up to

Re: srp-ha backport

2012-12-06 Thread Bart Van Assche
On 12/06/12 10:52, Vasiliy Tolstov wrote: Now i'm switch from sles kernel to 3.6.7 All works fine , but now you patches from github provide some errors: /sbin/service openibd restart Unloading ib_srp [FAILED] Removing 'ib_srp': Device or resource busy xen11:~ #

Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-12-06 Thread Bart Van Assche
On 12/05/12 22:32, Or Gerlitz wrote: On Wed, Dec 5, 2012 at 8:50 PM, Bart Van Assche wrote: [...] The only way to make I/O work reliably if a failure can occur at the transport layer is to use multipathd on top of ib_srp. If a connection fails for some reason, then the SRP SCSI host will be

Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-12-06 Thread Bart Van Assche
On 12/06/12 15:27, Or Gerlitz wrote: The core problem here seems to be that scsi_remove_host simply never ends. Hello Or, The later patches in the srp-ha patch series avoided such behavior by checking whether the connection between SRP initiator and target is unique, and by removing duplicat

[PATCH for-next] IB/srp: Make SCSI error handling finish

2012-12-07 Thread Bart Van Assche
If a SCSI command times out it is passed to the SCSI error handler. The SCSI error handler keeps trying to abort a command until aborting succeeded or the command has been finished. Avoid that attempts to abort a command without RDMA RC connection trigger an endless loop. Signed-off-by: Bart Van

Re: [PATCH 00/11] First pass at merging Bart's HA work

2012-12-08 Thread Bart Van Assche
On 12/07/12 22:47, Vu Pham wrote: I applied your latest patch [PATCH for-next] IB/srp: Make SCSI error handling finish and test Let me capture what I'm seeing: Host has two paths (scsi_host 7 & 8) to target thru two physical ports 1 & 2 [root@rsws42 ~]# multipath -l size=50G features='0' hwhan

Re: [PATCH] [RFC] osm_log printing incorrectly assumes that pthread_t is not opaque type

2012-12-09 Thread Bart Van Assche
On 12/09/12 10:26, Garrett Cooper wrote: +#if defined(__FreeBSD__) +"%s %02d %02d:%02d:%02d %06d [%p] 0x%02x -> %s", +#else "%s %02d %02d:%02d:%02d %06d [%04X] 0x%02x -> %s", +#endif Please cast the pthread_t value to an unsigned long long or another integral type. Su

Re: [PATCH 3/3] Avoid linker error with clang 3.0

2012-12-09 Thread Bart Van Assche
On 12/09/12 12:07, Garrett Cooper wrote: > It seems that there's a bug when linking inlined functions with clang; this > issue will need to be upstreamed and reverified with clang 3.2. > > Signed-off-by: Garrett Cooper > --- > osmtest/osmtest.c | 2 +- > 1 file changed, 1 insertion(+), 1 delet

[PATCH v2] IB/SRP patches for kernel 3.8

2012-12-14 Thread Bart Van Assche
Hello Dave, It would be appreciated if you could have a look at the following two patches: * Track connection state properly. Apparently an assignment statement had not been dropped while it should have been dropped. * Avoid endless SCSI error handling loop after cable pull. Thanks, Bart. -

[PATCH v2 1/2] IB/srp: Track connection state properly

2012-12-14 Thread Bart Van Assche
The connection state must be initialized before srp_connect_target() is invoked. Drop the assignment in srp_add_target() since scsi_host_alloc() zero-initializes the Scsi_Host structure anyway. This patch makes ib_srp again report the first QP error. Signed-off-by: Bart Van Assche Cc: David

[PATCH v2 2/2] IB/srp: Avoid endless SCSI error handling loop

2012-12-14 Thread Bart Van Assche
if the QP is in the error state. - Make srp_reset_host() reset SCSI requests even if host removal has already started or if reconnecting fails. Signed-off-by: Bart Van Assche Cc: David Dillow Cc: Roland Dreier Reported-by: Or Gerlitz Cc: Vu Pham Cc: Alex Turin --- drivers/infiniband/ulp

Re: [PATCH v2 2/2] IB/srp: Avoid endless SCSI error handling loop

2012-12-14 Thread Bart Van Assche
On 12/14/12 16:55, David Dillow wrote: On Fri, 2012-12-14 at 16:38 +0100, Bart Van Assche wrote: If a SCSI command times out it is passed to the SCSI error handler. The SCSI error handler will try to abort the command that timed out. If aborting failed a device reset will be attempted. If the

Re: [PATCH v2 2/2] IB/srp: Avoid endless SCSI error handling loop

2012-12-14 Thread Bart Van Assche
On 12/14/12 17:19, David Dillow wrote: On Fri, 2012-12-14 at 17:12 +0100, Bart Van Assche wrote: On 12/14/12 16:55, David Dillow wrote: This is much more than your original patch that Alex claimed fixed his issues; are you not merging two separate issues? > Also, there's no r

Re: [PATCH v2 2/2] IB/srp: Avoid endless SCSI error handling loop

2012-12-19 Thread Bart Van Assche
On 12/19/12 05:09, David Dillow wrote: Did you update the patch? I think I'm on-board with the idea. Sorry for the delay. I will post the updated patch series. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org

[PATCH v3 0/3] IB/SRP patches for kernel 3.8

2012-12-19 Thread Bart Van Assche
This patch series avoids that SCSI error handling triggers an endless loop and also restores reporting of QP errors in the kernel log. Changes between v3 and v2: - As proposed by Dave, added a patch that prevents sending of a task management function over a closed connection. Changes between

[PATCH v3 1/3] IB/srp: Track connection state properly

2012-12-19 Thread Bart Van Assche
The connection state must be initialized before srp_connect_target() is invoked. Drop the assignment in srp_add_target() since it occurs after srp_connect_target() and since scsi_host_alloc() zero-initializes the Scsi_Host structure anyway. Signed-off-by: Bart Van Assche Acked-by: David Dillow

[PATCH v3 2/3] IB/srp: Avoid sending a task management function needlessly

2012-12-19 Thread Bart Van Assche
Do not send a task management function if sending will fail anyway because either there is no RDMA/RC connection or the QP is in the error state. Signed-off-by: Bart Van Assche Cc: David Dillow Cc: Roland Dreier --- drivers/infiniband/ulp/srp/ib_srp.c |5 +++-- 1 file changed, 3

[PATCH v3 3/3] IB/srp: Avoid endless SCSI error handling loop

2012-12-19 Thread Bart Van Assche
. Modify the SCSI error handling functions in ib_srp as follows: - Abort SCSI commands properly even if the QP is in the error state. - Make srp_reset_host() reset SCSI requests even after host removal has already started or if reconnecting fails. Signed-off-by: Bart Van Assche Cc: David Dillow Cc

Re: [PATCH v3 1/3] IB/srp: Track connection state properly

2012-12-20 Thread Bart Van Assche
On 12/19/12 19:04, David Dillow wrote: On Wed, 2012-12-19 at 15:21 +0100, Bart Van Assche wrote: The connection state must be initialized before srp_connect_target() is invoked. Drop the assignment in srp_add_target() since it occurs after srp_connect_target() and since scsi_host_alloc() zero

Re: [PATCH v3 0/3] IB/SRP patches for kernel 3.8

2012-12-20 Thread Bart Van Assche
On 12/20/12 13:38, Or Gerlitz wrote: I think few days ago you had a patch on your tree named "Save and restore host_scribble during error handling", is it possible we need this here for happy removal of the scsi host? No. Host removal works fine even without that patch. That's because srp_abor

Re: [PATCH v3 1/3] IB/srp: Track connection state properly

2012-12-20 Thread Bart Van Assche
On 12/20/12 16:10, David Dillow wrote: On Thu, 2012-12-20 at 09:13 +0100, Bart Van Assche wrote: On 12/19/12 19:04, David Dillow wrote: On Wed, 2012-12-19 at 15:21 +0100, Bart Van Assche wrote: The connection state must be initialized before srp_connect_target() is invoked. Drop the

[PATCH v4 1/3] IB/srp: Track connection state properly

2012-12-20 Thread Bart Van Assche
Remove an assignment that incorrectly overwrites the connection state update by srp_connect_target(). Signed-off-by: Bart Van Assche Acked-by: David Dillow Cc: Roland Dreier --- drivers/infiniband/ulp/srp/ib_srp.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/infiniband/ulp

Re: [PATCH 1/2] IB/rds: Correct ib_api use with gs_dma_address/sg_dma_len

2012-12-25 Thread Bart Van Assche
On 12/21/12 19:01, Mike Marciniszyn wrote: > 0b088e00 ("RDS: Use page_remainder_alloc() for recv bufs") > added uses of sg_dma_len() and sg_dma_address(). This makes > RDS DOA with the qib driver. > > IB ulps should use ib_sg_dma_len() and ib_sg_dma_address > respectively since some HCAs overload

Re: [PATCH] IB/srp: disconnect to SRP target before removing SCSI host

2013-01-07 Thread Bart Van Assche
This patch is based on the srp-ha-v3.7 tree by Bart Van Assche. See also <https://github.com/advance38/linux/tree/ib-srp-remove-target-v3.7>. If necessary, I could rebase it on the stable tree. Signed-off-by: Dongsu Park Cc: Sebastian Riemer Cc: Bart Van Assche Cc: David Dillow Cc: Rol

Re: [PATCH] IB/srp: disconnect to SRP target before removing SCSI host

2013-01-11 Thread Bart Van Assche
On 01/11/13 15:07, Dongsu Park wrote: However, that action will hang forever until the target machine comes up again. Precisely it's blocked on scsi_execute() directly after sending SYNCHRONIZE_CACHE command to the first target of the host. As IB stack is not able to give any response, further ta

Re: [PATCH] opensm/configure.in: Remove Default-Start from opensmd init script

2013-01-30 Thread Bart Van Assche
On 01/29/13 18:18, Alex Netes wrote: During opensm RPM packaging, `chkconfig --add opensmd` is called. `chkconfig --add` creates the appropriate entry as specified by the default values in the init script. Having opensmd run by default on boot isn't desired. Signed-off-by: Alex Netes --- conf

Re: [PATCH] opensm/configure.in: Remove Default-Start from opensmd init script

2013-01-30 Thread Bart Van Assche
On 01/30/13 16:43, Doug Ledford wrote: On 01/30/13 03:59, Bart Van Assche wrote: On 01/29/13 18:18, Alex Netes wrote: During opensm RPM packaging, `chkconfig --add opensmd` is called. `chkconfig --add` creates the appropriate entry as specified by the default values in the init script. Having

Re: [PATCH] opensm/configure.in: Remove Default-Start from opensmd init script

2013-01-30 Thread Bart Van Assche
On 01/30/13 18:48, Doug Ledford wrote: On 1/30/2013 11:00 AM, Bart Van Assche wrote: Which convention is followed for other packages ? This is what I found in the Fedora 18 iscsi-initiator-utils package (http://be.mirror.eurid.eu/fedora/linux/releases/18/Fedora/source/SRPMS/i/iscsi-initiator

[LSF/MM TOPIC] Reducing the SRP initiator failover time

2013-02-01 Thread Bart Van Assche
It is known that it takes about two to three minutes before the upstream SRP initiator fails over from a failed path to a working path. This is not only considered longer than acceptable but is also longer than other Linux SCSI initiators (e.g. iSCSI and FC). Progress so far with improving the

[PATCH for 3.8 v3, resend 0/3] IB/SRP patches for kernel 3.8

2013-02-01 Thread Bart Van Assche
This patch series avoids that SCSI error handling triggers an endless loop and also restores reporting of QP errors in the kernel log. Changes between v3 and v2: - As proposed by Dave, added a patch that prevents sending of a task management function over a closed connection. Changes between

[PATCH for 3.8 v3, resend 1/3] IB/srp: Track connection state properly

2013-02-01 Thread Bart Van Assche
Remove an assignment that incorrectly overwrites the connection state update by srp_connect_target(). Signed-off-by: Bart Van Assche Acked-by: David Dillow Cc: Roland Dreier --- drivers/infiniband/ulp/srp/ib_srp.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers/infiniband/ulp

[PATCH for 3.8 v3, resend 2/3] IB/srp: Avoid sending a task management function needlessly

2013-02-01 Thread Bart Van Assche
Do not send a task management function if sending will fail anyway because either there is no RDMA/RC connection or the QP is in the error state. Signed-off-by: Bart Van Assche Acked-by: David Dillow Cc: Roland Dreier --- drivers/infiniband/ulp/srp/ib_srp.c |5 +++-- 1 file changed, 3

[PATCH for 3.8 v3, resend 3/3] IB/srp: Avoid endless SCSI error handling loop

2013-02-01 Thread Bart Van Assche
. Modify the SCSI error handling functions in ib_srp as follows: - Abort SCSI commands properly even if the QP is in the error state. - Make srp_reset_host() reset SCSI requests even after host removal has already started or if reconnecting fails. Signed-off-by: Bart Van Assche Acked-by: David Dillow

Re: [PATCH] opensm/configure.in: Remove Default-Start from opensmd init script

2013-02-04 Thread Bart Van Assche
On 02/04/13 16:36, Alex Netes wrote: On 09:20 Thu 31 Jan , Doug Ledford wrote: On 01/31/13 02:21, Alex Netes wrote: On 14:24 Wed 30 Jan , Doug Ledford wrote: On 1/30/2013 2:12 PM, Bart Van Assche wrote: On 01/30/13 18:48, Doug Ledford wrote: On 1/30/2013 11:00 AM, Bart Van Assche

Re: [PATCH for 3.8 v3, resend 0/3] IB/SRP patches for kernel 3.8

2013-02-05 Thread Bart Van Assche
On 02/04/13 22:11, Or Gerlitz wrote: On Fri, Feb 1, 2013 at 5:18 PM, Bart Van Assche wrote: This patch series avoids that SCSI error handling triggers an endless loop and also restores reporting of QP errors in the kernel log. Bart, You wrote "resend" in the subject line, anythi

Re: [PATCH for 3.8 v3, resend 0/3] IB/SRP patches for kernel 3.8

2013-02-05 Thread Bart Van Assche
On 02/05/13 21:54, Or Gerlitz wrote: On Tue, Feb 5, 2013 at 6:25 PM, Bart Van Assche wrote: On 02/04/13 22:11, Or Gerlitz wrote: Bart, I'd like to sharpen the point: could you please clarify if the series posted to linux-rdma stands for itself in the sense that SRP HA scheme X (please

Re: [PATCH for 3.8 v3, resend 0/3] IB/SRP patches for kernel 3.8

2013-02-05 Thread Bart Van Assche
On 02/06/13 08:44, Or Gerlitz wrote: On 06/02/2013 09:22, Bart Van Assche wrote: A huge number of patches have been taken upstream between 3.8-rc1 and 3.8-rc6. I have retested these three patches with 3.8-rc6 and would appreciate if you would also repeat your tests. not really... this is

Re: [PATCH for 3.8 v3, resend 0/3] IB/SRP patches for kernel 3.8

2013-02-07 Thread Bart Van Assche
On 02/06/13 22:42, Vu Pham wrote: Conclusion: 1. disable the port/path long enough >35 minutes, we have dangling scsi host. 2. enable the port within 30 minute, scsi host re-establish connection, path re-instate and then scsi_host was removed (no entry in sysfs) I attached a log here to show wha

Re: [PATCH for 3.8 v3, resend 0/3] IB/SRP patches for kernel 3.8

2013-02-07 Thread Bart Van Assche
On 02/07/13 10:41, Or Gerlitz wrote: (BTW - if the fourth patch that Vu used "save & restore host_scribble during error handling" is also needed, maybe you add it to this series, so they are reviewed/accepted together). Hello Or, The three patches I posted guarantee timely host removal even w

Re: [PATCH] IB/lmx4: silence GCC warning

2013-02-13 Thread Bart Van Assche
On 10/29/12 10:50, Paul Bolle wrote: On Wed, 2012-10-10 at 09:23 +0200, Jack Morgenstein wrote: You could use: u16 uninitialized_var(vlan); instead. I guess we'd better just wait and see whether uninitialized_var() survives before discussing your suggestion (see the thread starting at ht

[PATCH] IB/srp: Fail I/O requests if the transport is offline

2013-02-15 Thread Bart Van Assche
of failing requests if (!target->connected || target->qp_in_error) such that the SCSI error handler has a chance to retry commands after a transport layer failure occurred. Signed-off-by: Bart Van Assche Cc: David Dillow Cc: Or Gerlitz Cc: Vu Pham --- drivers/infiniband/ulp/srp/ib_srp.c

Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline

2013-02-21 Thread Bart Van Assche
On 02/18/13 05:06, David Dillow wrote: On Fri, 2013-02-15 at 10:39 +0100, Bart Van Assche wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8a7eb9f..b34752d 100644 --- a/drivers/infiniband/ulp/srp/ib_srp.c +++ b/drivers/infiniband/ulp/srp

[PATCH v4 0/4] IB/srp fixes

2013-02-21 Thread Bart Van Assche
This patch series avoids that SCSI error handling triggers an endless loop and also restores reporting of QP errors in the kernel log. Changes between v4 and v3: - Added a patch that ensures that the SCSI host gets removed in time when a user space process keeps queueing I/O during removal, e.

[PATCH v4 1/4] IB/srp: Track connection state properly

2013-02-21 Thread Bart Van Assche
Remove an assignment that incorrectly overwrites the connection state update by srp_connect_target(). Signed-off-by: Bart Van Assche Acked-by: David Dillow Cc: Roland Dreier Cc: # 3.8 --- drivers/infiniband/ulp/srp/ib_srp.c |1 - 1 file changed, 1 deletion(-) diff --git a/drivers

[PATCH v4 2/4] IB/srp: Avoid sending a task management function needlessly

2013-02-21 Thread Bart Van Assche
Do not send a task management function if sending will fail anyway because either there is no RDMA/RC connection or the QP is in the error state. Signed-off-by: Bart Van Assche Acked-by: David Dillow Cc: Roland Dreier Cc: # 3.8 --- drivers/infiniband/ulp/srp/ib_srp.c |5 +++-- 1 file

[PATCH v4 3/4] IB/srp: Avoid endless SCSI error handling loop

2013-02-21 Thread Bart Van Assche
. Modify the SCSI error handling functions in ib_srp as follows: - Abort SCSI commands properly even if the QP is in the error state. - Make srp_reset_host() reset SCSI requests even after host removal has already started or if reconnecting fails. Signed-off-by: Bart Van Assche Acked-by: David Dillow

[PATCH v4 4/4] IB/srp: Fail I/O requests if the transport is offline

2013-02-21 Thread Bart Van Assche
of failing requests if (!target->connected || target->qp_in_error) such that the SCSI error handler has a chance to retry commands after a transport layer failure occurred. Signed-off-by: Bart Van Assche Cc: David Dillow Cc: Or Gerlitz Cc: Vu Pham Cc: # 3.8 --- drivers/infiniband/u

Re: [PATCH] IB/srp: Fail I/O requests if the transport is offline

2013-02-24 Thread Bart Van Assche
On 02/18/13 09:11, Sagi Grimberg wrote: On 2/18/2013 6:06 AM, David Dillow wrote: On Fri, 2013-02-15 at 10:39 +0100, Bart Van Assche wrote: diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index 8a7eb9f..b34752d 100644 --- a/drivers/infiniband/ulp/srp

Re: srptools ("Virtual" ibnetdiscover command fails)

2013-02-25 Thread Bart Van Assche
On 02/06/13 11:40, Sebastian Riemer wrote: On 06.02.2013 11:20, Or Gerlitz wrote: On 06/02/2013 12:04, Mathis GAVILLON wrote: Just a last question : is that possible VFs lid to be different from PF one ? NO, we've implemented a "shared port" model, so all functions on the same IB port use the

Re: v3.7: Unloading ib_ipoib triggers circular locking dependency complaint

2013-03-12 Thread Bart Van Assche
On 11/26/12 09:00, Or Gerlitz wrote: On Fri, Nov 23, 2012 at 2:10 PM, Bart Van Assche wrote: Apparently unloading the ib_ipoib kernel module triggers a circular locking dependency complaint. Has anyone already been looking into this ? Yes, I see that this happens here e.g when doing hot

Re: [RFC ib_srp-backport] ib_srp: bind fast IO failing to QP timeout

2013-03-19 Thread Bart Van Assche
On 03/19/13 11:16, Sebastian Riemer wrote: Hi Bart, now I've got my priority on SRP again. I've also noticed that your ib_srp-backport doesn't fail the IO fast enough. The fast_io_fail_tmo only comes into play after the QP is already in timeout and the "terminate_rport_io" function is missing.

Re: [PATCH] ipoib: fix hard_header return value

2013-03-26 Thread Bart Van Assche
On 03/26/13 17:24, Doug Ledford wrote: If you have a patched up dhcp server (and dhclient), they will use AF_PACKET/SOCK_DGRAM pair to send dhcp packets over IPoIB. This has worked since forever if you use OFED kernels or one of the distribution kernels. However, when testing an upstream kernel

Re: [PATCH 0/3] Least attached vector support

2013-04-02 Thread Bart Van Assche
Yevgeny Petrilin writes: > Hello Roland, > > Those patches where submitted a while ago, I cleaned them up a little and generated against your latest git. > They allow to hw driver to choose to which EQ a CQ would be attached, considering the load on its eqs. (replying to an e-mail from three ye

Re: [PATCH 4/5] tipc: add InfiniBand media type

2013-04-07 Thread Bart Van Assche
On 04/03/13 14:43, Patrick McHardy wrote: diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h +#ifdef CONFIG_TIPC_MEDIA_IB +int tipc_ib_media_start(void); +void tipc_ib_media_stop(void); +#else +int tipc_ib_media_start(void) { return 0; } +void tipc_ib_media_stop(void) { return; } +#endif Is

Re: linux 3.8.6 and srp backports

2013-04-08 Thread Bart Van Assche
On 04/08/13 10:00, Vasiliy Tolstov wrote: Hello. Some times ago, when i'm use kernel 3.6 i'm use https://github.com/bvanassche/ib_srp-backport/ for srp drivers on my linux server. Now i'm using 3.8.6, does i need something from https://github.com/bvanassche/ib_srp-backport/ or all patches already

Re: [ANNOUNCE] SRP: ProfitBricks publishes its SRP Initiator patches

2013-05-03 Thread Bart Van Assche
On 04/30/13 09:34, Vasiliy Tolstov wrote: What is main difference between bvanassche repo and sriemer ? Good question. As soon as I have the time I will try to find a single approach that works for everyone and post a new patch series for review on the linux-rdma mailing list such that these

Re: [ANNOUNCE] SRP: ProfitBricks publishes its SRP Initiator patches

2013-05-06 Thread Bart Van Assche
On 05/06/13 10:44, Sebastian Riemer wrote: Sorry Bart, but a reconnect with just the commit message "IB/srp: Add kernel-level transport layer recovery" and no further description isn't very trustworthy for me. I also wonder why you need so much locking. Hello Sebastian, There is a very good re

Re: [ANNOUNCE] SRP: ProfitBricks publishes its SRP Initiator patches

2013-05-14 Thread Bart Van Assche
On 05/14/13 12:00, Vasiliy Tolstov wrote: if i need faster reconnects and ability to close session from initiator side under qlogic hardware, does it possible? Or this patches only covers mallanox cards? The ability to close a session from the initiator side went upstream in kernel 3.8 (/sys/c

Re: [ANNOUNCE] SRP: ProfitBricks publishes its SRP Initiator patches

2013-05-15 Thread Bart Van Assche
On 05/15/13 07:12, Vasiliy Tolstov wrote: Thanks. What about close session from target side? For example i need to close the srp session and block all access from specific initiator? The traditional approach to block access from a specific initiator is to modify the LUN masking configuration a

Re: MLX4 Cq Question

2013-05-21 Thread Bart Van Assche
On 05/21/13 11:40, Or Gerlitz wrote: 2. is possible in the Linux kernel for one hard irq callback to flash on CPU X while another hard irq callback is running on the same CPU? I think that from kernel 2.6.35 on MSI IRQs are no longer nested. See also http://git.kernel.org/cgit/linux/kernel/gi

Re: Patch: Support for Xeon Phi

2013-05-24 Thread Bart Van Assche
On 05/24/13 19:02, Bryce Lelbach wrote: > The attached patch modifies the kernel Infiniband drivers to support the Xeon > Phi > co-processor. > > This patch is a modified version of a patch from Intel's MPSS framework > (specifically, from the "KNC_gold_update_1-2.1.4982-15-rhel-6.3" package), >

Re: Combining distro IB tools and OFED

2013-06-08 Thread Bart Van Assche
On 06/08/13 04:31, Bruce McKenzie wrote: ive compiled ubuntu 13.04 to kernel 3.6.11 with OFED 2 from Mellanox, and it works ok, performance is a little better with SRP. Some packages dont seem to work, ie srptools and IB-diags some commands fail, which looks like those tools havenet been tested

Re: How to do replication right with SRP or remote storage?

2013-06-10 Thread Bart Van Assche
On 06/10/13 14:05, Sebastian Riemer wrote: Perhaps, I should collect all guys who require MD RAID-1 for remote storage replication in order to put some pressure on Neil. If I remember correctly one of the things Neil is trying to explain to md users is that when md is used without write-intent

[PATCH 0/14] IB SRP initiator patches for kernel 3.11

2013-06-12 Thread Bart Van Assche
The purpose of this InfiniBand SRP initiator patch series is as follows: - Make the SRP initiator driver better suited for use in a H.A. setup. Speed up failover by reducing the IB RC retry count and by notifying multipathd faster about transport layer failures by adding fast_io_fail_tmo and

[PATCH 01/14] IB/srp: Fix remove_one crash due to resource exhaustion

2013-06-12 Thread Bart Van Assche
+0x16/0x1b [bvanassche: Shortened patch description] Signed-off-by: Dotan Barak Reviewed-by: Eli Cohen Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c |2 ++ 1 file changed, 2 insertions(+) diff --

[PATCH 02/14] IB/srp: Fix race between srp_queuecommand() and srp_claim_req()

2013-06-12 Thread Bart Van Assche
Avoid that srp_claim_command() can claim a command while srp_queuecommand() is still busy queueing the same command. Found this via source reading. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c

[PATCH 03/14] IB/srp: Avoid that srp_reset_host() is skipped after a TL error

2013-06-12 Thread Bart Van Assche
after a transport layer error. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b

[PATCH 04/14] IB/srp: Skip host settle delay

2013-06-12 Thread Bart Van Assche
The SRP initiator implements host reset by reconnecting to the SRP target. That means that communication with the target is possible as soon as host reset finished. Hence skip the host settle delay. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham Cc: Sebastian

[PATCH 05/14] IB/srp: Maintain a single connection per I_T nexus

2013-06-12 Thread Bart Van Assche
or handler will cause the SRP initiator to reconnect, which will cause I/O over the second connection to fail. Avoid such ping-pong behavior by disabling relogins. Note: if reconnecting manually is necessary, that is possible by deleting and recreating an rport via sysfs. Signed-off-by: Bart Van

[PATCH 06/14] IB/srp: Keep rport as long as the IB transport layer

2013-06-12 Thread Bart Van Assche
queuecommand callback is racy because srp_remove_host() must be invoked before scsi_remove_host() and because the queuecommand callback may get invoked after srp_remove_host() has finished. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: James Bottomley Cc: David Dillow Cc: Vu Pham Cc: Sebastian

[PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-12 Thread Bart Van Assche
after having detected a transport layer problem and before failing I/O. - Support for implementing dev_loss_tmo, the time that should elapse after having detected a transport layer problem and before removing a remote port. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: James Bottomley

[PATCH 08/14] IB/srp: Add srp_terminate_io()

2013-06-12 Thread Bart Van Assche
Finish all outstanding I/O requests after fast_io_fail_tmo expired, which speeds up failover in a multipath setup. This patch is a reworked version of a patch from Sebastian Riemer. Reported-by: Sebastian Riemer Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham Cc

[PATCH 09/14] IB/srp: Use SRP transport layer error recovery

2013-06-12 Thread Bart Van Assche
Enable fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP initiator. Signed-off-by: Bart Van Assche Cc: David Dillow Cc: Roland Dreier Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c | 123 +-- drivers/infiniband/ulp/srp

[PATCH 10/14] IB/srp: Start timers if a transport layer error occurs

2013-06-12 Thread Bart Van Assche
Start the reconnect timer, fast_io_fail timer and dev_loss timer if a transport layer error occurs. Signed-off-by: Bart Van Assche Cc: David Dillow Cc: Roland Dreier Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c | 19 +++ drivers/infiniband/ulp

[PATCH 11/14] IB/srp: Fail SCSI commands silently

2013-06-12 Thread Bart Van Assche
description] Signed-off-by: Sebastian Riemer Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham --- drivers/infiniband/ulp/srp/ib_srp.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c index

[PATCH 12/14] IB/srp: Make HCA completion vector configurable

2013-06-12 Thread Bart Van Assche
ws to reduce latency on an initiator connected to multiple SRP targets but also allows to improve throughput. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c | 26 -- drivers/infin

[PATCH 13/14] IB/srp: Make transport layer retry count configurable

2013-06-12 Thread Bart Van Assche
tivating the SCSI error handler on an IB path with a regular BER or due to brief IB network congestion. [bvanassche: Rewrote patch description] Signed-off-by: Vu Pham Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham Cc: Sebastian Riemer --- drivers/infiniband/ulp/

[PATCH 14/14] IB/srp: Bump driver version and release date

2013-06-12 Thread Bart Van Assche
From: Vu Pham Signed-off-by: Vu Pham Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Sebastian Riemer --- drivers/infiniband/ulp/srp/ib_srp.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers

Re: [PATCH 01/14] IB/srp: Fix remove_one crash due to resource exhaustion

2013-06-12 Thread Bart Van Assche
On 06/12/13 15:20, Bart Van Assche wrote: If the add_one callback fails during driver load no resources are allocated so there isn't a need to release any resources. Trying to clean the resource may lead to the following kernel panic: BUG: unable to handle kernel NULL pointer dereferen

Re: [PATCH 02/14] IB/srp: Fix race between srp_queuecommand() and srp_claim_req()

2013-06-12 Thread Bart Van Assche
On 06/12/13 16:58, Sebastian Riemer wrote: Wait a minute, so you've changed this commit to also hold that target lock in the following functions in error case: srp_unmap_data(), srp_put_tx_iu() This is different from: https://github.com/bvanassche/ib_srp-backport/commit/6ce0e30dbb69973926df8429

Re: [PATCH 05/14] IB/srp: Maintain a single connection per I_T nexus

2013-06-12 Thread Bart Van Assche
via sysfs. Signed-off-by: Bart Van Assche Cc: Roland Dreier Cc: David Dillow Cc: Vu Pham Cc: Sebastian Riemer Thanks Bart for refreshing this new patch set. I think you should add Signed-off-by for Sebastian ? This patch differs slightly from what Sebastian had posted. But if Sebastian a

Re: [PATCH 03/14] IB/srp: Avoid that srp_reset_host() is skipped after a TL error

2013-06-13 Thread Bart Van Assche
On 06/13/13 11:30, Sebastian Riemer wrote: On 12.06.2013 15:23, Bart Van Assche wrote: The SCSI error handler assumes that the transport layer is operational if an eh_abort_handler() returns SUCCESS. Hence let srp_abort() only return SUCCESS if sending the ABORT TASK task management function

Re: [PATCH 05/14] IB/srp: Maintain a single connection per I_T nexus

2013-06-13 Thread Bart Van Assche
On 06/13/13 15:57, Sebastian Riemer wrote: > You've only changed the style of this function. Functionality is still > the same. Fine for me. > > But why do you put it that high in the source code? > Do you (still) need it for something else? > > I would put it directly in front of srp_create_targ

Re: recommend setting for dev_loss_tmo and fast_io_fail_tmo

2013-06-13 Thread Bart Van Assche
On 06/13/13 17:52, Vasiliy Tolstov wrote: Hello. I'm using 3.9.5 and ib_srp_backport from github. What is recommended setting for dev_loss_tmo and fast_io_fail_tmo ? Now i have dev_loss_tmo = 60 fast_io_fail_tmo = 40 Does it right? I need to wait no more 100 seconds for failed path. (i'm use so

Re: [PATCH 05/14] IB/srp: Maintain a single connection per I_T nexus

2013-06-13 Thread Bart Van Assche
On 06/13/13 19:50, Vu Pham wrote: Hello Bart, +/** + * srp_conn_unique() - check whether the connection to a target is unique + */ +static bool srp_conn_unique(struct srp_host *host, +struct srp_target_port *target) +{ +struct srp_target_port *t; +bool ret = false; + +

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-14 Thread Bart Van Assche
On 06/13/13 21:43, Vu Pham wrote: > Hello Bart, > >> >> +What:/sys/class/srp_remote_ports/port-:/dev_loss_tmo >> +Date:September 1, 2013 >> +KernelVersion:3.11 >> +Contact:linux-s...@vger.kernel.org, linux-rdma@vger.kernel.org >> +Description:Number of seconds the SCSI

Re: recommend setting for dev_loss_tmo and fast_io_fail_tmo

2013-06-14 Thread Bart Van Assche
On 06/13/13 20:21, Vasiliy Tolstov wrote: P.S. In case of kernel 3.9.5 does i need you backported ib_srp driver or i can use mainline kernel drivers for faster reconnects? P.P.S. Can you provide me subject of e-mail or link to patches to switch off dev_loss_tmo? Hello Vasiliy, Does this mean t

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-15 Thread Bart Van Assche
On 06/14/13 19:59, Vu Pham wrote: On 06/13/13 21:43, Vu Pham wrote: +/** + * srp_tmo_valid() - check timeout combination validity + * + * If no fast I/O fail timeout has been configured then the device loss timeout + * must be below SCSI_DEVICE_BLOCK_MAX_TIMEOUT. If a fast I/O fail timeout has +

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-17 Thread Bart Van Assche
On 06/17/13 08:18, Hannes Reinecke wrote: On 06/15/2013 11:52 AM, Bart Van Assche wrote: On 06/14/13 19:59, Vu Pham wrote: On 06/13/13 21:43, Vu Pham wrote: +/** + * srp_tmo_valid() - check timeout combination validity + * + * If no fast I/O fail timeout has been configured then the device

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-17 Thread Bart Van Assche
On 06/17/13 09:14, Hannes Reinecke wrote: On 06/17/2013 09:04 AM, Bart Van Assche wrote: I agree that the value of fast_io_fail_tmo should be kept small. Although as you explained changing the SCSI device state into SDEV_BLOCK doesn't help for I/O that has already been queued on a failed

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-19 Thread Bart Van Assche
On 06/18/13 18:59, Vu Pham wrote: Bart Van Assche wrote: On 06/14/13 19:59, Vu Pham wrote: On 06/13/13 21:43, Vu Pham wrote: If rport's state is already SRP_RPORT_BLOCKED, I don't think we need to do extra block with scsi_block_requests() Please keep in mind that srp_reconnect_r

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-19 Thread Bart Van Assche
On 06/19/13 15:44, Jack Wang wrote: + /* +* It can occur that after fast_io_fail_tmo expired and before +* dev_loss_tmo expired that the SCSI error handler has +* offlined one or more devices. scsi_target_unblock() doesn't +

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-24 Thread Bart Van Assche
On 06/23/13 23:13, Mike Christie wrote: > On 06/12/2013 08:28 AM, Bart Van Assche wrote: >> +/* >> + * It can occur that after fast_io_fail_tmo expired and before >> + * dev_loss_tmo expired that the SCSI error handler has >> +

Re: [PATCH 07/14] scsi_transport_srp: Add transport layer error handling

2013-06-24 Thread Bart Van Assche
On 06/24/13 15:48, Jack Wang wrote: I'm not sure it's possible to avoid such a race without introducing a new mutex. How about something like the (untested) SCSI core patch below, and invoking scsi_block_eh() and scsi_unblock_eh() around any reconnect activity not initiated from the SCSI EH threa

<    2   3   4   5   6   7   8   9   10   11   >