On Sun, Jan 10, 2010 at 7:41 AM, Roland Dreier <rdre...@cisco.com> wrote: > > > The patch I posted is really fixing the original bug. The problem was > > that neither the SRP target nor the SRP initiator had support for > > SRP_CRED_REQ. Support for SRP_CRED_REQ has to be added to both > > software components in order to fix this bug. > > There's no way for the target to return credits through responses? I > agree that we should implement the full SRP spec in the initiator but it > seems unfortunate to force both an initiator and target upgrade to fix > what really appears to be a target bug. This means anyone running a > pre-2.6.34 kernel won't be able to use the SCST SRP target reliably.
Please let me explain why the SCST SRP target behaves as observed, why this behavior is not specific to SCST, and which workaround is available for pre-2.6.34 SRP initiator users. As known an SRP target passes the so-called req_lim value to the SRP initiator via the REQUEST LIMIT DELTA field of the SRP_LOGIN_RSP information unit. Let's call this value RL. As specified in the SRP r16a document, an initiator may never send more than RL - 2 unanswered SRP_CMD information units to an SRP target. When an SRP_CMD request is being processed by an SRP target, the SRP target can e.g. process this request using one of the following strategies: 1. Using the buffer in which the SRP_CMD request was received to build the response. In this case once the response has been built the target will call ib_post_send() and will wait until the send completion has been received before it will declare that buffer again available for receiving by calling ib_post_recv(). 2. Using separate sets of buffers for receiving SRP_CMD requests and sending back SRP_RSP responses. In this case it is possible for the target to re-enable receiving for the buffer in which the SRP_CMD request was received before the SRP_RSP response is sent back. Regarding approach (2): with this approach the value of the REQUEST LIMIT DELTA field in the SRP_RSP information unit will always equal one. With this approach it will never be necessary that the targets sends an SRP_CRED_REQ information unit to the initiator. Regarding approach (1): since for each SRP_RSP response sent back by the target ib_post_recv() is called in the target after ib_post_send(), at least for the first SRP_RSP response the REQUEST LIMIT DELTA field will be equal to zero. And for a target that is able to process all received SRP_CMD information units in parallel, it can happen that the SRP target sends a contiguous series of (RL - 2) SRP_RSP information units to the initiator with the REQUEST LIMIT DELTA field equal to zero. As a consequence, the value of the req_lim variable in the SRP initiator will be equal to 2 and the initiator won't send any further SRP_CMD requests to the target. A scenario for how to get the SRP initiator into this state can be found in http://bugzilla.kernel.org/show_bug.cgi?id=14235. The only way to get out of this deadlock is that the target send an SRP_CRED_REQ information unit to the initiator with a non-zero REQUEST LIMIT DELTA field, and that the SRP initiator processes this SRP_CRED_REQ information unit. Because of this possible SRP initiator lockup SCST-SRPT users have been recommended to disable parallel processing of information units in this SRP target (by specifying the ib_srpt kernel parameter thread=1). My conclusion is that the SRP initiator lockup explained in http://bugzilla.kernel.org/show_bug.cgi?id=14235 is not specific to SCST-SRPT but that this lockup can be triggered by any SRP target that processes SRP_CMD requests in parallel. So as far as I can see the choices we have are: * Document that SRP_CRED_REQ support is missing in the Linux SRP initiator and hence that command processing in SRP targets must be complicated by making sure that never (RL - 2) contiguous SRP_RSP information units are sent to the SRP initiator with the REQUEST LIMIT DELTA field equal to zero. * Add support for the SRP_CRED_REQ information unit in the Linux SRP initiator. Note: I do not know of any SRP targets that implement approach (2). As far as I know all SRP targets use approach (1). Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html