(From: Mr. James Tan (Urgent & Confidential)
-- (From: Mr.James Tan (Urgent & Confidential) Good Day, Please Read. My name is Mr.James Tan , I am the Bill and Exchange manager here in Bank of Africa (BOA) Lome-Togo.West-Africa. I have a business proposal in the tune of $9.7m, (Nine Million Seven Hundred Thousand Dollars Only) after the successful transfer, we shall share in ratio of 40% for you and 60% for me. Should you be interested, please contact me through my private email (tanjames...@gmail.com) so we can commence all arrangements and i will give you more information on how we would handle this project. Please treat this business with utmost confidentiality and send me the Following: 1. Your Full Name: 2. Your Phone Number:.. 3. Your Age: 4. Your Sex:... 5. Your Occupations: 6. Your Country and City: Kind Regards, Mr.James Tan . Bill & Exchange manager (BOA) Bank of Africa. Lome-Togo. Email: tanjames...@gmail.com
Re: [PATCH 2/2] block: Improve error handling verbosity
Looks fine, Reviewed-by: Christoph Hellwig
Re: [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ
Looks like I won't get the major error status changes into 4.12, so let's go with these patches for now: Reviewed-by: Christoph Hellwig
Re: [PATCH 08/23] scsi: introduce a result field in struct scsi_request
On Wed, Apr 19, 2017 at 09:43:00PM -0400, Martin K. Petersen wrote: > Really wish we could just obliterate drivers/ide. Same here. It's been _the_ major pain point for doing block layer heavy lifting for the last 1 or two years.
Re: [PATCH 02/23] block: remove the blk_execute_rq return value
On Wed, Apr 19, 2017 at 09:07:45PM +, Bart Van Assche wrote: > > + blk_execute_rq(or->request->q, NULL, or->request, 0); > > + error = or->request ? -EIO : 0; > > Hello Christoph, > > Did you perhaps intend or->request->errors instead of or->request? Yes, thanks.
[PATCH-v1 22/22] lpfc revison 11.2.0.12
From: James Smart Update lpfc version to reflect this set of changes. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_version.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_version.h b/drivers/scsi/lpfc/lpfc_version.h index d4e95e2..1c26dc6 100644 --- a/drivers/scsi/lpfc/lpfc_version.h +++ b/drivers/scsi/lpfc/lpfc_version.h @@ -20,7 +20,7 @@ * included with this package. * ***/ -#define LPFC_DRIVER_VERSION "11.2.0.10" +#define LPFC_DRIVER_VERSION "11.2.0.12" #define LPFC_DRIVER_NAME "lpfc" /* Used for SLI 2/3 */ -- 2.1.0
[PATCH-v1 21/22] Fix Express lane queue creation.
From: James Smart The older sli4 adapters only supported the 64 byte WQE entry size. The new adapter (fw) support both 64 and 128 byte WQE entry sizies. The Express lane WQ was not being created with the 128 byte WQE sizes when it was supported. Not having the right WQE size created for the express lane work queue caused the the firmware to overwrite the lun indentifier in the FCP header. This patch correctly creates the express lane work queue with the supported size. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_init.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index e084a66..7d3b199 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -12040,6 +12040,7 @@ int lpfc_fof_queue_create(struct lpfc_hba *phba) { struct lpfc_queue *qdesc; + uint32_t wqesize; /* Create FOF EQ */ qdesc = lpfc_sli4_queue_alloc(phba, phba->sli4_hba.eq_esize, @@ -12060,8 +12061,11 @@ lpfc_fof_queue_create(struct lpfc_hba *phba) phba->sli4_hba.oas_cq = qdesc; /* Create OAS WQ */ - qdesc = lpfc_sli4_queue_alloc(phba, phba->sli4_hba.wq_esize, + wqesize = (phba->fcp_embed_io) ? + LPFC_WQE128_SIZE : phba->sli4_hba.wq_esize; + qdesc = lpfc_sli4_queue_alloc(phba, wqesize, phba->sli4_hba.wq_ecount); + if (!qdesc) goto out_error; -- 2.1.0
[PATCH-v1 14/22] Remove hba lock from NVMET issue WQE.
From: James Smart Unnecessary lock is taken. ring lock should be sufficient to protect the work queue submission. This was noticed when doing performance testing. The hbalock is not needed to issue io to the nvme work queue. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_nvmet.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c index c8a4941..5a994a8 100644 --- a/drivers/scsi/lpfc/lpfc_nvmet.c +++ b/drivers/scsi/lpfc/lpfc_nvmet.c @@ -517,7 +517,6 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport, container_of(rsp, struct lpfc_nvmet_rcv_ctx, ctx.fcp_req); struct lpfc_hba *phba = ctxp->phba; struct lpfc_iocbq *nvmewqeq; - unsigned long iflags; int rc, id; #ifdef CONFIG_SCSI_LPFC_DEBUG_FS @@ -571,10 +570,7 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport, lpfc_nvmeio_data(phba, "NVMET FCP CMND: xri x%x op x%x len x%x\n", ctxp->oxid, rsp->op, rsp->rsplen); - /* For now we take hbalock */ - spin_lock_irqsave(&phba->hbalock, iflags); rc = lpfc_sli4_issue_wqe(phba, LPFC_FCP_RING, nvmewqeq); - spin_unlock_irqrestore(&phba->hbalock, iflags); if (rc == WQE_SUCCESS) { ctxp->flag |= LPFC_NVMET_IO_INP; #ifdef CONFIG_SCSI_LPFC_DEBUG_FS -- 2.1.0
[PATCH-v1 08/22] Remove NULL ptr check before kfree.
From: James Smart The check for NULL ptr is not necessary, kfree will check it. Removing NULL ptr check. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_debugfs.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_debugfs.c b/drivers/scsi/lpfc/lpfc_debugfs.c index 913eed8..76857e2 100644 --- a/drivers/scsi/lpfc/lpfc_debugfs.c +++ b/drivers/scsi/lpfc/lpfc_debugfs.c @@ -5700,10 +5700,8 @@ lpfc_debugfs_terminate(struct lpfc_vport *vport) #ifdef CONFIG_SCSI_LPFC_DEBUG_FS struct lpfc_hba *phba = vport->phba; - if (vport->disc_trc) { - kfree(vport->disc_trc); - vport->disc_trc = NULL; - } + kfree(vport->disc_trc); + vport->disc_trc = NULL; debugfs_remove(vport->debug_disc_trc); /* discovery_trace */ vport->debug_disc_trc = NULL; @@ -5770,10 +5768,8 @@ lpfc_debugfs_terminate(struct lpfc_vport *vport) debugfs_remove(phba->debug_readRef); /* readRef */ phba->debug_readRef = NULL; - if (phba->slow_ring_trc) { - kfree(phba->slow_ring_trc); - phba->slow_ring_trc = NULL; - } + kfree(phba->slow_ring_trc); + phba->slow_ring_trc = NULL; /* slow_ring_trace */ debugfs_remove(phba->debug_slow_ring_trc); -- 2.1.0
[PATCH-v1 04/22] Fix log message in completion path.
From: James Smart In the lpfc_nvme_io_cmd_wqe_cmpl routine the driver was printing two pointers and the DID for the rport whenever an IO completed on a now that had transitioned to a non active state. There is no need to print the node pointer address for a node that is not active the DID should be enough to debug. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_nvme.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c index 2a87428..bc6946f 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.c +++ b/drivers/scsi/lpfc/lpfc_nvme.c @@ -787,9 +787,8 @@ lpfc_nvme_io_cmd_wqe_cmpl(struct lpfc_hba *phba, struct lpfc_iocbq *pwqeIn, ndlp = rport->ndlp; if (!ndlp || !NLP_CHK_NODE_ACT(ndlp)) { lpfc_printf_vlog(vport, KERN_ERR, LOG_NODE | LOG_NVME_IOERR, -"6061 rport %p, ndlp %p, DID x%06x ndlp " -"not ready.\n", -rport, ndlp, rport->remoteport->port_id); +"6061 rport %p, DID x%06x node not ready.\n", +rport, rport->remoteport->port_id); ndlp = lpfc_findnode_did(vport, rport->remoteport->port_id); if (!ndlp) { -- 2.1.0
[PATCH-v1 07/22] Remove unused defines for NVME PostBuf.
From: James Smart These defines for the posting of buffers for nvmet target were not used. Removing the unused defines. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_nvme.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_nvme.h b/drivers/scsi/lpfc/lpfc_nvme.h index 6277796..2582f46 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.h +++ b/drivers/scsi/lpfc/lpfc_nvme.h @@ -22,9 +22,6 @@ / #define LPFC_NVME_DEFAULT_SEGS (64 + 1)/* 256K IOs */ -#define LPFC_NVMET_MIN_POSTBUF 16 -#define LPFC_NVMET_DEFAULT_POSTBUF 1024 -#define LPFC_NVMET_MAX_POSTBUF 4096 #define LPFC_NVME_WQSIZE 256 #define LPFC_NVME_ERSP_LEN 0x20 -- 2.1.0
[PATCH-v1 16/22] Fix crash after issuing lip reset
From: James Smart When RPI is not available, driver sends WQE with invalid RPI value and rejected by HBA. lpfc :82:00.3: 1:3154 BLS ABORT RSP failed, data: x3/xa0320008 and lpfc :2753 PLOGI failure DID:FA Status:x3/xa0240008 In this case, driver accesses rpi_ids array out of bounds. Fix: Check return value of lpfc_sli4_alloc_rpi(). Do not allocate lpfc_nodelist entry if RPI is not available. When RPI is not available, we will get discovery timeouts and command drops for some of the vports as seen below. lpfc :0273 Unexpected discovery timeout, vport State x0 lpfc :0230 Unexpected timeout, hba link state x5 lpfc :0111 Dropping received ELS cmd Data: x0 xc90c55 x0 Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_bsg.c | 4 drivers/scsi/lpfc/lpfc_crtn.h| 2 +- drivers/scsi/lpfc/lpfc_els.c | 35 drivers/scsi/lpfc/lpfc_hbadisc.c | 47 +- drivers/scsi/lpfc/lpfc_init.c| 49 +--- drivers/scsi/lpfc/lpfc_sli.c | 3 +-- drivers/scsi/lpfc/lpfc_vport.c | 3 +-- 7 files changed, 79 insertions(+), 64 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_bsg.c b/drivers/scsi/lpfc/lpfc_bsg.c index 18157d2..a1686c2 100644 --- a/drivers/scsi/lpfc/lpfc_bsg.c +++ b/drivers/scsi/lpfc/lpfc_bsg.c @@ -2486,6 +2486,10 @@ static int lpfcdiag_loop_self_reg(struct lpfc_hba *phba, uint16_t *rpi) mbox, *rpi); else { *rpi = lpfc_sli4_alloc_rpi(phba); + if (*rpi == LPFC_RPI_ALLOC_ERROR) { + mempool_free(mbox, phba->mbox_mem_pool); + return -EBUSY; + } status = lpfc_reg_rpi(phba, phba->pport->vpi, phba->pport->fc_myDID, (uint8_t *)&phba->pport->fc_sparam, diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h index 54e6ac4..d859aff 100644 --- a/drivers/scsi/lpfc/lpfc_crtn.h +++ b/drivers/scsi/lpfc/lpfc_crtn.h @@ -99,7 +99,7 @@ void lpfc_issue_reg_vpi(struct lpfc_hba *, struct lpfc_vport *); int lpfc_check_sli_ndlp(struct lpfc_hba *, struct lpfc_sli_ring *, struct lpfc_iocbq *, struct lpfc_nodelist *); -void lpfc_nlp_init(struct lpfc_vport *, struct lpfc_nodelist *, uint32_t); +struct lpfc_nodelist *lpfc_nlp_init(struct lpfc_vport *vport, uint32_t did); struct lpfc_nodelist *lpfc_nlp_get(struct lpfc_nodelist *); int lpfc_nlp_put(struct lpfc_nodelist *); int lpfc_nlp_not_used(struct lpfc_nodelist *ndlp); diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c index d6e0f58..ffdcac0 100644 --- a/drivers/scsi/lpfc/lpfc_els.c +++ b/drivers/scsi/lpfc/lpfc_els.c @@ -895,10 +895,9 @@ lpfc_cmpl_els_flogi_nport(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp, * Cannot find existing Fabric ndlp, so allocate a * new one */ - ndlp = mempool_alloc(phba->nlp_mem_pool, GFP_KERNEL); + ndlp = lpfc_nlp_init(vport, PT2PT_RemoteID); if (!ndlp) goto fail; - lpfc_nlp_init(vport, ndlp, PT2PT_RemoteID); } else if (!NLP_CHK_NODE_ACT(ndlp)) { ndlp = lpfc_enable_node(vport, ndlp, NLP_STE_UNUSED_NODE); @@ -1364,7 +1363,6 @@ lpfc_els_abort_flogi(struct lpfc_hba *phba) int lpfc_initial_flogi(struct lpfc_vport *vport) { - struct lpfc_hba *phba = vport->phba; struct lpfc_nodelist *ndlp; vport->port_state = LPFC_FLOGI; @@ -1374,10 +1372,9 @@ lpfc_initial_flogi(struct lpfc_vport *vport) ndlp = lpfc_findnode_did(vport, Fabric_DID); if (!ndlp) { /* Cannot find existing Fabric ndlp, so allocate a new one */ - ndlp = mempool_alloc(phba->nlp_mem_pool, GFP_KERNEL); + ndlp = lpfc_nlp_init(vport, Fabric_DID); if (!ndlp) return 0; - lpfc_nlp_init(vport, ndlp, Fabric_DID); /* Set the node type */ ndlp->nlp_type |= NLP_FABRIC; /* Put ndlp onto node list */ @@ -1418,17 +1415,15 @@ lpfc_initial_flogi(struct lpfc_vport *vport) int lpfc_initial_fdisc(struct lpfc_vport *vport) { - struct lpfc_hba *phba = vport->phba; struct lpfc_nodelist *ndlp; /* First look for the Fabric ndlp */ ndlp = lpfc_findnode_did(vport, Fabric_DID); if (!ndlp) { /* Cannot find existing Fabric ndlp, so allocate a new one */ - ndlp = mempool_alloc(phba->nlp_mem_pool, GFP_KERNEL); + ndlp = lpfc_nlp_init(vport, Fabric_DID); if (!ndlp) return 0; - lpfc_nlp_in
[PATCH-v1 06/22] Fix spelling in comments.
From: James Smart Comment should have said Repost. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_sli.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 1c9fa45..7087c55 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -6338,7 +6338,7 @@ lpfc_sli4_get_allocated_extnts(struct lpfc_hba *phba, uint16_t type, } /** - * lpfc_sli4_repost_sgl_list - Repsot the buffers sgl pages as block + * lpfc_sli4_repost_sgl_list - Repost the buffers sgl pages as block * @phba: pointer to lpfc hba data structure. * @pring: Pointer to driver SLI ring object. * @sgl_list: linked link of sgl buffers to post -- 2.1.0
[PATCH-v1 01/22] Standardize nvme SGL segment count
From: James Smart Standardize default SGL segment count for nvme target and initiator The driver needs to make them the same for clarity. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_nvme.h | 4 +--- drivers/scsi/lpfc/lpfc_nvmet.c | 2 +- drivers/scsi/lpfc/lpfc_nvmet.h | 4 +--- 3 files changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_nvme.h b/drivers/scsi/lpfc/lpfc_nvme.h index 1347deb..6277796 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.h +++ b/drivers/scsi/lpfc/lpfc_nvme.h @@ -21,9 +21,7 @@ * included with this package. * / -#define LPFC_NVME_MIN_SEGS 16 -#define LPFC_NVME_DEFAULT_SEGS 66 /* 256K IOs - 64 + 2 */ -#define LPFC_NVME_MAX_SEGS 510 +#define LPFC_NVME_DEFAULT_SEGS (64 + 1)/* 256K IOs */ #define LPFC_NVMET_MIN_POSTBUF 16 #define LPFC_NVMET_DEFAULT_POSTBUF 1024 #define LPFC_NVMET_MAX_POSTBUF 4096 diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c index d488c33..c8a4941 100644 --- a/drivers/scsi/lpfc/lpfc_nvmet.c +++ b/drivers/scsi/lpfc/lpfc_nvmet.c @@ -709,7 +709,7 @@ lpfc_nvmet_create_targetport(struct lpfc_hba *phba) pinfo.port_id = vport->fc_myDID; lpfc_tgttemplate.max_hw_queues = phba->cfg_nvme_io_channel; - lpfc_tgttemplate.max_sgl_segments = phba->cfg_sg_seg_cnt; + lpfc_tgttemplate.max_sgl_segments = phba->cfg_sg_seg_cnt + 1; lpfc_tgttemplate.target_features = NVMET_FCTGTFEAT_READDATA_RSP | NVMET_FCTGTFEAT_NEEDS_CMD_CPUSCHED | NVMET_FCTGTFEAT_CMD_IN_ISR | diff --git a/drivers/scsi/lpfc/lpfc_nvmet.h b/drivers/scsi/lpfc/lpfc_nvmet.h index 02735fc..d8bac4c 100644 --- a/drivers/scsi/lpfc/lpfc_nvmet.h +++ b/drivers/scsi/lpfc/lpfc_nvmet.h @@ -21,9 +21,7 @@ * included with this package. * / -#define LPFC_NVMET_MIN_SEGS16 -#define LPFC_NVMET_DEFAULT_SEGS64 /* 256K IOs */ -#define LPFC_NVMET_MAX_SEGS510 +#define LPFC_NVMET_DEFAULT_SEGS(64 + 1)/* 256K IOs */ #define LPFC_NVMET_SUCCESS_LEN 12 /* Used for NVME Target */ -- 2.1.0
[PATCH-v1 12/22] Fix driver usage of 128B WQEs when WQ_CREATE is V1.
From: James Smart There are two versions of a structure for queue creation and setup that the driver shares with FW. The driver was only treating as version 0. Verify WQ_CREATE with 128B WQEs in V0 and V1. Code review of another bug showed the driver passing 128B WQEs and 8 pages in WQ CREATE and V0. Code inspection/instrumentation showed that the driver uses V0 in WQ_CREATE and if the caller passes queue->entry_size 128B, the driver sets the hdr_version to V1 so all is good. When I tested the V1 WQ_CREATE, the mailbox failed causing the driver to unload. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_sli.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index d606944..1519fdf 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -14741,6 +14741,9 @@ lpfc_wq_create(struct lpfc_hba *phba, struct lpfc_queue *wq, case LPFC_Q_CREATE_VERSION_1: bf_set(lpfc_mbx_wq_create_wqe_count, &wq_create->u.request_1, wq->entry_count); + bf_set(lpfc_mbox_hdr_version, &shdr->request, + LPFC_Q_CREATE_VERSION_1); + switch (wq->entry_size) { default: case 64: -- 2.1.0
[PATCH-v1 20/22] Update ABORT processing for NVMET.
From: James Smart The driver with nvme had this routine stubbed. Right now XRI_ABORTED_CQE is not handled and the FC NVMET Transport has a new API for the driver. Missing code path, new NVME abort API Update ABORT processing for NVMET There are 3 new FC NVMET Transport API/ template routines for NVMET: lpfc_nvmet_xmt_fcp_release This NVMET template callback routine called to release context associated with an IO This routine is ALWAYS called last, even if the IO was aborted or completed in error. lpfc_nvmet_xmt_fcp_abort This NVMET template callback routine called to abort an exchange that has an IO in progress nvmet_fc_rcv_fcp_req When the lpfc driver receives an ABTS, this NVME FC transport layer callback routine is called. For this case there are 2 paths thru the driver: the driver either has an outstanding exchange / context for the XRI to be aborted or not. If not, a BA_RJT is issued otherwise a BA_ACC NVMET Driver abort paths: There are 2 paths for aborting an IO. The first one is we receive an IO and decide not to process it because of lack of resources. An unsolicated ABTS is immediately sent back to the initiator as a response. lpfc_nvmet_unsol_fcp_buffer lpfc_nvmet_unsol_issue_abort (XMIT_SEQUENCE_WQE) The second one is we sent the IO up to the NVMET transport layer to process, and for some reason the NVME Transport layer decided to abort the IO before it completes all its phases. For this case there are 2 paths thru the driver: the driver either has an outstanding TSEND/TRECEIVE/TRSP WQE or no outstanding WQEs are present for the exchange / context. lpfc_nvmet_xmt_fcp_abort if (LPFC_NVMET_IO_INP) lpfc_nvmet_sol_fcp_issue_abort (ABORT_WQE) lpfc_nvmet_sol_fcp_abort_cmp else lpfc_nvmet_unsol_fcp_issue_abort lpfc_nvmet_unsol_issue_abort (XMIT_SEQUENCE_WQE) lpfc_nvmet_unsol_fcp_abort_cmp Context flags: LPFC_NVMET_IOP - his flag signifies an IO is in progress on the exchange. LPFC_NVMET_XBUSY - this flag indicates the IO completed but the firmware is still busy with the corresponding exchange. The exchange should not be reused until after a XRI_ABORTED_CQE is received for that exchange. LPFC_NVMET_ABORT_OP - this flag signifies an ABORT_WQE was issued on the exchange. LPFC_NVMET_CTX_RLS - this flag signifies a context free was requested, but we are deferring it due to an XBUSY or ABORT in progress. A ctxlock is added to the context structure that is used whenever these flags are set/read within the context of an IO. The LPFC_NVMET_CTX_RLS flag is only set in the defer_relase routine when the transport has resolved all IO associated with the buffer. The flag is cleared when the CTX is associated with a new IO. An exchange can has both an LPFC_NVMET_XBUSY and a LPFC_NVMET_ABORT_OP condition active simultaneously. Both conditions must complete before the exchange is freed. When the abort callback (lpfc_nvmet_xmt_fcp_abort) is envoked: If there is an outstanding IO, the driver will issue an ABORT_WQE. This should result in 3 completions for the exchange: 1) IO cmpl with XB bit set 2) Abort WQE cmpl 3) XRI_ABORTED_CQE cmpl For this scenerio, after completion #1, the NVMET Transport IO rsp callback is called. After completion #2, no action is taken with respect to the exchange / context. After completion #3, the exchange context is free for re-use on another IO. If there is no outstanding activity on the exchange, the driver will send a ABTS to the Initiator. Upon completion of this WQE, the exchange / context is freed for re-use on another IO. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_crtn.h| 7 + drivers/scsi/lpfc/lpfc_debugfs.c | 55 -- drivers/scsi/lpfc/lpfc_hw4.h | 3 + drivers/scsi/lpfc/lpfc_init.c| 52 +++--- drivers/scsi/lpfc/lpfc_mbox.c| 7 +- drivers/scsi/lpfc/lpfc_nvme.c| 45 +++-- drivers/scsi/lpfc/lpfc_nvmet.c | 349 +++ drivers/scsi/lpfc/lpfc_nvmet.h | 10 +- drivers/scsi/lpfc/lpfc_sli.c | 7 +- drivers/scsi/lpfc/lpfc_sli4.h| 2 +- 10 files changed, 412 insertions(+), 125 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h index d859aff..24da922 100644 --- a/drivers/scsi/lpfc/lpfc_crtn.h +++ b/drivers/scsi/lpfc/lpfc_crtn.h @@ -24,6 +24,7 @@ typedef int (*node_filter)(struct lpfc_nodelist *, void *); struct fc_rport; struct fc_frame_header; +struct lpfc_nvmet_rcv_ctx; void lpfc_down_link(struct lpfc_hba *, LPFC_MBOXQ_t *); void lpfc_sli_read_link_ste(struct lpfc_hba *); void lpfc_dump_mem(struct lpfc_hba *, LPFC_MBOXQ_t *, uint16_t, uint16_t); @@ -245,6 +246,10 @@ struct hbq_dmabuf *lpfc_sli4_rb_alloc(struct lpfc_hba *); void lpfc_sli4_rb_free(struct lpfc_hba *, struct hbq_dmabuf *); struct rqb_dmabuf *lpfc_sli4_nvmet_alloc(struct lpfc_hba *phba); void lpfc_sli4_nvmet_free(struct lpfc_hba *phba, struct rqb_
[PATCH-v1 17/22] Fix max_sgl_segments settings for NVME / NVMET
From: James Smart Cannot set NVME segment counts to a large number The existing module parameter lpfc_sg_seg_cnt is used for both SCSI and NVME. Limit the module parameter lpfc_sg_seg_cnt to 128 with the default being 64 for both NVME and NVMET, assuming NVME is enabled in the driver for that port. The driver will set max_sgl_segments in the NVME/NVMET template to lpfc_sg_seg_cnt + 1. Looks good, Reviewed-by: Johannes Thumshirn Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc.h | 3 ++- drivers/scsi/lpfc/lpfc_nvme.c | 18 ++ drivers/scsi/lpfc/lpfc_nvmet.c | 19 +++ 3 files changed, 31 insertions(+), 9 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 257bbdd..0fc1450 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -56,7 +56,7 @@ struct lpfc_sli2_slim; #define LPFC_MAX_SG_SEG_CNT4096/* sg element count per scsi cmnd */ #define LPFC_MAX_SGL_SEG_CNT 512 /* SGL element count per scsi cmnd */ #define LPFC_MAX_BPL_SEG_CNT 4096/* BPL element count per scsi cmnd */ -#define LPFC_MIN_NVME_SEG_CNT 254 +#define LPFC_MAX_NVME_SEG_CNT 128 /* max SGL element cnt per NVME cmnd */ #define LPFC_MAX_SGE_SIZE 0x8000 /* Maximum data allowed in a SGE */ #define LPFC_IOCB_LIST_CNT 2250/* list of IOCBs for fast-path usage. */ @@ -781,6 +781,7 @@ struct lpfc_hba { uint32_t cfg_nvmet_fb_size; uint32_t cfg_total_seg_cnt; uint32_t cfg_sg_seg_cnt; + uint32_t cfg_nvme_seg_cnt; uint32_t cfg_sg_dma_buf_size; uint64_t cfg_soft_wwnn; uint64_t cfg_soft_wwpn; diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c index efe1242..1a83ade 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.c +++ b/drivers/scsi/lpfc/lpfc_nvme.c @@ -1113,12 +1113,12 @@ lpfc_nvme_prep_io_dma(struct lpfc_vport *vport, first_data_sgl = sgl; lpfc_ncmd->seg_cnt = nCmd->sg_cnt; - if (lpfc_ncmd->seg_cnt > phba->cfg_sg_seg_cnt) { + if (lpfc_ncmd->seg_cnt > phba->cfg_nvme_seg_cnt) { lpfc_printf_log(phba, KERN_ERR, LOG_NVME_IOERR, "6058 Too many sg segments from " "NVME Transport. Max %d, " "nvmeIO sg_cnt %d\n", - phba->cfg_sg_seg_cnt, + phba->cfg_nvme_seg_cnt, lpfc_ncmd->seg_cnt); lpfc_ncmd->seg_cnt = 0; return 1; @@ -2157,8 +2157,18 @@ lpfc_nvme_create_localport(struct lpfc_vport *vport) nfcp_info.node_name = wwn_to_u64(vport->fc_nodename.u.wwn); nfcp_info.port_name = wwn_to_u64(vport->fc_portname.u.wwn); - /* For now need + 1 to get around NVME transport logic */ - lpfc_nvme_template.max_sgl_segments = phba->cfg_sg_seg_cnt + 1; + /* Limit to LPFC_MAX_NVME_SEG_CNT. +* For now need + 1 to get around NVME transport logic. +*/ + if (phba->cfg_sg_seg_cnt > LPFC_MAX_NVME_SEG_CNT) { + lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME | LOG_INIT, +"6300 Reducing sg segment cnt to %d\n", +LPFC_MAX_NVME_SEG_CNT); + phba->cfg_nvme_seg_cnt = LPFC_MAX_NVME_SEG_CNT; + } else { + phba->cfg_nvme_seg_cnt = phba->cfg_sg_seg_cnt; + } + lpfc_nvme_template.max_sgl_segments = phba->cfg_nvme_seg_cnt + 1; lpfc_nvme_template.max_hw_queues = phba->cfg_nvme_io_channel; /* localport is allocated from the stack, but the registration diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c index 8348bb5..d0dda42 100644 --- a/drivers/scsi/lpfc/lpfc_nvmet.c +++ b/drivers/scsi/lpfc/lpfc_nvmet.c @@ -704,8 +704,19 @@ lpfc_nvmet_create_targetport(struct lpfc_hba *phba) pinfo.port_name = wwn_to_u64(vport->fc_portname.u.wwn); pinfo.port_id = vport->fc_myDID; + /* Limit to LPFC_MAX_NVME_SEG_CNT. +* For now need + 1 to get around NVME transport logic. +*/ + if (phba->cfg_sg_seg_cnt > LPFC_MAX_NVME_SEG_CNT) { + lpfc_printf_log(phba, KERN_INFO, LOG_NVME | LOG_INIT, + "6400 Reducing sg segment cnt to %d\n", + LPFC_MAX_NVME_SEG_CNT); + phba->cfg_nvme_seg_cnt = LPFC_MAX_NVME_SEG_CNT; + } else { + phba->cfg_nvme_seg_cnt = phba->cfg_sg_seg_cnt; + } + lpfc_tgttemplate.max_sgl_segments = phba->cfg_nvme_seg_cnt + 1; lpfc_tgttemplate.max_hw_queues = phba->cfg_nvme_io_channel; - lpfc_tgttemplate.max_sgl_segments = phba->cfg_sg_seg_cnt + 1; lpfc_tgttemplate.target_features = NVMET_FCTGTFEAT_READDATA_RSP |
[PATCH-v1 15/22] Fix driver load issues when MRQ=8
From: James Smart The symptom is that the driver will fail to login to the fabric. The reason is because it is out of iocb resources. There is a one to one relationship between MRQs (receive buffers for NVMET-FC) and iocbs and the default number of IOCBs was not accounting for the number of MRQs that were being created. This fix aligns the number of MRQ resources with the total resources so that it can handle fabric events when needed. Also the initialization of ctxlock to be on FCP commands, NOT LS commands. And modified log messages so that the log output can be correlated with the analyzer trace. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_init.c | 12 drivers/scsi/lpfc/lpfc_nvme.c | 23 --- drivers/scsi/lpfc/lpfc_nvmet.c | 6 +++--- 3 files changed, 23 insertions(+), 18 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index cca7f81..88977b8 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -11061,7 +11061,7 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) struct lpfc_hba *phba; struct lpfc_vport *vport = NULL; struct Scsi_Host *shost = NULL; - int error; + int error, cnt; uint32_t cfg_mode, intr_mode; /* Allocate memory for HBA structure */ @@ -11095,11 +11095,15 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const struct pci_device_id *pid) goto out_unset_pci_mem_s4; } + cnt = phba->cfg_iocb_cnt * 1024; + if (phba->nvmet_support) + cnt += (phba->cfg_nvmet_mrq_post * phba->cfg_nvmet_mrq); + /* Initialize and populate the iocb list per host */ lpfc_printf_log(phba, KERN_INFO, LOG_INIT, - "2821 initialize iocb list %d.\n", - phba->cfg_iocb_cnt*1024); - error = lpfc_init_iocb_list(phba, phba->cfg_iocb_cnt*1024); + "2821 initialize iocb list %d total %d\n", + phba->cfg_iocb_cnt, cnt); + error = lpfc_init_iocb_list(phba, cnt); if (error) { lpfc_printf_log(phba, KERN_ERR, LOG_INIT, diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c index f23496b..efe1242 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.c +++ b/drivers/scsi/lpfc/lpfc_nvme.c @@ -1494,6 +1494,7 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port *pnvme_lport, "io buffer. Skipping abort req.\n"); return; } + nvmereq_wqe = &lpfc_nbuf->cur_iocbq; /* * The lpfc_nbuf and the mapped nvme_fcreq in the driver's @@ -1507,20 +1508,19 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port *pnvme_lport, lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME, "6143 NVME req mismatch: " "lpfc_nbuf %p nvmeCmd %p, " -"pnvme_fcreq %p. Skipping Abort\n", +"pnvme_fcreq %p. Skipping Abort xri x%x\n", lpfc_nbuf, lpfc_nbuf->nvmeCmd, -pnvme_fcreq); +pnvme_fcreq, nvmereq_wqe->sli4_xritag); return; } /* Don't abort IOs no longer on the pending queue. */ - nvmereq_wqe = &lpfc_nbuf->cur_iocbq; if (!(nvmereq_wqe->iocb_flag & LPFC_IO_ON_TXCMPLQ)) { spin_unlock_irqrestore(&phba->hbalock, flags); lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME, "6142 NVME IO req %p not queued - skipping " -"abort req\n", -pnvme_fcreq); +"abort req xri x%x\n", +pnvme_fcreq, nvmereq_wqe->sli4_xritag); return; } @@ -1534,8 +1534,9 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port *pnvme_lport, lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME, "6144 Outstanding NVME I/O Abort Request " "still pending on nvme_fcreq %p, " -"lpfc_ncmd %p\n", -pnvme_fcreq, lpfc_nbuf); +"lpfc_ncmd %p xri x%x\n", +pnvme_fcreq, lpfc_nbuf, +nvmereq_wqe->sli4_xritag); return; } @@ -1544,8 +1545,8 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port *pnvme_lport, spin_unlock_irqrestore(&phba->hbalock, flags); lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME, "6136 No available abort wqes. Skipping " -"Abts req for nvme_fcreq %p.\n", -pnvme_f
[PATCH-v1 13/22] Fix nvme initiator handling when not enabled.
From: James Smart Fix nvme initiator handline when CONFIG_LPFC_NVME_INITIATOR is not enabled. With update nvme upstream driver sources, loading the driver with nvme enabled resulting in this Oops. BUG: unable to handle kernel NULL pointer dereference at 0018 IP: lpfc_nvme_update_localport+0x23/0xd0 [lpfc] PGD 0 Oops: [#1] SMP CPU: 0 PID: 10256 Comm: lpfc_worker_0 Tainted Hardware name: ... task: 881028191c40 task.stack: 880ffdf0 RIP: 0010:lpfc_nvme_update_localport+0x23/0xd0 [lpfc] RSP: 0018:880ffdf03c20 EFLAGS: 00010202 Cause: As the initiator driver completes discovery at different stages, it call lpfc_nvme_update_localport to hint that the DID and role may have changed. In the implementation of lpfc_nvme_update_localport, the driver was not validating the localport or the lport during the execution of the update_localport routine. With the recent upstream additions to the driver, the create_localport routine didn't run and so the localport was NULL causing the page-fault Oops. Fix: Add the CONFIG_LPFC_NVME_INITIATOR preprocessor inclusions to lpfc_nvme_update_localport to turn off all routine processing when the running kernel does not have NVME configured. Add NULL pointer checks on the localport and lport in lpfc_nvme_update_localport and dump messages if they are NULL and just exit. Also one alingment issue fixed. Repalces the ifdef with the IS_ENABLED macro. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_attr.c | 2 +- drivers/scsi/lpfc/lpfc_nvme.c | 15 +-- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c index 22819af..4b32d02 100644 --- a/drivers/scsi/lpfc/lpfc_attr.c +++ b/drivers/scsi/lpfc/lpfc_attr.c @@ -3335,7 +3335,7 @@ LPFC_ATTR_R(enable_fc4_type, LPFC_ENABLE_FCP, * percentage will go to NVME. */ LPFC_ATTR_R(xri_split, 50, 10, 90, -"Division of XRI resources between SCSI and NVME"); + "Division of XRI resources between SCSI and NVME"); /* # lpfc_log_verbose: Only turn this flag on if you are willing to risk being diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c index bc6946f..f23496b 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.c +++ b/drivers/scsi/lpfc/lpfc_nvme.c @@ -2263,12 +2263,23 @@ lpfc_nvme_destroy_localport(struct lpfc_vport *vport) void lpfc_nvme_update_localport(struct lpfc_vport *vport) { +#if (IS_ENABLED(CONFIG_NVME_FC)) struct nvme_fc_local_port *localport; struct lpfc_nvme_lport *lport; localport = vport->localport; + if (!localport) { + lpfc_printf_vlog(vport, KERN_WARNING, LOG_NVME, +"6710 Update NVME fail. No localport\n"); + return; + } lport = (struct lpfc_nvme_lport *)localport->private; - + if (!lport) { + lpfc_printf_vlog(vport, KERN_WARNING, LOG_NVME, +"6171 Update NVME fail. localP %p, No lport\n", +localport); + return; + } lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME, "6012 Update NVME lport %p did x%x\n", localport, vport->fc_myDID); @@ -2282,7 +2293,7 @@ lpfc_nvme_update_localport(struct lpfc_vport *vport) lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC, "6030 bound lport %p to DID x%06x\n", lport, localport->port_id); - +#endif } int -- 2.1.0
[PATCH-v1 11/22] Fix driver unload/reload operation.
From: James Smart There are couple of different load/unload issues fixed with this patch. One of the issues was reported by Junichi Nomura, a patch was submitted by Johannes Thumsrhirn which did fix one of the problems but the fix in this patch separates the pring free from the queue free and does not set the parameter passed in to NULL. issues: (1) driver could not be unloaded and reloaded without some Oops or Panic occurring. (2) The driver was panicking because of a corruption in the Memory Manager when the iocb list was getting allocated. Root cause for the memory corruption was a double free of the Work Queue ring pointer memory - Freed once in the lpfc_sli4_queue_free when the CQ was destroyed and again in lpfc_sli4_queue_free when the WQ was destroyed. The pring free and the queue free were separated, the pring free was moved to the wq destroy routine because it a better fit logically to delete the ring with the wq. The checkpatch flagged several alignmenet issues that were also corrected with this patch. The mboxq was never initialed correctly before it was used by the driver this patch corrects that issue. Reported-by: Junichi Nomura Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_init.c | 46 +++ drivers/scsi/lpfc/lpfc_sli.c | 7 ++- 2 files changed, 31 insertions(+), 22 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index b56da01..cca7f81 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -5815,6 +5815,12 @@ lpfc_sli4_driver_resource_setup(struct lpfc_hba *phba) INIT_LIST_HEAD(&phba->sli4_hba.lpfc_vfi_blk_list); INIT_LIST_HEAD(&phba->lpfc_vpi_blk_list); + /* Initialize mboxq lists. If the early init routines fail +* these lists need to be correctly initialized. +*/ + INIT_LIST_HEAD(&phba->sli.mboxq); + INIT_LIST_HEAD(&phba->sli.mboxq_cmpl); + /* initialize optic_state to 0xFF */ phba->sli4_hba.lnk_info.optic_state = 0xff; @@ -5880,6 +5886,7 @@ lpfc_sli4_driver_resource_setup(struct lpfc_hba *phba) "READ_NV, mbxStatus x%x\n", bf_get(lpfc_mqe_command, &mboxq->u.mqe), bf_get(lpfc_mqe_status, &mboxq->u.mqe)); + mempool_free(mboxq, phba->mbox_mem_pool); rc = -EIO; goto out_free_bsmbx; } @@ -7805,7 +7812,7 @@ lpfc_alloc_fcp_wq_cq(struct lpfc_hba *phba, int wqidx) /* Create Fast Path FCP WQs */ wqesize = (phba->fcp_embed_io) ? - LPFC_WQE128_SIZE : phba->sli4_hba.wq_esize; + LPFC_WQE128_SIZE : phba->sli4_hba.wq_esize; qdesc = lpfc_sli4_queue_alloc(phba, wqesize, phba->sli4_hba.wq_ecount); if (!qdesc) { lpfc_printf_log(phba, KERN_ERR, LOG_INIT, @@ -7836,7 +7843,7 @@ int lpfc_sli4_queue_create(struct lpfc_hba *phba) { struct lpfc_queue *qdesc; - int idx, io_channel, max; + int idx, io_channel; /* * Create HBA Record arrays. @@ -7997,15 +8004,6 @@ lpfc_sli4_queue_create(struct lpfc_hba *phba) if (lpfc_alloc_nvme_wq_cq(phba, idx)) goto out_error; - /* allocate MRQ CQs */ - max = phba->cfg_nvme_io_channel; - if (max < phba->cfg_nvmet_mrq) - max = phba->cfg_nvmet_mrq; - - for (idx = 0; idx < max; idx++) - if (lpfc_alloc_nvme_wq_cq(phba, idx)) - goto out_error; - if (phba->nvmet_support) { for (idx = 0; idx < phba->cfg_nvmet_mrq; idx++) { qdesc = lpfc_sli4_queue_alloc(phba, @@ -8227,11 +8225,11 @@ lpfc_sli4_queue_destroy(struct lpfc_hba *phba) /* Release FCP cqs */ lpfc_sli4_release_queues(&phba->sli4_hba.fcp_cq, - phba->cfg_fcp_io_channel); +phba->cfg_fcp_io_channel); /* Release FCP wqs */ lpfc_sli4_release_queues(&phba->sli4_hba.fcp_wq, - phba->cfg_fcp_io_channel); +phba->cfg_fcp_io_channel); /* Release FCP CQ mapping array */ lpfc_sli4_release_queue_map(&phba->sli4_hba.fcp_cq_map); @@ -8577,15 +8575,15 @@ lpfc_sli4_queue_setup(struct lpfc_hba *phba) lpfc_printf_log(phba, KERN_ERR, LOG_INIT, "0528 %s not allocated\n", phba->sli4_hba.mbx_cq ? - "Mailbox WQ" : "Mailbox CQ"); + "Mailbox WQ" : "Mailbox CQ"); rc = -ENOMEM; goto out_destroy; } rc = lpfc_create_wq_cq(phba, phba->sli4_hba.hba_eq[0
[PATCH-v1 05/22] Add debug messages for nvme/fcp resource allocation.
From: James Smart The xri resources are split into pools for NVME and FCP IO when NVME is enabled. There was not message in the log that identified this allocation. Added debug message to log XRI split. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_init.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 6cc561b..b56da01 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -3508,6 +3508,12 @@ lpfc_sli4_scsi_sgl_update(struct lpfc_hba *phba) spin_unlock(&phba->scsi_buf_list_put_lock); spin_unlock_irq(&phba->scsi_buf_list_get_lock); + lpfc_printf_log(phba, KERN_INFO, LOG_SLI, + "6060 Current allocated SCSI xri-sgl count:%d, " + "maximum SCSI xri count:%d (split:%d)\n", + phba->sli4_hba.scsi_xri_cnt, + phba->sli4_hba.scsi_xri_max, phba->cfg_xri_split); + if (phba->sli4_hba.scsi_xri_cnt > phba->sli4_hba.scsi_xri_max) { /* max scsi xri shrinked below the allocated scsi buffers */ scsi_xri_cnt = phba->sli4_hba.scsi_xri_cnt - -- 2.1.0
[PATCH-v1 09/22] Fix extra line print in rqpair debug print.
From: James Smart An extra blank line was being added the the rqpair printing. Remove the extra line feed. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_debugfs.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_debugfs.c b/drivers/scsi/lpfc/lpfc_debugfs.c index 76857e2..55a8d8f 100644 --- a/drivers/scsi/lpfc/lpfc_debugfs.c +++ b/drivers/scsi/lpfc/lpfc_debugfs.c @@ -3128,8 +3128,6 @@ __lpfc_idiag_print_rqpair(struct lpfc_queue *qp, struct lpfc_queue *datqp, datqp->queue_id, datqp->entry_count, datqp->entry_size, datqp->host_index, datqp->hba_index); - len += snprintf(pbuffer + len, LPFC_QUE_INFO_GET_BUF_SIZE - len, "\n"); - return len; } -- 2.1.0
[PATCH-v1 18/22] Add Fabric assigned WWN support.
From: James Smart Adding support for Fabric assigned WWPN and WWNN. Firmware sends first FLOGI to fabric with vendor version changes. On link up driver gets updated service parameter with FAWWN assigned port name. Driver sends 2nd FLOGI with updated fawwpn and modifies the vport->fc_portname in driver. Note: Soft wwpn will not be allowed when fawwpn is enabled. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc.h | 2 ++ drivers/scsi/lpfc/lpfc_attr.c| 8 drivers/scsi/lpfc/lpfc_els.c | 8 +--- drivers/scsi/lpfc/lpfc_hbadisc.c | 22 +- drivers/scsi/lpfc/lpfc_hw.h | 3 +++ drivers/scsi/lpfc/lpfc_hw4.h | 1 + drivers/scsi/lpfc/lpfc_init.c| 31 --- 7 files changed, 64 insertions(+), 11 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h index 0fc1450..6d7840b 100644 --- a/drivers/scsi/lpfc/lpfc.h +++ b/drivers/scsi/lpfc/lpfc.h @@ -474,6 +474,8 @@ struct lpfc_vport { unsigned long rcv_buffer_time_stamp; uint32_t vport_flag; #define STATIC_VPORT 1 +#define FAWWPN_SET 2 +#define FAWWPN_PARAM_CHG 4 uint16_t fdmi_num_disc; uint32_t fdmi_hba_mask; diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c index 4b32d02..e9d956d 100644 --- a/drivers/scsi/lpfc/lpfc_attr.c +++ b/drivers/scsi/lpfc/lpfc_attr.c @@ -2292,6 +2292,8 @@ lpfc_soft_wwn_enable_store(struct device *dev, struct device_attribute *attr, struct lpfc_vport *vport = (struct lpfc_vport *) shost->hostdata; struct lpfc_hba *phba = vport->phba; unsigned int cnt = count; + uint8_t vvvl = vport->fc_sparam.cmn.valid_vendor_ver_level; + u32 *fawwpn_key = (uint32_t *)&vport->fc_sparam.un.vendorVersion[0]; /* * We're doing a simple sanity check for soft_wwpn setting. @@ -2305,6 +2307,12 @@ lpfc_soft_wwn_enable_store(struct device *dev, struct device_attribute *attr, * here. The intent is to protect against the random user or * application that is just writing attributes. */ + if (vvvl == 1 && (cpu_to_be32(*fawwpn_key)) == FAPWWN_KEY_VENDOR) { + lpfc_printf_log(phba, KERN_ERR, LOG_INIT, +"0051 "LPFC_DRIVER_NAME" soft wwpn can not" +" be enabled: fawwpn is enabled\n"); + return -EINVAL; + } /* count may include a LF at end of string */ if (buf[cnt-1] == '\n') diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c index ffdcac0..35fc260 100644 --- a/drivers/scsi/lpfc/lpfc_els.c +++ b/drivers/scsi/lpfc/lpfc_els.c @@ -603,9 +603,11 @@ lpfc_check_clean_addr_bit(struct lpfc_vport *vport, memcmp(&vport->fabric_portname, &sp->portName, sizeof(struct lpfc_name)) || memcmp(&vport->fabric_nodename, &sp->nodeName, - sizeof(struct lpfc_name))) + sizeof(struct lpfc_name)) || + (vport->vport_flag & FAWWPN_PARAM_CHG)) { fabric_param_changed = 1; - + vport->vport_flag &= ~FAWWPN_PARAM_CHG; + } /* * Word 1 Bit 31 in common service parameter is overloaded. * Word 1 Bit 31 in FLOGI request is multiple NPort request @@ -8755,7 +8757,7 @@ lpfc_issue_els_fdisc(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp, pcmd += sizeof(uint32_t); /* Node Name */ pcmd += sizeof(uint32_t); /* Node Name */ memcpy(pcmd, &vport->fc_nodename, 8); - + memset(sp->un.vendorVersion, 0, sizeof(sp->un.vendorVersion)); lpfc_set_disctmo(vport); phba->fc_stat.elsXmitFDISC++; diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c index d313dde..e824ec1 100644 --- a/drivers/scsi/lpfc/lpfc_hbadisc.c +++ b/drivers/scsi/lpfc/lpfc_hbadisc.c @@ -3002,6 +3002,7 @@ lpfc_mbx_cmpl_read_sparam(struct lpfc_hba *phba, LPFC_MBOXQ_t *pmb) MAILBOX_t *mb = &pmb->u.mb; struct lpfc_dmabuf *mp = (struct lpfc_dmabuf *) pmb->context1; struct lpfc_vport *vport = pmb->vport; + struct Scsi_Host *shost = lpfc_shost_from_vport(vport); struct serv_parm *sp = &vport->fc_sparam; uint32_t ed_tov; @@ -3031,6 +3032,7 @@ lpfc_mbx_cmpl_read_sparam(struct lpfc_hba *phba, LPFC_MBOXQ_t *pmb) } lpfc_update_vport_wwn(vport); + fc_host_port_name(shost) = wwn_to_u64(vport->fc_portname.u.wwn); if (vport->port_type == LPFC_PHYSICAL_PORT) { memcpy(&phba->wwnn, &vport->fc_nodename, sizeof(phba->wwnn)); memcpy(&phba->wwpn, &vport->fc_portname, sizeof(phba->wwnn)); @@ -3309,6 +3311,7 @@ lpfc_mbx_cmpl_read_topology(struct lpfc_hba *phba, LPFC_MBOXQ_t *pmb) struct lpfc_sli_ring *pring; MAILBOX_t *mb = &pmb->u.mb; struct lpfc_dmabuf *
[PATCH-v1 19/22] Fix implicit logo and RSCN handling for NVMET
From: James Smart NVMET didn't have any RSCN handling at all and would not execute implicit LOGO when receiving a PLOGI from an rport that NVMET had in state UNMAPPED. Clean up the logic in lpfc_nlp_state_cleanup for initiators (FCP and NVME). NVMET should not respond to RSCN including allocating new ndlps so this code was conditionalized when nvmet_support is true. The check for NLP_RCV_PLOGI in lpfc_setup_disc_node was moved below the check for nvmet_support to allow the NVMET to recover initiator nodes correctly. The implicit logo was introduced with lpfc_rcv_plogi when NVMET gets a PLOGI on an ndlp in UNMAPPED state. The RSCN handling was modified to not respond to an RSCN in NVMET. Instead NVMET sends a GID_FT and determines if an NVMEP_INITIATOR it has is UNMAPPED but no longer in the zone membership. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_ct.c| 68 +- drivers/scsi/lpfc/lpfc_disc.h | 1 + drivers/scsi/lpfc/lpfc_els.c | 23 ++--- drivers/scsi/lpfc/lpfc_hbadisc.c | 62 ++ drivers/scsi/lpfc/lpfc_nportdisc.c | 8 +++-- 5 files changed, 110 insertions(+), 52 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_ct.c b/drivers/scsi/lpfc/lpfc_ct.c index d3e9af9..1487406 100644 --- a/drivers/scsi/lpfc/lpfc_ct.c +++ b/drivers/scsi/lpfc/lpfc_ct.c @@ -537,19 +537,53 @@ lpfc_prep_node_fc4type(struct lpfc_vport *vport, uint32_t Did, uint8_t fc4_type) } } +static void +lpfc_ns_rsp_audit_did(struct lpfc_vport *vport, uint32_t Did, uint8_t fc4_type) +{ + struct lpfc_hba *phba = vport->phba; + struct lpfc_nodelist *ndlp = NULL; + struct Scsi_Host *shost = lpfc_shost_from_vport(vport); + + /* +* To conserve rpi's, filter out addresses for other +* vports on the same physical HBAs. +*/ + if (Did != vport->fc_myDID && + (!lpfc_find_vport_by_did(phba, Did) || +vport->cfg_peer_port_login)) { + if (!phba->nvmet_support) { + /* FCPI/NVMEI path. Process Did */ + lpfc_prep_node_fc4type(vport, Did, fc4_type); + return; + } + /* NVMET path. NVMET only cares about NVMEI nodes. */ + list_for_each_entry(ndlp, &vport->fc_nodes, nlp_listp) { + if (ndlp->nlp_type != NLP_NVME_INITIATOR || + ndlp->nlp_state != NLP_STE_UNMAPPED_NODE) + continue; + spin_lock_irq(shost->host_lock); + if (ndlp->nlp_DID == Did) + ndlp->nlp_flag &= ~NLP_NVMET_RECOV; + else + ndlp->nlp_flag |= NLP_NVMET_RECOV; + spin_unlock_irq(shost->host_lock); + } + } +} + static int lpfc_ns_rsp(struct lpfc_vport *vport, struct lpfc_dmabuf *mp, uint8_t fc4_type, uint32_t Size) { - struct lpfc_hba *phba = vport->phba; struct lpfc_sli_ct_request *Response = (struct lpfc_sli_ct_request *) mp->virt; - struct lpfc_nodelist *ndlp = NULL; struct lpfc_dmabuf *mlast, *next_mp; uint32_t *ctptr = (uint32_t *) & Response->un.gid.PortType; uint32_t Did, CTentry; int Cnt; struct list_head head; + struct Scsi_Host *shost = lpfc_shost_from_vport(vport); + struct lpfc_nodelist *ndlp = NULL; lpfc_set_disctmo(vport); vport->num_disc_nodes = 0; @@ -574,19 +608,7 @@ lpfc_ns_rsp(struct lpfc_vport *vport, struct lpfc_dmabuf *mp, uint8_t fc4_type, /* Get next DID from NameServer List */ CTentry = *ctptr++; Did = ((be32_to_cpu(CTentry)) & Mask_DID); - - ndlp = NULL; - - /* -* Check for rscn processing or not -* To conserve rpi's, filter out addresses for other -* vports on the same physical HBAs. -*/ - if ((Did != vport->fc_myDID) && - ((lpfc_find_vport_by_did(phba, Did) == NULL) || -vport->cfg_peer_port_login)) - lpfc_prep_node_fc4type(vport, Did, fc4_type); - + lpfc_ns_rsp_audit_did(vport, Did, fc4_type); if (CTentry & (cpu_to_be32(SLI_CT_LAST_ENTRY))) goto nsout1; @@ -596,6 +618,22 @@ lpfc_ns_rsp(struct lpfc_vport *vport, struct lpfc_dmabuf *mp, uint8_t fc4_type, } + /* All GID_FT entries processed. If the driver is running in +* in target mode, put impacted nodes into recovery and drop +* the RPI to flush outstanding IO. +*/ + if (vport->p
[PATCH-v1 10/22] Fix PRLI ACC rsp for NVME
From: James Smart PRLI ACC from target is FCP oriented. Word 0 was wrong. This was noticed by another nvmet-fc vendor that was testing the lpfc nvme-fc initiator with their target. Verified results with analyzer. PRLI BC B5 56 56 22 61 04 00 00 61 00 00 01 29 00 00 20 00 00 00 00 10 FF FF 00 00 00 00 20 14 00 18 28 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00 9C D8 DA C9 BC 95 75 75 ACC BC B5 56 56 23 61 00 00 00 61 04 00 01 98 00 00 30 00 00 00 00 10 00 18 00 00 00 00 02 14 00 18 28 00 01 00 00 00 00 00 00 00 00 00 00 00 00 18 00 00 00 00 B0 6B 07 57 BC B5 75 75 Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_els.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c index d9c61d0..d6e0f58 100644 --- a/drivers/scsi/lpfc/lpfc_els.c +++ b/drivers/scsi/lpfc/lpfc_els.c @@ -4403,7 +4403,7 @@ lpfc_els_rsp_prli_acc(struct lpfc_vport *vport, struct lpfc_iocbq *oldiocb, pcmd = (uint8_t *) (((struct lpfc_dmabuf *) elsiocb->context2)->virt); memset(pcmd, 0, cmdsize); - *((uint32_t *) (pcmd)) = (ELS_CMD_ACC | (ELS_CMD_PRLI & ~ELS_RSP_MASK)); + *((uint32_t *)(pcmd)) = elsrspcmd; pcmd += sizeof(uint32_t); /* For PRLI, remainder of payload is PRLI parameter page */ -- 2.1.0
[PATCH-v1 02/22] Fix nvme unregister port timeout.
From: James Smart During some link event testing it was observed that the wait_for_completion_timeout in the lpfc_nvme_unregister_port was timing out all the time. The initiator is claiming the nvme_fc_unregister_remoteport upcall is not completing the unregister in the time allotted. [ 2186.151317] lpfc :07:00.0: 0:(0):6169 Unreg nvme wait failed 0 The wait_for_completion_timeout returns 0 when the wait has been outstanding for the jiffies passed by the caller. In this error message, the nvme initiator passed value 5 - meaning 5 jiffies - and this is just wrong. Calculate 5 seconds in Jiffies and pass that value from the current jiffies. Also the log message for the unregister timeout was reduced because timeout failure is the same as timeout. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_nvme.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c index 0024de1..a39d72c 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.c +++ b/drivers/scsi/lpfc/lpfc_nvme.c @@ -2409,6 +2409,7 @@ lpfc_nvme_unregister_port(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp) struct lpfc_nvme_lport *lport; struct lpfc_nvme_rport *rport; struct nvme_fc_remote_port *remoteport; + unsigned long wait_tmo; localport = vport->localport; @@ -2451,11 +2452,12 @@ lpfc_nvme_unregister_port(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp) * before proceeding. This guarantees the transport and driver * have completed the unreg process. */ - ret = wait_for_completion_timeout(&rport->rport_unreg_done, 5); + wait_tmo = msecs_to_jiffies(5000); + ret = wait_for_completion_timeout(&rport->rport_unreg_done, + wait_tmo); if (ret == 0) { lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_DISC, -"6169 Unreg nvme wait failed %d\n", -ret); +"6169 Unreg nvme wait timeout\n"); } } return; -- 2.1.0
[PATCH-v1 03/22] Fix rejected nvme LS Req.
From: James Smart In this case, the NVME initiator is sending an LS REQ command on an NDLP that is not MAPPED. The FW rejects it. The lpfc_nvme_ls_req routine checks for a NULL ndlp pointer but does not check the NDLP state. This allows the routine to send an LS IO when the ndlp is disconnected. Check the ndlp for NULL, actual node, Target and MAPPED or Initiator and UNMAPPED. This avoids Fabric nodes getting the Create Association or Create Connection commands. Initiators are free to Reject either Create. Also some of the messages numbers in lpfc_nvme_ls_req were changed because they were already used in other log messages. Signed-off-by: Dick Kennedy Signed-off-by: James Smart --- drivers/scsi/lpfc/lpfc_nvme.c | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c index a39d72c..2a87428 100644 --- a/drivers/scsi/lpfc/lpfc_nvme.c +++ b/drivers/scsi/lpfc/lpfc_nvme.c @@ -417,11 +417,26 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport, vport = lport->vport; ndlp = lpfc_findnode_did(vport, pnvme_rport->port_id); - if (!ndlp) { - lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_DISC, -"6043 Could not find node for DID %x\n", + if (!ndlp || !NLP_CHK_NODE_ACT(ndlp)) { + lpfc_printf_vlog(vport, KERN_ERR, LOG_NODE | LOG_NVME_IOERR, +"6051 DID x%06x not an active rport.\n", pnvme_rport->port_id); - return 1; + return -ENODEV; + } + + /* The remote node has to be a mapped nvme target or an +* unmapped nvme initiator or it's an error. +*/ + if (((ndlp->nlp_type & NLP_NVME_TARGET) && +(ndlp->nlp_state != NLP_STE_MAPPED_NODE)) || + ((ndlp->nlp_type & NLP_NVME_INITIATOR) && +(ndlp->nlp_state != NLP_STE_UNMAPPED_NODE))) { + lpfc_printf_vlog(vport, KERN_ERR, LOG_NODE | LOG_NVME_IOERR, +"6088 DID x%06x not ready for " +"IO. State x%x, Type x%x\n", +pnvme_rport->port_id, +ndlp->nlp_state, ndlp->nlp_type); + return -ENODEV; } bmp = kmalloc(sizeof(struct lpfc_dmabuf), GFP_KERNEL); if (!bmp) { @@ -456,7 +471,7 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport, /* Expand print to include key fields. */ lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC, -"6051 ENTER. lport %p, rport %p lsreq%p rqstlen:%d " +"6149 ENTER. lport %p, rport %p lsreq%p rqstlen:%d " "rsplen:%d %pad %pad\n", pnvme_lport, pnvme_rport, pnvme_lsreq, pnvme_lsreq->rqstlen, -- 2.1.0
[PATCH-v1 00/22] lpfc updates for 11.2.0.12
From: James Smart This patch set provides a number of bug fixes, code cleanups, and error handling, mostly in the nvme area of lpfc. There is one new feature in the series to add Fabric Assigned WWN support. The patches were cut against the linux-block tree, for-4.12/block branch, and are intended to be pulled in through that tree rather than the scsi trees. The patches are dependent on the FC nvme/nvmet patches from the following 2 series: http://lists.infradead.org/pipermail/linux-nvme/2017-April/009250.html http://lists.infradead.org/pipermail/linux-nvme/2017-April/009256.html James Smart (22): Standardize nvme SGL segment count Fix nvme unregister port timeout. Fix rejected nvme LS Req. Fix log message in completion path. Add debug messages for nvme/fcp resource allocation. Fix spelling in comments. Remove unused defines for NVME PostBuf. Remove NULL ptr check before kfree. Fix extra line print in rqpair debug print. Fix PRLI ACC rsp for NVME Fix driver unload/reload operation. Fix driver usage of 128B WQEs when WQ_CREATE is V1. Fix nvme initiator handling when not enabled. Remove hba lock from NVMET issue WQE. Fix driver load issues when MRQ=8 Fix crash after issuing lip reset Fix max_sgl_segments settings for NVME / NVMET Add Fabric assigned WWN support. Fix implicit logo and RSCN handling for NVMET Update ABORT processing for NVMET. Fix Express lane queue creation. lpfc revison 11.2.0.12 drivers/scsi/lpfc/lpfc.h | 5 +- drivers/scsi/lpfc/lpfc_attr.c | 10 +- drivers/scsi/lpfc/lpfc_bsg.c | 4 + drivers/scsi/lpfc/lpfc_crtn.h | 9 +- drivers/scsi/lpfc/lpfc_ct.c| 68 +-- drivers/scsi/lpfc/lpfc_debugfs.c | 69 --- drivers/scsi/lpfc/lpfc_disc.h | 1 + drivers/scsi/lpfc/lpfc_els.c | 68 +++ drivers/scsi/lpfc/lpfc_hbadisc.c | 131 + drivers/scsi/lpfc/lpfc_hw.h| 3 + drivers/scsi/lpfc/lpfc_hw4.h | 4 + drivers/scsi/lpfc/lpfc_init.c | 202 +--- drivers/scsi/lpfc/lpfc_mbox.c | 7 +- drivers/scsi/lpfc/lpfc_nportdisc.c | 8 +- drivers/scsi/lpfc/lpfc_nvme.c | 139 ++ drivers/scsi/lpfc/lpfc_nvme.h | 7 +- drivers/scsi/lpfc/lpfc_nvmet.c | 378 - drivers/scsi/lpfc/lpfc_nvmet.h | 14 +- drivers/scsi/lpfc/lpfc_sli.c | 22 ++- drivers/scsi/lpfc/lpfc_sli4.h | 2 +- drivers/scsi/lpfc/lpfc_version.h | 2 +- drivers/scsi/lpfc/lpfc_vport.c | 3 +- 22 files changed, 814 insertions(+), 342 deletions(-) -- 2.1.0
Re: [PATCH] ibmvscsis: Do not send aborted task response
"Bryant G. Ly" writes: > The driver is sending a response to the aborted task response along > with LIO sending the tmr response. ibmvscsis_tgt does not send the > response to the client until release_cmd time. The reason for this was > because if we did it at queue_status time, then the client would be > free to reuse the tag for that command, but we're still using the tag > until the command is released at release_cmd time, so we chose to > delay sending the response until then. That then caused this issue, > because release_cmd is always called, even if queue_status is not. > > SCSI spec says that the initiator that sends the abort task TM NEVER > gets a response to the aborted op and with the current code it will > send a response. Thus this fix will remove that response if the TAS > bit is set. Somebody please review! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH V2] scsi: mpt3sas: remove redundant wmb
Sinan Kaya writes: > Due to relaxed ordering requirements on multiple architectures, > drivers are required to use wmb/rmb/mb combinations when they need to > guarantee observability between the memory and the HW. > > The mpt3sas driver is already using wmb() for this purpose. However, > it issues a writel following wmb(). writel() function on arm/arm64 > arhictectures have an embedded wmb() call inside. > > This results in unnecessary performance loss and code duplication. > > writel already guarantees ordering for both cpu and bus. we don't need > additional wmb() Broadcom folks, please review! -- Martin K. Petersen Oracle Linux Engineering
Re: commit 1b6ac5e3f "fnic: Using rport->dd_data to check rport online instead of rport_lookup."
Joe Jin writes: > My customer hit below error when issue LIP to fnic controller: > > [94702.898408] sd 2:0:4:1: [sdx] tag#1 FAILED Result: > hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK > [94702.898416] sd 2:0:4:1: [sdx] tag#1 CDB: Write(10) 2a 00 04 56 c0 08 00 00 > 08 00 > [94702.898420] blk_update_request: I/O error, dev sdx, sector 72794120 > [94702.898455] sd 2:0:4:10: [sdy] tag#2 FAILED Result: > hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK > [94702.898459] sd 2:0:4:10: [sdy] tag#2 CDB: Read(10) 28 00 00 00 08 a8 00 00 > 02 00 > [94702.898462] blk_update_request: I/O error, dev sdy, sector 2216 > > Looked at all changes of fnic I found it caused by commit 1b6ac5e3f "fnic: > Using rport->dd_data to check rport online instead of rport_lookup.", a > question is why changed state check from RPORT_ST_DELETE to RPORT_ST_READY > here? Satish? -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] [SCSI] aic7xxx: fix order of arguments in function prototype
Colin King writes: > The vendor/device and subvendor/subdevice arguments to the function > prototype ahc_9005_subdevinfo_valid are in the wrong order and need to > be swapped to fix this. Detected with PVS-Studio studio. Applied to 4.12/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCHv6 8/8] scsi: inline command aborts
Bart Van Assche writes: > On Thu, 2017-04-06 at 15:36 +0200, Hannes Reinecke wrote: >> The block layer always calls the timeout function from a workqueue >> context, so there is no need to have yet another workqueue for >> running command aborts. >> >> [ ... ] >> @@ -271,10 +266,14 @@ enum blk_eh_timer_return scsi_times_out(struct request >> *req) >> rtn = host->hostt->eh_timed_out(scmd); >> >> if (rtn == BLK_EH_NOT_HANDLED) { >> -if (scsi_abort_command(scmd) != SUCCESS) { >> +int ret; >> + >> +ret = scsi_abort_command(scmd); >> +if (ret == FAILED) { >> set_host_byte(scmd, DID_TIME_OUT); >> scsi_eh_scmd_add(scmd); >> -} >> +} else if (ret == FAST_IO_FAIL) >> +rtn = BLK_EH_RESET_TIMER; >> } > > Has this patch been tested with the traditional block layer? For the > traditional block layer scsi_times_out() is called with the queue lock > held. Does this patch cause .eh_abort_handler(), a function that may > sleep, to be called with the queue lock held? Hannes: Ping! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] bfa: remove bfa_module_s madness
Christoph Hellwig writes: > Just call the functions directly and remove a giant pile of > boilerplate code. Applied to 4.12/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] scsi: qedi,qedf: Use designated initializers
Kees, > Prepare to mark sensitive kernel structures for randomization by > making sure they're using designated initializers. These were > identified during allyesconfig builds of x86, arm, and arm64, with > most initializer fixes extracted from grsecurity. > > For these cases, terminate the list with { }, which will be > zero-filled, instead of undesignated NULLs. Applied to 4.12/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 08/23] scsi: introduce a result field in struct scsi_request
Christoph Hellwig writes: > This passes on the scsi_cmnd result field to users of passthrough > requests. Currently we abuse req->errors for this purpose, but that > field will go away in its current form. Reviewed-by: Martin K. Petersen > Note that the old IDE code abuses the errors field in very creative > ways and stores all kinds of different values in it. I didn't dare > to touch this magic, so the abuses are brought forward 1:1. Really wish we could just obliterate drivers/ide. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 26/27] scsi: sd: Separate zeroout and discard command choices
Paolo, > Should this be conditional on lbprz, lbpws, lbpws10 and max_ws_blocks? It is intentional that things can be overridden from userland for devices that report the "wrong" thing. We do the same for discard so people can set up udev rules. -- Martin K. Petersen Oracle Linux Engineering
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On Wed, Apr 19, 2017 at 3:55 PM, Logan Gunthorpe wrote: > > > On 19/04/17 02:48 PM, Jason Gunthorpe wrote: >> On Wed, Apr 19, 2017 at 01:41:49PM -0600, Logan Gunthorpe wrote: >> But.. it could point to a GPU and the GPU struct device could have a proxy dma_ops like Dan pointed out. >>> >>> Seems a bit awkward to me that in order for the intended use case, you >>> have to proxy the dma_ops. I'd probably still suggest throwing a couple >>> ops for things like this in the dev_pagemap. >> >> Another option is adding a new 'struct completer_dma_ops *' to struct >> device for this use case. >> >> Seems like a waste to expand dev_pagemap when we only need a unique >> value per struct device? > > I feel like expanding dev_pagemap has a much lower impact than expanding > struct device... dev_pagemap is only one instance per zone device region > so expanding it shouldn't be a huge issue. Expanding struct device means > every device struct in the system gets bigger. Especially since we expect a very small subset of devices will ever support p2p.
Re: [PATCH v3 8/8] scsi: Implement blk_mq_ops.show_rq()
On Wed, 2017-04-19 at 19:25 -0400, Martin K. Petersen wrote: > Bart Van Assche writes: > > Show the SCSI CDB, .eh_eflags and .result for pending SCSI commands > > in /sys/kernel/debug/block/*/mq/*/dispatch and */rq_list. > > SCSI_EH_CANCEL_CMD is no more (as of a06586325f37). Thanks Martin. I will remove that flag when I repost this patch series. Bart.
Re: [PATCH] scsi: osd_uld: fix mismerge
Hi Bart, On Wed, 19 Apr 2017 17:46:43 + Bart Van Assche wrote: > > On Wed, 2017-04-19 at 19:44 +0200, Arnd Bergmann wrote: > > Fixes: c02465fa13b6 ("scsi: osd_uld: Check scsi_device_get() return value") > > Fixes: ac1ddc584e98 ("scsi: utilize new cdev_device_add helper function") > > Both these patches are fine as far as I can tell. Shouldn't the > "Fixes" tag refer to the merge commit instead of to the commits > that have been merged? Unfortunately, the merge commit is redone every day and will be redone by Linus during the merge window. This fix should be incorporated into the merge commit itself. I should have done it originally, but the conflict was such a mess that I took the easy way out :-) -- Cheers, Stephen Rothwell
Re: [PATCH] scsi: osd_uld: fix mismerge
Hi Arnd, On Wed, 19 Apr 2017 19:44:01 +0200 Arnd Bergmann wrote: > > A mismerge between two branches in linux-next reintroduced a warning that was > previously resolved: > > drivers/scsi/osd/osd_uld.c: In function 'osd_probe': > drivers/scsi/osd/osd_uld.c:457:2: error: ignoring return value of > 'scsi_device_get', declared with attribute warn_unused_result > [-Werror=unused-result] > > The original fix had more complex error unrolling, but as far as I > can tell, this simpler version is now sufficient. > > Fixes: c02465fa13b6 ("scsi: osd_uld: Check scsi_device_get() return value") > Fixes: ac1ddc584e98 ("scsi: utilize new cdev_device_add helper function") > Signed-off-by: Arnd Bergmann > --- > drivers/scsi/osd/osd_uld.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c > index 8b9941a5687a..0e56f1eb05dc 100644 > --- a/drivers/scsi/osd/osd_uld.c > +++ b/drivers/scsi/osd/osd_uld.c > @@ -454,7 +454,8 @@ static int osd_probe(struct device *dev) > /* hold one more reference to the scsi_device that will get released >* in __release, in case a logout is happening while fs is mounted >*/ > - scsi_device_get(scsi_device); > + if (scsi_device_get(scsi_device)) > + goto err_retract_minor; > osd_dev_init(&oud->od, scsi_device); > > /* allocate a disk and set it up */ I will add this as a merge fix patch for the merge of the char-misc and scsi trees today. Someone needs to let Linus know when these trees are merged. -- Cheers, Stephen Rothwell
Re: [PATCH v3 8/8] scsi: Implement blk_mq_ops.show_rq()
Bart Van Assche writes: Bart, > Show the SCSI CDB, .eh_eflags and .result for pending SCSI commands > in /sys/kernel/debug/block/*/mq/*/dispatch and */rq_list. SCSI_EH_CANCEL_CMD is no more (as of a06586325f37). -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v3 0/2] scsi: storvsc: Add support for FC rport
Cathy, > The updated patch set provides a way for drivers ( specifically > storvsc in this case ) that expose virturalized fc devices > but that do not expose rports to be manually scanned. This is > done via creating a pseudo rport in storvsc and a > corresponding dummy initiator rport role in the fc transport. Applied to 4.12/scsi-queue, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] virtio_scsi: Always try to read VPD pages
David, > Passed through SCSI targets may have transfer limits which come from > the host SCSI controller something on the host side other than the > target itself. > > To make this work properly, the hypervisor can adjust the target's VPD > information to advertise these limits. But for that to work, the > guest has to look at the VPD pages, which we won't do by default if it > is an SPC-2 device, even if it does actually support it. Applied to 4.12/scsi-queue. -- Martin K. Petersen Oracle Linux Engineering
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On 19/04/17 02:48 PM, Jason Gunthorpe wrote: > On Wed, Apr 19, 2017 at 01:41:49PM -0600, Logan Gunthorpe wrote: > >>> But.. it could point to a GPU and the GPU struct device could have a >>> proxy dma_ops like Dan pointed out. >> >> Seems a bit awkward to me that in order for the intended use case, you >> have to proxy the dma_ops. I'd probably still suggest throwing a couple >> ops for things like this in the dev_pagemap. > > Another option is adding a new 'struct completer_dma_ops *' to struct > device for this use case. > > Seems like a waste to expand dev_pagemap when we only need a unique > value per struct device? I feel like expanding dev_pagemap has a much lower impact than expanding struct device... dev_pagemap is only one instance per zone device region so expanding it shouldn't be a huge issue. Expanding struct device means every device struct in the system gets bigger. Logan
Re: [PATCH 23/23] block: remove the errors field from struct request
On Wed, 2017-04-19 at 21:27 +0200, Christoph Hellwig wrote: > Signed-off-by: Christoph Hellwig > Acked-by: Roger Pau Monné > Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Bart Van Assche
Re: [PATCH 02/23] block: remove the blk_execute_rq return value
On Wed, 2017-04-19 at 21:26 +0200, Christoph Hellwig wrote: > --- a/drivers/scsi/osd/osd_initiator.c > +++ b/drivers/scsi/osd/osd_initiator.c > @@ -489,7 +489,10 @@ static void _set_error_resid(struct osd_request *or, > struct request *req, > > int osd_execute_request(struct osd_request *or) > { > - int error = blk_execute_rq(or->request->q, NULL, or->request, 0); > + int error; > + > + blk_execute_rq(or->request->q, NULL, or->request, 0); > + error = or->request ? -EIO : 0; Hello Christoph, Did you perhaps intend or->request->errors instead of or->request? Otherwise this patch looks fine to me. Bart.
Re: [PATCH 01/23] pd: don't check blk_execute_rq return value.
On Wed, 2017-04-19 at 21:26 +0200, Christoph Hellwig wrote: > The driver never sets req->errors, so blk_execute_rq will always return 0. Reviewed-by: Bart Van Assche
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On Wed, Apr 19, 2017 at 01:41:49PM -0600, Logan Gunthorpe wrote: > > But.. it could point to a GPU and the GPU struct device could have a > > proxy dma_ops like Dan pointed out. > > Seems a bit awkward to me that in order for the intended use case, you > have to proxy the dma_ops. I'd probably still suggest throwing a couple > ops for things like this in the dev_pagemap. Another option is adding a new 'struct completer_dma_ops *' to struct device for this use case. Seems like a waste to expand dev_pagemap when we only need a unique value per struct device? Jason
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On 19/04/17 01:31 PM, Jason Gunthorpe wrote: > Try it with VT-D turned on. It shouldn't work or there is a notable > security hole in your platform.. Ah, ok. >>> const struct dma_map_ops *comp_ops = get_dma_ops(completer); >>> const struct dma_map_ops *init_ops = get_dma_ops(initiator); >> >> So, in this case, what device does the completer point to? The PCI >> device or a more specific GPU device? If it's the former, who's >> responsible for setting the new dma_ops? Typically the dma_ops are arch >> specific but now you'd be adding ones that are tied to hmm or the gpu. > > Donno, that is for GPU folks to figure out :) > > But.. it could point to a GPU and the GPU struct device could have a > proxy dma_ops like Dan pointed out. Seems a bit awkward to me that in order for the intended use case, you have to proxy the dma_ops. I'd probably still suggest throwing a couple ops for things like this in the dev_pagemap. >> It appears to me like it's calculating the DMA address, and the check is >> just a side requirement. It reads as though it's only doing the check. > > pci_p2p_same_segment_get_pa() then? Ok, I think that's a bit clearer. Logan
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On Wed, Apr 19, 2017 at 01:02:49PM -0600, Logan Gunthorpe wrote: > > > On 19/04/17 12:32 PM, Jason Gunthorpe wrote: > > On Wed, Apr 19, 2017 at 12:01:39PM -0600, Logan Gunthorpe wrote: > > Not entirely, it would have to call through the whole process > > including the arch_p2p_cross_segment().. > > Hmm, yes. Though it's still not clear what, if anything, > arch_p2p_cross_segment would be doing. Sets up the iommu for arches that place a iommu between the pci root port and other pci root ports. > In my experience, if you are going between host bridges, the CPU > address (or PCI address -- I'm not sure which seeing they are the > same on my system) would still work fine Try it with VT-D turned on. It shouldn't work or there is a notable security hole in your platform.. > > const struct dma_map_ops *comp_ops = get_dma_ops(completer); > > const struct dma_map_ops *init_ops = get_dma_ops(initiator); > > So, in this case, what device does the completer point to? The PCI > device or a more specific GPU device? If it's the former, who's > responsible for setting the new dma_ops? Typically the dma_ops are arch > specific but now you'd be adding ones that are tied to hmm or the gpu. Donno, that is for GPU folks to figure out :) But.. it could point to a GPU and the GPU struct device could have a proxy dma_ops like Dan pointed out. > >> I'm not sure I like the name pci_p2p_same_segment. It reads as though > >> it's only checking if the devices are not the same segment. > > > > Well, that is exactly what it is doing. If it succeeds then the caller > > knows the DMA will not flow outside the segment and no iommu setup/etc > > is required. > > It appears to me like it's calculating the DMA address, and the check is > just a side requirement. It reads as though it's only doing the check. pci_p2p_same_segment_get_pa() then? Jason
[PATCH 23/23] block: remove the errors field from struct request
Signed-off-by: Christoph Hellwig Acked-by: Roger Pau Monné Reviewed-by: Konrad Rzeszutek Wilk --- block/blk-core.c | 14 +- block/blk-exec.c | 3 +-- block/blk-mq.c | 10 +++--- block/blk-timeout.c | 1 - include/linux/blkdev.h | 2 -- include/trace/events/block.h | 17 +++-- kernel/trace/blktrace.c | 26 -- 7 files changed, 24 insertions(+), 49 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 8654aa0cef6d..48f522a02f54 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -1633,7 +1633,6 @@ void init_request_from_bio(struct request *req, struct bio *bio) if (bio->bi_opf & REQ_RAHEAD) req->cmd_flags |= REQ_FAILFAST_MASK; - req->errors = 0; req->__sector = bio->bi_iter.bi_sector; blk_rq_set_prio(req, rq_ioc(bio)); if (ioprio_valid(bio_prio(bio))) @@ -2567,22 +2566,11 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes) { int total_bytes; - trace_block_rq_complete(req->q, req, nr_bytes); + trace_block_rq_complete(req, error, nr_bytes); if (!req->bio) return false; - /* -* For fs requests, rq is just carrier of independent bio's -* and each partial completion should be handled separately. -* Reset per-request error on each partial completion. -* -* TODO: tj: This is too subtle. It would be better to let -* low level drivers do what they see fit. -*/ - if (!blk_rq_is_passthrough(req)) - req->errors = 0; - if (error && !blk_rq_is_passthrough(req) && !(req->rq_flags & RQF_QUIET)) { char *error_type; diff --git a/block/blk-exec.c b/block/blk-exec.c index afa383248c7c..a9451e3b8587 100644 --- a/block/blk-exec.c +++ b/block/blk-exec.c @@ -69,8 +69,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk, if (unlikely(blk_queue_dying(q))) { rq->rq_flags |= RQF_QUIET; - rq->errors = -ENXIO; - __blk_end_request_all(rq, rq->errors); + __blk_end_request_all(rq, -ENXIO); spin_unlock_irq(q->queue_lock); return; } diff --git a/block/blk-mq.c b/block/blk-mq.c index 1f98db082da3..f5e3cf153ad7 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -213,7 +213,6 @@ void blk_mq_rq_ctx_init(struct request_queue *q, struct blk_mq_ctx *ctx, #endif rq->special = NULL; /* tag was already set */ - rq->errors = 0; rq->extra_len = 0; INIT_LIST_HEAD(&rq->timeout_list); @@ -624,8 +623,7 @@ void blk_mq_abort_requeue_list(struct request_queue *q) rq = list_first_entry(&rq_list, struct request, queuelist); list_del_init(&rq->queuelist); - rq->errors = -EIO; - blk_mq_end_request(rq, rq->errors); + blk_mq_end_request(rq, -EIO); } } EXPORT_SYMBOL(blk_mq_abort_requeue_list); @@ -1032,8 +1030,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list) pr_err("blk-mq: bad return on queue: %d\n", ret); case BLK_MQ_RQ_QUEUE_ERROR: errors++; - rq->errors = -EIO; - blk_mq_end_request(rq, rq->errors); + blk_mq_end_request(rq, -EIO); break; } @@ -1484,8 +1481,7 @@ static void __blk_mq_try_issue_directly(struct request *rq, blk_qc_t *cookie, if (ret == BLK_MQ_RQ_QUEUE_ERROR) { *cookie = BLK_QC_T_NONE; - rq->errors = -EIO; - blk_mq_end_request(rq, rq->errors); + blk_mq_end_request(rq, -EIO); return; } diff --git a/block/blk-timeout.c b/block/blk-timeout.c index a30441a200c0..cbff183f3d9f 100644 --- a/block/blk-timeout.c +++ b/block/blk-timeout.c @@ -89,7 +89,6 @@ static void blk_rq_timed_out(struct request *req) ret = q->rq_timed_out_fn(req); switch (ret) { case BLK_EH_HANDLED: - /* Can we use req->errors here? */ __blk_complete_request(req); break; case BLK_EH_RESET_TIMER: diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index e8006abce8b6..b9b958a3bd08 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -220,8 +220,6 @@ struct request { void *special; /* opaque pointer available for LLD use */ - int errors; - unsigned int extra_len; /* length of alignment and padding */ unsigned long deadline; diff --git a/include/trace/events/block.h b/include/trace/events/block.h index 99ed69fad041..d0dbe60d8a6d 100644 --- a/include/trace/events/block.h +++ b/include/trace/events/bloc
[PATCH 21/23] swim3: remove (commented out) printing of req->errors
Signed-off-by: Christoph Hellwig --- drivers/block/swim3.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/block/swim3.c b/drivers/block/swim3.c index 61b3ffa4f458..ba4809c9bdba 100644 --- a/drivers/block/swim3.c +++ b/drivers/block/swim3.c @@ -343,8 +343,8 @@ static void start_request(struct floppy_state *fs) req->rq_disk->disk_name, req->cmd, (long)blk_rq_pos(req), blk_rq_sectors(req), bio_data(req->bio)); - swim3_dbg(" errors=%d current_nr_sectors=%u\n", - req->errors, blk_rq_cur_sectors(req)); + swim3_dbg(" current_nr_sectors=%u\n", + blk_rq_cur_sectors(req)); #endif if (blk_rq_pos(req) >= fs->total_secs) { -- 2.11.0
[PATCH 20/23] ataflop: switch from req->errors to req->error_count
Signed-off-by: Christoph Hellwig --- drivers/block/ataflop.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c index 2104b1b4ccda..fa69ecd52cb5 100644 --- a/drivers/block/ataflop.c +++ b/drivers/block/ataflop.c @@ -617,12 +617,12 @@ static void fd_error( void ) if (!fd_request) return; - fd_request->errors++; - if (fd_request->errors >= MAX_ERRORS) { + fd_request->error_count++; + if (fd_request->error_count >= MAX_ERRORS) { printk(KERN_ERR "fd%d: too many errors.\n", SelectedDrive ); fd_end_request_cur(-EIO); } - else if (fd_request->errors == RECALIBRATE_ERRORS) { + else if (fd_request->error_count == RECALIBRATE_ERRORS) { printk(KERN_WARNING "fd%d: recalibrating\n", SelectedDrive ); if (SelectedDrive != -1) SUD.track = -1; @@ -1386,7 +1386,7 @@ static void setup_req_params( int drive ) ReqData = ReqBuffer + 512 * ReqCnt; if (UseTrackbuffer) - read_track = (ReqCmd == READ && fd_request->errors == 0); + read_track = (ReqCmd == READ && fd_request->error_count == 0); else read_track = 0; @@ -1409,8 +1409,10 @@ static struct request *set_next_request(void) fdc_queue = 0; if (q) { rq = blk_fetch_request(q); - if (rq) + if (rq) { + rq->error_count = 0; break; + } } } while (fdc_queue != old_pos); -- 2.11.0
[PATCH 18/23] block: add a error_count field to struct request
This is for the legacy floppy and ataflop drivers that currently abuse ->errors for this purpose. It's stashed away in a union to not grow the struct size, the other fields are either used by modern drivers for different purposes or the I/O scheduler before queing the I/O to drivers. Signed-off-by: Christoph Hellwig --- include/linux/blkdev.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d8bcd51b8a47..e8006abce8b6 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -175,6 +175,7 @@ struct request { struct rb_node rb_node; /* sort/lookup */ struct bio_vec special_vec; void *completion_data; + int error_count; /* for legacy drivers, don't use */ }; /* -- 2.11.0
[PATCH 17/23] blk-mq: simplify __blk_mq_complete_request
Merge blk_mq_ipi_complete_request and blk_mq_stat_add into their only caller. Signed-off-by: Christoph Hellwig --- block/blk-mq.c | 25 - 1 file changed, 8 insertions(+), 17 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 0c19df66f5a0..1f98db082da3 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -406,12 +406,19 @@ static void __blk_mq_complete_request_remote(void *data) rq->q->softirq_done_fn(rq); } -static void blk_mq_ipi_complete_request(struct request *rq) +static void __blk_mq_complete_request(struct request *rq) { struct blk_mq_ctx *ctx = rq->mq_ctx; bool shared = false; int cpu; + if (rq->internal_tag != -1) + blk_mq_sched_completed_request(rq); + if (rq->rq_flags & RQF_STATS) { + blk_mq_poll_stats_start(rq->q); + blk_stat_add(rq); + } + if (!test_bit(QUEUE_FLAG_SAME_COMP, &rq->q->queue_flags)) { rq->q->softirq_done_fn(rq); return; @@ -432,22 +439,6 @@ static void blk_mq_ipi_complete_request(struct request *rq) put_cpu(); } -static void blk_mq_stat_add(struct request *rq) -{ - if (rq->rq_flags & RQF_STATS) { - blk_mq_poll_stats_start(rq->q); - blk_stat_add(rq); - } -} - -static void __blk_mq_complete_request(struct request *rq) -{ - if (rq->internal_tag != -1) - blk_mq_sched_completed_request(rq); - blk_mq_stat_add(rq); - blk_mq_ipi_complete_request(rq); -} - /** * blk_mq_complete_request - end I/O on a request * @rq:the request being processed -- 2.11.0
[PATCH 19/23] floppy: switch from req->errors to req->error_count
Signed-off-by: Christoph Hellwig --- drivers/block/floppy.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c index ce102ec47ef2..60d4c7653178 100644 --- a/drivers/block/floppy.c +++ b/drivers/block/floppy.c @@ -2805,8 +2805,10 @@ static int set_next_request(void) fdc_queue = 0; if (q) { current_req = blk_fetch_request(q); - if (current_req) + if (current_req) { + current_req->error_count = 0; break; + } } } while (fdc_queue != old_pos); @@ -2866,7 +2868,7 @@ static void redo_fd_request(void) _floppy = floppy_type + DP->autodetect[DRS->probed_format]; } else probing = 0; - errors = &(current_req->errors); + errors = &(current_req->error_count); tmp = make_raw_rw_request(); if (tmp < 2) { request_done(tmp); -- 2.11.0
[PATCH 12/23] dm mpath: don't check for req->errors
We'll get all proper errors reported through ->end_io and ->errors will go away soon. Signed-off-by: Christoph Hellwig --- drivers/md/dm-mpath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index ab55955ed704..2950b145443d 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1492,7 +1492,7 @@ static int do_end_io(struct multipath *m, struct request *clone, */ int r = DM_ENDIO_REQUEUE; - if (!error && !clone->errors) + if (!error) return 0; /* I/O complete */ if (noretry_error(error)) -- 2.11.0
[PATCH 16/23] blk-mq: remove the error argument to blk_mq_complete_request
Now that all drivers that call blk_mq_complete_requests have a ->complete callback we can remove the direct call to blk_mq_end_request, as well as the error argument to blk_mq_complete_request. Signed-off-by: Christoph Hellwig --- block/blk-mq.c| 15 +++ drivers/block/loop.c | 4 ++-- drivers/block/mtip32xx/mtip32xx.c | 4 ++-- drivers/block/nbd.c | 4 ++-- drivers/block/null_blk.c | 2 +- drivers/block/virtio_blk.c| 2 +- drivers/block/xen-blkfront.c | 2 +- drivers/md/dm-rq.c| 2 +- drivers/nvme/host/core.c | 2 +- drivers/nvme/host/nvme.h | 2 +- drivers/scsi/scsi_lib.c | 2 +- include/linux/blk-mq.h| 2 +- 12 files changed, 17 insertions(+), 26 deletions(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index e2ef7b460924..0c19df66f5a0 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -442,17 +442,10 @@ static void blk_mq_stat_add(struct request *rq) static void __blk_mq_complete_request(struct request *rq) { - struct request_queue *q = rq->q; - if (rq->internal_tag != -1) blk_mq_sched_completed_request(rq); - blk_mq_stat_add(rq); - - if (!q->softirq_done_fn) - blk_mq_end_request(rq, rq->errors); - else - blk_mq_ipi_complete_request(rq); + blk_mq_ipi_complete_request(rq); } /** @@ -463,16 +456,14 @@ static void __blk_mq_complete_request(struct request *rq) * Ends all I/O on a request. It does not handle partial completions. * The actual completion happens out-of-order, through a IPI handler. **/ -void blk_mq_complete_request(struct request *rq, int error) +void blk_mq_complete_request(struct request *rq) { struct request_queue *q = rq->q; if (unlikely(blk_should_fake_timeout(q))) return; - if (!blk_mark_rq_complete(rq)) { - rq->errors = error; + if (!blk_mark_rq_complete(rq)) __blk_mq_complete_request(rq); - } } EXPORT_SYMBOL(blk_mq_complete_request); diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 86351b3f7350..994403efee19 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -465,7 +465,7 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2) struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb); cmd->ret = ret; - blk_mq_complete_request(cmd->rq, 0); + blk_mq_complete_request(cmd->rq); } static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, @@ -1685,7 +1685,7 @@ static void loop_handle_cmd(struct loop_cmd *cmd) /* complete non-aio request */ if (!cmd->use_aio || ret) { cmd->ret = ret ? -EIO : 0; - blk_mq_complete_request(cmd->rq, 0); + blk_mq_complete_request(cmd->rq); } } diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 7406de29db58..66a6bd83faae 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -242,7 +242,7 @@ static void mtip_async_complete(struct mtip_port *port, rq = mtip_rq_from_tag(dd, tag); cmd->status = status; - blk_mq_complete_request(rq, 0); + blk_mq_complete_request(rq); } /* @@ -4109,7 +4109,7 @@ static void mtip_no_dev_cleanup(struct request *rq, void *data, bool reserv) if (likely(!reserv)) { cmd->status = -ENODEV; - blk_mq_complete_request(rq, 0); + blk_mq_complete_request(rq); } else if (test_bit(MTIP_PF_IC_ACTIVE_BIT, &dd->port->flags)) { cmd = mtip_cmd_from_tag(dd, MTIP_TAG_INTERNAL); diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 09a74a66beb1..d387bef07fcc 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -635,7 +635,7 @@ static void recv_work(struct work_struct *work) break; } - blk_mq_complete_request(blk_mq_rq_from_pdu(cmd), 0); + blk_mq_complete_request(blk_mq_rq_from_pdu(cmd)); } atomic_dec(&config->recv_threads); wake_up(&config->recv_wq); @@ -651,7 +651,7 @@ static void nbd_clear_req(struct request *req, void *data, bool reserved) return; cmd = blk_mq_rq_to_pdu(req); cmd->status = -EIO; - blk_mq_complete_request(req, 0); + blk_mq_complete_request(req); } static void nbd_clear_que(struct nbd_device *nbd) diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index 24ca85a70fd8..c27cccec368b 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -281,7 +281,7 @@ static inline void null_handle_cmd(struct nullb_cmd *cmd) case NULL_IRQ_SOFTIRQ: switch (queue_mode) { case NULL_Q_MQ: - blk_mq_complete_request(cmd->rq, 0);
[PATCH 15/23] xen-blkfront: don't use req->errors
xen-blkfron is the last users using rq->errros for passing back error to blk-mq, and I'd like to get rid of that. In the longer run the driver should be moving more of the completion processing into .complete, but this is the minimal change to move forward for now. Signed-off-by: Christoph Hellwig Acked-by: Roger Pau Monné Reviewed-by: Konrad Rzeszutek Wilk --- drivers/block/xen-blkfront.c | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index abed296ce605..57866355c060 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -115,6 +115,15 @@ struct split_bio { atomic_t pending; }; +struct blkif_req { + int error; +}; + +static inline struct blkif_req *blkif_req(struct request *rq) +{ + return blk_mq_rq_to_pdu(rq); +} + static DEFINE_MUTEX(blkfront_mutex); static const struct block_device_operations xlvbd_block_fops; @@ -907,8 +916,14 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx, return BLK_MQ_RQ_QUEUE_BUSY; } +static void blkif_complete_rq(struct request *rq) +{ + blk_mq_end_request(rq, blkif_req(rq)->error); +} + static const struct blk_mq_ops blkfront_mq_ops = { .queue_rq = blkif_queue_rq, + .complete = blkif_complete_rq, }; static void blkif_set_queue_limits(struct blkfront_info *info) @@ -969,7 +984,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size, info->tag_set.queue_depth = BLK_RING_SIZE(info); info->tag_set.numa_node = NUMA_NO_NODE; info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; - info->tag_set.cmd_size = 0; + info->tag_set.cmd_size = sizeof(struct blkif_req); info->tag_set.driver_data = info; if (blk_mq_alloc_tag_set(&info->tag_set)) @@ -1543,7 +1558,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) unsigned long flags; struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id; struct blkfront_info *info = rinfo->dev_info; - int error; if (unlikely(info->connected != BLKIF_STATE_CONNECTED)) return IRQ_HANDLED; @@ -1587,37 +1601,36 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) continue; } - error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; + blkif_req(req)->error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; switch (bret->operation) { case BLKIF_OP_DISCARD: if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { struct request_queue *rq = info->rq; printk(KERN_WARNING "blkfront: %s: %s op failed\n", info->gd->disk_name, op_name(bret->operation)); - error = -EOPNOTSUPP; + blkif_req(req)->error = -EOPNOTSUPP; info->feature_discard = 0; info->feature_secdiscard = 0; queue_flag_clear(QUEUE_FLAG_DISCARD, rq); queue_flag_clear(QUEUE_FLAG_SECERASE, rq); } - blk_mq_complete_request(req, error); break; case BLKIF_OP_FLUSH_DISKCACHE: case BLKIF_OP_WRITE_BARRIER: if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { printk(KERN_WARNING "blkfront: %s: %s op failed\n", info->gd->disk_name, op_name(bret->operation)); - error = -EOPNOTSUPP; + blkif_req(req)->error = -EOPNOTSUPP; } if (unlikely(bret->status == BLKIF_RSP_ERROR && rinfo->shadow[id].req.u.rw.nr_segments == 0)) { printk(KERN_WARNING "blkfront: %s: empty %s op failed\n", info->gd->disk_name, op_name(bret->operation)); - error = -EOPNOTSUPP; + blkif_req(req)->error = -EOPNOTSUPP; } - if (unlikely(error)) { - if (error == -EOPNOTSUPP) - error = 0; + if (unlikely(blkif_req(req)->error)) { + if (blkif_req(req)->error == -EOPNOTSUPP) + blkif_req(req)->error = 0; info->feature_fua = 0; info->feature_flush = 0; xlvbd_flush(info); @@ -1629,11 +1642,12 @@ stat
[PATCH 22/23] blktrace: remove the unused block_rq_abort tracepoint
Signed-off-by: Christoph Hellwig --- include/trace/events/block.h | 44 ++-- kernel/trace/blktrace.c | 9 - 2 files changed, 10 insertions(+), 43 deletions(-) diff --git a/include/trace/events/block.h b/include/trace/events/block.h index a88ed13446ff..99ed69fad041 100644 --- a/include/trace/events/block.h +++ b/include/trace/events/block.h @@ -61,7 +61,16 @@ DEFINE_EVENT(block_buffer, block_dirty_buffer, TP_ARGS(bh) ); -DECLARE_EVENT_CLASS(block_rq_with_error, +/** + * block_rq_requeue - place block IO request back on a queue + * @q: queue holding operation + * @rq: block IO operation request + * + * The block operation request @rq is being placed back into queue + * @q. For some reason the request was not completed and needs to be + * put back in the queue. + */ +TRACE_EVENT(block_rq_requeue, TP_PROTO(struct request_queue *q, struct request *rq), @@ -94,39 +103,6 @@ DECLARE_EVENT_CLASS(block_rq_with_error, ); /** - * block_rq_abort - abort block operation request - * @q: queue containing the block operation request - * @rq: block IO operation request - * - * Called immediately after pending block IO operation request @rq in - * queue @q is aborted. The fields in the operation request @rq - * can be examined to determine which device and sectors the pending - * operation would access. - */ -DEFINE_EVENT(block_rq_with_error, block_rq_abort, - - TP_PROTO(struct request_queue *q, struct request *rq), - - TP_ARGS(q, rq) -); - -/** - * block_rq_requeue - place block IO request back on a queue - * @q: queue holding operation - * @rq: block IO operation request - * - * The block operation request @rq is being placed back into queue - * @q. For some reason the request was not completed and needs to be - * put back in the queue. - */ -DEFINE_EVENT(block_rq_with_error, block_rq_requeue, - - TP_PROTO(struct request_queue *q, struct request *rq), - - TP_ARGS(q, rq) -); - -/** * block_rq_complete - block IO operation completed by device driver * @q: queue containing the block operation request * @rq: block operations request diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c index b2058a7f94bd..9f3624dadb09 100644 --- a/kernel/trace/blktrace.c +++ b/kernel/trace/blktrace.c @@ -716,12 +716,6 @@ static void blk_add_trace_rq(struct request_queue *q, struct request *rq, rq->cmd_flags, what, rq->errors, 0, NULL); } -static void blk_add_trace_rq_abort(void *ignore, - struct request_queue *q, struct request *rq) -{ - blk_add_trace_rq(q, rq, blk_rq_bytes(rq), BLK_TA_ABORT); -} - static void blk_add_trace_rq_insert(void *ignore, struct request_queue *q, struct request *rq) { @@ -974,8 +968,6 @@ static void blk_register_tracepoints(void) { int ret; - ret = register_trace_block_rq_abort(blk_add_trace_rq_abort, NULL); - WARN_ON(ret); ret = register_trace_block_rq_insert(blk_add_trace_rq_insert, NULL); WARN_ON(ret); ret = register_trace_block_rq_issue(blk_add_trace_rq_issue, NULL); @@ -1028,7 +1020,6 @@ static void blk_unregister_tracepoints(void) unregister_trace_block_rq_requeue(blk_add_trace_rq_requeue, NULL); unregister_trace_block_rq_issue(blk_add_trace_rq_issue, NULL); unregister_trace_block_rq_insert(blk_add_trace_rq_insert, NULL); - unregister_trace_block_rq_abort(blk_add_trace_rq_abort, NULL); tracepoint_synchronize_unregister(); } -- 2.11.0
[PATCH 10/23] null_blk: don't pass always-0 req->errors to blk_mq_complete_request
Signed-off-by: Christoph Hellwig --- drivers/block/null_blk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index f93906ff31e8..24ca85a70fd8 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -281,7 +281,7 @@ static inline void null_handle_cmd(struct nullb_cmd *cmd) case NULL_IRQ_SOFTIRQ: switch (queue_mode) { case NULL_Q_MQ: - blk_mq_complete_request(cmd->rq, cmd->rq->errors); + blk_mq_complete_request(cmd->rq, 0); break; case NULL_Q_RQ: blk_complete_request(cmd->rq); -- 2.11.0
[PATCH 14/23] mtip32xx: add a status field to struct mtip_cmd
Instead of using req->errors, which will go away. Signed-off-by: Christoph Hellwig --- drivers/block/mtip32xx/mtip32xx.c | 16 +--- drivers/block/mtip32xx/mtip32xx.h | 1 + 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c index 05e3e664ea1b..7406de29db58 100644 --- a/drivers/block/mtip32xx/mtip32xx.c +++ b/drivers/block/mtip32xx/mtip32xx.c @@ -241,7 +241,8 @@ static void mtip_async_complete(struct mtip_port *port, rq = mtip_rq_from_tag(dd, tag); - blk_mq_complete_request(rq, status); + cmd->status = status; + blk_mq_complete_request(rq, 0); } /* @@ -2910,18 +2911,19 @@ static void mtip_softirq_done_fn(struct request *rq) if (unlikely(cmd->unaligned)) up(&dd->port->cmd_slot_unal); - blk_mq_end_request(rq, rq->errors); + blk_mq_end_request(rq, cmd->status); } static void mtip_abort_cmd(struct request *req, void *data, bool reserved) { + struct mtip_cmd *cmd = blk_mq_rq_to_pdu(req); struct driver_data *dd = data; dbg_printk(MTIP_DRV_NAME " Aborting request, tag = %d\n", req->tag); clear_bit(req->tag, dd->port->cmds_to_issue); - req->errors = -EIO; + cmd->status = -EIO; mtip_softirq_done_fn(req); } @@ -3816,7 +3818,6 @@ static int mtip_queue_rq(struct blk_mq_hw_ctx *hctx, if (likely(!ret)) return BLK_MQ_RQ_QUEUE_OK; - rq->errors = ret; return BLK_MQ_RQ_QUEUE_ERROR; } @@ -4106,9 +4107,10 @@ static void mtip_no_dev_cleanup(struct request *rq, void *data, bool reserv) struct driver_data *dd = (struct driver_data *)data; struct mtip_cmd *cmd; - if (likely(!reserv)) - blk_mq_complete_request(rq, -ENODEV); - else if (test_bit(MTIP_PF_IC_ACTIVE_BIT, &dd->port->flags)) { + if (likely(!reserv)) { + cmd->status = -ENODEV; + blk_mq_complete_request(rq, 0); + } else if (test_bit(MTIP_PF_IC_ACTIVE_BIT, &dd->port->flags)) { cmd = mtip_cmd_from_tag(dd, MTIP_TAG_INTERNAL); if (cmd->comp_func) diff --git a/drivers/block/mtip32xx/mtip32xx.h b/drivers/block/mtip32xx/mtip32xx.h index 7617888f7944..57b41528a824 100644 --- a/drivers/block/mtip32xx/mtip32xx.h +++ b/drivers/block/mtip32xx/mtip32xx.h @@ -352,6 +352,7 @@ struct mtip_cmd { int retries; /* The number of retries left for this command. */ int direction; /* Data transfer direction */ + int status; }; /* Structure used to describe a port. */ -- 2.11.0
[PATCH 07/23] virtio_blk: don't use req->errors
Remove passing req->errors (which at that point is always 0) to blk_mq_complete_request, and rely on the virtio status code for the serial number passthrough request. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn --- drivers/block/virtio_blk.c | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index dbc4e80680b1..8378ad480f77 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -175,19 +175,15 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr, static inline void virtblk_request_done(struct request *req) { struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); - int error = virtblk_result(vbr); switch (req_op(req)) { case REQ_OP_SCSI_IN: case REQ_OP_SCSI_OUT: virtblk_scsi_request_done(req); break; - case REQ_OP_DRV_IN: - req->errors = (error != 0); - break; } - blk_mq_end_request(req, error); + blk_mq_end_request(req, virtblk_result(vbr)); } static void virtblk_done(struct virtqueue *vq) @@ -205,7 +201,7 @@ static void virtblk_done(struct virtqueue *vq) while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) { struct request *req = blk_mq_rq_from_pdu(vbr); - blk_mq_complete_request(req, req->errors); + blk_mq_complete_request(req, 0); req_done = true; } if (unlikely(virtqueue_is_broken(vq))) @@ -311,7 +307,7 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str) goto out; blk_execute_rq(vblk->disk->queue, vblk->disk, req, false); - err = req->errors ? -EIO : 0; + err = virtblk_result(blk_mq_rq_to_pdu(req)); out: blk_put_request(req); return err; -- 2.11.0
[PATCH 13/23] nbd: don't use req->errors
Add a nbd-specific field instead. Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik --- drivers/block/nbd.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 6e592c2ba8fd..09a74a66beb1 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -115,6 +115,7 @@ struct nbd_cmd { int index; int cookie; struct completion send_complete; + int status; }; #if IS_ENABLED(CONFIG_DEBUG_FS) @@ -244,16 +245,14 @@ static void nbd_size_set(struct nbd_device *nbd, loff_t blocksize, nbd_size_update(nbd); } -static void nbd_end_request(struct nbd_cmd *cmd) +static void nbd_complete_rq(struct request *req) { - struct nbd_device *nbd = cmd->nbd; - struct request *req = blk_mq_rq_from_pdu(cmd); - int error = req->errors ? -EIO : 0; + struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req); - dev_dbg(nbd_to_dev(nbd), "request %p: %s\n", cmd, - error ? "failed" : "done"); + dev_dbg(nbd_to_dev(cmd->nbd), "request %p: %s\n", cmd, + cmd->status ? "failed" : "done"); - blk_mq_complete_request(req, error); + blk_mq_end_request(req, cmd->status); } /* @@ -286,7 +285,7 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req, struct nbd_config *config; if (!refcount_inc_not_zero(&nbd->config_refs)) { - req->errors = -EIO; + cmd->status = -EIO; return BLK_EH_HANDLED; } @@ -331,7 +330,7 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct request *req, "Connection timed out\n"); } set_bit(NBD_TIMEDOUT, &config->runtime_flags); - req->errors = -EIO; + cmd->status = -EIO; sock_shutdown(nbd); nbd_config_put(nbd); @@ -574,7 +573,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) if (ntohl(reply.error)) { dev_err(disk_to_dev(nbd->disk), "Other side returned error (%d)\n", ntohl(reply.error)); - req->errors = -EIO; + cmd->status = -EIO; return cmd; } @@ -599,7 +598,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device *nbd, int index) */ if (nbd_disconnected(config) || config->num_connections <= 1) { - req->errors = -EIO; + cmd->status = -EIO; return cmd; } return ERR_PTR(-EIO); @@ -636,7 +635,7 @@ static void recv_work(struct work_struct *work) break; } - nbd_end_request(cmd); + blk_mq_complete_request(blk_mq_rq_from_pdu(cmd), 0); } atomic_dec(&config->recv_threads); wake_up(&config->recv_wq); @@ -651,8 +650,8 @@ static void nbd_clear_req(struct request *req, void *data, bool reserved) if (!blk_mq_request_started(req)) return; cmd = blk_mq_rq_to_pdu(req); - req->errors = -EIO; - nbd_end_request(cmd); + cmd->status = -EIO; + blk_mq_complete_request(req, 0); } static void nbd_clear_que(struct nbd_device *nbd) @@ -740,7 +739,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index) nbd_config_put(nbd); return -EINVAL; } - req->errors = 0; + cmd->status = 0; again: nsock = config->socks[index]; mutex_lock(&nsock->tx_lock); @@ -1408,6 +1407,7 @@ static int nbd_init_request(void *data, struct request *rq, static const struct blk_mq_ops nbd_mq_ops = { .queue_rq = nbd_queue_rq, + .complete = nbd_complete_rq, .init_request = nbd_init_request, .timeout= nbd_xmit_timeout, }; -- 2.11.0
[PATCH 11/23] dm rq: don't pass irrelevant error code to blk_mq_complete_request
dm never uses rq->errors, so there is no need to pass an error argument to blk_mq_complete_request. Signed-off-by: Christoph Hellwig --- drivers/md/dm-rq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c index e60f1b6845be..1173be21f6f6 100644 --- a/drivers/md/dm-rq.c +++ b/drivers/md/dm-rq.c @@ -363,7 +363,7 @@ static void dm_complete_request(struct request *rq, int error) if (!rq->q->mq_ops) blk_complete_request(rq); else - blk_mq_complete_request(rq, error); + blk_mq_complete_request(rq, 0); } /* -- 2.11.0
[PATCH 08/23] scsi: introduce a result field in struct scsi_request
This passes on the scsi_cmnd result field to users of passthrough requests. Currently we abuse req->errors for this purpose, but that field will go away in its current form. Note that the old IDE code abuses the errors field in very creative ways and stores all kinds of different values in it. I didn't dare to touch this magic, so the abuses are brought forward 1:1. Signed-off-by: Christoph Hellwig --- block/bsg-lib.c| 8 block/bsg.c| 12 +-- block/scsi_ioctl.c | 14 ++--- drivers/block/cciss.c | 42 +++--- drivers/block/pktcdvd.c| 2 +- drivers/block/virtio_blk.c | 2 +- drivers/cdrom/cdrom.c | 2 +- drivers/ide/ide-atapi.c| 10 - drivers/ide/ide-cd.c | 20 +- drivers/ide/ide-cd_ioctl.c | 2 +- drivers/ide/ide-devsets.c | 4 ++-- drivers/ide/ide-dma.c | 2 +- drivers/ide/ide-eh.c | 36 drivers/ide/ide-floppy.c | 10 - drivers/ide/ide-io.c | 10 - drivers/ide/ide-ioctls.c | 4 ++-- drivers/ide/ide-park.c | 2 +- drivers/ide/ide-pm.c | 8 drivers/ide/ide-tape.c | 4 ++-- drivers/ide/ide-taskfile.c | 6 +++--- drivers/scsi/osd/osd_initiator.c | 2 +- drivers/scsi/osst.c| 2 +- drivers/scsi/qla2xxx/qla_bsg.c | 6 +++--- drivers/scsi/scsi_lib.c| 15 +++--- drivers/scsi/scsi_transport_sas.c | 2 +- drivers/scsi/sg.c | 2 +- drivers/scsi/st.c | 6 +++--- drivers/target/target_core_pscsi.c | 2 +- fs/nfsd/blocklayout.c | 4 ++-- include/linux/ide.h| 2 +- include/scsi/scsi_request.h| 1 + 31 files changed, 122 insertions(+), 122 deletions(-) diff --git a/block/bsg-lib.c b/block/bsg-lib.c index cd15f9dbb147..0a23dbba2d30 100644 --- a/block/bsg-lib.c +++ b/block/bsg-lib.c @@ -37,7 +37,7 @@ static void bsg_destroy_job(struct kref *kref) struct bsg_job *job = container_of(kref, struct bsg_job, kref); struct request *rq = job->req; - blk_end_request_all(rq, rq->errors); + blk_end_request_all(rq, scsi_req(rq)->result); put_device(job->dev); /* release reference for the request */ @@ -74,7 +74,7 @@ void bsg_job_done(struct bsg_job *job, int result, struct scsi_request *rq = scsi_req(req); int err; - err = job->req->errors = result; + err = scsi_req(job->req)->result = result; if (err < 0) /* we're only returning the result field in the reply */ rq->sense_len = sizeof(u32); @@ -177,7 +177,7 @@ static int bsg_create_job(struct device *dev, struct request *req) * @q: request queue to manage * * On error the create_bsg_job function should return a -Exyz error value - * that will be set to the req->errors. + * that will be set to ->result. * * Drivers/subsys should pass this to the queue init function. */ @@ -201,7 +201,7 @@ static void bsg_request_fn(struct request_queue *q) ret = bsg_create_job(dev, req); if (ret) { - req->errors = ret; + scsi_req(req)->result = ret; blk_end_request_all(req, ret); spin_lock_irq(q->queue_lock); continue; diff --git a/block/bsg.c b/block/bsg.c index 74835dbf0c47..d9da1b613ced 100644 --- a/block/bsg.c +++ b/block/bsg.c @@ -391,13 +391,13 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr, struct scsi_request *req = scsi_req(rq); int ret = 0; - dprintk("rq %p bio %p 0x%x\n", rq, bio, rq->errors); + dprintk("rq %p bio %p 0x%x\n", rq, bio, req->result); /* * fill in all the output members */ - hdr->device_status = rq->errors & 0xff; - hdr->transport_status = host_byte(rq->errors); - hdr->driver_status = driver_byte(rq->errors); + hdr->device_status = req->result & 0xff; + hdr->transport_status = host_byte(req->result); + hdr->driver_status = driver_byte(req->result); hdr->info = 0; if (hdr->device_status || hdr->transport_status || hdr->driver_status) hdr->info |= SG_INFO_CHECK; @@ -431,8 +431,8 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr, * just a protocol response (i.e. non negative), that gets * processed above. */ - if (!ret && rq->errors < 0) - ret = rq->errors; + if (!ret && req->result < 0) + ret = req->result; blk_rq_unmap_user(bio); scsi_req_free_cmd(req); diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index b1
[PATCH 09/23] loop: zero-fill bio on the submitting cpu
In thruth I've just audited which blk-mq drivers don't currently have a complete callback, but I think this change is at least borderline useful. Signed-off-by: Christoph Hellwig Reviewed-by: Ming Lei --- drivers/block/loop.c | 30 ++ drivers/block/loop.h | 1 + 2 files changed, 15 insertions(+), 16 deletions(-) diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 3081d83d2ea3..86351b3f7350 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -445,32 +445,27 @@ static int lo_req_flush(struct loop_device *lo, struct request *rq) return ret; } -static inline void handle_partial_read(struct loop_cmd *cmd, long bytes) +static void lo_complete_rq(struct request *rq) { - if (bytes < 0 || op_is_write(req_op(cmd->rq))) - return; + struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq); - if (unlikely(bytes < blk_rq_bytes(cmd->rq))) { + if (unlikely(req_op(cmd->rq) == REQ_OP_READ && cmd->use_aio && +cmd->ret >= 0 && cmd->ret < blk_rq_bytes(cmd->rq))) { struct bio *bio = cmd->rq->bio; - bio_advance(bio, bytes); + bio_advance(bio, cmd->ret); zero_fill_bio(bio); } + + blk_mq_end_request(rq, cmd->ret < 0 ? -EIO : 0); } static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2) { struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb); - struct request *rq = cmd->rq; - - handle_partial_read(cmd, ret); - if (ret > 0) - ret = 0; - else if (ret < 0) - ret = -EIO; - - blk_mq_complete_request(rq, ret); + cmd->ret = ret; + blk_mq_complete_request(cmd->rq, 0); } static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd, @@ -1688,8 +1683,10 @@ static void loop_handle_cmd(struct loop_cmd *cmd) ret = do_req_filebacked(lo, cmd->rq); failed: /* complete non-aio request */ - if (!cmd->use_aio || ret) - blk_mq_complete_request(cmd->rq, ret ? -EIO : 0); + if (!cmd->use_aio || ret) { + cmd->ret = ret ? -EIO : 0; + blk_mq_complete_request(cmd->rq, 0); + } } static void loop_queue_work(struct kthread_work *work) @@ -1715,6 +1712,7 @@ static int loop_init_request(void *data, struct request *rq, static const struct blk_mq_ops loop_mq_ops = { .queue_rq = loop_queue_rq, .init_request = loop_init_request, + .complete = lo_complete_rq, }; static int loop_add(struct loop_device **l, int i) diff --git a/drivers/block/loop.h b/drivers/block/loop.h index fb2237c73e61..fecd3f97ef8c 100644 --- a/drivers/block/loop.h +++ b/drivers/block/loop.h @@ -70,6 +70,7 @@ struct loop_cmd { struct request *rq; struct list_head list; bool use_aio; /* use AIO interface to handle I/O */ + long ret; struct kiocb iocb; }; -- 2.11.0
[PATCH 06/23] virtio: fix spelling of virtblk_scsi_request_done
Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Bart Van Assche --- drivers/block/virtio_blk.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index eaf99022bdc6..dbc4e80680b1 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -111,7 +111,7 @@ static int virtblk_add_req_scsi(struct virtqueue *vq, struct virtblk_req *vbr, return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC); } -static inline void virtblk_scsi_reques_done(struct request *req) +static inline void virtblk_scsi_request_done(struct request *req) { struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); struct virtio_blk *vblk = req->q->queuedata; @@ -144,7 +144,7 @@ static inline int virtblk_add_req_scsi(struct virtqueue *vq, { return -EIO; } -static inline void virtblk_scsi_reques_done(struct request *req) +static inline void virtblk_scsi_request_done(struct request *req) { } #define virtblk_ioctl NULL @@ -180,7 +180,7 @@ static inline void virtblk_request_done(struct request *req) switch (req_op(req)) { case REQ_OP_SCSI_IN: case REQ_OP_SCSI_OUT: - virtblk_scsi_reques_done(req); + virtblk_scsi_request_done(req); break; case REQ_OP_DRV_IN: req->errors = (error != 0); -- 2.11.0
[PATCH 03/23] nvme-fc: fix status code handling in nvme_fc_fcpio_done
nvme_complete_async_event expects the little endian status code including the phase bit, and a new completion handler I plan to introduce will do so as well. Change the status variable into the little endian format with the phase bit used in the NVMe CQE to fix / enable this. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn --- drivers/nvme/host/fc.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index fc42172c796a..aad7f9c0be32 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -1147,7 +1147,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req) struct nvme_fc_ctrl *ctrl = op->ctrl; struct nvme_fc_queue *queue = op->queue; struct nvme_completion *cqe = &op->rsp_iu.cqe; - u16 status = NVME_SC_SUCCESS; + __le16 status = cpu_to_le16(NVME_SC_SUCCESS << 1); /* * WARNING: @@ -1182,9 +1182,9 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req) sizeof(op->rsp_iu), DMA_FROM_DEVICE); if (atomic_read(&op->state) == FCPOP_STATE_ABORTED) - status = NVME_SC_ABORT_REQ | NVME_SC_DNR; + status = cpu_to_le16((NVME_SC_ABORT_REQ | NVME_SC_DNR) << 1); else if (freq->status) - status = NVME_SC_FC_TRANSPORT_ERROR; + status = cpu_to_le16(NVME_SC_FC_TRANSPORT_ERROR << 1); /* * For the linux implementation, if we have an unsuccesful @@ -1212,7 +1212,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req) */ if (freq->transferred_length != be32_to_cpu(op->cmd_iu.data_len)) { - status = NVME_SC_FC_TRANSPORT_ERROR; + status = cpu_to_le16(NVME_SC_FC_TRANSPORT_ERROR << 1); goto done; } op->nreq.result.u64 = 0; @@ -1229,15 +1229,15 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req) freq->transferred_length || op->rsp_iu.status_code || op->rqno != le16_to_cpu(cqe->command_id))) { - status = NVME_SC_FC_TRANSPORT_ERROR; + status = cpu_to_le16(NVME_SC_FC_TRANSPORT_ERROR << 1); goto done; } op->nreq.result = cqe->result; - status = le16_to_cpu(cqe->status) >> 1; + status = cqe->status; break; default: - status = NVME_SC_FC_TRANSPORT_ERROR; + status = cpu_to_le16(NVME_SC_FC_TRANSPORT_ERROR << 1); goto done; } @@ -1249,7 +1249,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req) return; } - blk_mq_complete_request(rq, status); + blk_mq_complete_request(rq, le16_to_cpu(status) >> 1); } static int -- 2.11.0
[PATCH 05/23] nvme: make nvme_error_status private
Currently it's used by the lighnvm passthrough ioctl, but we'd like to make it private in preparation of block layer specific error code. Lighnvm already returns the real NVMe status anyway, so I think we can just limit it to returning -EIO for any status set. This will need a careful audit from the lightnvm folks, though. Signed-off-by: Christoph Hellwig --- drivers/nvme/host/core.c | 3 +-- drivers/nvme/host/lightnvm.c | 6 +++--- drivers/nvme/host/nvme.h | 1 - 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index c6f256d74b6b..805f250315ec 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -66,7 +66,7 @@ static DEFINE_SPINLOCK(dev_list_lock); static struct class *nvme_class; -int nvme_error_status(struct request *req) +static int nvme_error_status(struct request *req) { switch (nvme_req(req)->status & 0x7ff) { case NVME_SC_SUCCESS: @@ -77,7 +77,6 @@ int nvme_error_status(struct request *req) return -EIO; } } -EXPORT_SYMBOL_GPL(nvme_error_status); static inline bool nvme_req_needs_retry(struct request *req) { diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index f85e6e57d641..7f407d5f0095 100644 --- a/drivers/nvme/host/lightnvm.c +++ b/drivers/nvme/host/lightnvm.c @@ -599,7 +599,7 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q, __le64 *metadata = NULL; dma_addr_t metadata_dma; DECLARE_COMPLETION_ONSTACK(wait); - int ret; + int ret = 0; rq = nvme_alloc_request(q, (struct nvme_command *)vcmd, 0, NVME_QID_ANY); @@ -671,8 +671,8 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q, if (nvme_req(rq)->flags & NVME_REQ_CANCELLED) ret = -EINTR; - else - ret = nvme_error_status(rq); + else if (nvme_req(rq)->status & 0x7ff) + ret = -EIO; if (result) *result = nvme_req(rq)->status & 0x7ff; if (status) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index d7330f75632d..550037f5efea 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -254,7 +254,6 @@ static inline void nvme_end_request(struct request *req, __le16 status, blk_mq_complete_request(req, 0); } -int nvme_error_status(struct request *req); void nvme_complete_rq(struct request *req); void nvme_cancel_request(struct request *req, void *data, bool reserved); bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl, -- 2.11.0
[PATCH 04/23] nvme: split nvme status from block req->errors
We want our own clearly defined error field for NVMe passthrough commands, and the request errors field is going away in its current form. Just store the status and result field in the nvme_request field from hardirq completion context (using a new helper) and then generate a Linux errno for the block layer only when we actually need it. Because we can't overload the status value with a negative error code for cancelled command we now have a flags filed in struct nvme_request that contains a bit for this condition. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn --- drivers/nvme/host/core.c | 50 +++- drivers/nvme/host/fc.c | 10 - drivers/nvme/host/lightnvm.c | 9 +--- drivers/nvme/host/nvme.h | 33 + drivers/nvme/host/pci.c | 11 +- drivers/nvme/host/rdma.c | 5 ++--- drivers/nvme/target/loop.c | 7 ++- 7 files changed, 65 insertions(+), 60 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index d33f829c3ab7..c6f256d74b6b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -66,11 +66,24 @@ static DEFINE_SPINLOCK(dev_list_lock); static struct class *nvme_class; +int nvme_error_status(struct request *req) +{ + switch (nvme_req(req)->status & 0x7ff) { + case NVME_SC_SUCCESS: + return 0; + case NVME_SC_CAP_EXCEEDED: + return -ENOSPC; + default: + return -EIO; + } +} +EXPORT_SYMBOL_GPL(nvme_error_status); + static inline bool nvme_req_needs_retry(struct request *req) { if (blk_noretry_request(req)) return false; - if (req->errors & NVME_SC_DNR) + if (nvme_req(req)->status & NVME_SC_DNR) return false; if (jiffies - req->start_time >= req->timeout) return false; @@ -81,23 +94,13 @@ static inline bool nvme_req_needs_retry(struct request *req) void nvme_complete_rq(struct request *req) { - int error = 0; - - if (unlikely(req->errors)) { - if (nvme_req_needs_retry(req)) { - nvme_req(req)->retries++; - blk_mq_requeue_request(req, - !blk_mq_queue_stopped(req->q)); - return; - } - - if (blk_rq_is_passthrough(req)) - error = req->errors; - else - error = nvme_error_status(req->errors); + if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) { + nvme_req(req)->retries++; + blk_mq_requeue_request(req, !blk_mq_queue_stopped(req->q)); + return; } - blk_mq_end_request(req, error); + blk_mq_end_request(req, nvme_error_status(req)); } EXPORT_SYMBOL_GPL(nvme_complete_rq); @@ -114,7 +117,9 @@ void nvme_cancel_request(struct request *req, void *data, bool reserved) status = NVME_SC_ABORT_REQ; if (blk_queue_dying(req->q)) status |= NVME_SC_DNR; - blk_mq_complete_request(req, status); + nvme_req(req)->status = status; + blk_mq_complete_request(req, 0); + } EXPORT_SYMBOL_GPL(nvme_cancel_request); @@ -357,6 +362,7 @@ int nvme_setup_cmd(struct nvme_ns *ns, struct request *req, if (!(req->rq_flags & RQF_DONTPREP)) { nvme_req(req)->retries = 0; + nvme_req(req)->flags = 0; req->rq_flags |= RQF_DONTPREP; } @@ -413,7 +419,10 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd, blk_execute_rq(req->q, NULL, req, at_head); if (result) *result = nvme_req(req)->result; - ret = req->errors; + if (nvme_req(req)->flags & NVME_REQ_CANCELLED) + ret = -EINTR; + else + ret = nvme_req(req)->status; out: blk_mq_free_request(req); return ret; @@ -498,7 +507,10 @@ int __nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd, } submit: blk_execute_rq(req->q, disk, req, 0); - ret = req->errors; + if (nvme_req(req)->flags & NVME_REQ_CANCELLED) + ret = -EINTR; + else + ret = nvme_req(req)->status; if (result) *result = le32_to_cpu(nvme_req(req)->result.u32); if (meta && !ret && !write) { diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index aad7f9c0be32..450733c8cd24 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -1148,6 +1148,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req) struct nvme_fc_queue *queue = op->queue; struct nvme_completion *cqe = &op->rsp_iu.cqe; __le16 status = cpu_to_le16(NVME_SC_SUCCESS << 1); + union nvme_result result; /* * WARNING: @@ -1215,7 +1216,7 @@ nvme_f
[PATCH 02/23] block: remove the blk_execute_rq return value
The function only returns -EIO if rq->errors is non-zero, which is not very useful and lets a large number of callers ignore the return value. Just let the callers figure out their error themselves. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn --- block/blk-exec.c | 8 +--- block/scsi_ioctl.c | 3 ++- drivers/block/virtio_blk.c | 3 ++- drivers/cdrom/cdrom.c| 3 ++- drivers/ide/ide-atapi.c | 3 ++- drivers/ide/ide-cd.c | 3 ++- drivers/ide/ide-cd_ioctl.c | 3 ++- drivers/ide/ide-devsets.c| 4 ++-- drivers/ide/ide-disk.c | 3 +-- drivers/ide/ide-ioctls.c | 7 --- drivers/ide/ide-park.c | 3 ++- drivers/ide/ide-pm.c | 3 ++- drivers/ide/ide-taskfile.c | 4 ++-- drivers/scsi/osd/osd_initiator.c | 5 - fs/nfsd/blocklayout.c| 5 +++-- include/linux/blkdev.h | 2 +- 16 files changed, 34 insertions(+), 28 deletions(-) diff --git a/block/blk-exec.c b/block/blk-exec.c index 8cd0e9bc8dc8..afa383248c7c 100644 --- a/block/blk-exec.c +++ b/block/blk-exec.c @@ -92,11 +92,10 @@ EXPORT_SYMBOL_GPL(blk_execute_rq_nowait); *Insert a fully prepared request at the back of the I/O scheduler queue *for execution and wait for completion. */ -int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk, +void blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk, struct request *rq, int at_head) { DECLARE_COMPLETION_ONSTACK(wait); - int err = 0; unsigned long hang_check; rq->end_io_data = &wait; @@ -108,10 +107,5 @@ int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk, while (!wait_for_completion_io_timeout(&wait, hang_check * (HZ/2))); else wait_for_completion_io(&wait); - - if (rq->errors) - err = -EIO; - - return err; } EXPORT_SYMBOL(blk_execute_rq); diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c index 82a43bb19967..b1352143f12f 100644 --- a/block/scsi_ioctl.c +++ b/block/scsi_ioctl.c @@ -547,7 +547,8 @@ static int __blk_send_generic(struct request_queue *q, struct gendisk *bd_disk, scsi_req(rq)->cmd[0] = cmd; scsi_req(rq)->cmd[4] = data; scsi_req(rq)->cmd_len = 6; - err = blk_execute_rq(q, bd_disk, rq, 0); + blk_execute_rq(q, bd_disk, rq, 0); + err = rq->errors ? -EIO : 0; blk_put_request(rq); return err; diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 2d8290169271..eaf99022bdc6 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -310,7 +310,8 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str) if (err) goto out; - err = blk_execute_rq(vblk->disk->queue, vblk->disk, req, false); + blk_execute_rq(vblk->disk->queue, vblk->disk, req, false); + err = req->errors ? -EIO : 0; out: blk_put_request(req); return err; diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c index 87739649eac2..308501730ab3 100644 --- a/drivers/cdrom/cdrom.c +++ b/drivers/cdrom/cdrom.c @@ -2218,7 +2218,8 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info *cdi, __u8 __user *ubuf, rq->timeout = 60 * HZ; bio = rq->bio; - if (blk_execute_rq(q, cdi->disk, rq, 0)) { + blk_execute_rq(q, cdi->disk, rq, 0); + if (rq->errors) { struct request_sense *s = req->sense; ret = -EIO; cdi->last_sense = s->sense_key; diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c index feb30061123b..1524797e1776 100644 --- a/drivers/ide/ide-atapi.c +++ b/drivers/ide/ide-atapi.c @@ -107,7 +107,8 @@ int ide_queue_pc_tail(ide_drive_t *drive, struct gendisk *disk, memcpy(scsi_req(rq)->cmd, pc->c, 12); if (drive->media == ide_tape) scsi_req(rq)->cmd[13] = REQ_IDETAPE_PC1; - error = blk_execute_rq(drive->queue, disk, rq, 0); + blk_execute_rq(drive->queue, disk, rq, 0); + error = rq->errors ? -EIO : 0; put_req: blk_put_request(rq); return error; diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 74f1b7dc03f7..95c40afa9120 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -452,7 +452,8 @@ int ide_cd_queue_pc(ide_drive_t *drive, const unsigned char *cmd, } } - error = blk_execute_rq(drive->queue, info->disk, rq, 0); + blk_execute_rq(drive->queue, info->disk, rq, 0); + error = rq->errors ? -EIO : 0; if (buffer) *bufflen = scsi_req(rq)->resid_len; diff --git a/drivers/ide/ide-cd_ioctl.c b/drivers/ide/ide-cd_ioctl.c index 9fcefbc8425e..f1ab726bd430 100644 --- a/drivers/ide/
[PATCH 01/23] pd: don't check blk_execute_rq return value.
The driver never sets req->errors, so blk_execute_rq will always return 0. Signed-off-by: Christoph Hellwig --- drivers/block/paride/pd.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/block/paride/pd.c b/drivers/block/paride/pd.c index b05e151c9b38..7d2402f90978 100644 --- a/drivers/block/paride/pd.c +++ b/drivers/block/paride/pd.c @@ -739,18 +739,15 @@ static int pd_special_command(struct pd_unit *disk, enum action (*func)(struct pd_unit *disk)) { struct request *rq; - int err = 0; rq = blk_get_request(disk->gd->queue, REQ_OP_DRV_IN, __GFP_RECLAIM); if (IS_ERR(rq)) return PTR_ERR(rq); rq->special = func; - - err = blk_execute_rq(disk->gd->queue, disk->gd, rq, 0); - + blk_execute_rq(disk->gd->queue, disk->gd, rq, 0); blk_put_request(rq); - return err; + return 0; } /* kernel glue structures */ -- 2.11.0
kill req->errors V3
Currently the request structure has an errors field that is used in various different ways. The oldest drivers use it as an error count, blk-mq and the generic timeout code assume that it holds a Linux errno for block completions, and various drivers use it for internal status values, often overwriting them with Linux errnos later, that is unless they are processing passthrough requests in which case they'll leave their errors in it. This series kills the ->errors field and replaced it with new fields in the drivers (or an odd hack in a union in struct request for two bitrotting old drivers). Also available in the following git tree: git://git.infradead.org/users/hch/block.git request-errors Gitweb: http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/request-errors Changes since V2; - reorder one patch to be earlier in the series - fix the argument to a bsg dprintk - fix a kerneldoc comment Changes since V1: - rebased on top the latest block for-next tree - fix error handling in nfsd blocklayout - dropped "scsi: fix fast-fail for non-passthrough requests"
kill req->errors V3
Currently the request structure has an errors field that is used in various different ways. The oldest drivers use it as an error count, blk-mq and the generic timeout code assume that it holds a Linux errno for block completions, and various drivers use it for internal status values, often overwriting them with Linux errnos later, that is unless they are processing passthrough requests in which case they'll leave their errors in it. This series kills the ->errors field and replaced it with new fields in the drivers (or an odd hack in a union in struct request for two bitrotting old drivers). Also available in the following git tree: git://git.infradead.org/users/hch/block.git request-errors Gitweb: http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/request-errors Changes since V2; - reorder one patch to be earlier in the series - fix the argument to a bsg dprintk - fix a kerneldoc comment Changes since V1: - rebased on top the latest block for-next tree - fix error handling in nfsd blocklayout - dropped "scsi: fix fast-fail for non-passthrough requests"
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On 19/04/17 12:32 PM, Jason Gunthorpe wrote: > On Wed, Apr 19, 2017 at 12:01:39PM -0600, Logan Gunthorpe wrote: > Not entirely, it would have to call through the whole process > including the arch_p2p_cross_segment().. Hmm, yes. Though it's still not clear what, if anything, arch_p2p_cross_segment would be doing. In my experience, if you are going between host bridges, the CPU address (or PCI address -- I'm not sure which seeing they are the same on my system) would still work fine -- it just _may_ be a bad idea because of performance. Similarly if you are crossing via a QPI bus or similar, I'd expect the CPU address to work fine. But here the performance is even less likely to be any good. > // Try the arch specific helper >const struct dma_map_ops *comp_ops = get_dma_ops(completer); >const struct dma_map_ops *init_ops = get_dma_ops(initiator); So, in this case, what device does the completer point to? The PCI device or a more specific GPU device? If it's the former, who's responsible for setting the new dma_ops? Typically the dma_ops are arch specific but now you'd be adding ones that are tied to hmm or the gpu. >> I'm not sure I like the name pci_p2p_same_segment. It reads as though >> it's only checking if the devices are not the same segment. > > Well, that is exactly what it is doing. If it succeeds then the caller > knows the DMA will not flow outside the segment and no iommu setup/etc > is required. It appears to me like it's calculating the DMA address, and the check is just a side requirement. It reads as though it's only doing the check. Logan
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On Wed, Apr 19, 2017 at 11:41 AM, Logan Gunthorpe wrote: > > > On 19/04/17 12:30 PM, Dan Williams wrote: >> Letting others users do the container_of() arrangement means that >> struct page_map needs to become public and move into struct >> dev_pagemap directly. > > Ah, yes, I got a bit turned around by that and failed to notice that > page_map and dev_pagemap are different. Why is it that dev_pagemap > contains pretty much the exact same information as page_map? The only > thing gained that I can see is that the struct resource gains const > protection... > >> ...I think that encapsulation loss is worth it for the gain of clearly >> separating the HMM-case from the base case. > > Agreed. > Yeah, I forgot that dev_pagemap grew those fields, so we don't have any real encapsulation today. So this would just be a pure cleanup to kill struct page_map.
Re: [PATCH v3 3/4] sd: Make synchronize cache upon shutdown asynchronous
On Tue, 2017-04-18 at 16:56 -0700, James Bottomley wrote: > This means the combined 1/3 3/3 patch looks like this: > [ ... ] Hello James, The two attached patches pass my tests. How would you like to proceed with patch 1/2? Would you like to submit it yourself or is it OK for you if I mention you as author and add your Signed-off-by? Bart.From c395ce2aaf6d8a644311f4c55dfa6aa560a93240 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Tue, 28 Mar 2017 14:00:17 -0700 Subject: [PATCH 1/2] Introduce scsi_start_queue() This patch does not change any functionality. Signed-off-by: Bart Van Assche Cc: Israel Rukshin Cc: Max Gurtovoy Cc: Hannes Reinecke Cc: Benjamin Block --- drivers/scsi/scsi_lib.c | 25 +++-- drivers/scsi/scsi_priv.h | 1 + 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index eecc005099b2..ffa6e61299a9 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2987,6 +2987,20 @@ scsi_internal_device_block(struct scsi_device *sdev, bool wait) } EXPORT_SYMBOL_GPL(scsi_internal_device_block); +void scsi_start_queue(struct scsi_device *sdev) +{ + struct request_queue *q = sdev->request_queue; + unsigned long flags; + + if (q->mq_ops) { + blk_mq_start_stopped_hw_queues(q, false); + } else { + spin_lock_irqsave(q->queue_lock, flags); + blk_start_queue(q); + spin_unlock_irqrestore(q->queue_lock, flags); + } +} + /** * scsi_internal_device_unblock - resume a device after a block request * @sdev: device to resume @@ -3007,9 +3021,6 @@ int scsi_internal_device_unblock(struct scsi_device *sdev, enum scsi_device_state new_state) { - struct request_queue *q = sdev->request_queue; - unsigned long flags; - /* * Try to transition the scsi device to SDEV_RUNNING or one of the * offlined states and goose the device queue if successful. @@ -3027,13 +3038,7 @@ scsi_internal_device_unblock(struct scsi_device *sdev, sdev->sdev_state != SDEV_OFFLINE) return -EINVAL; - if (q->mq_ops) { - blk_mq_start_stopped_hw_queues(q, false); - } else { - spin_lock_irqsave(q->queue_lock, flags); - blk_start_queue(q); - spin_unlock_irqrestore(q->queue_lock, flags); - } + scsi_start_queue(sdev); return 0; } diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h index f11bd102d6d5..c7629e31a75b 100644 --- a/drivers/scsi/scsi_priv.h +++ b/drivers/scsi/scsi_priv.h @@ -89,6 +89,7 @@ extern void scsi_run_host_queues(struct Scsi_Host *shost); extern void scsi_requeue_run_queue(struct work_struct *work); extern struct request_queue *scsi_alloc_queue(struct scsi_device *sdev); extern struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev); +extern void scsi_start_queue(struct scsi_device *sdev); extern int scsi_mq_setup_tags(struct Scsi_Host *shost); extern void scsi_mq_destroy_tags(struct Scsi_Host *shost); extern int scsi_init_queue(void); -- 2.12.2 From 540cd36d79a46c634368365d3f0f5b8adf4fd687 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Tue, 18 Apr 2017 10:11:02 -0700 Subject: [PATCH 2/2] Make __scsi_remove_device go straight from BLOCKED to DEL If a device is blocked, make __scsi_remove_device() cause it to transition to the DEL state. This means that all the commands issued in .shutdown() will error in the mid-layer, thus making the removal proceed without being stopped. This patch is a slightly modified version of a patch from James Bottomley. Signed-off-by: Bart Van Assche Cc: James Bottomley Cc: Israel Rukshin Cc: Max Gurtovoy Cc: Hannes Reinecke Cc: Benjamin Block --- drivers/scsi/scsi_lib.c | 2 +- drivers/scsi/scsi_sysfs.c | 15 +-- 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index ffa6e61299a9..376cd1da102c 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2611,7 +2611,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state) case SDEV_QUIESCE: case SDEV_OFFLINE: case SDEV_TRANSPORT_OFFLINE: - case SDEV_BLOCK: break; default: goto illegal; @@ -2625,6 +2624,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state) case SDEV_OFFLINE: case SDEV_TRANSPORT_OFFLINE: case SDEV_CANCEL: + case SDEV_BLOCK: case SDEV_CREATED_BLOCK: break; default: diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 82dfe07b1d47..732f1873f2fb 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1282,8 +1282,19 @@ void __scsi_remove_device(struct scsi_device *sdev) return; if (sdev->is_visible) { - if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) - return; + /* + * If blocked, we go straight to DEL and restart the queue so + * any commands issued during driver shutdown (like sync + * cache) are errored immediately. + */ + if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) { + if (scsi_dev
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On 19/04/17 12:30 PM, Dan Williams wrote: > Letting others users do the container_of() arrangement means that > struct page_map needs to become public and move into struct > dev_pagemap directly. Ah, yes, I got a bit turned around by that and failed to notice that page_map and dev_pagemap are different. Why is it that dev_pagemap contains pretty much the exact same information as page_map? The only thing gained that I can see is that the struct resource gains const protection... > ...I think that encapsulation loss is worth it for the gain of clearly > separating the HMM-case from the base case. Agreed. Logan
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On Wed, Apr 19, 2017 at 12:01:39PM -0600, Logan Gunthorpe wrote: > I'm just spit balling here but if HMM wanted to use unaddressable memory > as a DMA target, it could set that function to create a window ine gpu > memory, then call the pci_p2p_same_segment and return the result as the > dma address. Not entirely, it would have to call through the whole process including the arch_p2p_cross_segment().. Maybe we can start down the road of using ops for more iommu steps with something like this as the helper: dma_addr_t dma_map_pa(struct device *initiator, struct page *page, void *data) { struct device *completer = get_p2p_completer(page); dma_addr_t pa; if (IS_ERR(completer)) return SYSTEM_MEMORY; // Or maybe ? return init_ops->dma_map_pa(..); // Try the generic method pa = pci_p2p_same_segment(dev, p2p_src, page); if (pa != ERROR) goto out; // Try the arch specific helper const struct dma_map_ops *comp_ops = get_dma_ops(completer); const struct dma_map_ops *init_ops = get_dma_ops(initiator); /* FUTURE: Let something translate a HMM page into a DMA'ble page, eg by mapping it into a GPU window. Maybe this callback lives in devmap ? */ page = comp_ops->translate_dma_page(completer, page); /* New dma_map_op is the same as arch_p2p_cross_segment in prior version. Any arch specific data needed to program the iommu flows through data */ pa = init_ops->p2p_cross_segment_map(completer, inititator, page, data); out: device_put(completer); return pa; } // map_sg op: for (each sgl) { struct page *page = sg_page(s); struct arch_iommu_data data = {}; // pass through to ops->p2p_cross_segment dma_addr_t pa; pa = dma_map_pa(dev, page, &data) if (pa == ERROR) // fail if (!data.needs_iommu) { // Insert PA directly into the result SGL sg++; continue; } // Else pa & data describe how to setup the iommu } > > dma_addr_t pci_p2p_same_segment(struct device *initator, > > struct device *completer, > > struct page *page) > > I'm not sure I like the name pci_p2p_same_segment. It reads as though > it's only checking if the devices are not the same segment. Well, that is exactly what it is doing. If it succeeds then the caller knows the DMA will not flow outside the segment and no iommu setup/etc is required. That function cannot be expanded to include generic cross-segment traffic, a new function would be needed.. Jason
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On Wed, Apr 19, 2017 at 11:19 AM, Logan Gunthorpe wrote: > > > On 19/04/17 12:11 PM, Logan Gunthorpe wrote: >> >> >> On 19/04/17 11:41 AM, Dan Williams wrote: >>> No, not quite ;-). I still don't think we should require the non-HMM >>> to pass NULL for all the HMM arguments. What I like about Logan's >>> proposal is to have a separate create and register steps dev_pagemap. >>> That way call paths that don't care about HMM specifics can just turn >>> around and register the vanilla dev_pagemap. >> >> Would you necessarily even need a create step? I was thinking more along >> the lines that struct dev_pagemap _could_ just be a member in another >> structure. The caller would set the attributes they needed and pass it >> to devm_memremap. (Similar to how we commonly do things with struct >> device, et al). Potentially, that could also get rid of the need for the >> *data pointer HMM is using to get back the struct hmm_devmem seeing >> container_of could be used instead. > > Also, now that I've thought about it a little more, it _may_ be that > many or all of the hmm specific fields in dev_pagemap could move to a > containing struct too... Yes, that's already how we handle struct page_map, it's an internal implementation detail of devm_memremap_pages(). Letting others users do the container_of() arrangement means that struct page_map needs to become public and move into struct dev_pagemap directly. ...I think that encapsulation loss is worth it for the gain of clearly separating the HMM-case from the base case.
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On 19/04/17 12:11 PM, Logan Gunthorpe wrote: > > > On 19/04/17 11:41 AM, Dan Williams wrote: >> No, not quite ;-). I still don't think we should require the non-HMM >> to pass NULL for all the HMM arguments. What I like about Logan's >> proposal is to have a separate create and register steps dev_pagemap. >> That way call paths that don't care about HMM specifics can just turn >> around and register the vanilla dev_pagemap. > > Would you necessarily even need a create step? I was thinking more along > the lines that struct dev_pagemap _could_ just be a member in another > structure. The caller would set the attributes they needed and pass it > to devm_memremap. (Similar to how we commonly do things with struct > device, et al). Potentially, that could also get rid of the need for the > *data pointer HMM is using to get back the struct hmm_devmem seeing > container_of could be used instead. Also, now that I've thought about it a little more, it _may_ be that many or all of the hmm specific fields in dev_pagemap could move to a containing struct too... Logan
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On 19/04/17 11:41 AM, Dan Williams wrote: > No, not quite ;-). I still don't think we should require the non-HMM > to pass NULL for all the HMM arguments. What I like about Logan's > proposal is to have a separate create and register steps dev_pagemap. > That way call paths that don't care about HMM specifics can just turn > around and register the vanilla dev_pagemap. Would you necessarily even need a create step? I was thinking more along the lines that struct dev_pagemap _could_ just be a member in another structure. The caller would set the attributes they needed and pass it to devm_memremap. (Similar to how we commonly do things with struct device, et al). Potentially, that could also get rid of the need for the *data pointer HMM is using to get back the struct hmm_devmem seeing container_of could be used instead. >> Note i won't make any change now on that front but if it make sense >> i am happy to do it as separate patchset on top of HMM. >> >> Also i don't want p2pmem to be an exclusive or with HMM, we will want >> GPU to do peer to peer DMA and thus HMM ZONE_DEVICE pages to support >> this too. > > Yes, this makes sense I think we really just want to distinguish host > memory or not in terms of the dev_pagemap type. That depends though. If unaddressable memory needs different steps to get to the DMA address (ie if it has to setup a gpu window) then we may still need a way to distinguish the two types of non-host memory. Logan
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On 19/04/17 11:14 AM, Jason Gunthorpe wrote: > I don't see a use for the dma_map function pointer at this point.. Yes, it is kind of like designing for the future. I just find it a little odd calling the pci functions in the iommu. > It doesn't make alot of sense for the completor of the DMA to provide > a mapping op, the mapping process is *path* specific, not specific to > a completer/initiator. I'm just spit balling here but if HMM wanted to use unaddressable memory as a DMA target, it could set that function to create a window ine gpu memory, then call the pci_p2p_same_segment and return the result as the dma address. > dma_addr_t pci_p2p_same_segment(struct device *initator, > struct device *completer, > struct page *page) I'm not sure I like the name pci_p2p_same_segment. It reads as though it's only checking if the devices are not the same segment. It also may be that, in the future, it supports devices on different segments. I'd call it more like pci_p2p_dma_map. > for (each sgl) { Thanks, this code fleshes things out nicely > To me it looks like the code duplication across the iommu stuff comes > from just duplicating the basic iommu algorithm in every driver. Yes, this is true. > To clean that up I think someone would need to hoist the overall sgl > loop and use more ops callbacks eg allocate_iommu_range, > assign_page_to_rage, dealloc_range, etc. This is a problem p2p makes > worse, but isn't directly causing :\ Yup. Logan
Re: blk-mq: remove the error argument to blk_mq_complete_request
On Tue, Apr 18, 2017 at 10:50:18PM +, Bart Van Assche wrote: > On Tue, 2017-04-18 at 08:52 -0700, Christoph Hellwig wrote: > > Now that we always have a ->complete callback we can remove the direct > > call to blk_mq_end_request, as well as the error argument to > > blk_mq_complete_request. > > Hello Christoph, > > Please add a runtime check that issues a warning early if a .complete > callback function is missing. > > Are you sure that all blk-mq drivers have a .complete callback? In the > request-errors branch of your block git tree I found the following: I think the problem is that my above changelog was wrong - we only require the complete_rq method if the driver calls the blk_mq_complete_request function. Both rbd and ubiblock don't in the tree. They probably should, but that's work for a separate series.
Re: [PATCH] scsi: osd_uld: fix mismerge
On Wed, 2017-04-19 at 19:44 +0200, Arnd Bergmann wrote: > Fixes: c02465fa13b6 ("scsi: osd_uld: Check scsi_device_get() return value") > Fixes: ac1ddc584e98 ("scsi: utilize new cdev_device_add helper function") Hello Arnd, Both these patches are fine as far as I can tell. Shouldn't the "Fixes" tag refer to the merge commit instead of to the commits that have been merged? Bart.
[PATCH] scsi: osd_uld: fix mismerge
A mismerge between two branches in linux-next reintroduced a warning that was previously resolved: drivers/scsi/osd/osd_uld.c: In function 'osd_probe': drivers/scsi/osd/osd_uld.c:457:2: error: ignoring return value of 'scsi_device_get', declared with attribute warn_unused_result [-Werror=unused-result] The original fix had more complex error unrolling, but as far as I can tell, this simpler version is now sufficient. Fixes: c02465fa13b6 ("scsi: osd_uld: Check scsi_device_get() return value") Fixes: ac1ddc584e98 ("scsi: utilize new cdev_device_add helper function") Signed-off-by: Arnd Bergmann --- drivers/scsi/osd/osd_uld.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c index 8b9941a5687a..0e56f1eb05dc 100644 --- a/drivers/scsi/osd/osd_uld.c +++ b/drivers/scsi/osd/osd_uld.c @@ -454,7 +454,8 @@ static int osd_probe(struct device *dev) /* hold one more reference to the scsi_device that will get released * in __release, in case a logout is happening while fs is mounted */ - scsi_device_get(scsi_device); + if (scsi_device_get(scsi_device)) + goto err_retract_minor; osd_dev_init(&oud->od, scsi_device); /* allocate a disk and set it up */ -- 2.9.0
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On Wed, Apr 19, 2017 at 10:32 AM, Jerome Glisse wrote: > On Wed, Apr 19, 2017 at 10:01:23AM -0700, Dan Williams wrote: >> On Wed, Apr 19, 2017 at 9:48 AM, Logan Gunthorpe wrote: >> > >> > >> > On 19/04/17 09:55 AM, Jason Gunthorpe wrote: >> >> I was thinking only this one would be supported with a core code >> >> helper.. >> > >> > Pivoting slightly: I was looking at how HMM uses ZONE_DEVICE. They add a >> > type flag to the dev_pagemap structure which would be very useful to us. >> > We could add another MEMORY_DEVICE_P2P type to distinguish p2p pages. >> > Then, potentially, we could add a dma_map callback to the structure >> > (possibly unioned with an hmm field). The dev_ops providers would then >> > just need to do something like this (enclosed in a helper): >> > >> > if (is_zone_device_page(page)) { >> > pgmap = get_dev_pagemap(page_to_pfn(page)); >> > if (!pgmap || pgmap->type != MEMORY_DEVICE_P2P || >> > !pgmap->dma_map) >> > return 0; >> > >> > dma_addr = pgmap->dma_map(dev, pgmap->dev, page); >> > put_dev_pagemap(pgmap); >> > if (!dma_addr) >> > return 0; >> > ... >> > } >> > >> > The pci_enable_p2p_bar function would then just need to call >> > devm_memremap_pages with the dma_map callback set to a function that >> > does the segment check and the offset calculation. >> > >> > Thoughts? >> > >> > @Jerome: my feedback to you would be that your patch assumes all users >> > of devm_memremap_pages are MEMORY_DEVICE_PERSISTENT. It would be more >> > useful if it was generic. My suggestion would be to have the caller >> > allocate the dev_pagemap structure, populate it and pass it into >> > devm_memremap_pages. Given that pretty much everything in that structure >> > are already arguments to that function, I feel like this makes sense. >> > This should also help to unify hmm_devmem_pages_create and >> > devm_memremap_pages which look very similar to each other. >> >> I like that change. Also the types should describe the memory relative >> to its relationship to struct page, not whether it is persistent or >> not. I would consider volatile and persistent memory that is attached >> to the cpu memory controller and i/o coherent as the same type of >> memory. DMA incoherent ranges like P2P and HMM should get their own >> types. > > Dan you asked me to not use devm_memremap_pages() because you didn't > want to have HMM memory in the pgmap_radix, did you change opinion > on that ? :) No, not quite ;-). I still don't think we should require the non-HMM to pass NULL for all the HMM arguments. What I like about Logan's proposal is to have a separate create and register steps dev_pagemap. That way call paths that don't care about HMM specifics can just turn around and register the vanilla dev_pagemap. > Note i won't make any change now on that front but if it make sense > i am happy to do it as separate patchset on top of HMM. > > Also i don't want p2pmem to be an exclusive or with HMM, we will want > GPU to do peer to peer DMA and thus HMM ZONE_DEVICE pages to support > this too. Yes, this makes sense I think we really just want to distinguish host memory or not in terms of the dev_pagemap type.
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On Wed, Apr 19, 2017 at 10:01:23AM -0700, Dan Williams wrote: > On Wed, Apr 19, 2017 at 9:48 AM, Logan Gunthorpe wrote: > > > > > > On 19/04/17 09:55 AM, Jason Gunthorpe wrote: > >> I was thinking only this one would be supported with a core code > >> helper.. > > > > Pivoting slightly: I was looking at how HMM uses ZONE_DEVICE. They add a > > type flag to the dev_pagemap structure which would be very useful to us. > > We could add another MEMORY_DEVICE_P2P type to distinguish p2p pages. > > Then, potentially, we could add a dma_map callback to the structure > > (possibly unioned with an hmm field). The dev_ops providers would then > > just need to do something like this (enclosed in a helper): > > > > if (is_zone_device_page(page)) { > > pgmap = get_dev_pagemap(page_to_pfn(page)); > > if (!pgmap || pgmap->type != MEMORY_DEVICE_P2P || > > !pgmap->dma_map) > > return 0; > > > > dma_addr = pgmap->dma_map(dev, pgmap->dev, page); > > put_dev_pagemap(pgmap); > > if (!dma_addr) > > return 0; > > ... > > } > > > > The pci_enable_p2p_bar function would then just need to call > > devm_memremap_pages with the dma_map callback set to a function that > > does the segment check and the offset calculation. > > > > Thoughts? > > > > @Jerome: my feedback to you would be that your patch assumes all users > > of devm_memremap_pages are MEMORY_DEVICE_PERSISTENT. It would be more > > useful if it was generic. My suggestion would be to have the caller > > allocate the dev_pagemap structure, populate it and pass it into > > devm_memremap_pages. Given that pretty much everything in that structure > > are already arguments to that function, I feel like this makes sense. > > This should also help to unify hmm_devmem_pages_create and > > devm_memremap_pages which look very similar to each other. > > I like that change. Also the types should describe the memory relative > to its relationship to struct page, not whether it is persistent or > not. I would consider volatile and persistent memory that is attached > to the cpu memory controller and i/o coherent as the same type of > memory. DMA incoherent ranges like P2P and HMM should get their own > types. Dan you asked me to not use devm_memremap_pages() because you didn't want to have HMM memory in the pgmap_radix, did you change opinion on that ? :) Note i won't make any change now on that front but if it make sense i am happy to do it as separate patchset on top of HMM. Also i don't want p2pmem to be an exclusive or with HMM, we will want GPU to do peer to peer DMA and thus HMM ZONE_DEVICE pages to support this too. I do believe it is easier to special case in ZONE_DEVICE into existing dma_ops for each architecture. For x86 i think there is only 3 different set of dma_ops to modify. For other arch i guess there is even less. But in all the case i think p2pmem should stay out of memory management business. If some set of device do not have memory management it is better to propose helper to do that as part of the subsystem to which those devices belong. Just wanted to reiterate that point. Cheers, Jérôme
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On Wed, Apr 19, 2017 at 10:48:51AM -0600, Logan Gunthorpe wrote: > The pci_enable_p2p_bar function would then just need to call > devm_memremap_pages with the dma_map callback set to a function that > does the segment check and the offset calculation. I don't see a use for the dma_map function pointer at this point.. It doesn't make alot of sense for the completor of the DMA to provide a mapping op, the mapping process is *path* specific, not specific to a completer/initiator. So, I would suggest more like this: static inline struct device *get_p2p_src(struct page *page) { struct device *res; struct dev_pagemap *pgmap; if (!is_zone_device_page(page)) return NULL; pgmap = get_dev_pagemap(page_to_pfn(page), NULL); if (!pgmap || pgmap->type != MEMORY_DEVICE_P2P) /* For now ZONE_DEVICE memory that is not P2P is assumed to be configured for DMA the same as CPU memory. */ return ERR_PTR(-EINVAL); res = pgmap->dev; device_get(res); put_dev_pagemap(pgmap); return res; } dma_addr_t pci_p2p_same_segment(struct device *initator, struct device *completer, struct page *page) { if (! PCI initiator & completer) return ERROR; if (!same segment initiator & completer) return ERROR; // Translate page directly to the value programmed into the BAR return (Completer's PCI BAR base address) + (offset of page within BAR); } // dma_sg_map for (each sgl) { struct page *page = sg_page(s); struct device *p2p_src = get_p2p_src(page); if (IS_ERR(p2p_src)) // fail dma_sg if (p2p_src) { bool needs_iommu = false; pa = pci_p2p_same_segment(dev, p2p_src, page); if (pa == ERROR) pa = arch_p2p_cross_segment(dev, p2psrc, page, &needs_iommui); device_put(p2p_src); if (pa == ERROR) // fail if (!needs_iommu) { // Insert PA directly into the result SGL sg++; continue; } } else // CPU memory pa = page_to_phys(page); To me it looks like the code duplication across the iommu stuff comes from just duplicating the basic iommu algorithm in every driver. To clean that up I think someone would need to hoist the overall sgl loop and use more ops callbacks eg allocate_iommu_range, assign_page_to_rage, dealloc_range, etc. This is a problem p2p makes worse, but isn't directly causing :\ Jason
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On Wed, Apr 19, 2017 at 9:48 AM, Logan Gunthorpe wrote: > > > On 19/04/17 09:55 AM, Jason Gunthorpe wrote: >> I was thinking only this one would be supported with a core code >> helper.. > > Pivoting slightly: I was looking at how HMM uses ZONE_DEVICE. They add a > type flag to the dev_pagemap structure which would be very useful to us. > We could add another MEMORY_DEVICE_P2P type to distinguish p2p pages. > Then, potentially, we could add a dma_map callback to the structure > (possibly unioned with an hmm field). The dev_ops providers would then > just need to do something like this (enclosed in a helper): > > if (is_zone_device_page(page)) { > pgmap = get_dev_pagemap(page_to_pfn(page)); > if (!pgmap || pgmap->type != MEMORY_DEVICE_P2P || > !pgmap->dma_map) > return 0; > > dma_addr = pgmap->dma_map(dev, pgmap->dev, page); > put_dev_pagemap(pgmap); > if (!dma_addr) > return 0; > ... > } > > The pci_enable_p2p_bar function would then just need to call > devm_memremap_pages with the dma_map callback set to a function that > does the segment check and the offset calculation. > > Thoughts? > > @Jerome: my feedback to you would be that your patch assumes all users > of devm_memremap_pages are MEMORY_DEVICE_PERSISTENT. It would be more > useful if it was generic. My suggestion would be to have the caller > allocate the dev_pagemap structure, populate it and pass it into > devm_memremap_pages. Given that pretty much everything in that structure > are already arguments to that function, I feel like this makes sense. > This should also help to unify hmm_devmem_pages_create and > devm_memremap_pages which look very similar to each other. I like that change. Also the types should describe the memory relative to its relationship to struct page, not whether it is persistent or not. I would consider volatile and persistent memory that is attached to the cpu memory controller and i/o coherent as the same type of memory. DMA incoherent ranges like P2P and HMM should get their own types.
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On 19/04/17 09:55 AM, Jason Gunthorpe wrote: > I was thinking only this one would be supported with a core code > helper.. Pivoting slightly: I was looking at how HMM uses ZONE_DEVICE. They add a type flag to the dev_pagemap structure which would be very useful to us. We could add another MEMORY_DEVICE_P2P type to distinguish p2p pages. Then, potentially, we could add a dma_map callback to the structure (possibly unioned with an hmm field). The dev_ops providers would then just need to do something like this (enclosed in a helper): if (is_zone_device_page(page)) { pgmap = get_dev_pagemap(page_to_pfn(page)); if (!pgmap || pgmap->type != MEMORY_DEVICE_P2P || !pgmap->dma_map) return 0; dma_addr = pgmap->dma_map(dev, pgmap->dev, page); put_dev_pagemap(pgmap); if (!dma_addr) return 0; ... } The pci_enable_p2p_bar function would then just need to call devm_memremap_pages with the dma_map callback set to a function that does the segment check and the offset calculation. Thoughts? @Jerome: my feedback to you would be that your patch assumes all users of devm_memremap_pages are MEMORY_DEVICE_PERSISTENT. It would be more useful if it was generic. My suggestion would be to have the caller allocate the dev_pagemap structure, populate it and pass it into devm_memremap_pages. Given that pretty much everything in that structure are already arguments to that function, I feel like this makes sense. This should also help to unify hmm_devmem_pages_create and devm_memremap_pages which look very similar to each other. Logan
Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
On Wed, Apr 19, 2017 at 11:20:06AM +1000, Benjamin Herrenschmidt wrote: > That helper wouldn't perform the actual iommu mapping. It would simply > return something along the lines of: > > - "use that alternate bus address and don't map in the iommu" I was thinking only this one would be supported with a core code helper.. > - "use that alternate bus address and do map in the iommu" > - "proceed as normal" .. because I don't have an idea how a core code helper could figure out what the platform needs for the above two .. I think many of the iommu platforms will need to construct their own bus address in the iommu, and we already have easy access to the CPU address. Presumably once a transcation passes through the iommu it needs to be using the CPU address for the target bar, otherwise things are going to be ambiguous.. Jason
Re: RFC: drop the T10 OSD code and its users
On Wed, Apr 12 2017, Christoph Hellwig wrote: > The only real user of the T10 OSD protocol, the pNFS object layout > driver never went to the point of having shipping products, and the > other two users (osdblk and exofs) were simple example of it's usage. > > The code has been mostly unmaintained for years and is getting in the > way of block / SCSI changes, so I think it's finally time to drop it. > > These patches are against Jens' block for-next tree as that already I've added 1/4 for now, as that one should obviously go. If we have consensus to kill the rest, I'm all for it. -- Jens Axboe
Re: [PATCH] virtio_scsi: Always try to read VPD pages
On 13/04/2017 15:39, Stefan Hajnoczi wrote: > On Thu, Apr 13, 2017 at 12:13:00PM +1000, David Gibson wrote: >> @@ -705,6 +706,28 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc) >> return virtscsi_tmf(vscsi, cmd); >> } >> >> +static int virtscsi_device_alloc(struct scsi_device *sdevice) >> +{ >> +/* >> + * Passed through SCSI targets (e.g. with qemu's 'scsi-block') >> + * may have transfer limits which come from the host SCSI >> + * controller something on the host side other than the target > > s/controller something/controller or something/ ? Acked-by: Paolo Bonzini with this change. Paolo
Re: [PATCH 26/27] scsi: sd: Separate zeroout and discard command choices
On 05/04/2017 19:21, Christoph Hellwig wrote: > +static ssize_t > +zeroing_mode_store(struct device *dev, struct device_attribute *attr, > +const char *buf, size_t count) > +{ > + struct scsi_disk *sdkp = to_scsi_disk(dev); > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EACCES; > + > + > + if (!strncmp(buf, zeroing_mode[SD_ZERO_WRITE], 20)) > + sdkp->zeroing_mode = SD_ZERO_WRITE; > + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS], 20)) > + sdkp->zeroing_mode = SD_ZERO_WS; > + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS16_UNMAP], 20)) > + sdkp->zeroing_mode = SD_ZERO_WS16_UNMAP; > + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS10_UNMAP], 20)) > + sdkp->zeroing_mode = SD_ZERO_WS10_UNMAP; Should this be conditional on lbprz, lbpws, lbpws10 and max_ws_blocks? Thanks, Paolo
Re: [PATCH 1/4] block: remove the osdblk driver
On 04/12/2017 07:01 PM, Christoph Hellwig wrote: > This was just a proof of concept user for the SCSI OSD library, and > never had any real users. > > Signed-off-by: Christoph Hellwig Yes please remove this driver ACK-by Boaz Harrosh > --- > drivers/block/Kconfig | 16 -- > drivers/block/Makefile | 1 - > drivers/block/osdblk.c | 693 > - > 3 files changed, 710 deletions(-) > delete mode 100644 drivers/block/osdblk.c > > diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig > index a1c2e816128f..58e24c354933 100644 > --- a/drivers/block/Kconfig > +++ b/drivers/block/Kconfig > @@ -312,22 +312,6 @@ config BLK_DEV_SKD > > Use device /dev/skd$N amd /dev/skd$Np$M. > > -config BLK_DEV_OSD > - tristate "OSD object-as-blkdev support" > - depends on SCSI_OSD_ULD > - ---help--- > - Saying Y or M here will allow the exporting of a single SCSI > - OSD (object-based storage) object as a Linux block device. > - > - For example, if you create a 2G object on an OSD device, > - you can then use this module to present that 2G object as > - a Linux block device. > - > - To compile this driver as a module, choose M here: the > - module will be called osdblk. > - > - If unsure, say N. > - > config BLK_DEV_SX8 > tristate "Promise SATA SX8 support" > depends on PCI > diff --git a/drivers/block/Makefile b/drivers/block/Makefile > index b12c772bbeb3..42e8cee1cbc7 100644 > --- a/drivers/block/Makefile > +++ b/drivers/block/Makefile > @@ -22,7 +22,6 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o > obj-$(CONFIG_MG_DISK)+= mg_disk.o > obj-$(CONFIG_SUNVDC) += sunvdc.o > obj-$(CONFIG_BLK_DEV_SKD)+= skd.o > -obj-$(CONFIG_BLK_DEV_OSD)+= osdblk.o > > obj-$(CONFIG_BLK_DEV_UMEM) += umem.o > obj-$(CONFIG_BLK_DEV_NBD)+= nbd.o > diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c > deleted file mode 100644 > index 8127b8201a01.. > --- a/drivers/block/osdblk.c > +++ /dev/null > @@ -1,693 +0,0 @@ > - > -/* > - osdblk.c -- Export a single SCSI OSD object as a Linux block device > - > - > - Copyright 2009 Red Hat, Inc. > - > - This program is free software; you can redistribute it and/or modify > - it under the terms of the GNU General Public License as published by > - the Free Software Foundation. > - > - This program is distributed in the hope that it will be useful, > - but WITHOUT ANY WARRANTY; without even the implied warranty of > - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > - GNU General Public License for more details. > - > - You should have received a copy of the GNU General Public License > - along with this program; see the file COPYING. If not, write to > - the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. > - > - > - Instructions for use > - > - > - 1) Map a Linux block device to an existing OSD object. > - > - In this example, we will use partition id 1234, object id 5678, > - OSD device /dev/osd1. > - > - $ echo "1234 5678 /dev/osd1" > /sys/class/osdblk/add > - > - > - 2) List all active blkdev<->object mappings. > - > - In this example, we have performed step #1 twice, creating two blkdevs, > - mapped to two separate OSD objects. > - > - $ cat /sys/class/osdblk/list > - 0 174 1234 5678 /dev/osd1 > - 1 179 1994 897123 /dev/osd0 > - > - The columns, in order, are: > - - blkdev unique id > - - blkdev assigned major > - - OSD object partition id > - - OSD object id > - - OSD device > - > - > - 3) Remove an active blkdev<->object mapping. > - > - In this example, we remove the mapping with blkdev unique id 1. > - > - $ echo 1 > /sys/class/osdblk/remove > - > - > - NOTE: The actual creation and deletion of OSD objects is outside the > scope > - of this driver. > - > - */ > - > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > -#include > - > -#define DRV_NAME "osdblk" > -#define PFX DRV_NAME ": " > - > -/* #define _OSDBLK_DEBUG */ > -#ifdef _OSDBLK_DEBUG > -#define OSDBLK_DEBUG(fmt, a...) \ > - printk(KERN_NOTICE "osdblk @%s:%d: " fmt, __func__, __LINE__, ##a) > -#else > -#define OSDBLK_DEBUG(fmt, a...) \ > - do { if (0) printk(fmt, ##a); } while (0) > -#endif > - > -MODULE_AUTHOR("Jeff Garzik "); > -MODULE_DESCRIPTION("block device inside an OSD object osdblk.ko"); > -MODULE_LICENSE("GPL"); > - > -struct osdblk_device; > - > -enum { > - OSDBLK_MINORS_PER_MAJOR = 256, /* max minors per blkdev */ > - OSDBLK_MAX_REQ = 32, /* max parallel requests */ > - OSDBLK_OP_TIMEOUT = 4 * 60, /* sync OSD req timeout */ > -}; > - > -struct osdblk_request { > - struct request *rq;/* blk layer request */ > - struct bio