(From: Mr. James Tan (Urgent & Confidential)

2017-04-19 Thread James Tan
-- 
(From: Mr.James Tan  (Urgent & Confidential)

Good Day, Please Read.

My name is Mr.James Tan , I am the Bill and Exchange manager here in
Bank of Africa (BOA) Lome-Togo.West-Africa. I have a business proposal
in the tune of $9.7m, (Nine Million Seven Hundred Thousand Dollars
Only) after the successful transfer, we shall share in ratio of 40%
for you and 60% for me.

Should you be interested, please contact me through my private email
(tanjames...@gmail.com) so we can commence all arrangements and i will
give you more information on how we would handle this project.


Please treat this business with utmost confidentiality and send me the
Following:

1. Your Full Name:
2. Your Phone Number:..
3. Your Age:
4. Your Sex:...
5. Your Occupations:
6. Your Country and City:

Kind Regards,

Mr.James Tan .
Bill & Exchange manager
(BOA) Bank of Africa.
Lome-Togo.
Email: tanjames...@gmail.com


Re: [PATCH 2/2] block: Improve error handling verbosity

2017-04-19 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 


Re: [PATCH 1/2] scsi: convert unrecovered read error to -EILSEQ

2017-04-19 Thread Christoph Hellwig
Looks like I won't get the major error status changes into 4.12,
so let's go with these patches for now:

Reviewed-by: Christoph Hellwig 


Re: [PATCH 08/23] scsi: introduce a result field in struct scsi_request

2017-04-19 Thread Christoph Hellwig
On Wed, Apr 19, 2017 at 09:43:00PM -0400, Martin K. Petersen wrote:
> Really wish we could just obliterate drivers/ide.

Same here.  It's been _the_ major pain point for doing block layer
heavy lifting for the last 1 or two years.


Re: [PATCH 02/23] block: remove the blk_execute_rq return value

2017-04-19 Thread h...@lst.de
On Wed, Apr 19, 2017 at 09:07:45PM +, Bart Van Assche wrote:
> > +   blk_execute_rq(or->request->q, NULL, or->request, 0);
> > +   error = or->request ? -EIO : 0;
> 
> Hello Christoph,
> 
> Did you perhaps intend or->request->errors instead of or->request?

Yes, thanks.


[PATCH-v1 22/22] lpfc revison 11.2.0.12

2017-04-19 Thread jsmart2021
From: James Smart 

Update lpfc version to reflect this set of changes.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_version.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_version.h b/drivers/scsi/lpfc/lpfc_version.h
index d4e95e2..1c26dc6 100644
--- a/drivers/scsi/lpfc/lpfc_version.h
+++ b/drivers/scsi/lpfc/lpfc_version.h
@@ -20,7 +20,7 @@
  * included with this package. *
  ***/
 
-#define LPFC_DRIVER_VERSION "11.2.0.10"
+#define LPFC_DRIVER_VERSION "11.2.0.12"
 #define LPFC_DRIVER_NAME   "lpfc"
 
 /* Used for SLI 2/3 */
-- 
2.1.0



[PATCH-v1 21/22] Fix Express lane queue creation.

2017-04-19 Thread jsmart2021
From: James Smart 

The older sli4 adapters only supported the 64 byte WQE entry size.
The new adapter (fw) support both 64 and 128 byte WQE entry sizies.
The Express lane WQ was not being created with the 128 byte WQE sizes
when it was supported.

Not having the right WQE size created for the express lane work queue
caused the the firmware to overwrite the lun indentifier in the FCP header.

This patch correctly creates the express lane work queue with the
supported size.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_init.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index e084a66..7d3b199 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -12040,6 +12040,7 @@ int
 lpfc_fof_queue_create(struct lpfc_hba *phba)
 {
struct lpfc_queue *qdesc;
+   uint32_t wqesize;
 
/* Create FOF EQ */
qdesc = lpfc_sli4_queue_alloc(phba, phba->sli4_hba.eq_esize,
@@ -12060,8 +12061,11 @@ lpfc_fof_queue_create(struct lpfc_hba *phba)
phba->sli4_hba.oas_cq = qdesc;
 
/* Create OAS WQ */
-   qdesc = lpfc_sli4_queue_alloc(phba, phba->sli4_hba.wq_esize,
+   wqesize = (phba->fcp_embed_io) ?
+   LPFC_WQE128_SIZE : phba->sli4_hba.wq_esize;
+   qdesc = lpfc_sli4_queue_alloc(phba, wqesize,
  phba->sli4_hba.wq_ecount);
+
if (!qdesc)
goto out_error;
 
-- 
2.1.0



[PATCH-v1 14/22] Remove hba lock from NVMET issue WQE.

2017-04-19 Thread jsmart2021
From: James Smart 

Unnecessary lock is taken. ring lock should be sufficient to protect the
work queue submission.

This was noticed when doing performance testing. The hbalock is not
needed to issue io to the nvme work queue.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_nvmet.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index c8a4941..5a994a8 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -517,7 +517,6 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport,
container_of(rsp, struct lpfc_nvmet_rcv_ctx, ctx.fcp_req);
struct lpfc_hba *phba = ctxp->phba;
struct lpfc_iocbq *nvmewqeq;
-   unsigned long iflags;
int rc, id;
 
 #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
@@ -571,10 +570,7 @@ lpfc_nvmet_xmt_fcp_op(struct nvmet_fc_target_port *tgtport,
lpfc_nvmeio_data(phba, "NVMET FCP CMND: xri x%x op x%x len x%x\n",
 ctxp->oxid, rsp->op, rsp->rsplen);
 
-   /* For now we take hbalock */
-   spin_lock_irqsave(&phba->hbalock, iflags);
rc = lpfc_sli4_issue_wqe(phba, LPFC_FCP_RING, nvmewqeq);
-   spin_unlock_irqrestore(&phba->hbalock, iflags);
if (rc == WQE_SUCCESS) {
ctxp->flag |= LPFC_NVMET_IO_INP;
 #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
-- 
2.1.0



[PATCH-v1 08/22] Remove NULL ptr check before kfree.

2017-04-19 Thread jsmart2021
From: James Smart 

The check for NULL ptr is not necessary, kfree will check it.

Removing NULL ptr check.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_debugfs.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_debugfs.c b/drivers/scsi/lpfc/lpfc_debugfs.c
index 913eed8..76857e2 100644
--- a/drivers/scsi/lpfc/lpfc_debugfs.c
+++ b/drivers/scsi/lpfc/lpfc_debugfs.c
@@ -5700,10 +5700,8 @@ lpfc_debugfs_terminate(struct lpfc_vport *vport)
 #ifdef CONFIG_SCSI_LPFC_DEBUG_FS
struct lpfc_hba   *phba = vport->phba;
 
-   if (vport->disc_trc) {
-   kfree(vport->disc_trc);
-   vport->disc_trc = NULL;
-   }
+   kfree(vport->disc_trc);
+   vport->disc_trc = NULL;
 
debugfs_remove(vport->debug_disc_trc); /* discovery_trace */
vport->debug_disc_trc = NULL;
@@ -5770,10 +5768,8 @@ lpfc_debugfs_terminate(struct lpfc_vport *vport)
debugfs_remove(phba->debug_readRef); /* readRef */
phba->debug_readRef = NULL;
 
-   if (phba->slow_ring_trc) {
-   kfree(phba->slow_ring_trc);
-   phba->slow_ring_trc = NULL;
-   }
+   kfree(phba->slow_ring_trc);
+   phba->slow_ring_trc = NULL;
 
/* slow_ring_trace */
debugfs_remove(phba->debug_slow_ring_trc);
-- 
2.1.0



[PATCH-v1 04/22] Fix log message in completion path.

2017-04-19 Thread jsmart2021
From: James Smart 

In the lpfc_nvme_io_cmd_wqe_cmpl routine the driver was printing two
pointers and the DID for the rport whenever an IO completed on a now
that had transitioned to a non active state.

There is no need to print the node pointer address for a node that
is not active the DID should be enough to debug.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_nvme.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 2a87428..bc6946f 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -787,9 +787,8 @@ lpfc_nvme_io_cmd_wqe_cmpl(struct lpfc_hba *phba, struct 
lpfc_iocbq *pwqeIn,
ndlp = rport->ndlp;
if (!ndlp || !NLP_CHK_NODE_ACT(ndlp)) {
lpfc_printf_vlog(vport, KERN_ERR, LOG_NODE | LOG_NVME_IOERR,
-"6061 rport %p, ndlp %p, DID x%06x ndlp "
-"not ready.\n",
-rport, ndlp, rport->remoteport->port_id);
+"6061 rport %p,  DID x%06x node not ready.\n",
+rport, rport->remoteport->port_id);
 
ndlp = lpfc_findnode_did(vport, rport->remoteport->port_id);
if (!ndlp) {
-- 
2.1.0



[PATCH-v1 07/22] Remove unused defines for NVME PostBuf.

2017-04-19 Thread jsmart2021
From: James Smart 

These defines for the posting of buffers for nvmet target were not used.

Removing the unused defines.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_nvme.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.h b/drivers/scsi/lpfc/lpfc_nvme.h
index 6277796..2582f46 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.h
+++ b/drivers/scsi/lpfc/lpfc_nvme.h
@@ -22,9 +22,6 @@
  /
 
 #define LPFC_NVME_DEFAULT_SEGS (64 + 1)/* 256K IOs */
-#define LPFC_NVMET_MIN_POSTBUF 16
-#define LPFC_NVMET_DEFAULT_POSTBUF 1024
-#define LPFC_NVMET_MAX_POSTBUF 4096
 #define LPFC_NVME_WQSIZE   256
 
 #define LPFC_NVME_ERSP_LEN 0x20
-- 
2.1.0



[PATCH-v1 16/22] Fix crash after issuing lip reset

2017-04-19 Thread jsmart2021
From: James Smart 

When RPI is not available, driver sends WQE with invalid RPI value and
rejected by HBA.
lpfc :82:00.3: 1:3154 BLS ABORT RSP failed, data:  x3/xa0320008
and
lpfc :2753 PLOGI failure DID:FA Status:x3/xa0240008

In this case, driver accesses rpi_ids array out of bounds.

Fix:
Check return value of lpfc_sli4_alloc_rpi(). Do not allocate
lpfc_nodelist entry if RPI is not available.

When RPI is not available, we will get discovery timeouts and
command drops for some of the vports as seen below.


lpfc :0273 Unexpected discovery timeout, vport State x0
lpfc :0230 Unexpected timeout, hba link state x5
lpfc :0111 Dropping received ELS cmd Data: x0 xc90c55 x0

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_bsg.c |  4 
 drivers/scsi/lpfc/lpfc_crtn.h|  2 +-
 drivers/scsi/lpfc/lpfc_els.c | 35 
 drivers/scsi/lpfc/lpfc_hbadisc.c | 47 +-
 drivers/scsi/lpfc/lpfc_init.c| 49 +---
 drivers/scsi/lpfc/lpfc_sli.c |  3 +--
 drivers/scsi/lpfc/lpfc_vport.c   |  3 +--
 7 files changed, 79 insertions(+), 64 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_bsg.c b/drivers/scsi/lpfc/lpfc_bsg.c
index 18157d2..a1686c2 100644
--- a/drivers/scsi/lpfc/lpfc_bsg.c
+++ b/drivers/scsi/lpfc/lpfc_bsg.c
@@ -2486,6 +2486,10 @@ static int lpfcdiag_loop_self_reg(struct lpfc_hba *phba, 
uint16_t *rpi)
mbox, *rpi);
else {
*rpi = lpfc_sli4_alloc_rpi(phba);
+   if (*rpi == LPFC_RPI_ALLOC_ERROR) {
+   mempool_free(mbox, phba->mbox_mem_pool);
+   return -EBUSY;
+   }
status = lpfc_reg_rpi(phba, phba->pport->vpi,
phba->pport->fc_myDID,
(uint8_t *)&phba->pport->fc_sparam,
diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index 54e6ac4..d859aff 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -99,7 +99,7 @@ void lpfc_issue_reg_vpi(struct lpfc_hba *, struct lpfc_vport 
*);
 
 int lpfc_check_sli_ndlp(struct lpfc_hba *, struct lpfc_sli_ring *,
struct lpfc_iocbq *, struct lpfc_nodelist *);
-void lpfc_nlp_init(struct lpfc_vport *, struct lpfc_nodelist *, uint32_t);
+struct lpfc_nodelist *lpfc_nlp_init(struct lpfc_vport *vport, uint32_t did);
 struct lpfc_nodelist *lpfc_nlp_get(struct lpfc_nodelist *);
 int  lpfc_nlp_put(struct lpfc_nodelist *);
 int  lpfc_nlp_not_used(struct lpfc_nodelist *ndlp);
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index d6e0f58..ffdcac0 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -895,10 +895,9 @@ lpfc_cmpl_els_flogi_nport(struct lpfc_vport *vport, struct 
lpfc_nodelist *ndlp,
 * Cannot find existing Fabric ndlp, so allocate a
 * new one
 */
-   ndlp = mempool_alloc(phba->nlp_mem_pool, GFP_KERNEL);
+   ndlp = lpfc_nlp_init(vport, PT2PT_RemoteID);
if (!ndlp)
goto fail;
-   lpfc_nlp_init(vport, ndlp, PT2PT_RemoteID);
} else if (!NLP_CHK_NODE_ACT(ndlp)) {
ndlp = lpfc_enable_node(vport, ndlp,
NLP_STE_UNUSED_NODE);
@@ -1364,7 +1363,6 @@ lpfc_els_abort_flogi(struct lpfc_hba *phba)
 int
 lpfc_initial_flogi(struct lpfc_vport *vport)
 {
-   struct lpfc_hba *phba = vport->phba;
struct lpfc_nodelist *ndlp;
 
vport->port_state = LPFC_FLOGI;
@@ -1374,10 +1372,9 @@ lpfc_initial_flogi(struct lpfc_vport *vport)
ndlp = lpfc_findnode_did(vport, Fabric_DID);
if (!ndlp) {
/* Cannot find existing Fabric ndlp, so allocate a new one */
-   ndlp = mempool_alloc(phba->nlp_mem_pool, GFP_KERNEL);
+   ndlp = lpfc_nlp_init(vport, Fabric_DID);
if (!ndlp)
return 0;
-   lpfc_nlp_init(vport, ndlp, Fabric_DID);
/* Set the node type */
ndlp->nlp_type |= NLP_FABRIC;
/* Put ndlp onto node list */
@@ -1418,17 +1415,15 @@ lpfc_initial_flogi(struct lpfc_vport *vport)
 int
 lpfc_initial_fdisc(struct lpfc_vport *vport)
 {
-   struct lpfc_hba *phba = vport->phba;
struct lpfc_nodelist *ndlp;
 
/* First look for the Fabric ndlp */
ndlp = lpfc_findnode_did(vport, Fabric_DID);
if (!ndlp) {
/* Cannot find existing Fabric ndlp, so allocate a new one */
-   ndlp = mempool_alloc(phba->nlp_mem_pool, GFP_KERNEL);
+   ndlp = lpfc_nlp_init(vport, Fabric_DID);
if (!ndlp)
return 0;
-   lpfc_nlp_in

[PATCH-v1 06/22] Fix spelling in comments.

2017-04-19 Thread jsmart2021
From: James Smart 

Comment should have said Repost.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_sli.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 1c9fa45..7087c55 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -6338,7 +6338,7 @@ lpfc_sli4_get_allocated_extnts(struct lpfc_hba *phba, 
uint16_t type,
 }
 
 /**
- * lpfc_sli4_repost_sgl_list - Repsot the buffers sgl pages as block
+ * lpfc_sli4_repost_sgl_list - Repost the buffers sgl pages as block
  * @phba: pointer to lpfc hba data structure.
  * @pring: Pointer to driver SLI ring object.
  * @sgl_list: linked link of sgl buffers to post
-- 
2.1.0



[PATCH-v1 01/22] Standardize nvme SGL segment count

2017-04-19 Thread jsmart2021
From: James Smart 

Standardize default SGL segment count for nvme target and initiator

The driver needs to make them the same for clarity.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_nvme.h  | 4 +---
 drivers/scsi/lpfc/lpfc_nvmet.c | 2 +-
 drivers/scsi/lpfc/lpfc_nvmet.h | 4 +---
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.h b/drivers/scsi/lpfc/lpfc_nvme.h
index 1347deb..6277796 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.h
+++ b/drivers/scsi/lpfc/lpfc_nvme.h
@@ -21,9 +21,7 @@
  * included with this package. *
  /
 
-#define LPFC_NVME_MIN_SEGS 16
-#define LPFC_NVME_DEFAULT_SEGS 66  /* 256K IOs - 64 + 2 */
-#define LPFC_NVME_MAX_SEGS 510
+#define LPFC_NVME_DEFAULT_SEGS (64 + 1)/* 256K IOs */
 #define LPFC_NVMET_MIN_POSTBUF 16
 #define LPFC_NVMET_DEFAULT_POSTBUF 1024
 #define LPFC_NVMET_MAX_POSTBUF 4096
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index d488c33..c8a4941 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -709,7 +709,7 @@ lpfc_nvmet_create_targetport(struct lpfc_hba *phba)
pinfo.port_id = vport->fc_myDID;
 
lpfc_tgttemplate.max_hw_queues = phba->cfg_nvme_io_channel;
-   lpfc_tgttemplate.max_sgl_segments = phba->cfg_sg_seg_cnt;
+   lpfc_tgttemplate.max_sgl_segments = phba->cfg_sg_seg_cnt + 1;
lpfc_tgttemplate.target_features = NVMET_FCTGTFEAT_READDATA_RSP |
   NVMET_FCTGTFEAT_NEEDS_CMD_CPUSCHED |
   NVMET_FCTGTFEAT_CMD_IN_ISR |
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.h b/drivers/scsi/lpfc/lpfc_nvmet.h
index 02735fc..d8bac4c 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.h
+++ b/drivers/scsi/lpfc/lpfc_nvmet.h
@@ -21,9 +21,7 @@
  * included with this package. *
  /
 
-#define LPFC_NVMET_MIN_SEGS16
-#define LPFC_NVMET_DEFAULT_SEGS64  /* 256K IOs */
-#define LPFC_NVMET_MAX_SEGS510
+#define LPFC_NVMET_DEFAULT_SEGS(64 + 1)/* 256K IOs */
 #define LPFC_NVMET_SUCCESS_LEN 12
 
 /* Used for NVME Target */
-- 
2.1.0



[PATCH-v1 12/22] Fix driver usage of 128B WQEs when WQ_CREATE is V1.

2017-04-19 Thread jsmart2021
From: James Smart 

There are two versions of a structure for queue creation and setup that the
driver shares with FW. The driver was only treating as version 0.

Verify WQ_CREATE with 128B WQEs in V0 and V1.

Code review of another bug showed the driver passing
128B WQEs and 8 pages in WQ CREATE and V0.
Code inspection/instrumentation showed that the driver
uses V0 in WQ_CREATE and if the caller passes queue->entry_size
128B, the driver sets the hdr_version to V1 so all is good.
When I tested the V1 WQ_CREATE, the mailbox failed causing
the driver to unload.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_sli.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index d606944..1519fdf 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -14741,6 +14741,9 @@ lpfc_wq_create(struct lpfc_hba *phba, struct lpfc_queue 
*wq,
case LPFC_Q_CREATE_VERSION_1:
bf_set(lpfc_mbx_wq_create_wqe_count, &wq_create->u.request_1,
   wq->entry_count);
+   bf_set(lpfc_mbox_hdr_version, &shdr->request,
+  LPFC_Q_CREATE_VERSION_1);
+
switch (wq->entry_size) {
default:
case 64:
-- 
2.1.0



[PATCH-v1 20/22] Update ABORT processing for NVMET.

2017-04-19 Thread jsmart2021
From: James Smart 

The driver with nvme had this routine stubbed.

Right now XRI_ABORTED_CQE is not handled and the FC NVMET
Transport has a new API for the driver.

Missing code path, new NVME abort API
Update ABORT processing for NVMET

There are 3 new FC NVMET Transport API/ template routines for NVMET:

lpfc_nvmet_xmt_fcp_release
This NVMET template callback routine called to release context
associated with an IO This routine is ALWAYS called last, even
if the IO was aborted or completed in error.

lpfc_nvmet_xmt_fcp_abort
This NVMET template callback routine called to abort an exchange that
has an IO in progress

nvmet_fc_rcv_fcp_req
When the lpfc driver receives an ABTS, this NVME FC transport layer
callback routine is called. For this case there are 2 paths thru the
driver: the driver either has an outstanding exchange / context for the
XRI to be aborted or not.  If not, a BA_RJT is issued otherwise a BA_ACC

NVMET Driver abort paths:

There are 2 paths for aborting an IO. The first one is we receive an IO and
decide not to process it because of lack of resources. An unsolicated ABTS
is immediately sent back to the initiator as a response.
lpfc_nvmet_unsol_fcp_buffer
lpfc_nvmet_unsol_issue_abort  (XMIT_SEQUENCE_WQE)

The second one is we sent the IO up to the NVMET transport layer to
process, and for some reason the NVME Transport layer decided to abort the
IO before it completes all its phases. For this case there are 2 paths
thru the driver:
the driver either has an outstanding TSEND/TRECEIVE/TRSP WQE or no
outstanding WQEs are present for the exchange / context.
lpfc_nvmet_xmt_fcp_abort
if (LPFC_NVMET_IO_INP)
lpfc_nvmet_sol_fcp_issue_abort  (ABORT_WQE)
lpfc_nvmet_sol_fcp_abort_cmp
else
lpfc_nvmet_unsol_fcp_issue_abort
lpfc_nvmet_unsol_issue_abort  (XMIT_SEQUENCE_WQE)
lpfc_nvmet_unsol_fcp_abort_cmp

Context flags:
LPFC_NVMET_IOP - his flag signifies an IO is in progress on the exchange.
LPFC_NVMET_XBUSY  - this flag indicates the IO completed but the firmware
is still busy with the corresponding exchange. The exchange should not be
reused until after a XRI_ABORTED_CQE is received for that exchange.
LPFC_NVMET_ABORT_OP - this flag signifies an ABORT_WQE was issued on the
exchange.
LPFC_NVMET_CTX_RLS  - this flag signifies a context free was requested,
but we are deferring it due to an XBUSY or ABORT in progress.

A ctxlock is added to the context structure that is used whenever these
flags are set/read  within the context of an IO.
The LPFC_NVMET_CTX_RLS flag is only set in the defer_relase routine when
the transport has resolved all IO associated with the buffer. The flag is
cleared when the CTX is associated with a new IO.

An exchange can has both an LPFC_NVMET_XBUSY and a LPFC_NVMET_ABORT_OP
condition active simultaneously. Both conditions must complete before the
exchange is freed.
When the abort callback (lpfc_nvmet_xmt_fcp_abort) is envoked:
If there is an outstanding IO, the driver will issue an ABORT_WQE. This
should result in 3 completions for the exchange:
1) IO cmpl with XB bit set
2) Abort WQE cmpl
3) XRI_ABORTED_CQE cmpl
For this scenerio, after completion #1, the NVMET Transport IO rsp
callback is called.  After completion #2, no action is taken with respect
to the exchange / context.  After completion #3, the exchange context is
free for re-use on another IO.

If there is no outstanding activity on the exchange, the driver will send a
ABTS to the Initiator. Upon completion of this WQE, the exchange / context
is freed for re-use on another IO.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_crtn.h|   7 +
 drivers/scsi/lpfc/lpfc_debugfs.c |  55 --
 drivers/scsi/lpfc/lpfc_hw4.h |   3 +
 drivers/scsi/lpfc/lpfc_init.c|  52 +++---
 drivers/scsi/lpfc/lpfc_mbox.c|   7 +-
 drivers/scsi/lpfc/lpfc_nvme.c|  45 +++--
 drivers/scsi/lpfc/lpfc_nvmet.c   | 349 +++
 drivers/scsi/lpfc/lpfc_nvmet.h   |  10 +-
 drivers/scsi/lpfc/lpfc_sli.c |   7 +-
 drivers/scsi/lpfc/lpfc_sli4.h|   2 +-
 10 files changed, 412 insertions(+), 125 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_crtn.h b/drivers/scsi/lpfc/lpfc_crtn.h
index d859aff..24da922 100644
--- a/drivers/scsi/lpfc/lpfc_crtn.h
+++ b/drivers/scsi/lpfc/lpfc_crtn.h
@@ -24,6 +24,7 @@ typedef int (*node_filter)(struct lpfc_nodelist *, void *);
 
 struct fc_rport;
 struct fc_frame_header;
+struct lpfc_nvmet_rcv_ctx;
 void lpfc_down_link(struct lpfc_hba *, LPFC_MBOXQ_t *);
 void lpfc_sli_read_link_ste(struct lpfc_hba *);
 void lpfc_dump_mem(struct lpfc_hba *, LPFC_MBOXQ_t *, uint16_t, uint16_t);
@@ -245,6 +246,10 @@ struct hbq_dmabuf *lpfc_sli4_rb_alloc(struct lpfc_hba *);
 void lpfc_sli4_rb_free(struct lpfc_hba *, struct hbq_dmabuf *);
 struct rqb_dmabuf *lpfc_sli4_nvmet_alloc(struct lpfc_hba *phba);
 void lpfc_sli4_nvmet_free(struct lpfc_hba *phba, struct rqb_

[PATCH-v1 17/22] Fix max_sgl_segments settings for NVME / NVMET

2017-04-19 Thread jsmart2021
From: James Smart 

Cannot set NVME segment counts to a large number

The existing module parameter lpfc_sg_seg_cnt is used for both
SCSI and NVME.

Limit the module parameter lpfc_sg_seg_cnt to 128 with the
default being 64 for both NVME and NVMET, assuming NVME is enabled in the
driver for that port. The driver will set max_sgl_segments in the
NVME/NVMET template to lpfc_sg_seg_cnt + 1.

Looks good,
Reviewed-by: Johannes Thumshirn 

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc.h   |  3 ++-
 drivers/scsi/lpfc/lpfc_nvme.c  | 18 ++
 drivers/scsi/lpfc/lpfc_nvmet.c | 19 +++
 3 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 257bbdd..0fc1450 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -56,7 +56,7 @@ struct lpfc_sli2_slim;
 #define LPFC_MAX_SG_SEG_CNT4096/* sg element count per scsi cmnd */
 #define LPFC_MAX_SGL_SEG_CNT   512 /* SGL element count per scsi cmnd */
 #define LPFC_MAX_BPL_SEG_CNT   4096/* BPL element count per scsi cmnd */
-#define LPFC_MIN_NVME_SEG_CNT  254
+#define LPFC_MAX_NVME_SEG_CNT  128 /* max SGL element cnt per NVME cmnd */
 
 #define LPFC_MAX_SGE_SIZE   0x8000 /* Maximum data allowed in a SGE */
 #define LPFC_IOCB_LIST_CNT 2250/* list of IOCBs for fast-path usage. */
@@ -781,6 +781,7 @@ struct lpfc_hba {
uint32_t cfg_nvmet_fb_size;
uint32_t cfg_total_seg_cnt;
uint32_t cfg_sg_seg_cnt;
+   uint32_t cfg_nvme_seg_cnt;
uint32_t cfg_sg_dma_buf_size;
uint64_t cfg_soft_wwnn;
uint64_t cfg_soft_wwpn;
diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index efe1242..1a83ade 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -1113,12 +1113,12 @@ lpfc_nvme_prep_io_dma(struct lpfc_vport *vport,
 
first_data_sgl = sgl;
lpfc_ncmd->seg_cnt = nCmd->sg_cnt;
-   if (lpfc_ncmd->seg_cnt > phba->cfg_sg_seg_cnt) {
+   if (lpfc_ncmd->seg_cnt > phba->cfg_nvme_seg_cnt) {
lpfc_printf_log(phba, KERN_ERR, LOG_NVME_IOERR,
"6058 Too many sg segments from "
"NVME Transport.  Max %d, "
"nvmeIO sg_cnt %d\n",
-   phba->cfg_sg_seg_cnt,
+   phba->cfg_nvme_seg_cnt,
lpfc_ncmd->seg_cnt);
lpfc_ncmd->seg_cnt = 0;
return 1;
@@ -2157,8 +2157,18 @@ lpfc_nvme_create_localport(struct lpfc_vport *vport)
nfcp_info.node_name = wwn_to_u64(vport->fc_nodename.u.wwn);
nfcp_info.port_name = wwn_to_u64(vport->fc_portname.u.wwn);
 
-   /* For now need + 1 to get around NVME transport logic */
-   lpfc_nvme_template.max_sgl_segments = phba->cfg_sg_seg_cnt + 1;
+   /* Limit to LPFC_MAX_NVME_SEG_CNT.
+* For now need + 1 to get around NVME transport logic.
+*/
+   if (phba->cfg_sg_seg_cnt > LPFC_MAX_NVME_SEG_CNT) {
+   lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME | LOG_INIT,
+"6300 Reducing sg segment cnt to %d\n",
+LPFC_MAX_NVME_SEG_CNT);
+   phba->cfg_nvme_seg_cnt = LPFC_MAX_NVME_SEG_CNT;
+   } else {
+   phba->cfg_nvme_seg_cnt = phba->cfg_sg_seg_cnt;
+   }
+   lpfc_nvme_template.max_sgl_segments = phba->cfg_nvme_seg_cnt + 1;
lpfc_nvme_template.max_hw_queues = phba->cfg_nvme_io_channel;
 
/* localport is allocated from the stack, but the registration
diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c
index 8348bb5..d0dda42 100644
--- a/drivers/scsi/lpfc/lpfc_nvmet.c
+++ b/drivers/scsi/lpfc/lpfc_nvmet.c
@@ -704,8 +704,19 @@ lpfc_nvmet_create_targetport(struct lpfc_hba *phba)
pinfo.port_name = wwn_to_u64(vport->fc_portname.u.wwn);
pinfo.port_id = vport->fc_myDID;
 
+   /* Limit to LPFC_MAX_NVME_SEG_CNT.
+* For now need + 1 to get around NVME transport logic.
+*/
+   if (phba->cfg_sg_seg_cnt > LPFC_MAX_NVME_SEG_CNT) {
+   lpfc_printf_log(phba, KERN_INFO, LOG_NVME | LOG_INIT,
+   "6400 Reducing sg segment cnt to %d\n",
+   LPFC_MAX_NVME_SEG_CNT);
+   phba->cfg_nvme_seg_cnt = LPFC_MAX_NVME_SEG_CNT;
+   } else {
+   phba->cfg_nvme_seg_cnt = phba->cfg_sg_seg_cnt;
+   }
+   lpfc_tgttemplate.max_sgl_segments = phba->cfg_nvme_seg_cnt + 1;
lpfc_tgttemplate.max_hw_queues = phba->cfg_nvme_io_channel;
-   lpfc_tgttemplate.max_sgl_segments = phba->cfg_sg_seg_cnt + 1;
lpfc_tgttemplate.target_features = NVMET_FCTGTFEAT_READDATA_RSP |
  

[PATCH-v1 15/22] Fix driver load issues when MRQ=8

2017-04-19 Thread jsmart2021
From: James Smart 

The symptom is that the driver will fail to login to the fabric.
The reason is because it is out of iocb resources.

There is a one to one relationship between MRQs
(receive buffers for NVMET-FC) and iocbs and the default number of
IOCBs was not accounting for the number of MRQs that were being created.

This fix aligns the number of MRQ resources with the total resources so
that it can handle fabric events when needed.

Also the initialization of ctxlock to be on FCP commands, NOT LS commands.
And modified log messages so that the log output can be correlated with
the analyzer trace.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_init.c  | 12 
 drivers/scsi/lpfc/lpfc_nvme.c  | 23 ---
 drivers/scsi/lpfc/lpfc_nvmet.c |  6 +++---
 3 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index cca7f81..88977b8 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -11061,7 +11061,7 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const 
struct pci_device_id *pid)
struct lpfc_hba   *phba;
struct lpfc_vport *vport = NULL;
struct Scsi_Host  *shost = NULL;
-   int error;
+   int error, cnt;
uint32_t cfg_mode, intr_mode;
 
/* Allocate memory for HBA structure */
@@ -11095,11 +11095,15 @@ lpfc_pci_probe_one_s4(struct pci_dev *pdev, const 
struct pci_device_id *pid)
goto out_unset_pci_mem_s4;
}
 
+   cnt = phba->cfg_iocb_cnt * 1024;
+   if (phba->nvmet_support)
+   cnt += (phba->cfg_nvmet_mrq_post * phba->cfg_nvmet_mrq);
+
/* Initialize and populate the iocb list per host */
lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
-   "2821 initialize iocb list %d.\n",
-   phba->cfg_iocb_cnt*1024);
-   error = lpfc_init_iocb_list(phba, phba->cfg_iocb_cnt*1024);
+   "2821 initialize iocb list %d total %d\n",
+   phba->cfg_iocb_cnt, cnt);
+   error = lpfc_init_iocb_list(phba, cnt);
 
if (error) {
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index f23496b..efe1242 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -1494,6 +1494,7 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
 "io buffer.  Skipping abort req.\n");
return;
}
+   nvmereq_wqe = &lpfc_nbuf->cur_iocbq;
 
/*
 * The lpfc_nbuf and the mapped nvme_fcreq in the driver's
@@ -1507,20 +1508,19 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME,
 "6143 NVME req mismatch: "
 "lpfc_nbuf %p nvmeCmd %p, "
-"pnvme_fcreq %p.  Skipping Abort\n",
+"pnvme_fcreq %p.  Skipping Abort xri x%x\n",
 lpfc_nbuf, lpfc_nbuf->nvmeCmd,
-pnvme_fcreq);
+pnvme_fcreq, nvmereq_wqe->sli4_xritag);
return;
}
 
/* Don't abort IOs no longer on the pending queue. */
-   nvmereq_wqe = &lpfc_nbuf->cur_iocbq;
if (!(nvmereq_wqe->iocb_flag & LPFC_IO_ON_TXCMPLQ)) {
spin_unlock_irqrestore(&phba->hbalock, flags);
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME,
 "6142 NVME IO req %p not queued - skipping "
-"abort req\n",
-pnvme_fcreq);
+"abort req xri x%x\n",
+pnvme_fcreq, nvmereq_wqe->sli4_xritag);
return;
}
 
@@ -1534,8 +1534,9 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME,
 "6144 Outstanding NVME I/O Abort Request "
 "still pending on nvme_fcreq %p, "
-"lpfc_ncmd %p\n",
-pnvme_fcreq, lpfc_nbuf);
+"lpfc_ncmd %p xri x%x\n",
+pnvme_fcreq, lpfc_nbuf,
+nvmereq_wqe->sli4_xritag);
return;
}
 
@@ -1544,8 +1545,8 @@ lpfc_nvme_fcp_abort(struct nvme_fc_local_port 
*pnvme_lport,
spin_unlock_irqrestore(&phba->hbalock, flags);
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME,
 "6136 No available abort wqes. Skipping "
-"Abts req for nvme_fcreq %p.\n",
-pnvme_f

[PATCH-v1 13/22] Fix nvme initiator handling when not enabled.

2017-04-19 Thread jsmart2021
From: James Smart 

Fix nvme initiator handline when CONFIG_LPFC_NVME_INITIATOR is not enabled.

With update nvme upstream driver sources, loading
the driver with nvme enabled resulting in this Oops.

 BUG: unable to handle kernel NULL pointer dereference at 0018
 IP: lpfc_nvme_update_localport+0x23/0xd0 [lpfc]
 PGD 0
 Oops:  [#1] SMP
 CPU: 0 PID: 10256 Comm: lpfc_worker_0 Tainted
 Hardware name: ...
 task: 881028191c40 task.stack: 880ffdf0
 RIP: 0010:lpfc_nvme_update_localport+0x23/0xd0 [lpfc]
 RSP: 0018:880ffdf03c20 EFLAGS: 00010202

Cause: As the initiator driver completes discovery at different stages,
it call lpfc_nvme_update_localport to hint that the DID and role may have
changed.  In the implementation of lpfc_nvme_update_localport, the driver
was not validating the localport or the lport during the execution
of the update_localport routine.  With the recent upstream additions to
the driver, the create_localport routine didn't run and so the localport
was NULL causing the page-fault Oops.

Fix: Add the CONFIG_LPFC_NVME_INITIATOR preprocessor inclusions to
lpfc_nvme_update_localport to turn off all routine processing when
the running kernel does not have NVME configured.  Add NULL pointer
checks on the localport and lport in lpfc_nvme_update_localport and
dump messages if they are NULL and just exit.
Also one alingment issue fixed.
Repalces the ifdef with the IS_ENABLED macro.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_attr.c |  2 +-
 drivers/scsi/lpfc/lpfc_nvme.c | 15 +--
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 22819af..4b32d02 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -3335,7 +3335,7 @@ LPFC_ATTR_R(enable_fc4_type, LPFC_ENABLE_FCP,
  * percentage will go to NVME.
  */
 LPFC_ATTR_R(xri_split, 50, 10, 90,
-"Division of XRI resources between SCSI and NVME");
+   "Division of XRI resources between SCSI and NVME");
 
 /*
 # lpfc_log_verbose: Only turn this flag on if you are willing to risk being
diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index bc6946f..f23496b 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -2263,12 +2263,23 @@ lpfc_nvme_destroy_localport(struct lpfc_vport *vport)
 void
 lpfc_nvme_update_localport(struct lpfc_vport *vport)
 {
+#if (IS_ENABLED(CONFIG_NVME_FC))
struct nvme_fc_local_port *localport;
struct lpfc_nvme_lport *lport;
 
localport = vport->localport;
+   if (!localport) {
+   lpfc_printf_vlog(vport, KERN_WARNING, LOG_NVME,
+"6710 Update NVME fail. No localport\n");
+   return;
+   }
lport = (struct lpfc_nvme_lport *)localport->private;
-
+   if (!lport) {
+   lpfc_printf_vlog(vport, KERN_WARNING, LOG_NVME,
+"6171 Update NVME fail. localP %p, No lport\n",
+localport);
+   return;
+   }
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME,
 "6012 Update NVME lport %p did x%x\n",
 localport, vport->fc_myDID);
@@ -2282,7 +2293,7 @@ lpfc_nvme_update_localport(struct lpfc_vport *vport)
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
 "6030 bound lport %p to DID x%06x\n",
 lport, localport->port_id);
-
+#endif
 }
 
 int
-- 
2.1.0



[PATCH-v1 11/22] Fix driver unload/reload operation.

2017-04-19 Thread jsmart2021
From: James Smart 

There are couple of different load/unload issues fixed with this patch.
One of the issues was reported by Junichi Nomura, a patch was submitted
by Johannes Thumsrhirn which did fix one of the problems but the fix in
this patch separates the pring free from the queue free and does not set
the parameter passed in to NULL.

issues:
(1) driver could not be unloaded and reloaded without some Oops or
 Panic occurring.
(2) The driver was panicking because of a corruption in the Memory
Manager when the iocb list was getting allocated.

Root cause for the memory corruption was a double free of the Work Queue
ring pointer memory - Freed once in the lpfc_sli4_queue_free when the CQ
was destroyed and again in lpfc_sli4_queue_free when the WQ was destroyed.

The pring free and the queue free were separated, the pring free was moved
to the wq destroy routine because it a better fit logically to delete the
ring with the wq.

The checkpatch flagged several alignmenet issues that were also corrected
with this patch.

The mboxq was never initialed correctly before it was used by the driver
this patch corrects that issue.

Reported-by: Junichi Nomura 
Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_init.c | 46 +++
 drivers/scsi/lpfc/lpfc_sli.c  |  7 ++-
 2 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index b56da01..cca7f81 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -5815,6 +5815,12 @@ lpfc_sli4_driver_resource_setup(struct lpfc_hba *phba)
INIT_LIST_HEAD(&phba->sli4_hba.lpfc_vfi_blk_list);
INIT_LIST_HEAD(&phba->lpfc_vpi_blk_list);
 
+   /* Initialize mboxq lists. If the early init routines fail
+* these lists need to be correctly initialized.
+*/
+   INIT_LIST_HEAD(&phba->sli.mboxq);
+   INIT_LIST_HEAD(&phba->sli.mboxq_cmpl);
+
/* initialize optic_state to 0xFF */
phba->sli4_hba.lnk_info.optic_state = 0xff;
 
@@ -5880,6 +5886,7 @@ lpfc_sli4_driver_resource_setup(struct lpfc_hba *phba)
"READ_NV, mbxStatus x%x\n",
bf_get(lpfc_mqe_command, &mboxq->u.mqe),
bf_get(lpfc_mqe_status, &mboxq->u.mqe));
+   mempool_free(mboxq, phba->mbox_mem_pool);
rc = -EIO;
goto out_free_bsmbx;
}
@@ -7805,7 +7812,7 @@ lpfc_alloc_fcp_wq_cq(struct lpfc_hba *phba, int wqidx)
 
/* Create Fast Path FCP WQs */
wqesize = (phba->fcp_embed_io) ?
-   LPFC_WQE128_SIZE : phba->sli4_hba.wq_esize;
+   LPFC_WQE128_SIZE : phba->sli4_hba.wq_esize;
qdesc = lpfc_sli4_queue_alloc(phba, wqesize, phba->sli4_hba.wq_ecount);
if (!qdesc) {
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
@@ -7836,7 +7843,7 @@ int
 lpfc_sli4_queue_create(struct lpfc_hba *phba)
 {
struct lpfc_queue *qdesc;
-   int idx, io_channel, max;
+   int idx, io_channel;
 
/*
 * Create HBA Record arrays.
@@ -7997,15 +8004,6 @@ lpfc_sli4_queue_create(struct lpfc_hba *phba)
if (lpfc_alloc_nvme_wq_cq(phba, idx))
goto out_error;
 
-   /* allocate MRQ CQs */
-   max = phba->cfg_nvme_io_channel;
-   if (max < phba->cfg_nvmet_mrq)
-   max = phba->cfg_nvmet_mrq;
-
-   for (idx = 0; idx < max; idx++)
-   if (lpfc_alloc_nvme_wq_cq(phba, idx))
-   goto out_error;
-
if (phba->nvmet_support) {
for (idx = 0; idx < phba->cfg_nvmet_mrq; idx++) {
qdesc = lpfc_sli4_queue_alloc(phba,
@@ -8227,11 +8225,11 @@ lpfc_sli4_queue_destroy(struct lpfc_hba *phba)
 
/* Release FCP cqs */
lpfc_sli4_release_queues(&phba->sli4_hba.fcp_cq,
-   phba->cfg_fcp_io_channel);
+phba->cfg_fcp_io_channel);
 
/* Release FCP wqs */
lpfc_sli4_release_queues(&phba->sli4_hba.fcp_wq,
-   phba->cfg_fcp_io_channel);
+phba->cfg_fcp_io_channel);
 
/* Release FCP CQ mapping array */
lpfc_sli4_release_queue_map(&phba->sli4_hba.fcp_cq_map);
@@ -8577,15 +8575,15 @@ lpfc_sli4_queue_setup(struct lpfc_hba *phba)
lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
"0528 %s not allocated\n",
phba->sli4_hba.mbx_cq ?
-   "Mailbox WQ" : "Mailbox CQ");
+   "Mailbox WQ" : "Mailbox CQ");
rc = -ENOMEM;
goto out_destroy;
}
 
rc = lpfc_create_wq_cq(phba, phba->sli4_hba.hba_eq[0

[PATCH-v1 05/22] Add debug messages for nvme/fcp resource allocation.

2017-04-19 Thread jsmart2021
From: James Smart 

The xri resources are split into pools for NVME and FCP IO when NVME is
enabled. There was not message in the log that identified this allocation.

Added debug message to log XRI split.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_init.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 6cc561b..b56da01 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -3508,6 +3508,12 @@ lpfc_sli4_scsi_sgl_update(struct lpfc_hba *phba)
spin_unlock(&phba->scsi_buf_list_put_lock);
spin_unlock_irq(&phba->scsi_buf_list_get_lock);
 
+   lpfc_printf_log(phba, KERN_INFO, LOG_SLI,
+   "6060 Current allocated SCSI xri-sgl count:%d, "
+   "maximum  SCSI xri count:%d (split:%d)\n",
+   phba->sli4_hba.scsi_xri_cnt,
+   phba->sli4_hba.scsi_xri_max, phba->cfg_xri_split);
+
if (phba->sli4_hba.scsi_xri_cnt > phba->sli4_hba.scsi_xri_max) {
/* max scsi xri shrinked below the allocated scsi buffers */
scsi_xri_cnt = phba->sli4_hba.scsi_xri_cnt -
-- 
2.1.0



[PATCH-v1 09/22] Fix extra line print in rqpair debug print.

2017-04-19 Thread jsmart2021
From: James Smart 

An extra blank line was being added the the rqpair printing.

Remove the extra line feed.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_debugfs.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_debugfs.c b/drivers/scsi/lpfc/lpfc_debugfs.c
index 76857e2..55a8d8f 100644
--- a/drivers/scsi/lpfc/lpfc_debugfs.c
+++ b/drivers/scsi/lpfc/lpfc_debugfs.c
@@ -3128,8 +3128,6 @@ __lpfc_idiag_print_rqpair(struct lpfc_queue *qp, struct 
lpfc_queue *datqp,
datqp->queue_id, datqp->entry_count,
datqp->entry_size, datqp->host_index,
datqp->hba_index);
-   len +=  snprintf(pbuffer + len, LPFC_QUE_INFO_GET_BUF_SIZE - len, "\n");
-
return len;
 }
 
-- 
2.1.0



[PATCH-v1 18/22] Add Fabric assigned WWN support.

2017-04-19 Thread jsmart2021
From: James Smart 

Adding support for Fabric assigned WWPN and WWNN.

Firmware sends first FLOGI to fabric with vendor version changes.
On link up driver gets updated service parameter with FAWWN assigned port
name.  Driver sends 2nd FLOGI with updated fawwpn and modifies the
vport->fc_portname in driver.

Note:
Soft wwpn will not be allowed when fawwpn is enabled.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc.h |  2 ++
 drivers/scsi/lpfc/lpfc_attr.c|  8 
 drivers/scsi/lpfc/lpfc_els.c |  8 +---
 drivers/scsi/lpfc/lpfc_hbadisc.c | 22 +-
 drivers/scsi/lpfc/lpfc_hw.h  |  3 +++
 drivers/scsi/lpfc/lpfc_hw4.h |  1 +
 drivers/scsi/lpfc/lpfc_init.c| 31 ---
 7 files changed, 64 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 0fc1450..6d7840b 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -474,6 +474,8 @@ struct lpfc_vport {
unsigned long rcv_buffer_time_stamp;
uint32_t vport_flag;
 #define STATIC_VPORT   1
+#define FAWWPN_SET 2
+#define FAWWPN_PARAM_CHG   4
 
uint16_t fdmi_num_disc;
uint32_t fdmi_hba_mask;
diff --git a/drivers/scsi/lpfc/lpfc_attr.c b/drivers/scsi/lpfc/lpfc_attr.c
index 4b32d02..e9d956d 100644
--- a/drivers/scsi/lpfc/lpfc_attr.c
+++ b/drivers/scsi/lpfc/lpfc_attr.c
@@ -2292,6 +2292,8 @@ lpfc_soft_wwn_enable_store(struct device *dev, struct 
device_attribute *attr,
struct lpfc_vport *vport = (struct lpfc_vport *) shost->hostdata;
struct lpfc_hba   *phba = vport->phba;
unsigned int cnt = count;
+   uint8_t vvvl = vport->fc_sparam.cmn.valid_vendor_ver_level;
+   u32 *fawwpn_key = (uint32_t *)&vport->fc_sparam.un.vendorVersion[0];
 
/*
 * We're doing a simple sanity check for soft_wwpn setting.
@@ -2305,6 +2307,12 @@ lpfc_soft_wwn_enable_store(struct device *dev, struct 
device_attribute *attr,
 * here. The intent is to protect against the random user or
 * application that is just writing attributes.
 */
+   if (vvvl == 1 && (cpu_to_be32(*fawwpn_key)) == FAPWWN_KEY_VENDOR) {
+   lpfc_printf_log(phba, KERN_ERR, LOG_INIT,
+"0051 "LPFC_DRIVER_NAME" soft wwpn can not"
+" be enabled: fawwpn is enabled\n");
+   return -EINVAL;
+   }
 
/* count may include a LF at end of string */
if (buf[cnt-1] == '\n')
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index ffdcac0..35fc260 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -603,9 +603,11 @@ lpfc_check_clean_addr_bit(struct lpfc_vport *vport,
memcmp(&vport->fabric_portname, &sp->portName,
sizeof(struct lpfc_name)) ||
memcmp(&vport->fabric_nodename, &sp->nodeName,
-   sizeof(struct lpfc_name)))
+   sizeof(struct lpfc_name)) ||
+   (vport->vport_flag & FAWWPN_PARAM_CHG)) {
fabric_param_changed = 1;
-
+   vport->vport_flag &= ~FAWWPN_PARAM_CHG;
+   }
/*
 * Word 1 Bit 31 in common service parameter is overloaded.
 * Word 1 Bit 31 in FLOGI request is multiple NPort request
@@ -8755,7 +8757,7 @@ lpfc_issue_els_fdisc(struct lpfc_vport *vport, struct 
lpfc_nodelist *ndlp,
pcmd += sizeof(uint32_t); /* Node Name */
pcmd += sizeof(uint32_t); /* Node Name */
memcpy(pcmd, &vport->fc_nodename, 8);
-
+   memset(sp->un.vendorVersion, 0, sizeof(sp->un.vendorVersion));
lpfc_set_disctmo(vport);
 
phba->fc_stat.elsXmitFDISC++;
diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c
index d313dde..e824ec1 100644
--- a/drivers/scsi/lpfc/lpfc_hbadisc.c
+++ b/drivers/scsi/lpfc/lpfc_hbadisc.c
@@ -3002,6 +3002,7 @@ lpfc_mbx_cmpl_read_sparam(struct lpfc_hba *phba, 
LPFC_MBOXQ_t *pmb)
MAILBOX_t *mb = &pmb->u.mb;
struct lpfc_dmabuf *mp = (struct lpfc_dmabuf *) pmb->context1;
struct lpfc_vport  *vport = pmb->vport;
+   struct Scsi_Host *shost = lpfc_shost_from_vport(vport);
struct serv_parm *sp = &vport->fc_sparam;
uint32_t ed_tov;
 
@@ -3031,6 +3032,7 @@ lpfc_mbx_cmpl_read_sparam(struct lpfc_hba *phba, 
LPFC_MBOXQ_t *pmb)
}
 
lpfc_update_vport_wwn(vport);
+   fc_host_port_name(shost) = wwn_to_u64(vport->fc_portname.u.wwn);
if (vport->port_type == LPFC_PHYSICAL_PORT) {
memcpy(&phba->wwnn, &vport->fc_nodename, sizeof(phba->wwnn));
memcpy(&phba->wwpn, &vport->fc_portname, sizeof(phba->wwnn));
@@ -3309,6 +3311,7 @@ lpfc_mbx_cmpl_read_topology(struct lpfc_hba *phba, 
LPFC_MBOXQ_t *pmb)
struct lpfc_sli_ring *pring;
MAILBOX_t *mb = &pmb->u.mb;
struct lpfc_dmabuf *

[PATCH-v1 19/22] Fix implicit logo and RSCN handling for NVMET

2017-04-19 Thread jsmart2021
From: James Smart 

NVMET didn't have any RSCN handling at all and
would not execute implicit LOGO when receiving a PLOGI
from an rport that NVMET had in state UNMAPPED.

Clean up the logic in lpfc_nlp_state_cleanup for
initiators (FCP and NVME). NVMET should not respond to
RSCN including allocating new ndlps so this code was
conditionalized when nvmet_support is true.  The check
for NLP_RCV_PLOGI in lpfc_setup_disc_node was moved
below the check for nvmet_support to allow the NVMET
to recover initiator nodes correctly.  The implicit
logo was introduced with lpfc_rcv_plogi when NVMET gets
a PLOGI on an ndlp in UNMAPPED state.  The RSCN handling
was modified to not respond to an RSCN in NVMET.  Instead
NVMET sends a GID_FT and determines if an NVMEP_INITIATOR
it has is UNMAPPED but no longer in the zone membership.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_ct.c| 68 +-
 drivers/scsi/lpfc/lpfc_disc.h  |  1 +
 drivers/scsi/lpfc/lpfc_els.c   | 23 ++---
 drivers/scsi/lpfc/lpfc_hbadisc.c   | 62 ++
 drivers/scsi/lpfc/lpfc_nportdisc.c |  8 +++--
 5 files changed, 110 insertions(+), 52 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_ct.c b/drivers/scsi/lpfc/lpfc_ct.c
index d3e9af9..1487406 100644
--- a/drivers/scsi/lpfc/lpfc_ct.c
+++ b/drivers/scsi/lpfc/lpfc_ct.c
@@ -537,19 +537,53 @@ lpfc_prep_node_fc4type(struct lpfc_vport *vport, uint32_t 
Did, uint8_t fc4_type)
}
 }
 
+static void
+lpfc_ns_rsp_audit_did(struct lpfc_vport *vport, uint32_t Did, uint8_t fc4_type)
+{
+   struct lpfc_hba *phba = vport->phba;
+   struct lpfc_nodelist *ndlp = NULL;
+   struct Scsi_Host *shost = lpfc_shost_from_vport(vport);
+
+   /*
+* To conserve rpi's, filter out addresses for other
+* vports on the same physical HBAs.
+*/
+   if (Did != vport->fc_myDID &&
+   (!lpfc_find_vport_by_did(phba, Did) ||
+vport->cfg_peer_port_login)) {
+   if (!phba->nvmet_support) {
+   /* FCPI/NVMEI path. Process Did */
+   lpfc_prep_node_fc4type(vport, Did, fc4_type);
+   return;
+   }
+   /* NVMET path.  NVMET only cares about NVMEI nodes. */
+   list_for_each_entry(ndlp, &vport->fc_nodes, nlp_listp) {
+   if (ndlp->nlp_type != NLP_NVME_INITIATOR ||
+   ndlp->nlp_state != NLP_STE_UNMAPPED_NODE)
+   continue;
+   spin_lock_irq(shost->host_lock);
+   if (ndlp->nlp_DID == Did)
+   ndlp->nlp_flag &= ~NLP_NVMET_RECOV;
+   else
+   ndlp->nlp_flag |= NLP_NVMET_RECOV;
+   spin_unlock_irq(shost->host_lock);
+   }
+   }
+}
+
 static int
 lpfc_ns_rsp(struct lpfc_vport *vport, struct lpfc_dmabuf *mp, uint8_t fc4_type,
uint32_t Size)
 {
-   struct lpfc_hba  *phba = vport->phba;
struct lpfc_sli_ct_request *Response =
(struct lpfc_sli_ct_request *) mp->virt;
-   struct lpfc_nodelist *ndlp = NULL;
struct lpfc_dmabuf *mlast, *next_mp;
uint32_t *ctptr = (uint32_t *) & Response->un.gid.PortType;
uint32_t Did, CTentry;
int Cnt;
struct list_head head;
+   struct Scsi_Host *shost = lpfc_shost_from_vport(vport);
+   struct lpfc_nodelist *ndlp = NULL;
 
lpfc_set_disctmo(vport);
vport->num_disc_nodes = 0;
@@ -574,19 +608,7 @@ lpfc_ns_rsp(struct lpfc_vport *vport, struct lpfc_dmabuf 
*mp, uint8_t fc4_type,
/* Get next DID from NameServer List */
CTentry = *ctptr++;
Did = ((be32_to_cpu(CTentry)) & Mask_DID);
-
-   ndlp = NULL;
-
-   /*
-* Check for rscn processing or not
-* To conserve rpi's, filter out addresses for other
-* vports on the same physical HBAs.
-*/
-   if ((Did != vport->fc_myDID) &&
-   ((lpfc_find_vport_by_did(phba, Did) == NULL) ||
-vport->cfg_peer_port_login))
-   lpfc_prep_node_fc4type(vport, Did, fc4_type);
-
+   lpfc_ns_rsp_audit_did(vport, Did, fc4_type);
if (CTentry & (cpu_to_be32(SLI_CT_LAST_ENTRY)))
goto nsout1;
 
@@ -596,6 +618,22 @@ lpfc_ns_rsp(struct lpfc_vport *vport, struct lpfc_dmabuf 
*mp, uint8_t fc4_type,
 
}
 
+   /* All GID_FT entries processed.  If the driver is running in
+* in target mode, put impacted nodes into recovery and drop
+* the RPI to flush outstanding IO.
+*/
+   if (vport->p

[PATCH-v1 10/22] Fix PRLI ACC rsp for NVME

2017-04-19 Thread jsmart2021
From: James Smart 

PRLI ACC from target is FCP oriented.

Word 0 was wrong. This was noticed by another nvmet-fc vendor that
was testing the lpfc nvme-fc initiator with their target.

Verified results with analyzer.

PRLI
BC B5 56 56  22 61 04 00  00 61 00 00  01 29 00 00  20 00 00 00
00 10 FF FF  00 00 00 00  20 14 00 18  28 00 00 00  00 00 00 00
00 00 00 00  00 00 00 20  00 00 00 00  9C D8 DA C9  BC 95 75 75

ACC
BC B5 56 56  23 61 00 00  00 61 04 00  01 98 00 00  30 00 00 00
00 10 00 18  00 00 00 00  02 14 00 18  28 00 01 00  00 00 00 00
00 00 00 00  00 00 00 18  00 00 00 00  B0 6B 07 57  BC B5 75 75

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_els.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index d9c61d0..d6e0f58 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -4403,7 +4403,7 @@ lpfc_els_rsp_prli_acc(struct lpfc_vport *vport, struct 
lpfc_iocbq *oldiocb,
pcmd = (uint8_t *) (((struct lpfc_dmabuf *) elsiocb->context2)->virt);
memset(pcmd, 0, cmdsize);
 
-   *((uint32_t *) (pcmd)) = (ELS_CMD_ACC | (ELS_CMD_PRLI & ~ELS_RSP_MASK));
+   *((uint32_t *)(pcmd)) = elsrspcmd;
pcmd += sizeof(uint32_t);
 
/* For PRLI, remainder of payload is PRLI parameter page */
-- 
2.1.0



[PATCH-v1 02/22] Fix nvme unregister port timeout.

2017-04-19 Thread jsmart2021
From: James Smart 

During some link event testing it was observed that the
wait_for_completion_timeout in the lpfc_nvme_unregister_port
was timing out all the time.

The initiator is claiming the nvme_fc_unregister_remoteport upcall is
not completing the unregister in the time allotted.
[ 2186.151317] lpfc :07:00.0: 0:(0):6169 Unreg nvme wait failed 0

 The wait_for_completion_timeout returns 0 when the wait has
been outstanding for the jiffies passed by the caller.  In this error
message, the nvme initiator passed value 5 - meaning 5 jiffies -
and this is just wrong.

Calculate 5 seconds in Jiffies and pass that value
from the current jiffies.

Also the log message for the unregister timeout was reduced
because timeout failure is the same as timeout.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_nvme.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 0024de1..a39d72c 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -2409,6 +2409,7 @@ lpfc_nvme_unregister_port(struct lpfc_vport *vport, 
struct lpfc_nodelist *ndlp)
struct lpfc_nvme_lport *lport;
struct lpfc_nvme_rport *rport;
struct nvme_fc_remote_port *remoteport;
+   unsigned long wait_tmo;
 
localport = vport->localport;
 
@@ -2451,11 +2452,12 @@ lpfc_nvme_unregister_port(struct lpfc_vport *vport, 
struct lpfc_nodelist *ndlp)
 * before proceeding.  This guarantees the transport and driver
 * have completed the unreg process.
 */
-   ret = wait_for_completion_timeout(&rport->rport_unreg_done, 5);
+   wait_tmo = msecs_to_jiffies(5000);
+   ret = wait_for_completion_timeout(&rport->rport_unreg_done,
+ wait_tmo);
if (ret == 0) {
lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_DISC,
-"6169 Unreg nvme wait failed %d\n",
-ret);
+"6169 Unreg nvme wait timeout\n");
}
}
return;
-- 
2.1.0



[PATCH-v1 03/22] Fix rejected nvme LS Req.

2017-04-19 Thread jsmart2021
From: James Smart 

In this case, the NVME initiator is sending an LS REQ command on an NDLP
that is not MAPPED.  The FW rejects it.

 The lpfc_nvme_ls_req routine checks for a NULL ndlp pointer
but does not check the NDLP state.  This allows the routine
to send an LS IO when the ndlp is disconnected.

 Check the ndlp for NULL, actual node, Target and MAPPED
or Initiator and UNMAPPED. This avoids Fabric nodes getting
the Create Association or Create Connection commands.  Initiators
are free to Reject either Create.
Also some of the messages numbers in lpfc_nvme_ls_req were changed because
they were already used in other log messages.

Signed-off-by: Dick Kennedy 
Signed-off-by: James Smart 
---
 drivers/scsi/lpfc/lpfc_nvme.c | 25 -
 1 file changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index a39d72c..2a87428 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -417,11 +417,26 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport,
vport = lport->vport;
 
ndlp = lpfc_findnode_did(vport, pnvme_rport->port_id);
-   if (!ndlp) {
-   lpfc_printf_vlog(vport, KERN_ERR, LOG_NVME_DISC,
-"6043 Could not find node for DID %x\n",
+   if (!ndlp || !NLP_CHK_NODE_ACT(ndlp)) {
+   lpfc_printf_vlog(vport, KERN_ERR, LOG_NODE | LOG_NVME_IOERR,
+"6051 DID x%06x not an active rport.\n",
 pnvme_rport->port_id);
-   return 1;
+   return -ENODEV;
+   }
+
+   /* The remote node has to be a mapped nvme target or an
+* unmapped nvme initiator or it's an error.
+*/
+   if (((ndlp->nlp_type & NLP_NVME_TARGET) &&
+(ndlp->nlp_state != NLP_STE_MAPPED_NODE)) ||
+   ((ndlp->nlp_type & NLP_NVME_INITIATOR) &&
+(ndlp->nlp_state != NLP_STE_UNMAPPED_NODE))) {
+   lpfc_printf_vlog(vport, KERN_ERR, LOG_NODE | LOG_NVME_IOERR,
+"6088 DID x%06x not ready for "
+"IO. State x%x, Type x%x\n",
+pnvme_rport->port_id,
+ndlp->nlp_state, ndlp->nlp_type);
+   return -ENODEV;
}
bmp = kmalloc(sizeof(struct lpfc_dmabuf), GFP_KERNEL);
if (!bmp) {
@@ -456,7 +471,7 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport,
 
/* Expand print to include key fields. */
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
-"6051 ENTER.  lport %p, rport %p lsreq%p rqstlen:%d "
+"6149 ENTER.  lport %p, rport %p lsreq%p rqstlen:%d "
 "rsplen:%d %pad %pad\n",
 pnvme_lport, pnvme_rport,
 pnvme_lsreq, pnvme_lsreq->rqstlen,
-- 
2.1.0



[PATCH-v1 00/22] lpfc updates for 11.2.0.12

2017-04-19 Thread jsmart2021
From: James Smart 

This patch set provides a number of bug fixes, code cleanups, and error
handling, mostly in the nvme area of lpfc.

There is one new feature in the series to add Fabric Assigned WWN support.

The patches were cut against the linux-block tree, for-4.12/block branch,
and are intended to be pulled in through that tree rather than the scsi trees.

The patches are dependent on the FC nvme/nvmet patches from the following 2
series:
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009250.html
http://lists.infradead.org/pipermail/linux-nvme/2017-April/009256.html

James Smart (22):
  Standardize nvme SGL segment count
  Fix nvme unregister port timeout.
  Fix rejected nvme LS Req.
  Fix log message in completion path.
  Add debug messages for nvme/fcp resource allocation.
  Fix spelling in comments.
  Remove unused defines for NVME PostBuf.
  Remove NULL ptr check before kfree.
  Fix extra line print in rqpair debug print.
  Fix PRLI ACC rsp for NVME
  Fix driver unload/reload operation.
  Fix driver usage of 128B WQEs when WQ_CREATE is V1.
  Fix nvme initiator handling when not enabled.
  Remove hba lock from NVMET issue WQE.
  Fix driver load issues when MRQ=8
  Fix crash after issuing lip reset
  Fix max_sgl_segments settings for NVME / NVMET
  Add Fabric assigned WWN support.
  Fix implicit logo and RSCN handling for NVMET
  Update ABORT processing for NVMET.
  Fix Express lane queue creation.
  lpfc revison 11.2.0.12

 drivers/scsi/lpfc/lpfc.h   |   5 +-
 drivers/scsi/lpfc/lpfc_attr.c  |  10 +-
 drivers/scsi/lpfc/lpfc_bsg.c   |   4 +
 drivers/scsi/lpfc/lpfc_crtn.h  |   9 +-
 drivers/scsi/lpfc/lpfc_ct.c|  68 +--
 drivers/scsi/lpfc/lpfc_debugfs.c   |  69 ---
 drivers/scsi/lpfc/lpfc_disc.h  |   1 +
 drivers/scsi/lpfc/lpfc_els.c   |  68 +++
 drivers/scsi/lpfc/lpfc_hbadisc.c   | 131 +
 drivers/scsi/lpfc/lpfc_hw.h|   3 +
 drivers/scsi/lpfc/lpfc_hw4.h   |   4 +
 drivers/scsi/lpfc/lpfc_init.c  | 202 +---
 drivers/scsi/lpfc/lpfc_mbox.c  |   7 +-
 drivers/scsi/lpfc/lpfc_nportdisc.c |   8 +-
 drivers/scsi/lpfc/lpfc_nvme.c  | 139 ++
 drivers/scsi/lpfc/lpfc_nvme.h  |   7 +-
 drivers/scsi/lpfc/lpfc_nvmet.c | 378 -
 drivers/scsi/lpfc/lpfc_nvmet.h |  14 +-
 drivers/scsi/lpfc/lpfc_sli.c   |  22 ++-
 drivers/scsi/lpfc/lpfc_sli4.h  |   2 +-
 drivers/scsi/lpfc/lpfc_version.h   |   2 +-
 drivers/scsi/lpfc/lpfc_vport.c |   3 +-
 22 files changed, 814 insertions(+), 342 deletions(-)

-- 
2.1.0



Re: [PATCH] ibmvscsis: Do not send aborted task response

2017-04-19 Thread Martin K. Petersen
"Bryant G. Ly"  writes:

> The driver is sending a response to the aborted task response along
> with LIO sending the tmr response.  ibmvscsis_tgt does not send the
> response to the client until release_cmd time. The reason for this was
> because if we did it at queue_status time, then the client would be
> free to reuse the tag for that command, but we're still using the tag
> until the command is released at release_cmd time, so we chose to
> delay sending the response until then. That then caused this issue,
> because release_cmd is always called, even if queue_status is not.
>
> SCSI spec says that the initiator that sends the abort task TM NEVER
> gets a response to the aborted op and with the current code it will
> send a response. Thus this fix will remove that response if the TAS
> bit is set.

Somebody please review!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH V2] scsi: mpt3sas: remove redundant wmb

2017-04-19 Thread Martin K. Petersen
Sinan Kaya  writes:

> Due to relaxed ordering requirements on multiple architectures,
> drivers are required to use wmb/rmb/mb combinations when they need to
> guarantee observability between the memory and the HW.
>
> The mpt3sas driver is already using wmb() for this purpose.  However,
> it issues a writel following wmb(). writel() function on arm/arm64
> arhictectures have an embedded wmb() call inside.
>
> This results in unnecessary performance loss and code duplication.
>
> writel already guarantees ordering for both cpu and bus. we don't need
> additional wmb()

Broadcom folks, please review!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: commit 1b6ac5e3f "fnic: Using rport->dd_data to check rport online instead of rport_lookup."

2017-04-19 Thread Martin K. Petersen
Joe Jin  writes:

> My customer hit below error when issue LIP to fnic controller:
>
> [94702.898408] sd 2:0:4:1: [sdx] tag#1 FAILED Result:
> hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [94702.898416] sd 2:0:4:1: [sdx] tag#1 CDB: Write(10) 2a 00 04 56 c0 08 00 00 
> 08 00
> [94702.898420] blk_update_request: I/O error, dev sdx, sector 72794120
> [94702.898455] sd 2:0:4:10: [sdy] tag#2 FAILED Result:
> hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
> [94702.898459] sd 2:0:4:10: [sdy] tag#2 CDB: Read(10) 28 00 00 00 08 a8 00 00 
> 02 00
> [94702.898462] blk_update_request: I/O error, dev sdy, sector 2216
>
> Looked at all changes of fnic I found it caused by commit 1b6ac5e3f "fnic: 
> Using rport->dd_data to check rport online instead of rport_lookup.", a 
> question is why changed state check from RPORT_ST_DELETE to RPORT_ST_READY
> here?

Satish?

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] [SCSI] aic7xxx: fix order of arguments in function prototype

2017-04-19 Thread Martin K. Petersen
Colin King  writes:

> The vendor/device and subvendor/subdevice arguments to the function
> prototype ahc_9005_subdevinfo_valid are in the wrong order and need to
> be swapped to fix this.  Detected with PVS-Studio studio.

Applied to 4.12/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCHv6 8/8] scsi: inline command aborts

2017-04-19 Thread Martin K. Petersen
Bart Van Assche  writes:

> On Thu, 2017-04-06 at 15:36 +0200, Hannes Reinecke wrote:
>> The block layer always calls the timeout function from a workqueue
>> context, so there is no need to have yet another workqueue for
>> running command aborts.
>> 
>> [ ... ]
>> @@ -271,10 +266,14 @@ enum blk_eh_timer_return scsi_times_out(struct request 
>> *req)
>>  rtn = host->hostt->eh_timed_out(scmd);
>>  
>>  if (rtn == BLK_EH_NOT_HANDLED) {
>> -if (scsi_abort_command(scmd) != SUCCESS) {
>> +int ret;
>> +
>> +ret = scsi_abort_command(scmd);
>> +if (ret == FAILED) {
>>  set_host_byte(scmd, DID_TIME_OUT);
>>  scsi_eh_scmd_add(scmd);
>> -}
>> +} else if (ret == FAST_IO_FAIL)
>> +rtn = BLK_EH_RESET_TIMER;
>>  }
>
> Has this patch been tested with the traditional block layer? For the
> traditional block layer scsi_times_out() is called with the queue lock
> held. Does this patch cause .eh_abort_handler(), a function that may
> sleep, to be called with the queue lock held?

Hannes: Ping!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] bfa: remove bfa_module_s madness

2017-04-19 Thread Martin K. Petersen
Christoph Hellwig  writes:

> Just call the functions directly and remove a giant pile of
> boilerplate code.

Applied to 4.12/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] scsi: qedi,qedf: Use designated initializers

2017-04-19 Thread Martin K. Petersen

Kees,

> Prepare to mark sensitive kernel structures for randomization by
> making sure they're using designated initializers. These were
> identified during allyesconfig builds of x86, arm, and arm64, with
> most initializer fixes extracted from grsecurity.
>
> For these cases, terminate the list with { }, which will be
> zero-filled, instead of undesignated NULLs.

Applied to 4.12/scsi-queue. Thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 08/23] scsi: introduce a result field in struct scsi_request

2017-04-19 Thread Martin K. Petersen
Christoph Hellwig  writes:

> This passes on the scsi_cmnd result field to users of passthrough
> requests.  Currently we abuse req->errors for this purpose, but that
> field will go away in its current form.

Reviewed-by: Martin K. Petersen 

> Note that the old IDE code abuses the errors field in very creative
> ways and stores all kinds of different values in it.  I didn't dare
> to touch this magic, so the abuses are brought forward 1:1.

Really wish we could just obliterate drivers/ide.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 26/27] scsi: sd: Separate zeroout and discard command choices

2017-04-19 Thread Martin K. Petersen

Paolo,

> Should this be conditional on lbprz, lbpws, lbpws10 and max_ws_blocks?

It is intentional that things can be overridden from userland for
devices that report the "wrong" thing. We do the same for discard so
people can set up udev rules.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Dan Williams
On Wed, Apr 19, 2017 at 3:55 PM, Logan Gunthorpe  wrote:
>
>
> On 19/04/17 02:48 PM, Jason Gunthorpe wrote:
>> On Wed, Apr 19, 2017 at 01:41:49PM -0600, Logan Gunthorpe wrote:
>>
 But.. it could point to a GPU and the GPU struct device could have a
 proxy dma_ops like Dan pointed out.
>>>
>>> Seems a bit awkward to me that in order for the intended use case, you
>>> have to proxy the dma_ops. I'd probably still suggest throwing a couple
>>> ops for things like this in the dev_pagemap.
>>
>> Another option is adding a new 'struct completer_dma_ops *' to struct
>> device for this use case.
>>
>> Seems like a waste to expand dev_pagemap when we only need a unique
>> value per struct device?
>
> I feel like expanding dev_pagemap has a much lower impact than expanding
> struct device... dev_pagemap is only one instance per zone device region
> so expanding it shouldn't be a huge issue. Expanding struct device means
> every device struct in the system gets bigger.

Especially since we expect a very small subset of devices will ever support p2p.


Re: [PATCH v3 8/8] scsi: Implement blk_mq_ops.show_rq()

2017-04-19 Thread Bart Van Assche
On Wed, 2017-04-19 at 19:25 -0400, Martin K. Petersen wrote:
> Bart Van Assche  writes:
> > Show the SCSI CDB, .eh_eflags and .result for pending SCSI commands
> > in /sys/kernel/debug/block/*/mq/*/dispatch and */rq_list.
> 
> SCSI_EH_CANCEL_CMD is no more (as of a06586325f37).

Thanks Martin. I will remove that flag when I repost this patch series.

Bart.

Re: [PATCH] scsi: osd_uld: fix mismerge

2017-04-19 Thread Stephen Rothwell
Hi Bart,

On Wed, 19 Apr 2017 17:46:43 + Bart Van Assche  
wrote:
>
> On Wed, 2017-04-19 at 19:44 +0200, Arnd Bergmann wrote:
> > Fixes: c02465fa13b6 ("scsi: osd_uld: Check scsi_device_get() return value")
> > Fixes: ac1ddc584e98 ("scsi: utilize new cdev_device_add helper function")  
> 
> Both these patches are fine as far as I can tell. Shouldn't the
> "Fixes" tag refer to the merge commit instead of to the commits
> that have been merged?

Unfortunately, the merge commit is redone every day and will be redone
by Linus during the merge window.  This fix should be incorporated into
the merge commit itself.  I should have done it originally, but the
conflict was such a mess that I took the easy way out :-)
-- 
Cheers,
Stephen Rothwell


Re: [PATCH] scsi: osd_uld: fix mismerge

2017-04-19 Thread Stephen Rothwell
Hi Arnd,

On Wed, 19 Apr 2017 19:44:01 +0200 Arnd Bergmann  wrote:
>
> A mismerge between two branches in linux-next reintroduced a warning that was
> previously resolved:
> 
> drivers/scsi/osd/osd_uld.c: In function 'osd_probe':
> drivers/scsi/osd/osd_uld.c:457:2: error: ignoring return value of 
> 'scsi_device_get', declared with attribute warn_unused_result 
> [-Werror=unused-result]
> 
> The original fix had more complex error unrolling, but as far as I
> can tell, this simpler version is now sufficient.
> 
> Fixes: c02465fa13b6 ("scsi: osd_uld: Check scsi_device_get() return value")
> Fixes: ac1ddc584e98 ("scsi: utilize new cdev_device_add helper function")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/scsi/osd/osd_uld.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c
> index 8b9941a5687a..0e56f1eb05dc 100644
> --- a/drivers/scsi/osd/osd_uld.c
> +++ b/drivers/scsi/osd/osd_uld.c
> @@ -454,7 +454,8 @@ static int osd_probe(struct device *dev)
>   /* hold one more reference to the scsi_device that will get released
>* in __release, in case a logout is happening while fs is mounted
>*/
> - scsi_device_get(scsi_device);
> + if (scsi_device_get(scsi_device))
> + goto err_retract_minor;
>   osd_dev_init(&oud->od, scsi_device);
>  
>   /* allocate a disk and set it up */

I will add this as a merge fix patch for the merge of the char-misc and
scsi trees today.  Someone needs to let Linus know when these trees are
merged.

-- 
Cheers,
Stephen Rothwell


Re: [PATCH v3 8/8] scsi: Implement blk_mq_ops.show_rq()

2017-04-19 Thread Martin K. Petersen
Bart Van Assche  writes:

Bart,

> Show the SCSI CDB, .eh_eflags and .result for pending SCSI commands
> in /sys/kernel/debug/block/*/mq/*/dispatch and */rq_list.

SCSI_EH_CANCEL_CMD is no more (as of a06586325f37).

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v3 0/2] scsi: storvsc: Add support for FC rport

2017-04-19 Thread Martin K. Petersen

Cathy,

> The updated patch set provides a way for drivers ( specifically
> storvsc in this case ) that expose virturalized fc devices
> but that do not expose rports to be manually scanned. This is
> done via creating a pseudo rport in storvsc and a
> corresponding dummy initiator rport role in the fc transport.

Applied to 4.12/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH] virtio_scsi: Always try to read VPD pages

2017-04-19 Thread Martin K. Petersen

David,

> Passed through SCSI targets may have transfer limits which come from
> the host SCSI controller something on the host side other than the
> target itself.
>
> To make this work properly, the hypervisor can adjust the target's VPD
> information to advertise these limits.  But for that to work, the
> guest has to look at the VPD pages, which we won't do by default if it
> is an SPC-2 device, even if it does actually support it.

Applied to 4.12/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe


On 19/04/17 02:48 PM, Jason Gunthorpe wrote:
> On Wed, Apr 19, 2017 at 01:41:49PM -0600, Logan Gunthorpe wrote:
> 
>>> But.. it could point to a GPU and the GPU struct device could have a
>>> proxy dma_ops like Dan pointed out.
>>
>> Seems a bit awkward to me that in order for the intended use case, you
>> have to proxy the dma_ops. I'd probably still suggest throwing a couple
>> ops for things like this in the dev_pagemap.
> 
> Another option is adding a new 'struct completer_dma_ops *' to struct
> device for this use case.
> 
> Seems like a waste to expand dev_pagemap when we only need a unique
> value per struct device?

I feel like expanding dev_pagemap has a much lower impact than expanding
struct device... dev_pagemap is only one instance per zone device region
so expanding it shouldn't be a huge issue. Expanding struct device means
every device struct in the system gets bigger.

Logan



Re: [PATCH 23/23] block: remove the errors field from struct request

2017-04-19 Thread Bart Van Assche
On Wed, 2017-04-19 at 21:27 +0200, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> Acked-by: Roger Pau Monné 
> Reviewed-by: Konrad Rzeszutek Wilk 

Reviewed-by: Bart Van Assche 

Re: [PATCH 02/23] block: remove the blk_execute_rq return value

2017-04-19 Thread Bart Van Assche
On Wed, 2017-04-19 at 21:26 +0200, Christoph Hellwig wrote:
> --- a/drivers/scsi/osd/osd_initiator.c
> +++ b/drivers/scsi/osd/osd_initiator.c
> @@ -489,7 +489,10 @@ static void _set_error_resid(struct osd_request *or, 
> struct request *req,
>  
>  int osd_execute_request(struct osd_request *or)
>  {
> - int error = blk_execute_rq(or->request->q, NULL, or->request, 0);
> + int error;
> +
> + blk_execute_rq(or->request->q, NULL, or->request, 0);
> + error = or->request ? -EIO : 0;

Hello Christoph,

Did you perhaps intend or->request->errors instead of or->request?

Otherwise this patch looks fine to me.

Bart.

Re: [PATCH 01/23] pd: don't check blk_execute_rq return value.

2017-04-19 Thread Bart Van Assche
On Wed, 2017-04-19 at 21:26 +0200, Christoph Hellwig wrote:
> The driver never sets req->errors, so blk_execute_rq will always return 0.

Reviewed-by: Bart Van Assche 

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jason Gunthorpe
On Wed, Apr 19, 2017 at 01:41:49PM -0600, Logan Gunthorpe wrote:

> > But.. it could point to a GPU and the GPU struct device could have a
> > proxy dma_ops like Dan pointed out.
> 
> Seems a bit awkward to me that in order for the intended use case, you
> have to proxy the dma_ops. I'd probably still suggest throwing a couple
> ops for things like this in the dev_pagemap.

Another option is adding a new 'struct completer_dma_ops *' to struct
device for this use case.

Seems like a waste to expand dev_pagemap when we only need a unique
value per struct device?

Jason


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe


On 19/04/17 01:31 PM, Jason Gunthorpe wrote:
> Try it with VT-D turned on. It shouldn't work or there is a notable
> security hole in your platform..

Ah, ok.

>>>  const struct dma_map_ops *comp_ops = get_dma_ops(completer);
>>>  const struct dma_map_ops *init_ops = get_dma_ops(initiator);
>>
>> So, in this case, what device does the completer point to? The PCI
>> device or a more specific GPU device? If it's the former, who's
>> responsible for setting the new dma_ops? Typically the dma_ops are arch
>> specific but now you'd be adding ones that are tied to hmm or the gpu.
> 
> Donno, that is for GPU folks to figure out :)
> 
> But.. it could point to a GPU and the GPU struct device could have a
> proxy dma_ops like Dan pointed out.

Seems a bit awkward to me that in order for the intended use case, you
have to proxy the dma_ops. I'd probably still suggest throwing a couple
ops for things like this in the dev_pagemap.

>> It appears to me like it's calculating the DMA address, and the check is
>> just a side requirement. It reads as though it's only doing the check.
> 
> pci_p2p_same_segment_get_pa() then?

Ok, I think that's a bit clearer.

Logan


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jason Gunthorpe
On Wed, Apr 19, 2017 at 01:02:49PM -0600, Logan Gunthorpe wrote:
> 
> 
> On 19/04/17 12:32 PM, Jason Gunthorpe wrote:
> > On Wed, Apr 19, 2017 at 12:01:39PM -0600, Logan Gunthorpe wrote:
> > Not entirely, it would have to call through the whole process
> > including the arch_p2p_cross_segment()..
> 
> Hmm, yes. Though it's still not clear what, if anything,
> arch_p2p_cross_segment would be doing.

Sets up the iommu for arches that place a iommu between the pci root
port and other pci root ports.

> In my experience, if you are going between host bridges, the CPU
> address (or PCI address -- I'm not sure which seeing they are the
> same on my system) would still work fine

Try it with VT-D turned on. It shouldn't work or there is a notable
security hole in your platform..

> >  const struct dma_map_ops *comp_ops = get_dma_ops(completer);
> >  const struct dma_map_ops *init_ops = get_dma_ops(initiator);
>
> So, in this case, what device does the completer point to? The PCI
> device or a more specific GPU device? If it's the former, who's
> responsible for setting the new dma_ops? Typically the dma_ops are arch
> specific but now you'd be adding ones that are tied to hmm or the gpu.

Donno, that is for GPU folks to figure out :)

But.. it could point to a GPU and the GPU struct device could have a
proxy dma_ops like Dan pointed out.

> >> I'm not sure I like the name pci_p2p_same_segment. It reads as though
> >> it's only checking if the devices are not the same segment.
> > 
> > Well, that is exactly what it is doing. If it succeeds then the caller
> > knows the DMA will not flow outside the segment and no iommu setup/etc
> > is required.
> 
> It appears to me like it's calculating the DMA address, and the check is
> just a side requirement. It reads as though it's only doing the check.

pci_p2p_same_segment_get_pa() then?

Jason


[PATCH 23/23] block: remove the errors field from struct request

2017-04-19 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
Acked-by: Roger Pau Monné 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 block/blk-core.c | 14 +-
 block/blk-exec.c |  3 +--
 block/blk-mq.c   | 10 +++---
 block/blk-timeout.c  |  1 -
 include/linux/blkdev.h   |  2 --
 include/trace/events/block.h | 17 +++--
 kernel/trace/blktrace.c  | 26 --
 7 files changed, 24 insertions(+), 49 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 8654aa0cef6d..48f522a02f54 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1633,7 +1633,6 @@ void init_request_from_bio(struct request *req, struct 
bio *bio)
if (bio->bi_opf & REQ_RAHEAD)
req->cmd_flags |= REQ_FAILFAST_MASK;
 
-   req->errors = 0;
req->__sector = bio->bi_iter.bi_sector;
blk_rq_set_prio(req, rq_ioc(bio));
if (ioprio_valid(bio_prio(bio)))
@@ -2567,22 +2566,11 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
 {
int total_bytes;
 
-   trace_block_rq_complete(req->q, req, nr_bytes);
+   trace_block_rq_complete(req, error, nr_bytes);
 
if (!req->bio)
return false;
 
-   /*
-* For fs requests, rq is just carrier of independent bio's
-* and each partial completion should be handled separately.
-* Reset per-request error on each partial completion.
-*
-* TODO: tj: This is too subtle.  It would be better to let
-* low level drivers do what they see fit.
-*/
-   if (!blk_rq_is_passthrough(req))
-   req->errors = 0;
-
if (error && !blk_rq_is_passthrough(req) &&
!(req->rq_flags & RQF_QUIET)) {
char *error_type;
diff --git a/block/blk-exec.c b/block/blk-exec.c
index afa383248c7c..a9451e3b8587 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -69,8 +69,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct 
gendisk *bd_disk,
 
if (unlikely(blk_queue_dying(q))) {
rq->rq_flags |= RQF_QUIET;
-   rq->errors = -ENXIO;
-   __blk_end_request_all(rq, rq->errors);
+   __blk_end_request_all(rq, -ENXIO);
spin_unlock_irq(q->queue_lock);
return;
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1f98db082da3..f5e3cf153ad7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -213,7 +213,6 @@ void blk_mq_rq_ctx_init(struct request_queue *q, struct 
blk_mq_ctx *ctx,
 #endif
rq->special = NULL;
/* tag was already set */
-   rq->errors = 0;
rq->extra_len = 0;
 
INIT_LIST_HEAD(&rq->timeout_list);
@@ -624,8 +623,7 @@ void blk_mq_abort_requeue_list(struct request_queue *q)
 
rq = list_first_entry(&rq_list, struct request, queuelist);
list_del_init(&rq->queuelist);
-   rq->errors = -EIO;
-   blk_mq_end_request(rq, rq->errors);
+   blk_mq_end_request(rq, -EIO);
}
 }
 EXPORT_SYMBOL(blk_mq_abort_requeue_list);
@@ -1032,8 +1030,7 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, 
struct list_head *list)
pr_err("blk-mq: bad return on queue: %d\n", ret);
case BLK_MQ_RQ_QUEUE_ERROR:
errors++;
-   rq->errors = -EIO;
-   blk_mq_end_request(rq, rq->errors);
+   blk_mq_end_request(rq, -EIO);
break;
}
 
@@ -1484,8 +1481,7 @@ static void __blk_mq_try_issue_directly(struct request 
*rq, blk_qc_t *cookie,
 
if (ret == BLK_MQ_RQ_QUEUE_ERROR) {
*cookie = BLK_QC_T_NONE;
-   rq->errors = -EIO;
-   blk_mq_end_request(rq, rq->errors);
+   blk_mq_end_request(rq, -EIO);
return;
}
 
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index a30441a200c0..cbff183f3d9f 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -89,7 +89,6 @@ static void blk_rq_timed_out(struct request *req)
ret = q->rq_timed_out_fn(req);
switch (ret) {
case BLK_EH_HANDLED:
-   /* Can we use req->errors here? */
__blk_complete_request(req);
break;
case BLK_EH_RESET_TIMER:
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e8006abce8b6..b9b958a3bd08 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -220,8 +220,6 @@ struct request {
 
void *special;  /* opaque pointer available for LLD use */
 
-   int errors;
-
unsigned int extra_len; /* length of alignment and padding */
 
unsigned long deadline;
diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index 99ed69fad041..d0dbe60d8a6d 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/bloc

[PATCH 21/23] swim3: remove (commented out) printing of req->errors

2017-04-19 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/block/swim3.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/swim3.c b/drivers/block/swim3.c
index 61b3ffa4f458..ba4809c9bdba 100644
--- a/drivers/block/swim3.c
+++ b/drivers/block/swim3.c
@@ -343,8 +343,8 @@ static void start_request(struct floppy_state *fs)
  req->rq_disk->disk_name, req->cmd,
  (long)blk_rq_pos(req), blk_rq_sectors(req),
  bio_data(req->bio));
-   swim3_dbg("   errors=%d current_nr_sectors=%u\n",
- req->errors, blk_rq_cur_sectors(req));
+   swim3_dbg("   current_nr_sectors=%u\n",
+ blk_rq_cur_sectors(req));
 #endif
 
if (blk_rq_pos(req) >= fs->total_secs) {
-- 
2.11.0



[PATCH 20/23] ataflop: switch from req->errors to req->error_count

2017-04-19 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/block/ataflop.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
index 2104b1b4ccda..fa69ecd52cb5 100644
--- a/drivers/block/ataflop.c
+++ b/drivers/block/ataflop.c
@@ -617,12 +617,12 @@ static void fd_error( void )
if (!fd_request)
return;
 
-   fd_request->errors++;
-   if (fd_request->errors >= MAX_ERRORS) {
+   fd_request->error_count++;
+   if (fd_request->error_count >= MAX_ERRORS) {
printk(KERN_ERR "fd%d: too many errors.\n", SelectedDrive );
fd_end_request_cur(-EIO);
}
-   else if (fd_request->errors == RECALIBRATE_ERRORS) {
+   else if (fd_request->error_count == RECALIBRATE_ERRORS) {
printk(KERN_WARNING "fd%d: recalibrating\n", SelectedDrive );
if (SelectedDrive != -1)
SUD.track = -1;
@@ -1386,7 +1386,7 @@ static void setup_req_params( int drive )
ReqData = ReqBuffer + 512 * ReqCnt;
 
if (UseTrackbuffer)
-   read_track = (ReqCmd == READ && fd_request->errors == 0);
+   read_track = (ReqCmd == READ && fd_request->error_count == 0);
else
read_track = 0;
 
@@ -1409,8 +1409,10 @@ static struct request *set_next_request(void)
fdc_queue = 0;
if (q) {
rq = blk_fetch_request(q);
-   if (rq)
+   if (rq) {
+   rq->error_count = 0;
break;
+   }
}
} while (fdc_queue != old_pos);
 
-- 
2.11.0



[PATCH 18/23] block: add a error_count field to struct request

2017-04-19 Thread Christoph Hellwig
This is for the legacy floppy and ataflop drivers that currently abuse
->errors for this purpose.  It's stashed away in a union to not grow
the struct size, the other fields are either used by modern drivers
for different purposes or the I/O scheduler before queing the I/O
to drivers.

Signed-off-by: Christoph Hellwig 
---
 include/linux/blkdev.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d8bcd51b8a47..e8006abce8b6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -175,6 +175,7 @@ struct request {
struct rb_node rb_node; /* sort/lookup */
struct bio_vec special_vec;
void *completion_data;
+   int error_count; /* for legacy drivers, don't use */
};
 
/*
-- 
2.11.0



[PATCH 17/23] blk-mq: simplify __blk_mq_complete_request

2017-04-19 Thread Christoph Hellwig
Merge blk_mq_ipi_complete_request and blk_mq_stat_add into their only
caller.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq.c | 25 -
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0c19df66f5a0..1f98db082da3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -406,12 +406,19 @@ static void __blk_mq_complete_request_remote(void *data)
rq->q->softirq_done_fn(rq);
 }
 
-static void blk_mq_ipi_complete_request(struct request *rq)
+static void __blk_mq_complete_request(struct request *rq)
 {
struct blk_mq_ctx *ctx = rq->mq_ctx;
bool shared = false;
int cpu;
 
+   if (rq->internal_tag != -1)
+   blk_mq_sched_completed_request(rq);
+   if (rq->rq_flags & RQF_STATS) {
+   blk_mq_poll_stats_start(rq->q);
+   blk_stat_add(rq);
+   }
+
if (!test_bit(QUEUE_FLAG_SAME_COMP, &rq->q->queue_flags)) {
rq->q->softirq_done_fn(rq);
return;
@@ -432,22 +439,6 @@ static void blk_mq_ipi_complete_request(struct request *rq)
put_cpu();
 }
 
-static void blk_mq_stat_add(struct request *rq)
-{
-   if (rq->rq_flags & RQF_STATS) {
-   blk_mq_poll_stats_start(rq->q);
-   blk_stat_add(rq);
-   }
-}
-
-static void __blk_mq_complete_request(struct request *rq)
-{
-   if (rq->internal_tag != -1)
-   blk_mq_sched_completed_request(rq);
-   blk_mq_stat_add(rq);
-   blk_mq_ipi_complete_request(rq);
-}
-
 /**
  * blk_mq_complete_request - end I/O on a request
  * @rq:the request being processed
-- 
2.11.0



[PATCH 19/23] floppy: switch from req->errors to req->error_count

2017-04-19 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/block/floppy.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index ce102ec47ef2..60d4c7653178 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -2805,8 +2805,10 @@ static int set_next_request(void)
fdc_queue = 0;
if (q) {
current_req = blk_fetch_request(q);
-   if (current_req)
+   if (current_req) {
+   current_req->error_count = 0;
break;
+   }
}
} while (fdc_queue != old_pos);
 
@@ -2866,7 +2868,7 @@ static void redo_fd_request(void)
_floppy = floppy_type + DP->autodetect[DRS->probed_format];
} else
probing = 0;
-   errors = &(current_req->errors);
+   errors = &(current_req->error_count);
tmp = make_raw_rw_request();
if (tmp < 2) {
request_done(tmp);
-- 
2.11.0



[PATCH 12/23] dm mpath: don't check for req->errors

2017-04-19 Thread Christoph Hellwig
We'll get all proper errors reported through ->end_io and ->errors will
go away soon.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-mpath.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
index ab55955ed704..2950b145443d 100644
--- a/drivers/md/dm-mpath.c
+++ b/drivers/md/dm-mpath.c
@@ -1492,7 +1492,7 @@ static int do_end_io(struct multipath *m, struct request 
*clone,
 */
int r = DM_ENDIO_REQUEUE;
 
-   if (!error && !clone->errors)
+   if (!error)
return 0;   /* I/O complete */
 
if (noretry_error(error))
-- 
2.11.0



[PATCH 16/23] blk-mq: remove the error argument to blk_mq_complete_request

2017-04-19 Thread Christoph Hellwig
Now that all drivers that call blk_mq_complete_requests have a
->complete callback we can remove the direct call to blk_mq_end_request,
as well as the error argument to blk_mq_complete_request.

Signed-off-by: Christoph Hellwig 
---
 block/blk-mq.c| 15 +++
 drivers/block/loop.c  |  4 ++--
 drivers/block/mtip32xx/mtip32xx.c |  4 ++--
 drivers/block/nbd.c   |  4 ++--
 drivers/block/null_blk.c  |  2 +-
 drivers/block/virtio_blk.c|  2 +-
 drivers/block/xen-blkfront.c  |  2 +-
 drivers/md/dm-rq.c|  2 +-
 drivers/nvme/host/core.c  |  2 +-
 drivers/nvme/host/nvme.h  |  2 +-
 drivers/scsi/scsi_lib.c   |  2 +-
 include/linux/blk-mq.h|  2 +-
 12 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e2ef7b460924..0c19df66f5a0 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -442,17 +442,10 @@ static void blk_mq_stat_add(struct request *rq)
 
 static void __blk_mq_complete_request(struct request *rq)
 {
-   struct request_queue *q = rq->q;
-
if (rq->internal_tag != -1)
blk_mq_sched_completed_request(rq);
-
blk_mq_stat_add(rq);
-
-   if (!q->softirq_done_fn)
-   blk_mq_end_request(rq, rq->errors);
-   else
-   blk_mq_ipi_complete_request(rq);
+   blk_mq_ipi_complete_request(rq);
 }
 
 /**
@@ -463,16 +456,14 @@ static void __blk_mq_complete_request(struct request *rq)
  * Ends all I/O on a request. It does not handle partial completions.
  * The actual completion happens out-of-order, through a IPI handler.
  **/
-void blk_mq_complete_request(struct request *rq, int error)
+void blk_mq_complete_request(struct request *rq)
 {
struct request_queue *q = rq->q;
 
if (unlikely(blk_should_fake_timeout(q)))
return;
-   if (!blk_mark_rq_complete(rq)) {
-   rq->errors = error;
+   if (!blk_mark_rq_complete(rq))
__blk_mq_complete_request(rq);
-   }
 }
 EXPORT_SYMBOL(blk_mq_complete_request);
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 86351b3f7350..994403efee19 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -465,7 +465,7 @@ static void lo_rw_aio_complete(struct kiocb *iocb, long 
ret, long ret2)
struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);
 
cmd->ret = ret;
-   blk_mq_complete_request(cmd->rq, 0);
+   blk_mq_complete_request(cmd->rq);
 }
 
 static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
@@ -1685,7 +1685,7 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
/* complete non-aio request */
if (!cmd->use_aio || ret) {
cmd->ret = ret ? -EIO : 0;
-   blk_mq_complete_request(cmd->rq, 0);
+   blk_mq_complete_request(cmd->rq);
}
 }
 
diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 7406de29db58..66a6bd83faae 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -242,7 +242,7 @@ static void mtip_async_complete(struct mtip_port *port,
rq = mtip_rq_from_tag(dd, tag);
 
cmd->status = status;
-   blk_mq_complete_request(rq, 0);
+   blk_mq_complete_request(rq);
 }
 
 /*
@@ -4109,7 +4109,7 @@ static void mtip_no_dev_cleanup(struct request *rq, void 
*data, bool reserv)
 
if (likely(!reserv)) {
cmd->status = -ENODEV;
-   blk_mq_complete_request(rq, 0);
+   blk_mq_complete_request(rq);
} else if (test_bit(MTIP_PF_IC_ACTIVE_BIT, &dd->port->flags)) {
 
cmd = mtip_cmd_from_tag(dd, MTIP_TAG_INTERNAL);
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 09a74a66beb1..d387bef07fcc 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -635,7 +635,7 @@ static void recv_work(struct work_struct *work)
break;
}
 
-   blk_mq_complete_request(blk_mq_rq_from_pdu(cmd), 0);
+   blk_mq_complete_request(blk_mq_rq_from_pdu(cmd));
}
atomic_dec(&config->recv_threads);
wake_up(&config->recv_wq);
@@ -651,7 +651,7 @@ static void nbd_clear_req(struct request *req, void *data, 
bool reserved)
return;
cmd = blk_mq_rq_to_pdu(req);
cmd->status = -EIO;
-   blk_mq_complete_request(req, 0);
+   blk_mq_complete_request(req);
 }
 
 static void nbd_clear_que(struct nbd_device *nbd)
diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 24ca85a70fd8..c27cccec368b 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -281,7 +281,7 @@ static inline void null_handle_cmd(struct nullb_cmd *cmd)
case NULL_IRQ_SOFTIRQ:
switch (queue_mode)  {
case NULL_Q_MQ:
-   blk_mq_complete_request(cmd->rq, 0);

[PATCH 15/23] xen-blkfront: don't use req->errors

2017-04-19 Thread Christoph Hellwig
xen-blkfron is the last users using rq->errros for passing back error to
blk-mq, and I'd like to get rid of that.  In the longer run the driver
should be moving more of the completion processing into .complete, but
this is the minimal change to move forward for now.

Signed-off-by: Christoph Hellwig 
Acked-by: Roger Pau Monné 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/block/xen-blkfront.c | 36 +---
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index abed296ce605..57866355c060 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -115,6 +115,15 @@ struct split_bio {
atomic_t pending;
 };
 
+struct blkif_req {
+   int error;
+};
+
+static inline struct blkif_req *blkif_req(struct request *rq)
+{
+   return blk_mq_rq_to_pdu(rq);
+}
+
 static DEFINE_MUTEX(blkfront_mutex);
 static const struct block_device_operations xlvbd_block_fops;
 
@@ -907,8 +916,14 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
return BLK_MQ_RQ_QUEUE_BUSY;
 }
 
+static void blkif_complete_rq(struct request *rq)
+{
+   blk_mq_end_request(rq, blkif_req(rq)->error);
+}
+
 static const struct blk_mq_ops blkfront_mq_ops = {
.queue_rq = blkif_queue_rq,
+   .complete = blkif_complete_rq,
 };
 
 static void blkif_set_queue_limits(struct blkfront_info *info)
@@ -969,7 +984,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 
sector_size,
info->tag_set.queue_depth = BLK_RING_SIZE(info);
info->tag_set.numa_node = NUMA_NO_NODE;
info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
-   info->tag_set.cmd_size = 0;
+   info->tag_set.cmd_size = sizeof(struct blkif_req);
info->tag_set.driver_data = info;
 
if (blk_mq_alloc_tag_set(&info->tag_set))
@@ -1543,7 +1558,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
unsigned long flags;
struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
struct blkfront_info *info = rinfo->dev_info;
-   int error;
 
if (unlikely(info->connected != BLKIF_STATE_CONNECTED))
return IRQ_HANDLED;
@@ -1587,37 +1601,36 @@ static irqreturn_t blkif_interrupt(int irq, void 
*dev_id)
continue;
}
 
-   error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO;
+   blkif_req(req)->error = (bret->status == BLKIF_RSP_OKAY) ? 0 : 
-EIO;
switch (bret->operation) {
case BLKIF_OP_DISCARD:
if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
struct request_queue *rq = info->rq;
printk(KERN_WARNING "blkfront: %s: %s op 
failed\n",
   info->gd->disk_name, 
op_name(bret->operation));
-   error = -EOPNOTSUPP;
+   blkif_req(req)->error = -EOPNOTSUPP;
info->feature_discard = 0;
info->feature_secdiscard = 0;
queue_flag_clear(QUEUE_FLAG_DISCARD, rq);
queue_flag_clear(QUEUE_FLAG_SECERASE, rq);
}
-   blk_mq_complete_request(req, error);
break;
case BLKIF_OP_FLUSH_DISKCACHE:
case BLKIF_OP_WRITE_BARRIER:
if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) {
printk(KERN_WARNING "blkfront: %s: %s op 
failed\n",
   info->gd->disk_name, 
op_name(bret->operation));
-   error = -EOPNOTSUPP;
+   blkif_req(req)->error = -EOPNOTSUPP;
}
if (unlikely(bret->status == BLKIF_RSP_ERROR &&
 rinfo->shadow[id].req.u.rw.nr_segments == 
0)) {
printk(KERN_WARNING "blkfront: %s: empty %s op 
failed\n",
   info->gd->disk_name, 
op_name(bret->operation));
-   error = -EOPNOTSUPP;
+   blkif_req(req)->error = -EOPNOTSUPP;
}
-   if (unlikely(error)) {
-   if (error == -EOPNOTSUPP)
-   error = 0;
+   if (unlikely(blkif_req(req)->error)) {
+   if (blkif_req(req)->error == -EOPNOTSUPP)
+   blkif_req(req)->error = 0;
info->feature_fua = 0;
info->feature_flush = 0;
xlvbd_flush(info);
@@ -1629,11 +1642,12 @@ stat

[PATCH 22/23] blktrace: remove the unused block_rq_abort tracepoint

2017-04-19 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 include/trace/events/block.h | 44 ++--
 kernel/trace/blktrace.c  |  9 -
 2 files changed, 10 insertions(+), 43 deletions(-)

diff --git a/include/trace/events/block.h b/include/trace/events/block.h
index a88ed13446ff..99ed69fad041 100644
--- a/include/trace/events/block.h
+++ b/include/trace/events/block.h
@@ -61,7 +61,16 @@ DEFINE_EVENT(block_buffer, block_dirty_buffer,
TP_ARGS(bh)
 );
 
-DECLARE_EVENT_CLASS(block_rq_with_error,
+/**
+ * block_rq_requeue - place block IO request back on a queue
+ * @q: queue holding operation
+ * @rq: block IO operation request
+ *
+ * The block operation request @rq is being placed back into queue
+ * @q.  For some reason the request was not completed and needs to be
+ * put back in the queue.
+ */
+TRACE_EVENT(block_rq_requeue,
 
TP_PROTO(struct request_queue *q, struct request *rq),
 
@@ -94,39 +103,6 @@ DECLARE_EVENT_CLASS(block_rq_with_error,
 );
 
 /**
- * block_rq_abort - abort block operation request
- * @q: queue containing the block operation request
- * @rq: block IO operation request
- *
- * Called immediately after pending block IO operation request @rq in
- * queue @q is aborted. The fields in the operation request @rq
- * can be examined to determine which device and sectors the pending
- * operation would access.
- */
-DEFINE_EVENT(block_rq_with_error, block_rq_abort,
-
-   TP_PROTO(struct request_queue *q, struct request *rq),
-
-   TP_ARGS(q, rq)
-);
-
-/**
- * block_rq_requeue - place block IO request back on a queue
- * @q: queue holding operation
- * @rq: block IO operation request
- *
- * The block operation request @rq is being placed back into queue
- * @q.  For some reason the request was not completed and needs to be
- * put back in the queue.
- */
-DEFINE_EVENT(block_rq_with_error, block_rq_requeue,
-
-   TP_PROTO(struct request_queue *q, struct request *rq),
-
-   TP_ARGS(q, rq)
-);
-
-/**
  * block_rq_complete - block IO operation completed by device driver
  * @q: queue containing the block operation request
  * @rq: block operations request
diff --git a/kernel/trace/blktrace.c b/kernel/trace/blktrace.c
index b2058a7f94bd..9f3624dadb09 100644
--- a/kernel/trace/blktrace.c
+++ b/kernel/trace/blktrace.c
@@ -716,12 +716,6 @@ static void blk_add_trace_rq(struct request_queue *q, 
struct request *rq,
rq->cmd_flags, what, rq->errors, 0, NULL);
 }
 
-static void blk_add_trace_rq_abort(void *ignore,
-  struct request_queue *q, struct request *rq)
-{
-   blk_add_trace_rq(q, rq, blk_rq_bytes(rq), BLK_TA_ABORT);
-}
-
 static void blk_add_trace_rq_insert(void *ignore,
struct request_queue *q, struct request *rq)
 {
@@ -974,8 +968,6 @@ static void blk_register_tracepoints(void)
 {
int ret;
 
-   ret = register_trace_block_rq_abort(blk_add_trace_rq_abort, NULL);
-   WARN_ON(ret);
ret = register_trace_block_rq_insert(blk_add_trace_rq_insert, NULL);
WARN_ON(ret);
ret = register_trace_block_rq_issue(blk_add_trace_rq_issue, NULL);
@@ -1028,7 +1020,6 @@ static void blk_unregister_tracepoints(void)
unregister_trace_block_rq_requeue(blk_add_trace_rq_requeue, NULL);
unregister_trace_block_rq_issue(blk_add_trace_rq_issue, NULL);
unregister_trace_block_rq_insert(blk_add_trace_rq_insert, NULL);
-   unregister_trace_block_rq_abort(blk_add_trace_rq_abort, NULL);
 
tracepoint_synchronize_unregister();
 }
-- 
2.11.0



[PATCH 10/23] null_blk: don't pass always-0 req->errors to blk_mq_complete_request

2017-04-19 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/block/null_blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index f93906ff31e8..24ca85a70fd8 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -281,7 +281,7 @@ static inline void null_handle_cmd(struct nullb_cmd *cmd)
case NULL_IRQ_SOFTIRQ:
switch (queue_mode)  {
case NULL_Q_MQ:
-   blk_mq_complete_request(cmd->rq, cmd->rq->errors);
+   blk_mq_complete_request(cmd->rq, 0);
break;
case NULL_Q_RQ:
blk_complete_request(cmd->rq);
-- 
2.11.0



[PATCH 14/23] mtip32xx: add a status field to struct mtip_cmd

2017-04-19 Thread Christoph Hellwig
Instead of using req->errors, which will go away.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/mtip32xx/mtip32xx.c | 16 +---
 drivers/block/mtip32xx/mtip32xx.h |  1 +
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c 
b/drivers/block/mtip32xx/mtip32xx.c
index 05e3e664ea1b..7406de29db58 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -241,7 +241,8 @@ static void mtip_async_complete(struct mtip_port *port,
 
rq = mtip_rq_from_tag(dd, tag);
 
-   blk_mq_complete_request(rq, status);
+   cmd->status = status;
+   blk_mq_complete_request(rq, 0);
 }
 
 /*
@@ -2910,18 +2911,19 @@ static void mtip_softirq_done_fn(struct request *rq)
if (unlikely(cmd->unaligned))
up(&dd->port->cmd_slot_unal);
 
-   blk_mq_end_request(rq, rq->errors);
+   blk_mq_end_request(rq, cmd->status);
 }
 
 static void mtip_abort_cmd(struct request *req, void *data,
bool reserved)
 {
+   struct mtip_cmd *cmd = blk_mq_rq_to_pdu(req);
struct driver_data *dd = data;
 
dbg_printk(MTIP_DRV_NAME " Aborting request, tag = %d\n", req->tag);
 
clear_bit(req->tag, dd->port->cmds_to_issue);
-   req->errors = -EIO;
+   cmd->status = -EIO;
mtip_softirq_done_fn(req);
 }
 
@@ -3816,7 +3818,6 @@ static int mtip_queue_rq(struct blk_mq_hw_ctx *hctx,
if (likely(!ret))
return BLK_MQ_RQ_QUEUE_OK;
 
-   rq->errors = ret;
return BLK_MQ_RQ_QUEUE_ERROR;
 }
 
@@ -4106,9 +4107,10 @@ static void mtip_no_dev_cleanup(struct request *rq, void 
*data, bool reserv)
struct driver_data *dd = (struct driver_data *)data;
struct mtip_cmd *cmd;
 
-   if (likely(!reserv))
-   blk_mq_complete_request(rq, -ENODEV);
-   else if (test_bit(MTIP_PF_IC_ACTIVE_BIT, &dd->port->flags)) {
+   if (likely(!reserv)) {
+   cmd->status = -ENODEV;
+   blk_mq_complete_request(rq, 0);
+   } else if (test_bit(MTIP_PF_IC_ACTIVE_BIT, &dd->port->flags)) {
 
cmd = mtip_cmd_from_tag(dd, MTIP_TAG_INTERNAL);
if (cmd->comp_func)
diff --git a/drivers/block/mtip32xx/mtip32xx.h 
b/drivers/block/mtip32xx/mtip32xx.h
index 7617888f7944..57b41528a824 100644
--- a/drivers/block/mtip32xx/mtip32xx.h
+++ b/drivers/block/mtip32xx/mtip32xx.h
@@ -352,6 +352,7 @@ struct mtip_cmd {
int retries; /* The number of retries left for this command. */
 
int direction; /* Data transfer direction */
+   int status;
 };
 
 /* Structure used to describe a port. */
-- 
2.11.0



[PATCH 07/23] virtio_blk: don't use req->errors

2017-04-19 Thread Christoph Hellwig
Remove passing req->errors (which at that point is always 0) to
blk_mq_complete_request, and rely on the virtio status code for the
serial number passthrough request.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---
 drivers/block/virtio_blk.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index dbc4e80680b1..8378ad480f77 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -175,19 +175,15 @@ static int virtblk_add_req(struct virtqueue *vq, struct 
virtblk_req *vbr,
 static inline void virtblk_request_done(struct request *req)
 {
struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
-   int error = virtblk_result(vbr);
 
switch (req_op(req)) {
case REQ_OP_SCSI_IN:
case REQ_OP_SCSI_OUT:
virtblk_scsi_request_done(req);
break;
-   case REQ_OP_DRV_IN:
-   req->errors = (error != 0);
-   break;
}
 
-   blk_mq_end_request(req, error);
+   blk_mq_end_request(req, virtblk_result(vbr));
 }
 
 static void virtblk_done(struct virtqueue *vq)
@@ -205,7 +201,7 @@ static void virtblk_done(struct virtqueue *vq)
while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != 
NULL) {
struct request *req = blk_mq_rq_from_pdu(vbr);
 
-   blk_mq_complete_request(req, req->errors);
+   blk_mq_complete_request(req, 0);
req_done = true;
}
if (unlikely(virtqueue_is_broken(vq)))
@@ -311,7 +307,7 @@ static int virtblk_get_id(struct gendisk *disk, char 
*id_str)
goto out;
 
blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
-   err = req->errors ? -EIO : 0;
+   err = virtblk_result(blk_mq_rq_to_pdu(req));
 out:
blk_put_request(req);
return err;
-- 
2.11.0



[PATCH 13/23] nbd: don't use req->errors

2017-04-19 Thread Christoph Hellwig
Add a nbd-specific field instead.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Josef Bacik 
---
 drivers/block/nbd.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 6e592c2ba8fd..09a74a66beb1 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -115,6 +115,7 @@ struct nbd_cmd {
int index;
int cookie;
struct completion send_complete;
+   int status;
 };
 
 #if IS_ENABLED(CONFIG_DEBUG_FS)
@@ -244,16 +245,14 @@ static void nbd_size_set(struct nbd_device *nbd, loff_t 
blocksize,
nbd_size_update(nbd);
 }
 
-static void nbd_end_request(struct nbd_cmd *cmd)
+static void nbd_complete_rq(struct request *req)
 {
-   struct nbd_device *nbd = cmd->nbd;
-   struct request *req = blk_mq_rq_from_pdu(cmd);
-   int error = req->errors ? -EIO : 0;
+   struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
 
-   dev_dbg(nbd_to_dev(nbd), "request %p: %s\n", cmd,
-   error ? "failed" : "done");
+   dev_dbg(nbd_to_dev(cmd->nbd), "request %p: %s\n", cmd,
+   cmd->status ? "failed" : "done");
 
-   blk_mq_complete_request(req, error);
+   blk_mq_end_request(req, cmd->status);
 }
 
 /*
@@ -286,7 +285,7 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct 
request *req,
struct nbd_config *config;
 
if (!refcount_inc_not_zero(&nbd->config_refs)) {
-   req->errors = -EIO;
+   cmd->status = -EIO;
return BLK_EH_HANDLED;
}
 
@@ -331,7 +330,7 @@ static enum blk_eh_timer_return nbd_xmit_timeout(struct 
request *req,
"Connection timed out\n");
}
set_bit(NBD_TIMEDOUT, &config->runtime_flags);
-   req->errors = -EIO;
+   cmd->status = -EIO;
sock_shutdown(nbd);
nbd_config_put(nbd);
 
@@ -574,7 +573,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device 
*nbd, int index)
if (ntohl(reply.error)) {
dev_err(disk_to_dev(nbd->disk), "Other side returned error 
(%d)\n",
ntohl(reply.error));
-   req->errors = -EIO;
+   cmd->status = -EIO;
return cmd;
}
 
@@ -599,7 +598,7 @@ static struct nbd_cmd *nbd_read_stat(struct nbd_device 
*nbd, int index)
 */
if (nbd_disconnected(config) ||
config->num_connections <= 1) {
-   req->errors = -EIO;
+   cmd->status = -EIO;
return cmd;
}
return ERR_PTR(-EIO);
@@ -636,7 +635,7 @@ static void recv_work(struct work_struct *work)
break;
}
 
-   nbd_end_request(cmd);
+   blk_mq_complete_request(blk_mq_rq_from_pdu(cmd), 0);
}
atomic_dec(&config->recv_threads);
wake_up(&config->recv_wq);
@@ -651,8 +650,8 @@ static void nbd_clear_req(struct request *req, void *data, 
bool reserved)
if (!blk_mq_request_started(req))
return;
cmd = blk_mq_rq_to_pdu(req);
-   req->errors = -EIO;
-   nbd_end_request(cmd);
+   cmd->status = -EIO;
+   blk_mq_complete_request(req, 0);
 }
 
 static void nbd_clear_que(struct nbd_device *nbd)
@@ -740,7 +739,7 @@ static int nbd_handle_cmd(struct nbd_cmd *cmd, int index)
nbd_config_put(nbd);
return -EINVAL;
}
-   req->errors = 0;
+   cmd->status = 0;
 again:
nsock = config->socks[index];
mutex_lock(&nsock->tx_lock);
@@ -1408,6 +1407,7 @@ static int nbd_init_request(void *data, struct request 
*rq,
 
 static const struct blk_mq_ops nbd_mq_ops = {
.queue_rq   = nbd_queue_rq,
+   .complete   = nbd_complete_rq,
.init_request   = nbd_init_request,
.timeout= nbd_xmit_timeout,
 };
-- 
2.11.0



[PATCH 11/23] dm rq: don't pass irrelevant error code to blk_mq_complete_request

2017-04-19 Thread Christoph Hellwig
dm never uses rq->errors, so there is no need to pass an error argument
to blk_mq_complete_request.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-rq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index e60f1b6845be..1173be21f6f6 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -363,7 +363,7 @@ static void dm_complete_request(struct request *rq, int 
error)
if (!rq->q->mq_ops)
blk_complete_request(rq);
else
-   blk_mq_complete_request(rq, error);
+   blk_mq_complete_request(rq, 0);
 }
 
 /*
-- 
2.11.0



[PATCH 08/23] scsi: introduce a result field in struct scsi_request

2017-04-19 Thread Christoph Hellwig
This passes on the scsi_cmnd result field to users of passthrough
requests.  Currently we abuse req->errors for this purpose, but that
field will go away in its current form.

Note that the old IDE code abuses the errors field in very creative
ways and stores all kinds of different values in it.  I didn't dare
to touch this magic, so the abuses are brought forward 1:1.

Signed-off-by: Christoph Hellwig 
---
 block/bsg-lib.c|  8 
 block/bsg.c| 12 +--
 block/scsi_ioctl.c | 14 ++---
 drivers/block/cciss.c  | 42 +++---
 drivers/block/pktcdvd.c|  2 +-
 drivers/block/virtio_blk.c |  2 +-
 drivers/cdrom/cdrom.c  |  2 +-
 drivers/ide/ide-atapi.c| 10 -
 drivers/ide/ide-cd.c   | 20 +-
 drivers/ide/ide-cd_ioctl.c |  2 +-
 drivers/ide/ide-devsets.c  |  4 ++--
 drivers/ide/ide-dma.c  |  2 +-
 drivers/ide/ide-eh.c   | 36 
 drivers/ide/ide-floppy.c   | 10 -
 drivers/ide/ide-io.c   | 10 -
 drivers/ide/ide-ioctls.c   |  4 ++--
 drivers/ide/ide-park.c |  2 +-
 drivers/ide/ide-pm.c   |  8 
 drivers/ide/ide-tape.c |  4 ++--
 drivers/ide/ide-taskfile.c |  6 +++---
 drivers/scsi/osd/osd_initiator.c   |  2 +-
 drivers/scsi/osst.c|  2 +-
 drivers/scsi/qla2xxx/qla_bsg.c |  6 +++---
 drivers/scsi/scsi_lib.c| 15 +++---
 drivers/scsi/scsi_transport_sas.c  |  2 +-
 drivers/scsi/sg.c  |  2 +-
 drivers/scsi/st.c  |  6 +++---
 drivers/target/target_core_pscsi.c |  2 +-
 fs/nfsd/blocklayout.c  |  4 ++--
 include/linux/ide.h|  2 +-
 include/scsi/scsi_request.h|  1 +
 31 files changed, 122 insertions(+), 122 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index cd15f9dbb147..0a23dbba2d30 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -37,7 +37,7 @@ static void bsg_destroy_job(struct kref *kref)
struct bsg_job *job = container_of(kref, struct bsg_job, kref);
struct request *rq = job->req;
 
-   blk_end_request_all(rq, rq->errors);
+   blk_end_request_all(rq, scsi_req(rq)->result);
 
put_device(job->dev);   /* release reference for the request */
 
@@ -74,7 +74,7 @@ void bsg_job_done(struct bsg_job *job, int result,
struct scsi_request *rq = scsi_req(req);
int err;
 
-   err = job->req->errors = result;
+   err = scsi_req(job->req)->result = result;
if (err < 0)
/* we're only returning the result field in the reply */
rq->sense_len = sizeof(u32);
@@ -177,7 +177,7 @@ static int bsg_create_job(struct device *dev, struct 
request *req)
  * @q: request queue to manage
  *
  * On error the create_bsg_job function should return a -Exyz error value
- * that will be set to the req->errors.
+ * that will be set to ->result.
  *
  * Drivers/subsys should pass this to the queue init function.
  */
@@ -201,7 +201,7 @@ static void bsg_request_fn(struct request_queue *q)
 
ret = bsg_create_job(dev, req);
if (ret) {
-   req->errors = ret;
+   scsi_req(req)->result = ret;
blk_end_request_all(req, ret);
spin_lock_irq(q->queue_lock);
continue;
diff --git a/block/bsg.c b/block/bsg.c
index 74835dbf0c47..d9da1b613ced 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -391,13 +391,13 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
struct scsi_request *req = scsi_req(rq);
int ret = 0;
 
-   dprintk("rq %p bio %p 0x%x\n", rq, bio, rq->errors);
+   dprintk("rq %p bio %p 0x%x\n", rq, bio, req->result);
/*
 * fill in all the output members
 */
-   hdr->device_status = rq->errors & 0xff;
-   hdr->transport_status = host_byte(rq->errors);
-   hdr->driver_status = driver_byte(rq->errors);
+   hdr->device_status = req->result & 0xff;
+   hdr->transport_status = host_byte(req->result);
+   hdr->driver_status = driver_byte(req->result);
hdr->info = 0;
if (hdr->device_status || hdr->transport_status || hdr->driver_status)
hdr->info |= SG_INFO_CHECK;
@@ -431,8 +431,8 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
 * just a protocol response (i.e. non negative), that gets
 * processed above.
 */
-   if (!ret && rq->errors < 0)
-   ret = rq->errors;
+   if (!ret && req->result < 0)
+   ret = req->result;
 
blk_rq_unmap_user(bio);
scsi_req_free_cmd(req);
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index b1

[PATCH 09/23] loop: zero-fill bio on the submitting cpu

2017-04-19 Thread Christoph Hellwig
In thruth I've just audited which blk-mq drivers don't currently have a
complete callback, but I think this change is at least borderline useful.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Ming Lei 
---
 drivers/block/loop.c | 30 ++
 drivers/block/loop.h |  1 +
 2 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 3081d83d2ea3..86351b3f7350 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -445,32 +445,27 @@ static int lo_req_flush(struct loop_device *lo, struct 
request *rq)
return ret;
 }
 
-static inline void handle_partial_read(struct loop_cmd *cmd, long bytes)
+static void lo_complete_rq(struct request *rq)
 {
-   if (bytes < 0 || op_is_write(req_op(cmd->rq)))
-   return;
+   struct loop_cmd *cmd = blk_mq_rq_to_pdu(rq);
 
-   if (unlikely(bytes < blk_rq_bytes(cmd->rq))) {
+   if (unlikely(req_op(cmd->rq) == REQ_OP_READ && cmd->use_aio &&
+cmd->ret >= 0 && cmd->ret < blk_rq_bytes(cmd->rq))) {
struct bio *bio = cmd->rq->bio;
 
-   bio_advance(bio, bytes);
+   bio_advance(bio, cmd->ret);
zero_fill_bio(bio);
}
+
+   blk_mq_end_request(rq, cmd->ret < 0 ? -EIO : 0);
 }
 
 static void lo_rw_aio_complete(struct kiocb *iocb, long ret, long ret2)
 {
struct loop_cmd *cmd = container_of(iocb, struct loop_cmd, iocb);
-   struct request *rq = cmd->rq;
-
-   handle_partial_read(cmd, ret);
 
-   if (ret > 0)
-   ret = 0;
-   else if (ret < 0)
-   ret = -EIO;
-
-   blk_mq_complete_request(rq, ret);
+   cmd->ret = ret;
+   blk_mq_complete_request(cmd->rq, 0);
 }
 
 static int lo_rw_aio(struct loop_device *lo, struct loop_cmd *cmd,
@@ -1688,8 +1683,10 @@ static void loop_handle_cmd(struct loop_cmd *cmd)
ret = do_req_filebacked(lo, cmd->rq);
  failed:
/* complete non-aio request */
-   if (!cmd->use_aio || ret)
-   blk_mq_complete_request(cmd->rq, ret ? -EIO : 0);
+   if (!cmd->use_aio || ret) {
+   cmd->ret = ret ? -EIO : 0;
+   blk_mq_complete_request(cmd->rq, 0);
+   }
 }
 
 static void loop_queue_work(struct kthread_work *work)
@@ -1715,6 +1712,7 @@ static int loop_init_request(void *data, struct request 
*rq,
 static const struct blk_mq_ops loop_mq_ops = {
.queue_rq   = loop_queue_rq,
.init_request   = loop_init_request,
+   .complete   = lo_complete_rq,
 };
 
 static int loop_add(struct loop_device **l, int i)
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index fb2237c73e61..fecd3f97ef8c 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -70,6 +70,7 @@ struct loop_cmd {
struct request *rq;
struct list_head list;
bool use_aio;   /* use AIO interface to handle I/O */
+   long ret;
struct kiocb iocb;
 };
 
-- 
2.11.0



[PATCH 06/23] virtio: fix spelling of virtblk_scsi_request_done

2017-04-19 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Bart Van Assche 
---
 drivers/block/virtio_blk.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index eaf99022bdc6..dbc4e80680b1 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -111,7 +111,7 @@ static int virtblk_add_req_scsi(struct virtqueue *vq, 
struct virtblk_req *vbr,
return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC);
 }
 
-static inline void virtblk_scsi_reques_done(struct request *req)
+static inline void virtblk_scsi_request_done(struct request *req)
 {
struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
struct virtio_blk *vblk = req->q->queuedata;
@@ -144,7 +144,7 @@ static inline int virtblk_add_req_scsi(struct virtqueue *vq,
 {
return -EIO;
 }
-static inline void virtblk_scsi_reques_done(struct request *req)
+static inline void virtblk_scsi_request_done(struct request *req)
 {
 }
 #define virtblk_ioctl  NULL
@@ -180,7 +180,7 @@ static inline void virtblk_request_done(struct request *req)
switch (req_op(req)) {
case REQ_OP_SCSI_IN:
case REQ_OP_SCSI_OUT:
-   virtblk_scsi_reques_done(req);
+   virtblk_scsi_request_done(req);
break;
case REQ_OP_DRV_IN:
req->errors = (error != 0);
-- 
2.11.0



[PATCH 03/23] nvme-fc: fix status code handling in nvme_fc_fcpio_done

2017-04-19 Thread Christoph Hellwig
nvme_complete_async_event expects the little endian status code
including the phase bit, and a new completion handler I plan to
introduce will do so as well.

Change the status variable into the little endian format with the
phase bit used in the NVMe CQE to fix / enable this.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---
 drivers/nvme/host/fc.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index fc42172c796a..aad7f9c0be32 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1147,7 +1147,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
struct nvme_fc_ctrl *ctrl = op->ctrl;
struct nvme_fc_queue *queue = op->queue;
struct nvme_completion *cqe = &op->rsp_iu.cqe;
-   u16 status = NVME_SC_SUCCESS;
+   __le16 status = cpu_to_le16(NVME_SC_SUCCESS << 1);
 
/*
 * WARNING:
@@ -1182,9 +1182,9 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
sizeof(op->rsp_iu), DMA_FROM_DEVICE);
 
if (atomic_read(&op->state) == FCPOP_STATE_ABORTED)
-   status = NVME_SC_ABORT_REQ | NVME_SC_DNR;
+   status = cpu_to_le16((NVME_SC_ABORT_REQ | NVME_SC_DNR) << 1);
else if (freq->status)
-   status = NVME_SC_FC_TRANSPORT_ERROR;
+   status = cpu_to_le16(NVME_SC_FC_TRANSPORT_ERROR << 1);
 
/*
 * For the linux implementation, if we have an unsuccesful
@@ -1212,7 +1212,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
 */
if (freq->transferred_length !=
be32_to_cpu(op->cmd_iu.data_len)) {
-   status = NVME_SC_FC_TRANSPORT_ERROR;
+   status = cpu_to_le16(NVME_SC_FC_TRANSPORT_ERROR << 1);
goto done;
}
op->nreq.result.u64 = 0;
@@ -1229,15 +1229,15 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
freq->transferred_length ||
 op->rsp_iu.status_code ||
 op->rqno != le16_to_cpu(cqe->command_id))) {
-   status = NVME_SC_FC_TRANSPORT_ERROR;
+   status = cpu_to_le16(NVME_SC_FC_TRANSPORT_ERROR << 1);
goto done;
}
op->nreq.result = cqe->result;
-   status = le16_to_cpu(cqe->status) >> 1;
+   status = cqe->status;
break;
 
default:
-   status = NVME_SC_FC_TRANSPORT_ERROR;
+   status = cpu_to_le16(NVME_SC_FC_TRANSPORT_ERROR << 1);
goto done;
}
 
@@ -1249,7 +1249,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
return;
}
 
-   blk_mq_complete_request(rq, status);
+   blk_mq_complete_request(rq, le16_to_cpu(status) >> 1);
 }
 
 static int
-- 
2.11.0



[PATCH 05/23] nvme: make nvme_error_status private

2017-04-19 Thread Christoph Hellwig
Currently it's used by the lighnvm passthrough ioctl, but we'd like to make
it private in preparation of block layer specific error code.  Lighnvm already
returns the real NVMe status anyway, so I think we can just limit it to
returning -EIO for any status set.

This will need a careful audit from the lightnvm folks, though.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/core.c | 3 +--
 drivers/nvme/host/lightnvm.c | 6 +++---
 drivers/nvme/host/nvme.h | 1 -
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c6f256d74b6b..805f250315ec 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -66,7 +66,7 @@ static DEFINE_SPINLOCK(dev_list_lock);
 
 static struct class *nvme_class;
 
-int nvme_error_status(struct request *req)
+static int nvme_error_status(struct request *req)
 {
switch (nvme_req(req)->status & 0x7ff) {
case NVME_SC_SUCCESS:
@@ -77,7 +77,6 @@ int nvme_error_status(struct request *req)
return -EIO;
}
 }
-EXPORT_SYMBOL_GPL(nvme_error_status);
 
 static inline bool nvme_req_needs_retry(struct request *req)
 {
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index f85e6e57d641..7f407d5f0095 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -599,7 +599,7 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q,
__le64 *metadata = NULL;
dma_addr_t metadata_dma;
DECLARE_COMPLETION_ONSTACK(wait);
-   int ret;
+   int ret = 0;
 
rq = nvme_alloc_request(q, (struct nvme_command *)vcmd, 0,
NVME_QID_ANY);
@@ -671,8 +671,8 @@ static int nvme_nvm_submit_user_cmd(struct request_queue *q,
 
if (nvme_req(rq)->flags & NVME_REQ_CANCELLED)
ret = -EINTR;
-   else
-   ret = nvme_error_status(rq);
+   else if (nvme_req(rq)->status & 0x7ff)
+   ret = -EIO;
if (result)
*result = nvme_req(rq)->status & 0x7ff;
if (status)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index d7330f75632d..550037f5efea 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -254,7 +254,6 @@ static inline void nvme_end_request(struct request *req, 
__le16 status,
blk_mq_complete_request(req, 0);
 }
 
-int nvme_error_status(struct request *req);
 void nvme_complete_rq(struct request *req);
 void nvme_cancel_request(struct request *req, void *data, bool reserved);
 bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
-- 
2.11.0



[PATCH 04/23] nvme: split nvme status from block req->errors

2017-04-19 Thread Christoph Hellwig
We want our own clearly defined error field for NVMe passthrough commands,
and the request errors field is going away in its current form.

Just store the status and result field in the nvme_request field from
hardirq completion context (using a new helper) and then generate a
Linux errno for the block layer only when we actually need it.

Because we can't overload the status value with a negative error code
for cancelled command we now have a flags filed in struct nvme_request
that contains a bit for this condition.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---
 drivers/nvme/host/core.c | 50 +++-
 drivers/nvme/host/fc.c   | 10 -
 drivers/nvme/host/lightnvm.c |  9 +---
 drivers/nvme/host/nvme.h | 33 +
 drivers/nvme/host/pci.c  | 11 +-
 drivers/nvme/host/rdma.c |  5 ++---
 drivers/nvme/target/loop.c   |  7 ++-
 7 files changed, 65 insertions(+), 60 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d33f829c3ab7..c6f256d74b6b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -66,11 +66,24 @@ static DEFINE_SPINLOCK(dev_list_lock);
 
 static struct class *nvme_class;
 
+int nvme_error_status(struct request *req)
+{
+   switch (nvme_req(req)->status & 0x7ff) {
+   case NVME_SC_SUCCESS:
+   return 0;
+   case NVME_SC_CAP_EXCEEDED:
+   return -ENOSPC;
+   default:
+   return -EIO;
+   }
+}
+EXPORT_SYMBOL_GPL(nvme_error_status);
+
 static inline bool nvme_req_needs_retry(struct request *req)
 {
if (blk_noretry_request(req))
return false;
-   if (req->errors & NVME_SC_DNR)
+   if (nvme_req(req)->status & NVME_SC_DNR)
return false;
if (jiffies - req->start_time >= req->timeout)
return false;
@@ -81,23 +94,13 @@ static inline bool nvme_req_needs_retry(struct request *req)
 
 void nvme_complete_rq(struct request *req)
 {
-   int error = 0;
-
-   if (unlikely(req->errors)) {
-   if (nvme_req_needs_retry(req)) {
-   nvme_req(req)->retries++;
-   blk_mq_requeue_request(req,
-   !blk_mq_queue_stopped(req->q));
-   return;
-   }
-
-   if (blk_rq_is_passthrough(req))
-   error = req->errors;
-   else
-   error = nvme_error_status(req->errors);
+   if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) {
+   nvme_req(req)->retries++;
+   blk_mq_requeue_request(req, !blk_mq_queue_stopped(req->q));
+   return;
}
 
-   blk_mq_end_request(req, error);
+   blk_mq_end_request(req, nvme_error_status(req));
 }
 EXPORT_SYMBOL_GPL(nvme_complete_rq);
 
@@ -114,7 +117,9 @@ void nvme_cancel_request(struct request *req, void *data, 
bool reserved)
status = NVME_SC_ABORT_REQ;
if (blk_queue_dying(req->q))
status |= NVME_SC_DNR;
-   blk_mq_complete_request(req, status);
+   nvme_req(req)->status = status;
+   blk_mq_complete_request(req, 0);
+
 }
 EXPORT_SYMBOL_GPL(nvme_cancel_request);
 
@@ -357,6 +362,7 @@ int nvme_setup_cmd(struct nvme_ns *ns, struct request *req,
 
if (!(req->rq_flags & RQF_DONTPREP)) {
nvme_req(req)->retries = 0;
+   nvme_req(req)->flags = 0;
req->rq_flags |= RQF_DONTPREP;
}
 
@@ -413,7 +419,10 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct 
nvme_command *cmd,
blk_execute_rq(req->q, NULL, req, at_head);
if (result)
*result = nvme_req(req)->result;
-   ret = req->errors;
+   if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
+   ret = -EINTR;
+   else
+   ret = nvme_req(req)->status;
  out:
blk_mq_free_request(req);
return ret;
@@ -498,7 +507,10 @@ int __nvme_submit_user_cmd(struct request_queue *q, struct 
nvme_command *cmd,
}
  submit:
blk_execute_rq(req->q, disk, req, 0);
-   ret = req->errors;
+   if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
+   ret = -EINTR;
+   else
+   ret = nvme_req(req)->status;
if (result)
*result = le32_to_cpu(nvme_req(req)->result.u32);
if (meta && !ret && !write) {
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index aad7f9c0be32..450733c8cd24 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -1148,6 +1148,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
struct nvme_fc_queue *queue = op->queue;
struct nvme_completion *cqe = &op->rsp_iu.cqe;
__le16 status = cpu_to_le16(NVME_SC_SUCCESS << 1);
+   union nvme_result result;
 
/*
 * WARNING:
@@ -1215,7 +1216,7 @@ nvme_f

[PATCH 02/23] block: remove the blk_execute_rq return value

2017-04-19 Thread Christoph Hellwig
The function only returns -EIO if rq->errors is non-zero, which is not
very useful and lets a large number of callers ignore the return value.

Just let the callers figure out their error themselves.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
---
 block/blk-exec.c | 8 +---
 block/scsi_ioctl.c   | 3 ++-
 drivers/block/virtio_blk.c   | 3 ++-
 drivers/cdrom/cdrom.c| 3 ++-
 drivers/ide/ide-atapi.c  | 3 ++-
 drivers/ide/ide-cd.c | 3 ++-
 drivers/ide/ide-cd_ioctl.c   | 3 ++-
 drivers/ide/ide-devsets.c| 4 ++--
 drivers/ide/ide-disk.c   | 3 +--
 drivers/ide/ide-ioctls.c | 7 ---
 drivers/ide/ide-park.c   | 3 ++-
 drivers/ide/ide-pm.c | 3 ++-
 drivers/ide/ide-taskfile.c   | 4 ++--
 drivers/scsi/osd/osd_initiator.c | 5 -
 fs/nfsd/blocklayout.c| 5 +++--
 include/linux/blkdev.h   | 2 +-
 16 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/block/blk-exec.c b/block/blk-exec.c
index 8cd0e9bc8dc8..afa383248c7c 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -92,11 +92,10 @@ EXPORT_SYMBOL_GPL(blk_execute_rq_nowait);
  *Insert a fully prepared request at the back of the I/O scheduler queue
  *for execution and wait for completion.
  */
-int blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
+void blk_execute_rq(struct request_queue *q, struct gendisk *bd_disk,
   struct request *rq, int at_head)
 {
DECLARE_COMPLETION_ONSTACK(wait);
-   int err = 0;
unsigned long hang_check;
 
rq->end_io_data = &wait;
@@ -108,10 +107,5 @@ int blk_execute_rq(struct request_queue *q, struct gendisk 
*bd_disk,
while (!wait_for_completion_io_timeout(&wait, hang_check * 
(HZ/2)));
else
wait_for_completion_io(&wait);
-
-   if (rq->errors)
-   err = -EIO;
-
-   return err;
 }
 EXPORT_SYMBOL(blk_execute_rq);
diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 82a43bb19967..b1352143f12f 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -547,7 +547,8 @@ static int __blk_send_generic(struct request_queue *q, 
struct gendisk *bd_disk,
scsi_req(rq)->cmd[0] = cmd;
scsi_req(rq)->cmd[4] = data;
scsi_req(rq)->cmd_len = 6;
-   err = blk_execute_rq(q, bd_disk, rq, 0);
+   blk_execute_rq(q, bd_disk, rq, 0);
+   err = rq->errors ? -EIO : 0;
blk_put_request(rq);
 
return err;
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2d8290169271..eaf99022bdc6 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -310,7 +310,8 @@ static int virtblk_get_id(struct gendisk *disk, char 
*id_str)
if (err)
goto out;
 
-   err = blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
+   blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
+   err = req->errors ? -EIO : 0;
 out:
blk_put_request(req);
return err;
diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index 87739649eac2..308501730ab3 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -2218,7 +2218,8 @@ static int cdrom_read_cdda_bpc(struct cdrom_device_info 
*cdi, __u8 __user *ubuf,
rq->timeout = 60 * HZ;
bio = rq->bio;
 
-   if (blk_execute_rq(q, cdi->disk, rq, 0)) {
+   blk_execute_rq(q, cdi->disk, rq, 0);
+   if (rq->errors) {
struct request_sense *s = req->sense;
ret = -EIO;
cdi->last_sense = s->sense_key;
diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c
index feb30061123b..1524797e1776 100644
--- a/drivers/ide/ide-atapi.c
+++ b/drivers/ide/ide-atapi.c
@@ -107,7 +107,8 @@ int ide_queue_pc_tail(ide_drive_t *drive, struct gendisk 
*disk,
memcpy(scsi_req(rq)->cmd, pc->c, 12);
if (drive->media == ide_tape)
scsi_req(rq)->cmd[13] = REQ_IDETAPE_PC1;
-   error = blk_execute_rq(drive->queue, disk, rq, 0);
+   blk_execute_rq(drive->queue, disk, rq, 0);
+   error = rq->errors ? -EIO : 0;
 put_req:
blk_put_request(rq);
return error;
diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c
index 74f1b7dc03f7..95c40afa9120 100644
--- a/drivers/ide/ide-cd.c
+++ b/drivers/ide/ide-cd.c
@@ -452,7 +452,8 @@ int ide_cd_queue_pc(ide_drive_t *drive, const unsigned char 
*cmd,
}
}
 
-   error = blk_execute_rq(drive->queue, info->disk, rq, 0);
+   blk_execute_rq(drive->queue, info->disk, rq, 0);
+   error = rq->errors ? -EIO : 0;
 
if (buffer)
*bufflen = scsi_req(rq)->resid_len;
diff --git a/drivers/ide/ide-cd_ioctl.c b/drivers/ide/ide-cd_ioctl.c
index 9fcefbc8425e..f1ab726bd430 100644
--- a/drivers/ide/

[PATCH 01/23] pd: don't check blk_execute_rq return value.

2017-04-19 Thread Christoph Hellwig
The driver never sets req->errors, so blk_execute_rq will always return 0.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/paride/pd.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/paride/pd.c b/drivers/block/paride/pd.c
index b05e151c9b38..7d2402f90978 100644
--- a/drivers/block/paride/pd.c
+++ b/drivers/block/paride/pd.c
@@ -739,18 +739,15 @@ static int pd_special_command(struct pd_unit *disk,
  enum action (*func)(struct pd_unit *disk))
 {
struct request *rq;
-   int err = 0;
 
rq = blk_get_request(disk->gd->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
if (IS_ERR(rq))
return PTR_ERR(rq);
 
rq->special = func;
-
-   err = blk_execute_rq(disk->gd->queue, disk->gd, rq, 0);
-
+   blk_execute_rq(disk->gd->queue, disk->gd, rq, 0);
blk_put_request(rq);
-   return err;
+   return 0;
 }
 
 /* kernel glue structures */
-- 
2.11.0



kill req->errors V3

2017-04-19 Thread Christoph Hellwig
Currently the request structure has an errors field that is used in
various different ways.  The oldest drivers use it as an error count,
blk-mq and the generic timeout code assume that it holds a Linux
errno for block completions, and various drivers use it for internal
status values, often overwriting them with Linux errnos later,
that is unless they are processing passthrough requests in which
case they'll leave their errors in it.

This series kills the ->errors field and replaced it with new fields
in the drivers (or an odd hack in a union in struct request for
two bitrotting old drivers).

Also available in the following git tree:

git://git.infradead.org/users/hch/block.git request-errors

Gitweb:


http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/request-errors

Changes since V2;
 - reorder one patch to be earlier in the series
 - fix the argument to a bsg dprintk
 - fix a kerneldoc comment

Changes since V1:
 - rebased on top the latest block for-next tree
 - fix error handling in nfsd blocklayout
 - dropped "scsi: fix fast-fail for non-passthrough requests"


kill req->errors V3

2017-04-19 Thread Christoph Hellwig
Currently the request structure has an errors field that is used in
various different ways.  The oldest drivers use it as an error count,
blk-mq and the generic timeout code assume that it holds a Linux
errno for block completions, and various drivers use it for internal
status values, often overwriting them with Linux errnos later,
that is unless they are processing passthrough requests in which
case they'll leave their errors in it.

This series kills the ->errors field and replaced it with new fields
in the drivers (or an odd hack in a union in struct request for
two bitrotting old drivers).

Also available in the following git tree:

git://git.infradead.org/users/hch/block.git request-errors

Gitweb:


http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/request-errors

Changes since V2;
 - reorder one patch to be earlier in the series
 - fix the argument to a bsg dprintk
 - fix a kerneldoc comment

Changes since V1:
 - rebased on top the latest block for-next tree
 - fix error handling in nfsd blocklayout
 - dropped "scsi: fix fast-fail for non-passthrough requests"


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe


On 19/04/17 12:32 PM, Jason Gunthorpe wrote:
> On Wed, Apr 19, 2017 at 12:01:39PM -0600, Logan Gunthorpe wrote:
> Not entirely, it would have to call through the whole process
> including the arch_p2p_cross_segment()..

Hmm, yes. Though it's still not clear what, if anything,
arch_p2p_cross_segment would be doing. In my experience, if you are
going between host bridges, the CPU address (or PCI address -- I'm not
sure which seeing they are the same on my system) would still work fine
-- it just _may_ be a bad idea because of performance.

Similarly if you are crossing via a QPI bus or similar, I'd expect the
CPU address to work fine. But here the performance is even less likely
to be any good.


>  // Try the arch specific helper
>const struct dma_map_ops *comp_ops = get_dma_ops(completer);
>const struct dma_map_ops *init_ops = get_dma_ops(initiator);

So, in this case, what device does the completer point to? The PCI
device or a more specific GPU device? If it's the former, who's
responsible for setting the new dma_ops? Typically the dma_ops are arch
specific but now you'd be adding ones that are tied to hmm or the gpu.

>> I'm not sure I like the name pci_p2p_same_segment. It reads as though
>> it's only checking if the devices are not the same segment.
> 
> Well, that is exactly what it is doing. If it succeeds then the caller
> knows the DMA will not flow outside the segment and no iommu setup/etc
> is required.

It appears to me like it's calculating the DMA address, and the check is
just a side requirement. It reads as though it's only doing the check.

Logan


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Dan Williams
On Wed, Apr 19, 2017 at 11:41 AM, Logan Gunthorpe  wrote:
>
>
> On 19/04/17 12:30 PM, Dan Williams wrote:
>> Letting others users do the container_of() arrangement means that
>> struct page_map needs to become public and move into struct
>> dev_pagemap directly.
>
> Ah, yes, I got a bit turned around by that and failed to notice that
> page_map and dev_pagemap are different. Why is it that dev_pagemap
> contains pretty much the exact same information as page_map? The only
> thing gained that I can see is that the struct resource gains const
> protection...
>
>> ...I think that encapsulation loss is worth it for the gain of clearly
>> separating the HMM-case from the base case.
>
> Agreed.
>

Yeah, I forgot that dev_pagemap grew those fields, so we don't have
any real encapsulation today. So this would just be a pure cleanup to
kill struct page_map.


Re: [PATCH v3 3/4] sd: Make synchronize cache upon shutdown asynchronous

2017-04-19 Thread Bart Van Assche
On Tue, 2017-04-18 at 16:56 -0700, James Bottomley wrote:
> This means the combined 1/3 3/3 patch looks like this:
> [ ... ]

Hello James,

The two attached patches pass my tests. How would you like to proceed
with patch 1/2? Would you like to submit it yourself or is it OK for you
if I mention you as author and add your Signed-off-by?

Bart.From c395ce2aaf6d8a644311f4c55dfa6aa560a93240 Mon Sep 17 00:00:00 2001
From: Bart Van Assche 
Date: Tue, 28 Mar 2017 14:00:17 -0700
Subject: [PATCH 1/2] Introduce scsi_start_queue()

This patch does not change any functionality.

Signed-off-by: Bart Van Assche 
Cc: Israel Rukshin 
Cc: Max Gurtovoy 
Cc: Hannes Reinecke 
Cc: Benjamin Block 
---
 drivers/scsi/scsi_lib.c  | 25 +++--
 drivers/scsi/scsi_priv.h |  1 +
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index eecc005099b2..ffa6e61299a9 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2987,6 +2987,20 @@ scsi_internal_device_block(struct scsi_device *sdev, bool wait)
 }
 EXPORT_SYMBOL_GPL(scsi_internal_device_block);
  
+void scsi_start_queue(struct scsi_device *sdev)
+{
+	struct request_queue *q = sdev->request_queue;
+	unsigned long flags;
+
+	if (q->mq_ops) {
+		blk_mq_start_stopped_hw_queues(q, false);
+	} else {
+		spin_lock_irqsave(q->queue_lock, flags);
+		blk_start_queue(q);
+		spin_unlock_irqrestore(q->queue_lock, flags);
+	}
+}
+
 /**
  * scsi_internal_device_unblock - resume a device after a block request
  * @sdev:	device to resume
@@ -3007,9 +3021,6 @@ int
 scsi_internal_device_unblock(struct scsi_device *sdev,
 			 enum scsi_device_state new_state)
 {
-	struct request_queue *q = sdev->request_queue; 
-	unsigned long flags;
-
 	/*
 	 * Try to transition the scsi device to SDEV_RUNNING or one of the
 	 * offlined states and goose the device queue if successful.
@@ -3027,13 +3038,7 @@ scsi_internal_device_unblock(struct scsi_device *sdev,
 		 sdev->sdev_state != SDEV_OFFLINE)
 		return -EINVAL;
 
-	if (q->mq_ops) {
-		blk_mq_start_stopped_hw_queues(q, false);
-	} else {
-		spin_lock_irqsave(q->queue_lock, flags);
-		blk_start_queue(q);
-		spin_unlock_irqrestore(q->queue_lock, flags);
-	}
+	scsi_start_queue(sdev);
 
 	return 0;
 }
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index f11bd102d6d5..c7629e31a75b 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -89,6 +89,7 @@ extern void scsi_run_host_queues(struct Scsi_Host *shost);
 extern void scsi_requeue_run_queue(struct work_struct *work);
 extern struct request_queue *scsi_alloc_queue(struct scsi_device *sdev);
 extern struct request_queue *scsi_mq_alloc_queue(struct scsi_device *sdev);
+extern void scsi_start_queue(struct scsi_device *sdev);
 extern int scsi_mq_setup_tags(struct Scsi_Host *shost);
 extern void scsi_mq_destroy_tags(struct Scsi_Host *shost);
 extern int scsi_init_queue(void);
-- 
2.12.2

From 540cd36d79a46c634368365d3f0f5b8adf4fd687 Mon Sep 17 00:00:00 2001
From: Bart Van Assche 
Date: Tue, 18 Apr 2017 10:11:02 -0700
Subject: [PATCH 2/2] Make __scsi_remove_device go straight from BLOCKED to DEL

If a device is blocked, make __scsi_remove_device() cause it to
transition to the DEL state. This means that all the commands
issued in .shutdown() will error in the mid-layer, thus making
the removal proceed without being stopped.

This patch is a slightly modified version of a patch from James
Bottomley.

Signed-off-by: Bart Van Assche 
Cc: James Bottomley 
Cc: Israel Rukshin 
Cc: Max Gurtovoy 
Cc: Hannes Reinecke 
Cc: Benjamin Block 
---
 drivers/scsi/scsi_lib.c   |  2 +-
 drivers/scsi/scsi_sysfs.c | 15 +--
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ffa6e61299a9..376cd1da102c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2611,7 +2611,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		case SDEV_QUIESCE:
 		case SDEV_OFFLINE:
 		case SDEV_TRANSPORT_OFFLINE:
-		case SDEV_BLOCK:
 			break;
 		default:
 			goto illegal;
@@ -2625,6 +2624,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state)
 		case SDEV_OFFLINE:
 		case SDEV_TRANSPORT_OFFLINE:
 		case SDEV_CANCEL:
+		case SDEV_BLOCK:
 		case SDEV_CREATED_BLOCK:
 			break;
 		default:
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 82dfe07b1d47..732f1873f2fb 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1282,8 +1282,19 @@ void __scsi_remove_device(struct scsi_device *sdev)
 		return;
 
 	if (sdev->is_visible) {
-		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
-			return;
+		/*
+		 * If blocked, we go straight to DEL and restart the queue so
+		 * any commands issued during driver shutdown (like sync
+		 * cache) are errored immediately.
+		 */
+		if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) {
+			if (scsi_dev

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe


On 19/04/17 12:30 PM, Dan Williams wrote:
> Letting others users do the container_of() arrangement means that
> struct page_map needs to become public and move into struct
> dev_pagemap directly.

Ah, yes, I got a bit turned around by that and failed to notice that
page_map and dev_pagemap are different. Why is it that dev_pagemap
contains pretty much the exact same information as page_map? The only
thing gained that I can see is that the struct resource gains const
protection...

> ...I think that encapsulation loss is worth it for the gain of clearly
> separating the HMM-case from the base case.

Agreed.

Logan


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jason Gunthorpe
On Wed, Apr 19, 2017 at 12:01:39PM -0600, Logan Gunthorpe wrote:

> I'm just spit balling here but if HMM wanted to use unaddressable memory
> as a DMA target, it could set that function to create a window ine gpu
> memory, then call the pci_p2p_same_segment and return the result as the
> dma address.

Not entirely, it would have to call through the whole process
including the arch_p2p_cross_segment()..

Maybe we can start down the road of using ops for more iommu steps
with something like this as the helper:

dma_addr_t dma_map_pa(struct device *initiator, struct page *page,
  void *data)
{
 struct device *completer = get_p2p_completer(page);
 dma_addr_t pa;

 if (IS_ERR(completer))
   return SYSTEM_MEMORY;
   // Or maybe ?
   return init_ops->dma_map_pa(..);

 // Try the generic method
 pa = pci_p2p_same_segment(dev, p2p_src, page);
 if (pa != ERROR)
   goto out;

 // Try the arch specific helper
 const struct dma_map_ops *comp_ops = get_dma_ops(completer);
 const struct dma_map_ops *init_ops = get_dma_ops(initiator);

 /* FUTURE: Let something translate a HMM page into a DMA'ble
page, eg by mapping it into a GPU window. Maybe this
callback lives in devmap ? */
 page = comp_ops->translate_dma_page(completer, page);

 /* New dma_map_op is the same as arch_p2p_cross_segment in
prior version. Any arch specific data needed to program
the iommu flows through data */
 pa = init_ops->p2p_cross_segment_map(completer, inititator, page, 
data);

out:
 device_put(completer);
 return pa;
}

// map_sg op:
for (each sgl) {
struct page *page = sg_page(s);
struct arch_iommu_data data = {}; // pass through to ops->p2p_cross_segment
dma_addr_t pa;

pa = dma_map_pa(dev, page, &data)
if (pa == ERROR)
// fail

if (!data.needs_iommu) {
// Insert PA directly into the result SGL
sg++;
continue;
}

// Else pa & data describe how to setup the iommu
}

> > dma_addr_t pci_p2p_same_segment(struct device *initator,
> > struct device *completer,
> > struct page *page)
> 
> I'm not sure I like the name pci_p2p_same_segment. It reads as though
> it's only checking if the devices are not the same segment.

Well, that is exactly what it is doing. If it succeeds then the caller
knows the DMA will not flow outside the segment and no iommu setup/etc
is required.

That function cannot be expanded to include generic cross-segment
traffic, a new function would be needed..

Jason


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Dan Williams
On Wed, Apr 19, 2017 at 11:19 AM, Logan Gunthorpe  wrote:
>
>
> On 19/04/17 12:11 PM, Logan Gunthorpe wrote:
>>
>>
>> On 19/04/17 11:41 AM, Dan Williams wrote:
>>> No, not quite ;-). I still don't think we should require the non-HMM
>>> to pass NULL for all the HMM arguments. What I like about Logan's
>>> proposal is to have a separate create and register steps dev_pagemap.
>>> That way call paths that don't care about HMM specifics can just turn
>>> around and register the vanilla dev_pagemap.
>>
>> Would you necessarily even need a create step? I was thinking more along
>> the lines that struct dev_pagemap _could_ just be a member in another
>> structure. The caller would set the attributes they needed and pass it
>> to devm_memremap. (Similar to how we commonly do things with struct
>> device, et al). Potentially, that could also get rid of the need for the
>> *data pointer HMM is using to get back the struct hmm_devmem seeing
>> container_of could be used instead.
>
> Also, now that I've thought about it a little more, it _may_ be that
> many or all of the hmm specific fields in dev_pagemap could move to a
> containing struct too...

Yes, that's already how we handle struct page_map, it's an internal
implementation detail of devm_memremap_pages().

Letting others users do the container_of() arrangement means that
struct page_map needs to become public and move into struct
dev_pagemap directly.

...I think that encapsulation loss is worth it for the gain of clearly
separating the HMM-case from the base case.


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe


On 19/04/17 12:11 PM, Logan Gunthorpe wrote:
> 
> 
> On 19/04/17 11:41 AM, Dan Williams wrote:
>> No, not quite ;-). I still don't think we should require the non-HMM
>> to pass NULL for all the HMM arguments. What I like about Logan's
>> proposal is to have a separate create and register steps dev_pagemap.
>> That way call paths that don't care about HMM specifics can just turn
>> around and register the vanilla dev_pagemap.
> 
> Would you necessarily even need a create step? I was thinking more along
> the lines that struct dev_pagemap _could_ just be a member in another
> structure. The caller would set the attributes they needed and pass it
> to devm_memremap. (Similar to how we commonly do things with struct
> device, et al). Potentially, that could also get rid of the need for the
> *data pointer HMM is using to get back the struct hmm_devmem seeing
> container_of could be used instead.

Also, now that I've thought about it a little more, it _may_ be that
many or all of the hmm specific fields in dev_pagemap could move to a
containing struct too...

Logan


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe


On 19/04/17 11:41 AM, Dan Williams wrote:
> No, not quite ;-). I still don't think we should require the non-HMM
> to pass NULL for all the HMM arguments. What I like about Logan's
> proposal is to have a separate create and register steps dev_pagemap.
> That way call paths that don't care about HMM specifics can just turn
> around and register the vanilla dev_pagemap.

Would you necessarily even need a create step? I was thinking more along
the lines that struct dev_pagemap _could_ just be a member in another
structure. The caller would set the attributes they needed and pass it
to devm_memremap. (Similar to how we commonly do things with struct
device, et al). Potentially, that could also get rid of the need for the
*data pointer HMM is using to get back the struct hmm_devmem seeing
container_of could be used instead.

>> Note i won't make any change now on that front but if it make sense
>> i am happy to do it as separate patchset on top of HMM.
>>
>> Also i don't want p2pmem to be an exclusive or with HMM, we will want
>> GPU to do peer to peer DMA and thus HMM ZONE_DEVICE pages to support
>> this too.
> 
> Yes, this makes sense I think we really just want to distinguish host
> memory or not in terms of the dev_pagemap type.

That depends though. If unaddressable memory needs different steps to
get to the DMA address (ie if it has to setup a gpu window) then we may
still need a way to distinguish the two types of non-host memory.

Logan



Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe


On 19/04/17 11:14 AM, Jason Gunthorpe wrote:
> I don't see a use for the dma_map function pointer at this point..

Yes, it is kind of like designing for the future. I just find it a
little odd calling the pci functions in the iommu.


> It doesn't make alot of sense for the completor of the DMA to provide
> a mapping op, the mapping process is *path* specific, not specific to
> a completer/initiator.

I'm just spit balling here but if HMM wanted to use unaddressable memory
as a DMA target, it could set that function to create a window ine gpu
memory, then call the pci_p2p_same_segment and return the result as the
dma address.


> dma_addr_t pci_p2p_same_segment(struct device *initator,
> struct device *completer,
>   struct page *page)

I'm not sure I like the name pci_p2p_same_segment. It reads as though
it's only checking if the devices are not the same segment. It also may
be that, in the future, it supports devices on different segments. I'd
call it more like pci_p2p_dma_map.


> for (each sgl) {

Thanks, this code fleshes things out nicely


> To me it looks like the code duplication across the iommu stuff comes
> from just duplicating the basic iommu algorithm in every driver.

Yes, this is true.

> To clean that up I think someone would need to hoist the overall sgl
> loop and use more ops callbacks eg allocate_iommu_range,
> assign_page_to_rage, dealloc_range, etc. This is a problem p2p makes
> worse, but isn't directly causing :\

Yup.

Logan


Re: blk-mq: remove the error argument to blk_mq_complete_request

2017-04-19 Thread h...@lst.de
On Tue, Apr 18, 2017 at 10:50:18PM +, Bart Van Assche wrote:
> On Tue, 2017-04-18 at 08:52 -0700, Christoph Hellwig wrote:
> > Now that we always have a ->complete callback we can remove the direct
> > call to blk_mq_end_request, as well as the error argument to
> > blk_mq_complete_request.
> 
> Hello Christoph,
> 
> Please add a runtime check that issues a warning early if a .complete
> callback function is missing.
> 
> Are you sure that all blk-mq drivers have a .complete callback? In the
> request-errors branch of your block git tree I found the following:

I think the problem is that my above changelog was wrong - we only
require the complete_rq method if the driver calls the
blk_mq_complete_request function.  Both rbd and ubiblock don't in
the tree.  They probably should, but that's work for a separate
series.


Re: [PATCH] scsi: osd_uld: fix mismerge

2017-04-19 Thread Bart Van Assche
On Wed, 2017-04-19 at 19:44 +0200, Arnd Bergmann wrote:
> Fixes: c02465fa13b6 ("scsi: osd_uld: Check scsi_device_get() return value")
> Fixes: ac1ddc584e98 ("scsi: utilize new cdev_device_add helper function")

Hello Arnd,

Both these patches are fine as far as I can tell. Shouldn't the
"Fixes" tag refer to the merge commit instead of to the commits
that have been merged?

Bart.

[PATCH] scsi: osd_uld: fix mismerge

2017-04-19 Thread Arnd Bergmann
A mismerge between two branches in linux-next reintroduced a warning that was
previously resolved:

drivers/scsi/osd/osd_uld.c: In function 'osd_probe':
drivers/scsi/osd/osd_uld.c:457:2: error: ignoring return value of 
'scsi_device_get', declared with attribute warn_unused_result 
[-Werror=unused-result]

The original fix had more complex error unrolling, but as far as I
can tell, this simpler version is now sufficient.

Fixes: c02465fa13b6 ("scsi: osd_uld: Check scsi_device_get() return value")
Fixes: ac1ddc584e98 ("scsi: utilize new cdev_device_add helper function")
Signed-off-by: Arnd Bergmann 
---
 drivers/scsi/osd/osd_uld.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c
index 8b9941a5687a..0e56f1eb05dc 100644
--- a/drivers/scsi/osd/osd_uld.c
+++ b/drivers/scsi/osd/osd_uld.c
@@ -454,7 +454,8 @@ static int osd_probe(struct device *dev)
/* hold one more reference to the scsi_device that will get released
 * in __release, in case a logout is happening while fs is mounted
 */
-   scsi_device_get(scsi_device);
+   if (scsi_device_get(scsi_device))
+   goto err_retract_minor;
osd_dev_init(&oud->od, scsi_device);
 
/* allocate a disk and set it up */
-- 
2.9.0



Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Dan Williams
On Wed, Apr 19, 2017 at 10:32 AM, Jerome Glisse  wrote:
> On Wed, Apr 19, 2017 at 10:01:23AM -0700, Dan Williams wrote:
>> On Wed, Apr 19, 2017 at 9:48 AM, Logan Gunthorpe  wrote:
>> >
>> >
>> > On 19/04/17 09:55 AM, Jason Gunthorpe wrote:
>> >> I was thinking only this one would be supported with a core code
>> >> helper..
>> >
>> > Pivoting slightly: I was looking at how HMM uses ZONE_DEVICE. They add a
>> > type flag to the dev_pagemap structure which would be very useful to us.
>> > We could add another MEMORY_DEVICE_P2P type to distinguish p2p pages.
>> > Then, potentially, we could add a dma_map callback to the structure
>> > (possibly unioned with an hmm field). The dev_ops providers would then
>> > just need to do something like this (enclosed in a helper):
>> >
>> > if (is_zone_device_page(page)) {
>> > pgmap = get_dev_pagemap(page_to_pfn(page));
>> > if (!pgmap || pgmap->type !=  MEMORY_DEVICE_P2P ||
>> > !pgmap->dma_map)
>> > return 0;
>> >
>> > dma_addr = pgmap->dma_map(dev, pgmap->dev, page);
>> > put_dev_pagemap(pgmap);
>> > if (!dma_addr)
>> > return 0;
>> > ...
>> > }
>> >
>> > The pci_enable_p2p_bar function would then just need to call
>> > devm_memremap_pages with the dma_map callback set to a function that
>> > does the segment check and the offset calculation.
>> >
>> > Thoughts?
>> >
>> > @Jerome: my feedback to you would be that your patch assumes all users
>> > of devm_memremap_pages are MEMORY_DEVICE_PERSISTENT. It would be more
>> > useful if it was generic. My suggestion would be to have the caller
>> > allocate the dev_pagemap structure, populate it and pass it into
>> > devm_memremap_pages. Given that pretty much everything in that structure
>> > are already arguments to that function, I feel like this makes sense.
>> > This should also help to unify hmm_devmem_pages_create and
>> > devm_memremap_pages which look very similar to each other.
>>
>> I like that change. Also the types should describe the memory relative
>> to its relationship to struct page, not whether it is persistent or
>> not. I would consider volatile and persistent memory that is attached
>> to the cpu memory controller and i/o coherent as the same type of
>> memory. DMA incoherent ranges like P2P and HMM should get their own
>> types.
>
> Dan you asked me to not use devm_memremap_pages() because you didn't
> want to have HMM memory in the pgmap_radix, did you change opinion
> on that ? :)

No, not quite ;-). I still don't think we should require the non-HMM
to pass NULL for all the HMM arguments. What I like about Logan's
proposal is to have a separate create and register steps dev_pagemap.
That way call paths that don't care about HMM specifics can just turn
around and register the vanilla dev_pagemap.

> Note i won't make any change now on that front but if it make sense
> i am happy to do it as separate patchset on top of HMM.
>
> Also i don't want p2pmem to be an exclusive or with HMM, we will want
> GPU to do peer to peer DMA and thus HMM ZONE_DEVICE pages to support
> this too.

Yes, this makes sense I think we really just want to distinguish host
memory or not in terms of the dev_pagemap type.


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jerome Glisse
On Wed, Apr 19, 2017 at 10:01:23AM -0700, Dan Williams wrote:
> On Wed, Apr 19, 2017 at 9:48 AM, Logan Gunthorpe  wrote:
> >
> >
> > On 19/04/17 09:55 AM, Jason Gunthorpe wrote:
> >> I was thinking only this one would be supported with a core code
> >> helper..
> >
> > Pivoting slightly: I was looking at how HMM uses ZONE_DEVICE. They add a
> > type flag to the dev_pagemap structure which would be very useful to us.
> > We could add another MEMORY_DEVICE_P2P type to distinguish p2p pages.
> > Then, potentially, we could add a dma_map callback to the structure
> > (possibly unioned with an hmm field). The dev_ops providers would then
> > just need to do something like this (enclosed in a helper):
> >
> > if (is_zone_device_page(page)) {
> > pgmap = get_dev_pagemap(page_to_pfn(page));
> > if (!pgmap || pgmap->type !=  MEMORY_DEVICE_P2P ||
> > !pgmap->dma_map)
> > return 0;
> >
> > dma_addr = pgmap->dma_map(dev, pgmap->dev, page);
> > put_dev_pagemap(pgmap);
> > if (!dma_addr)
> > return 0;
> > ...
> > }
> >
> > The pci_enable_p2p_bar function would then just need to call
> > devm_memremap_pages with the dma_map callback set to a function that
> > does the segment check and the offset calculation.
> >
> > Thoughts?
> >
> > @Jerome: my feedback to you would be that your patch assumes all users
> > of devm_memremap_pages are MEMORY_DEVICE_PERSISTENT. It would be more
> > useful if it was generic. My suggestion would be to have the caller
> > allocate the dev_pagemap structure, populate it and pass it into
> > devm_memremap_pages. Given that pretty much everything in that structure
> > are already arguments to that function, I feel like this makes sense.
> > This should also help to unify hmm_devmem_pages_create and
> > devm_memremap_pages which look very similar to each other.
> 
> I like that change. Also the types should describe the memory relative
> to its relationship to struct page, not whether it is persistent or
> not. I would consider volatile and persistent memory that is attached
> to the cpu memory controller and i/o coherent as the same type of
> memory. DMA incoherent ranges like P2P and HMM should get their own
> types.

Dan you asked me to not use devm_memremap_pages() because you didn't
want to have HMM memory in the pgmap_radix, did you change opinion
on that ? :)

Note i won't make any change now on that front but if it make sense
i am happy to do it as separate patchset on top of HMM.

Also i don't want p2pmem to be an exclusive or with HMM, we will want
GPU to do peer to peer DMA and thus HMM ZONE_DEVICE pages to support
this too.

I do believe it is easier to special case in ZONE_DEVICE into existing
dma_ops for each architecture. For x86 i think there is only 3 different
set of dma_ops to modify. For other arch i guess there is even less.

But in all the case i think p2pmem should stay out of memory management
business. If some set of device do not have memory management it is
better to propose helper to do that as part of the subsystem to which
those devices belong. Just wanted to reiterate that point.

Cheers,
Jérôme


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jason Gunthorpe
On Wed, Apr 19, 2017 at 10:48:51AM -0600, Logan Gunthorpe wrote:
> The pci_enable_p2p_bar function would then just need to call
> devm_memremap_pages with the dma_map callback set to a function that
> does the segment check and the offset calculation.

I don't see a use for the dma_map function pointer at this point..

It doesn't make alot of sense for the completor of the DMA to provide
a mapping op, the mapping process is *path* specific, not specific to
a completer/initiator.

So, I would suggest more like this:

static inline struct device *get_p2p_src(struct page *page)
{
struct device *res;
struct dev_pagemap *pgmap;

if (!is_zone_device_page(page))
 return NULL;

pgmap = get_dev_pagemap(page_to_pfn(page), NULL);
if (!pgmap || pgmap->type !=  MEMORY_DEVICE_P2P)
/* For now ZONE_DEVICE memory that is not P2P is
   assumed to be configured for DMA the same as CPU
   memory. */
return ERR_PTR(-EINVAL);
res = pgmap->dev;
device_get(res);
put_dev_pagemap(pgmap);
return res;
}

dma_addr_t pci_p2p_same_segment(struct device *initator,
struct device *completer,
struct page *page)
{
   if (! PCI initiator & completer)
   return ERROR;
   if (!same segment initiator & completer)
   return ERROR;

   // Translate page directly to the value programmed into the BAR
   return (Completer's PCI BAR base address) + (offset of page within BAR);
}

// dma_sg_map

for (each sgl) {
struct page *page = sg_page(s);
struct device *p2p_src = get_p2p_src(page);

if (IS_ERR(p2p_src))
// fail dma_sg

if (p2p_src) {
bool needs_iommu = false;

pa = pci_p2p_same_segment(dev, p2p_src, page);
if (pa == ERROR)
pa = arch_p2p_cross_segment(dev, p2psrc, page, &needs_iommui);

device_put(p2p_src);

if (pa == ERROR)
// fail

if (!needs_iommu) {  
// Insert PA directly into the result SGL
sg++;
continue;
}
}
else
// CPU memory
pa = page_to_phys(page);


To me it looks like the code duplication across the iommu stuff comes
from just duplicating the basic iommu algorithm in every driver.

To clean that up I think someone would need to hoist the overall sgl
loop and use more ops callbacks eg allocate_iommu_range,
assign_page_to_rage, dealloc_range, etc. This is a problem p2p makes
worse, but isn't directly causing :\

Jason


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Dan Williams
On Wed, Apr 19, 2017 at 9:48 AM, Logan Gunthorpe  wrote:
>
>
> On 19/04/17 09:55 AM, Jason Gunthorpe wrote:
>> I was thinking only this one would be supported with a core code
>> helper..
>
> Pivoting slightly: I was looking at how HMM uses ZONE_DEVICE. They add a
> type flag to the dev_pagemap structure which would be very useful to us.
> We could add another MEMORY_DEVICE_P2P type to distinguish p2p pages.
> Then, potentially, we could add a dma_map callback to the structure
> (possibly unioned with an hmm field). The dev_ops providers would then
> just need to do something like this (enclosed in a helper):
>
> if (is_zone_device_page(page)) {
> pgmap = get_dev_pagemap(page_to_pfn(page));
> if (!pgmap || pgmap->type !=  MEMORY_DEVICE_P2P ||
> !pgmap->dma_map)
> return 0;
>
> dma_addr = pgmap->dma_map(dev, pgmap->dev, page);
> put_dev_pagemap(pgmap);
> if (!dma_addr)
> return 0;
> ...
> }
>
> The pci_enable_p2p_bar function would then just need to call
> devm_memremap_pages with the dma_map callback set to a function that
> does the segment check and the offset calculation.
>
> Thoughts?
>
> @Jerome: my feedback to you would be that your patch assumes all users
> of devm_memremap_pages are MEMORY_DEVICE_PERSISTENT. It would be more
> useful if it was generic. My suggestion would be to have the caller
> allocate the dev_pagemap structure, populate it and pass it into
> devm_memremap_pages. Given that pretty much everything in that structure
> are already arguments to that function, I feel like this makes sense.
> This should also help to unify hmm_devmem_pages_create and
> devm_memremap_pages which look very similar to each other.

I like that change. Also the types should describe the memory relative
to its relationship to struct page, not whether it is persistent or
not. I would consider volatile and persistent memory that is attached
to the cpu memory controller and i/o coherent as the same type of
memory. DMA incoherent ranges like P2P and HMM should get their own
types.


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Logan Gunthorpe


On 19/04/17 09:55 AM, Jason Gunthorpe wrote:
> I was thinking only this one would be supported with a core code
> helper..

Pivoting slightly: I was looking at how HMM uses ZONE_DEVICE. They add a
type flag to the dev_pagemap structure which would be very useful to us.
We could add another MEMORY_DEVICE_P2P type to distinguish p2p pages.
Then, potentially, we could add a dma_map callback to the structure
(possibly unioned with an hmm field). The dev_ops providers would then
just need to do something like this (enclosed in a helper):

if (is_zone_device_page(page)) {
pgmap = get_dev_pagemap(page_to_pfn(page));
if (!pgmap || pgmap->type !=  MEMORY_DEVICE_P2P ||
!pgmap->dma_map)
return 0;

dma_addr = pgmap->dma_map(dev, pgmap->dev, page);
put_dev_pagemap(pgmap);
if (!dma_addr)
return 0;
...
}

The pci_enable_p2p_bar function would then just need to call
devm_memremap_pages with the dma_map callback set to a function that
does the segment check and the offset calculation.

Thoughts?

@Jerome: my feedback to you would be that your patch assumes all users
of devm_memremap_pages are MEMORY_DEVICE_PERSISTENT. It would be more
useful if it was generic. My suggestion would be to have the caller
allocate the dev_pagemap structure, populate it and pass it into
devm_memremap_pages. Given that pretty much everything in that structure
are already arguments to that function, I feel like this makes sense.
This should also help to unify hmm_devmem_pages_create and
devm_memremap_pages which look very similar to each other.

Logan


Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-19 Thread Jason Gunthorpe
On Wed, Apr 19, 2017 at 11:20:06AM +1000, Benjamin Herrenschmidt wrote:

> That helper wouldn't perform the actual iommu mapping. It would simply
> return something along the lines of:
> 
>  - "use that alternate bus address and don't map in the iommu"

I was thinking only this one would be supported with a core code
helper..

>  - "use that alternate bus address and do map in the iommu"
>  - "proceed as normal"

.. because I don't have an idea how a core code helper could figure
out what the platform needs for the above two ..

I think many of the iommu platforms will need to construct their own
bus address in the iommu, and we already have easy access to the CPU
address.

Presumably once a transcation passes through the iommu it needs to be
using the CPU address for the target bar, otherwise things are going
to be ambiguous..

Jason


Re: RFC: drop the T10 OSD code and its users

2017-04-19 Thread Jens Axboe
On Wed, Apr 12 2017, Christoph Hellwig wrote:
> The only real user of the T10 OSD protocol, the pNFS object layout
> driver never went to the point of having shipping products, and the
> other two users (osdblk and exofs) were simple example of it's usage.
> 
> The code has been mostly unmaintained for years and is getting in the
> way of block / SCSI changes, so I think it's finally time to drop it.
> 
> These patches are against Jens' block for-next tree as that already

I've added 1/4 for now, as that one should obviously go. If we have
consensus to kill the rest, I'm all for it.

-- 
Jens Axboe



Re: [PATCH] virtio_scsi: Always try to read VPD pages

2017-04-19 Thread Paolo Bonzini


On 13/04/2017 15:39, Stefan Hajnoczi wrote:
> On Thu, Apr 13, 2017 at 12:13:00PM +1000, David Gibson wrote:
>> @@ -705,6 +706,28 @@ static int virtscsi_device_reset(struct scsi_cmnd *sc)
>>  return virtscsi_tmf(vscsi, cmd);
>>  }
>>  
>> +static int virtscsi_device_alloc(struct scsi_device *sdevice)
>> +{
>> +/*
>> + * Passed through SCSI targets (e.g. with qemu's 'scsi-block')
>> + * may have transfer limits which come from the host SCSI
>> + * controller something on the host side other than the target
> 
> s/controller something/controller or something/ ?

Acked-by: Paolo Bonzini 

with this change.

Paolo


Re: [PATCH 26/27] scsi: sd: Separate zeroout and discard command choices

2017-04-19 Thread Paolo Bonzini


On 05/04/2017 19:21, Christoph Hellwig wrote:
> +static ssize_t
> +zeroing_mode_store(struct device *dev, struct device_attribute *attr,
> +const char *buf, size_t count)
> +{
> + struct scsi_disk *sdkp = to_scsi_disk(dev);
> +
> + if (!capable(CAP_SYS_ADMIN))
> + return -EACCES;
> +
> +
> + if (!strncmp(buf, zeroing_mode[SD_ZERO_WRITE], 20))
> + sdkp->zeroing_mode = SD_ZERO_WRITE;
> + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS], 20))
> + sdkp->zeroing_mode = SD_ZERO_WS;
> + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS16_UNMAP], 20))
> + sdkp->zeroing_mode = SD_ZERO_WS16_UNMAP;
> + else if (!strncmp(buf, zeroing_mode[SD_ZERO_WS10_UNMAP], 20))
> + sdkp->zeroing_mode = SD_ZERO_WS10_UNMAP;

Should this be conditional on lbprz, lbpws, lbpws10 and max_ws_blocks?

Thanks,

Paolo


Re: [PATCH 1/4] block: remove the osdblk driver

2017-04-19 Thread Boaz Harrosh
On 04/12/2017 07:01 PM, Christoph Hellwig wrote:
> This was just a proof of concept user for the SCSI OSD library, and
> never had any real users.
> 
> Signed-off-by: Christoph Hellwig 

Yes please remove this driver

ACK-by Boaz Harrosh 

> ---
>  drivers/block/Kconfig  |  16 --
>  drivers/block/Makefile |   1 -
>  drivers/block/osdblk.c | 693 
> -
>  3 files changed, 710 deletions(-)
>  delete mode 100644 drivers/block/osdblk.c
> 
> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
> index a1c2e816128f..58e24c354933 100644
> --- a/drivers/block/Kconfig
> +++ b/drivers/block/Kconfig
> @@ -312,22 +312,6 @@ config BLK_DEV_SKD
>  
>   Use device /dev/skd$N amd /dev/skd$Np$M.
>  
> -config BLK_DEV_OSD
> - tristate "OSD object-as-blkdev support"
> - depends on SCSI_OSD_ULD
> - ---help---
> -   Saying Y or M here will allow the exporting of a single SCSI
> -   OSD (object-based storage) object as a Linux block device.
> -
> -   For example, if you create a 2G object on an OSD device,
> -   you can then use this module to present that 2G object as
> -   a Linux block device.
> -
> -   To compile this driver as a module, choose M here: the
> -   module will be called osdblk.
> -
> -   If unsure, say N.
> -
>  config BLK_DEV_SX8
>   tristate "Promise SATA SX8 support"
>   depends on PCI
> diff --git a/drivers/block/Makefile b/drivers/block/Makefile
> index b12c772bbeb3..42e8cee1cbc7 100644
> --- a/drivers/block/Makefile
> +++ b/drivers/block/Makefile
> @@ -22,7 +22,6 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o
>  obj-$(CONFIG_MG_DISK)+= mg_disk.o
>  obj-$(CONFIG_SUNVDC) += sunvdc.o
>  obj-$(CONFIG_BLK_DEV_SKD)+= skd.o
> -obj-$(CONFIG_BLK_DEV_OSD)+= osdblk.o
>  
>  obj-$(CONFIG_BLK_DEV_UMEM)   += umem.o
>  obj-$(CONFIG_BLK_DEV_NBD)+= nbd.o
> diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
> deleted file mode 100644
> index 8127b8201a01..
> --- a/drivers/block/osdblk.c
> +++ /dev/null
> @@ -1,693 +0,0 @@
> -
> -/*
> -   osdblk.c -- Export a single SCSI OSD object as a Linux block device
> -
> -
> -   Copyright 2009 Red Hat, Inc.
> -
> -   This program is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU General Public License as published by
> -   the Free Software Foundation.
> -
> -   This program is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU General Public License for more details.
> -
> -   You should have received a copy of the GNU General Public License
> -   along with this program; see the file COPYING.  If not, write to
> -   the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
> -
> -
> -   Instructions for use
> -   
> -
> -   1) Map a Linux block device to an existing OSD object.
> -
> -  In this example, we will use partition id 1234, object id 5678,
> -  OSD device /dev/osd1.
> -
> -  $ echo "1234 5678 /dev/osd1" > /sys/class/osdblk/add
> -
> -
> -   2) List all active blkdev<->object mappings.
> -
> -  In this example, we have performed step #1 twice, creating two blkdevs,
> -  mapped to two separate OSD objects.
> -
> -  $ cat /sys/class/osdblk/list
> -  0 174 1234 5678 /dev/osd1
> -  1 179 1994 897123 /dev/osd0
> -
> -  The columns, in order, are:
> -  - blkdev unique id
> -  - blkdev assigned major
> -  - OSD object partition id
> -  - OSD object id
> -  - OSD device
> -
> -
> -   3) Remove an active blkdev<->object mapping.
> -
> -  In this example, we remove the mapping with blkdev unique id 1.
> -
> -  $ echo 1 > /sys/class/osdblk/remove
> -
> -
> -   NOTE:  The actual creation and deletion of OSD objects is outside the 
> scope
> -   of this driver.
> -
> - */
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#define DRV_NAME "osdblk"
> -#define PFX DRV_NAME ": "
> -
> -/* #define _OSDBLK_DEBUG */
> -#ifdef _OSDBLK_DEBUG
> -#define OSDBLK_DEBUG(fmt, a...) \
> - printk(KERN_NOTICE "osdblk @%s:%d: " fmt, __func__, __LINE__, ##a)
> -#else
> -#define OSDBLK_DEBUG(fmt, a...) \
> - do { if (0) printk(fmt, ##a); } while (0)
> -#endif
> -
> -MODULE_AUTHOR("Jeff Garzik ");
> -MODULE_DESCRIPTION("block device inside an OSD object osdblk.ko");
> -MODULE_LICENSE("GPL");
> -
> -struct osdblk_device;
> -
> -enum {
> - OSDBLK_MINORS_PER_MAJOR = 256,  /* max minors per blkdev */
> - OSDBLK_MAX_REQ  = 32,   /* max parallel requests */
> - OSDBLK_OP_TIMEOUT   = 4 * 60,   /* sync OSD req timeout */
> -};
> -
> -struct osdblk_request {
> - struct request  *rq;/* blk layer request */
> - struct bio