[Bug 188061] On quad port QLE2564 can't add in target only 2 ports

2017-03-02 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=188061

--- Comment #5 from Anthony (anthony.blood...@gmail.com) ---
I've installed another 4-port card and got same issue - port names have
variable part at highest bits. On 2-port card port names have variable part at
lower bits:
port_name   = "0x2124ff3bcb3e"
port_name   = "0x2124ff3bcb3f"
I'm trying to change a port name in NVRAM with FLASUTIL.EXE with no luck.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


[PATCH] qla2xxx: Fix ql_dump_buffer

2017-03-02 Thread Joe Perches
Recent printk changes for KERN_CONT cause this logging to be
defectively emitted on multiple lines.  Fix it.

Also reduces object size a trivial amount.

$ size drivers/scsi/qla2xxx/qla_dbg.o*
   textdata bss dec hex filename
  39125   0   0   3912598d5 drivers/scsi/qla2xxx/qla_dbg.o.new
  39164   0   0   3916498fc drivers/scsi/qla2xxx/qla_dbg.o.old

Signed-off-by: Joe Perches 
---
 drivers/scsi/qla2xxx/qla_dbg.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/qla2xxx/qla_dbg.c b/drivers/scsi/qla2xxx/qla_dbg.c
index 21d9fb7fc887..51b4179469d1 100644
--- a/drivers/scsi/qla2xxx/qla_dbg.c
+++ b/drivers/scsi/qla2xxx/qla_dbg.c
@@ -2707,13 +2707,9 @@ ql_dump_buffer(uint32_t level, scsi_qla_host_t *vha, 
int32_t id,
"%-+5d  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F\n", size);
ql_dbg(level, vha, id,
"- ---\n");
-   for (cnt = 0; cnt < size; cnt++, buf++) {
-   if (cnt % 16 == 0)
-   ql_dbg(level, vha, id, "%04x:", cnt & ~0xFU);
-   printk(" %02x", *buf);
-   if (cnt % 16 == 15)
-   printk("\n");
+   for (cnt = 0; cnt < size; cnt += 16) {
+   ql_dbg(level, vha, id, "%04x: ", cnt);
+   print_hex_dump(KERN_CONT, "", DUMP_PREFIX_NONE, 16, 1,
+  buf + cnt, min(16U, size - cnt), false);
}
-   if (cnt % 16 != 0)
-   printk("\n");
 }
-- 
2.10.0.rc2.1.g053435c



[PATCH] aacraid: Fix typo in blink status

2017-03-02 Thread Raghava Aditya Renukunta
The return status of the adapter check on KERNEL_PANIC is supposed
to be the upper 16 bits of the OMR status register.

Fixes: c421530bf848604e (scsi: aacraid: Reorder Adpater status check)
Reported-by: Dan Carpenter 
Signed-off-by: Raghava Aditya Renukunta 
---
 drivers/scsi/aacraid/src.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/aacraid/src.c b/drivers/scsi/aacraid/src.c
index 2e5338d..7b0410e 100644
--- a/drivers/scsi/aacraid/src.c
+++ b/drivers/scsi/aacraid/src.c
@@ -468,7 +468,7 @@ static int aac_src_check_health(struct aac_dev *dev)
return -1;
 
 err_blink:
-   return (status > 16) & 0xFF;
+   return (status >> 16) & 0xFF;
 }
 
 static inline u32 aac_get_vector(struct aac_dev *dev)
-- 
1.8.3.1



Re: SCSI regression in 4.11

2017-03-02 Thread James Bottomley
On March 2, 2017 10:23:24 AM PST, Stephen Hemminger 
 wrote:
>On Thu, 2 Mar 2017 14:25:14 +0100
>Hannes Reinecke  wrote:
>
>> On 03/02/2017 02:40 AM, Stephen Hemminger wrote:
>> > On Thu, 2 Mar 2017 01:56:15 +0100
>> > Christoph Hellwig  wrote:
>> >  
>> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote:
> 
>> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger
>wrote:  
>> >
>   
> http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4
> 
>> 
>>  It appears that is already in the code I am testing in
>linux-next...  
>> >>>
>> >>> It's in -next now, but it wasn't at the time you reported the
>bug.
>> >>>
>> >>> And it would sortof explain the bug if the INQUIRY data is
>correct
>> >>> in the scatterlist, but we ignore it, given that scsi_probe_lun
>> >>> ignores the result based on sense data.
>> >>>
>> >>> Can you check what happens with the horrible hack below:  
>> >>
>> >> Strike that - we're checking result later, so this can't be the
>case.
>> >>
>> >> Now the other interesting thing is the memset in __scsi_exectute,
>> >> which looks very suspicious.  Try the following please:
>> >>
>> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> >> index 3e32dc954c3c..22f4fb550561 100644
>> >> --- a/drivers/scsi/scsi_lib.c
>> >> +++ b/drivers/scsi/scsi_lib.c
>> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device
>*sdev, const unsigned char *cmd,
>> >>* and prevent security leaks by zeroing out the excess data.
>> >>*/
>> >>   if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
>> >> - memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len);
>> >> +//   memset(buffer + (bufflen - rq->resid_len), 0, 
>> >> rq->resid_len);
>> >> + printk_ratelimited("%s: got resid %d\n", __func__,
>rq->resid_len);
>> >>
>> >>   if (resid)
>> >>   *resid = rq->resid_len;  
>> >
>> >
>> > Still fails but does print resid on some of the later INQUIRY
>commands (not the initial one).
>> >  
>> Can you test what happens if you blank out the storvsc_drv
>workaround:
>> 
>> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
>> index 585e54f..c36f42d 100644
>> --- a/drivers/scsi/storvsc_drv.c
>> +++ b/drivers/scsi/storvsc_drv.c
>> @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct 
>> storvsc_device *stor_device,
>>   * We do this so we can distinguish truly fatal failues
>>   * (srb status == 0x4) and off-line the device in that case.
>>   */
>> -
>> +#if 0
>>  if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
>> (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
>>  vstor_packet->vm_srb.scsi_status = 0;
>>  vstor_packet->vm_srb.srb_status =
>SRB_STATUS_SUCCESS;
>>  }
>> -
>> +#endif
>> 
>>  /* Copy over the status...etc */
>>  stor_pkt->vm_srb.scsi_status =
>vstor_packet->vm_srb.scsi_status;
>> 
>> It might thappen that we're fail to interpret the 'Device not
>present' 
>> status correctly (which will happen for non-connected DVDs) causing
>the 
>> SCSI stack to make incorrect decisions later on.
>> 
>> Cheers,
>> 
>> Hannes
>
>There are several oddities about the host SCSI interface that I see:
> 1. The host bus seems to report up to 6 devices even though only 2 are
> present (Disk and CDROM).
>2. The CDROM emulation doesn't report the same status as a real device.
> 3. The host emulation of SCSI doesn't support all the page codes which
> is why there is the hack.
>
>But as James said, these don't appear to be related to the failure
>because
>the code worked before and only in post 4.11 merege is there a problem.

Your wait for the hang trace is the most suggestive.   It says we're waiting 
for a partition read to the spurious device.  Previously this would have failed 
or timed out, so this seems to be the root cause.

James


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


RE: [bug report] scsi: aacraid: Reorder Adapter status check

2017-03-02 Thread Dave Carroll
> -Original Message-
> From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
> ow...@vger.kernel.org] On Behalf Of Dan Carpenter
> Sent: Monday, February 27, 2017 8:32 AM
> To: Raghava Aditya Renukunta
> Cc: linux-scsi@vger.kernel.org
> Subject: [bug report] scsi: aacraid: Reorder Adapter status check
> 
> 
> The patch c421530bf848: "scsi: aacraid: Reorder Adapter status check"
> from Feb 16, 2017, leads to the following static checker warning:
> 
> drivers/scsi/aacraid/src.c:471 aac_src_check_health()
> warn: was shift intended here '(status > 16)'
> 
> drivers/scsi/aacraid/src.c
>464   */
>465  return 0;
>466
>467  err_out:
>468  return -1;
>469
>470  err_blink:
>471  return (status > 16) & 0xFF;
> ^^^ Issue #1:  This is obviously a 
> typo.

Agreed, will submit a correction ... should be >>
> 
>472  }
> 
> Issue #2:  The caller checks for if the return is greater than 2.  It
>never is.  We can remove this dead code.

That would be the blink led returned from the controller which would be greater 
than 2

> 
> Issue #3:  The caller passes "bled" to aac_send_iop_reset() which
>ignores it.  What's up with that?  Either it's a bug or we
>should delete that dead code.

When we have gotten all caught up, we will be printing the blink led as part of 
the iop_reset

Thanks, -Dave
> 
> regards,
> dan carpenter


[GIT PULL] target updates for v4.11-rc1

2017-03-02 Thread Nicholas A. Bellinger
Hello Linus,

Here are the outstanding target pending changes for v4.11-rc1.  Please
go ahead and pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git for-next

The highlights this round include:

- Enable dual mode (initiator + target) qla2xxx operation. (Quinn + Himanshu)

- Add a framework for qla2xxx async fabric discovery. (Quinn + Himanshu)

- Enable iscsi PDU DDP completion offload in cxgbit/T6 NICs. (Varun)

- Fix target-core handling of aborted failed commands. (Bart)

- Fix a long standing target-core issue NULL pointer dereference with
  active I/O LUN shutdown. (Rob Millner + Bryant + nab)

Things are shaping up to be a busy cycle for v4.12 with a new fabric
driver (efct) in flight, and a number of other patches on the list being
discussed.

For post v4.11-rc1 fixes, there is a qla2xxx series that missed -rc1
that will be sent your way soon, along with a issue reported by Bart
related to ib_srpt + LUN_RESET that we are still going back and forth on
how the bug-fix should work.

Beyond that, the number of folks with dedicated Q/A resources for
upstream target code is steadily increasing, along with the number of
production customers using the mainstream in-tree fabric drivers.

Thank you,

--nab

Bart Van Assche (19):
  qla2xxx: Avoid using variable-length arrays
  target/cxgbit: Fix endianness annotations
  target/tcm_fc: Remove a set-but-not-used variable
  target/iscsi: Fix indentation in iscsi_target_start_negotiation()
  target/iscsi: Fix spelling of "perform"
  target/iscsi: Fix spelling of "reallegiance"
  target/iscsi: Introduce a helper function for TMF translation
  target/iscsi: Fix iSCSI task reassignment handling
  target: Remove se_tmr_req.tmr_lun
  target: Make core_tmr_abort_task() consider all commands
  target: Correct transport_wait_for_tasks() documentation
  target: Stop execution if CMD_T_STOP has been set
  target: Remove an overly chatty debug message
  target: Inline transport_cmd_check_stop()
  target: Move session check from target_put_sess_cmd() into
target_release_cmd_kref()
  target: Remove command flag CMD_T_BUSY
  target: Remove command flag CMD_T_DEV_ACTIVE
  target: Fix handling of aborted failed commands
  target: Delete tmr from list before processing

Dmitry V. Levin (1):
  uapi: fix linux/target_core_user.h userspace compilation errors

Himanshu Madhani (2):
  qla2xxx: Remove SRR code
  qla2xxx: Remove unused reverse_ini_mode

Joe Carnuccio (1):
  qla2xxx: Simplify usage of SRB structure in driver

Mike Christie (1):
  target: export protocol identifier

Nicholas Bellinger (3):
  target: Fix NULL dereference during LUN lookup + active I/O shutdown
  iscsi-target: Fix early login failure statistics misses
  target: Add counters for ABORT_TASK success + failure

Quinn Tran (10):
  qla2xxx: Remove direct access of scsi_status field in se_cmd
  qla2xxx: Cleanup TMF code translation from qla_target
  qla2xxx: Make trace flags more readable
  qla2xxx: Fix wrong argument in sp done callback
  qla2xxx: Use d_id instead of s_id for more clarity
  qla2xxx: Track I-T nexus as single fc_port struct
  qla2xxx: Add framework for async fabric discovery
  qla2xxx: Add Dual mode support in the driver
  qla2xxx: Improve RSCN handling in driver
  qla2xxx: Fix a warning reported by the "smatch" static checker

Varun Prakash (7):
  target/cxgbit: Use T6 specific macro to set the force bit
  target/iscsi: split iscsit_check_dataout_hdr()
  target/cxgbit: use cxgb4_tp_smt_idx() to get smt idx
  target/cxgbit: Use T6 specific macros to get ETH/IP hdr len
  target/cxgbit: Enable DDP for T6 only if data sequence and pdu are in
order
  target/cxgbit: add T6 iSCSI DDP completion feature
  target/iscsi: Fix unsolicited data seq_end_offset calculation

 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h|4 +
 drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.h |2 +-
 drivers/scsi/qla2xxx/qla_attr.c|3 +
 drivers/scsi/qla2xxx/qla_bsg.c |   23 +-
 drivers/scsi/qla2xxx/qla_def.h |  306 ++-
 drivers/scsi/qla2xxx/qla_dfs.c |   11 +-
 drivers/scsi/qla2xxx/qla_fw.h  |  106 +-
 drivers/scsi/qla2xxx/qla_gbl.h |   72 +-
 drivers/scsi/qla2xxx/qla_gs.c  |  726 +-
 drivers/scsi/qla2xxx/qla_init.c| 1612 +
 drivers/scsi/qla2xxx/qla_inline.h  |   18 +-
 drivers/scsi/qla2xxx/qla_iocb.c|  167 +-
 drivers/scsi/qla2xxx/qla_isr.c |  317 ++-
 drivers/scsi/qla2xxx/qla_mbx.c |  232 +-
 drivers/scsi/qla2xxx/qla_mr.c  |   48 +-
 drivers/scsi/qla2xxx/qla_os.c  |  330 ++-
 drivers/scsi/qla2xxx/qla_target.c  | 2392 +---
 drivers/scsi/qla2xxx/qla_target.h  |  252 +--
 drivers/scsi/qla2xxx/tcm_qla2xxx.c |  256 ++-
 

[bug report] scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.

2017-03-02 Thread Dan Carpenter
Hello Dupuis, Chad,

The patch 61d8658b4a43: "scsi: qedf: Add QLogic FastLinQ offload FCoE
driver framework." from Feb 15, 2017, leads to the following static
checker warning:

drivers/scsi/qedf/qedf_main.c:1168 qedf_rport_event_handler()
warn: 'rval' can be either negative or positive

drivers/scsi/qedf/qedf_main.c
  1005  static int qedf_offload_connection(struct qedf_ctx *qedf,
  1006  struct qedf_rport *fcport)
  1007  {
  1008  struct qed_fcoe_params_offload conn_info;
  1009  u32 port_id;
  1010  u8 lport_src_id[3];
  1011  int rval;
  1012  uint16_t total_sqe = (fcport->sq_mem_size / sizeof(struct 
fcoe_wqe));
  1013  
  1014  QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_CONN, "Offloading 
connection "
  1015 "portid=%06x.\n", fcport->rdata->ids.port_id);
  1016  rval = qed_ops->acquire_conn(qedf->cdev, >handle,
  1017  >fw_cid, >p_doorbell);
  1018  if (rval) {
  1019  QEDF_WARN(&(qedf->dbg_ctx), "Could not acquire 
connection "
  1020 "for portid=%06x.\n", 
fcport->rdata->ids.port_id);
  1021  rval = 1; /* For some reason qed returns 0 on failure 
here */

What kind of an error is 1 supposed to indicate?  This needs to be
not a magic number and commented.

  1022  goto out;
  1023  }
  1024  
  1025  QEDF_INFO(&(qedf->dbg_ctx), QEDF_LOG_CONN, "portid=%06x "
  1026 "fw_cid=%08x handle=%d.\n", 
fcport->rdata->ids.port_id,
  1027 fcport->fw_cid, fcport->handle);
  1028  
  1029  memset(_info, 0, sizeof(struct qed_fcoe_params_offload));
  1030  

regards,
dan carpenter


Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run

2017-03-02 Thread Benjamin Block
Hej Hannes,

On Wed, Mar 01, 2017 at 10:15:15AM +0100, Hannes Reinecke wrote:
> The current medium access timeout counter will be increased for
> each command, so if there are enough failed commands we'll hit
> the medium access timeout for even a single failure.
> Fix this by making the timeout per EH run, ie the counter will
> only be increased once per device and EH run.
>
> Cc: Ewan Milne 
> Cc: Lawrence Obermann 
> Cc: Benjamin Block 
> Cc: Steffen Maier 
> Signed-off-by: Hannes Reinecke 

Steffen already suggested it, It would be nice to have a stable-tag
here. This already caused multiple real-world false-positive outages, I
think that qualifies for stable :)

> ---
>  drivers/scsi/scsi_error.c  | 16 +++-
>  drivers/scsi/sd.c  | 21 +
>  drivers/scsi/sd.h  |  1 +
>  include/scsi/scsi_driver.h |  2 +-
>  4 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index f2cafae..cec439c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -58,6 +58,7 @@
>  static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
>  static int scsi_try_to_abort_cmd(struct scsi_host_template *,
>struct scsi_cmnd *);
> +static int scsi_eh_reset(struct scsi_cmnd *scmd);
>
>  /* called with shost->host_lock held */
>  void scsi_eh_wakeup(struct Scsi_Host *shost)
> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>   if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>   eh_flag &= ~SCSI_EH_CANCEL_CMD;
>   scmd->eh_eflags |= eh_flag;
> + scsi_eh_reset(scmd);
>   list_add_tail(>eh_entry, >eh_cmd_q);
>   shost->host_failed++;
>   scsi_eh_wakeup(shost);
> @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int 
> rtn)
>   if (!blk_rq_is_passthrough(scmd->request)) {
>   struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>   if (sdrv->eh_action)
> - rtn = sdrv->eh_action(scmd, rtn);
> + rtn = sdrv->eh_action(scmd, rtn, false);
> + }
> + return rtn;
> +}
> +
> +static int scsi_eh_reset(struct scsi_cmnd *scmd)
> +{
> + int rtn = SUCCESS;
> +
> + if (!blk_rq_is_passthrough(scmd->request)) {
> + struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
> + if (sdrv->eh_action)
> + rtn = sdrv->eh_action(scmd, rtn, true);
>   }
>   return rtn;
>  }
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index c7839f6..c794686 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -115,7 +115,7 @@
>  static int sd_init_command(struct scsi_cmnd *SCpnt);
>  static void sd_uninit_command(struct scsi_cmnd *SCpnt);
>  static int sd_done(struct scsi_cmnd *);
> -static int sd_eh_action(struct scsi_cmnd *, int);
> +static int sd_eh_action(struct scsi_cmnd *, int, bool);
>  static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
>  static void scsi_disk_release(struct device *cdev);
>  static void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *);
> @@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 
> key)
>   *   sd_eh_action - error handling callback
>   *   @scmd:  sd-issued command that has failed
>   *   @eh_disp:   The recovery disposition suggested by the midlayer
> + *   @reset: Reset medium access counter
>   *
>   *   This function is called by the SCSI midlayer upon completion of an
>   *   error test command (currently TEST UNIT READY). The result of sending
>   *   the eh command is passed in eh_disp.  We're looking for devices that
>   *   fail medium access commands but are OK with non access commands like
>   *   test unit ready (so wrongly see the device as having a successful
> - *   recovery)
> + *   recovery).
> + *   We have to be careful to count a medium access failure only once
> + *   per SCSI EH run; there might be several timed out commands which
> + *   will cause the 'max_medium_access_timeouts' counter to trigger
> + *   after the first SCSI EH run already and set the device to offline.
>   **/
> -static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
> +static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp, bool reset)
>  {
>   struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk);
>
> + if (reset) {
> + /* New SCSI EH run, reset gate variable */
> + sdkp->medium_access_reset = 0;
> + return eh_disp;
> + }
>   if (!scsi_device_online(scmd->device) ||
>   !scsi_medium_access_command(scmd) ||
>   host_byte(scmd->result) != DID_TIME_OUT ||
> @@ -1714,7 +1724,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int 
> eh_disp)
>* process of recovering or has it 

Re: [ANNOUNCE]: Broadcom (Emulex) FC Target driver - efct

2017-03-02 Thread Nicholas A. Bellinger
Hi James & Co,

Adding target-devel and Sebastian CC'

On Mon, 2017-02-27 at 15:28 -0800, James Smart wrote:
> I'd like to announce the availability of the Broadcom (Emulex) FC Target 
> driver - efct.
> This driver has been part of the Emulex OneCore Storage SDK tool kit for 
> Emulex
> SLI-4 adapters. The SLI-4 adapters support 16Gb/s and higher adapters. 
> Although this
> kit has supported FCoE in the past, it is currently limited to FC support.
> 
> This driver provides the following:
> - Target mode operation:
>- Functional with LIO-based interfaces

Glad to see the upstream push for this after all these years.  :)

>- Extensive use of hw offloads such as auto-xfer_rdy, auto-rsp, cmd 
> cpu spreading
>- High login mode - thousands of logins
>- T-10 DIF/PI support  (inline and separate)
>- NPIV support
> - Concurrent Initiator support if needed
> - Discovery engine has F_Port and fabric services emulation.
> - Extended mgmt interfaces:
>- firmware dump api, including dump to host memory for faster dumps
>- Healthcheck operations and watchdogs
> - Extended driver behaviors such as:
>- polled mode operation
>- multi-queue: cpu, roundrobin, or priority  (but not tied  to scsi-mq)
>- long chained sgl's
>- extensive internal logging and statistics
>- Tuning parameters on modes and resource allocation to different 
> features
> 
> Broadcom is looking to upstream this driver and would like review and 
> feedback.
> The driver may be found at the following git repository:
>  g...@gitlab.com:jsmart/efct-Emulex_FC_Target.git
> 

Can we get the patch series posted to linux-scsi and target-devel to
start giving some initial feedback..?

> 
> Some of the key questions we have are with lpfc :
> 1) Coexistence vs integration
> Currently, the efct driver maps to a different set of PCI ids than lpfc.
> It's very clear there's an overlap with lpfc, both on SLI-4 hw as well 
> as initiator support.
> Although target mode support can be simplistically added to lpfc, what 
> we've found is
> that doing so means a lot of tradeoffs. Some of the target mode 
> features, when enabled,
> impact the initiator support and how it would operate.
> 

I don't really have much preference either way.  

> 2) SLI-3 support
> lpfc provides SLI-3 support so that all FC adapters are supported, 
> including the older ones.
> The form of the driver, based on its history, is SLI-3 with SLI-3 
> adapted to SLI-4 at the point
> it hits the hardware. efct does not support SLI-3.

AFAIK I think Sebastian was using SLI-3, so he might have some comments
here.

Since he's been the main person using the original tcm_lpfc code from
way back when, maybe it would be a good idea to send him a couple of
SLI-4 HBAs to help with the upstreaming of efct..?

> 
> 3) complexity of configuration knobs caused by the kitchen-sink of 
> features in lpfc ?
> we are pushing the limit on needing per-instance attributes rather than 
> global module
> parameters.

This is exactly what
/sys/kernel/config/target/efct/$WWPN/tpgt_1/attribute/ is intended for.



Re: SCSI regression in 4.11

2017-03-02 Thread James Bottomley
On March 2, 2017 11:05:05 AM PST, Stephen Hemminger 
 wrote:
>On Thu, 02 Mar 2017 10:36:17 -0800
>James Bottomley  wrote:
>
>> On March 2, 2017 10:23:24 AM PST, Stephen Hemminger
> wrote:
>> >On Thu, 2 Mar 2017 14:25:14 +0100
>> >Hannes Reinecke  wrote:
>> >  
>> >> On 03/02/2017 02:40 AM, Stephen Hemminger wrote:  
>> >> > On Thu, 2 Mar 2017 01:56:15 +0100
>> >> > Christoph Hellwig  wrote:
>> >> >
>> >> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig
>wrote:  
>> >   
>> >> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger  
>> >wrote:
>> >> >  
>>
>>  
>> http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4
>> >   
>> >> 
>> >>  It appears that is already in the code I am testing in  
>> >linux-next...
>> >> >>>
>> >> >>> It's in -next now, but it wasn't at the time you reported the 
>
>> >bug.  
>> >> >>>
>> >> >>> And it would sortof explain the bug if the INQUIRY data is  
>> >correct  
>> >> >>> in the scatterlist, but we ignore it, given that
>scsi_probe_lun
>> >> >>> ignores the result based on sense data.
>> >> >>>
>> >> >>> Can you check what happens with the horrible hack below:
>> >> >>
>> >> >> Strike that - we're checking result later, so this can't be the
> 
>> >case.  
>> >> >>
>> >> >> Now the other interesting thing is the memset in
>__scsi_exectute,
>> >> >> which looks very suspicious.  Try the following please:
>> >> >>
>> >> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> >> >> index 3e32dc954c3c..22f4fb550561 100644
>> >> >> --- a/drivers/scsi/scsi_lib.c
>> >> >> +++ b/drivers/scsi/scsi_lib.c
>> >> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct
>scsi_device  
>> >*sdev, const unsigned char *cmd,  
>> >> >> * and prevent security leaks by zeroing out the excess data.
>> >> >> */
>> >> >>if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
>> >> >> -  memset(buffer + (bufflen - rq->resid_len), 0,
>rq->resid_len);
>> >> >> +//memset(buffer + (bufflen - rq->resid_len), 0,
>rq->resid_len);
>> >> >> +  printk_ratelimited("%s: got resid %d\n", __func__,  
>> >rq->resid_len);  
>> >> >>
>> >> >>if (resid)
>> >> >>*resid = rq->resid_len;
>> >> >
>> >> >
>> >> > Still fails but does print resid on some of the later INQUIRY  
>> >commands (not the initial one).  
>> >> >
>> >> Can you test what happens if you blank out the storvsc_drv  
>> >workaround:  
>> >> 
>> >> diff --git a/drivers/scsi/storvsc_drv.c
>b/drivers/scsi/storvsc_drv.c
>> >> index 585e54f..c36f42d 100644
>> >> --- a/drivers/scsi/storvsc_drv.c
>> >> +++ b/drivers/scsi/storvsc_drv.c
>> >> @@ -1060,13 +1060,13 @@ static void
>storvsc_on_io_completion(struct 
>> >> storvsc_device *stor_device,
>> >>   * We do this so we can distinguish truly fatal failues
>> >>   * (srb status == 0x4) and off-line the device in that
>case.
>> >>   */
>> >> -
>> >> +#if 0
>> >>  if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
>> >> (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
>> >>  vstor_packet->vm_srb.scsi_status = 0;
>> >>  vstor_packet->vm_srb.srb_status =  
>> >SRB_STATUS_SUCCESS;  
>> >>  }
>> >> -
>> >> +#endif
>> >> 
>> >>  /* Copy over the status...etc */
>> >>  stor_pkt->vm_srb.scsi_status =  
>> >vstor_packet->vm_srb.scsi_status;  
>> >> 
>> >> It might thappen that we're fail to interpret the 'Device not  
>> >present'   
>> >> status correctly (which will happen for non-connected DVDs)
>causing  
>> >the   
>> >> SCSI stack to make incorrect decisions later on.
>> >> 
>> >> Cheers,
>> >> 
>> >> Hannes  
>> >
>> >There are several oddities about the host SCSI interface that I see:
>> > 1. The host bus seems to report up to 6 devices even though only 2
>are
>> > present (Disk and CDROM).
>> >2. The CDROM emulation doesn't report the same status as a real
>device.
>> > 3. The host emulation of SCSI doesn't support all the page codes
>which
>> > is why there is the hack.
>> >
>> >But as James said, these don't appear to be related to the failure
>> >because
>> >the code worked before and only in post 4.11 merege is there a
>problem.  
>> 
>> Your wait for the hang trace is the most suggestive.   It says we're
>waiting for a partition read to the spurious device.  Previously this
>would have failed or timed out, so this seems to be the root cause.
>> 
>> James
>> 
>> 
>
>Where is the number of valid LUN's determined during the scan process?

Depends.  If you can do a report lun scan then that's definitive.  You seem to 
be probing (SCSI_probe_and_add_lun)  and you make us think there's something 
there by responding wrongly to the initial inquiry.

James




-- 
Sent from my Android device 

Re: [regression] vmw_pvscsi probe fails on 4.11

2017-03-02 Thread Loïc Yhuel

Le 02/03/2017 à 17:05, Christoph Hellwig a écrit :

Please try this patch:

diff --git a/drivers/scsi/vmw_pvscsi.c b/drivers/scsi/vmw_pvscsi.c
index ef474a748744..c374e3b5c678 100644
--- a/drivers/scsi/vmw_pvscsi.c
+++ b/drivers/scsi/vmw_pvscsi.c
@@ -1487,7 +1487,7 @@ static int pvscsi_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
irq_flag &= ~PCI_IRQ_MSI;
  
  	error = pci_alloc_irq_vectors(adapter->dev, 1, 1, irq_flag);

-   if (error)
+   if (error < 0)
goto out_reset_adapter;
  
  	adapter->use_req_threshold = pvscsi_setup_req_threshold(adapter, true);

It works, thanks.

Btw, vmw_vmci has the same issue.

diff --git a/drivers/misc/vmw_vmci/vmci_guest.c 
b/drivers/misc/vmw_vmci/vmci_guest.c

index 9d659542a335..dad5abee656e 100644
--- a/drivers/misc/vmw_vmci/vmci_guest.c
+++ b/drivers/misc/vmw_vmci/vmci_guest.c
@@ -566,10 +566,10 @@ static int vmci_guest_probe_device(struct pci_dev 
*pdev,

 */
error = pci_alloc_irq_vectors(pdev, VMCI_MAX_INTRS, VMCI_MAX_INTRS,
PCI_IRQ_MSIX);
-   if (error) {
+   if (error < 0) {
error = pci_alloc_irq_vectors(pdev, 1, 1,
PCI_IRQ_MSIX | PCI_IRQ_MSI | 
PCI_IRQ_LEGACY);

-   if (error)
+   if (error < 0)
goto err_remove_bitmap;
} else {
vmci_dev->exclusive_vectors = true;


Re: SCSI regression in 4.11

2017-03-02 Thread Stephen Hemminger
On Thu, 02 Mar 2017 10:36:17 -0800
James Bottomley  wrote:

> On March 2, 2017 10:23:24 AM PST, Stephen Hemminger 
>  wrote:
> >On Thu, 2 Mar 2017 14:25:14 +0100
> >Hannes Reinecke  wrote:
> >  
> >> On 03/02/2017 02:40 AM, Stephen Hemminger wrote:  
> >> > On Thu, 2 Mar 2017 01:56:15 +0100
> >> > Christoph Hellwig  wrote:
> >> >
> >> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote:  
> >   
> >> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger  
> >wrote:
> >> >  
> > 
> > http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4
> >   
> >> 
> >>  It appears that is already in the code I am testing in  
> >linux-next...
> >> >>>
> >> >>> It's in -next now, but it wasn't at the time you reported the  
> >bug.  
> >> >>>
> >> >>> And it would sortof explain the bug if the INQUIRY data is  
> >correct  
> >> >>> in the scatterlist, but we ignore it, given that scsi_probe_lun
> >> >>> ignores the result based on sense data.
> >> >>>
> >> >>> Can you check what happens with the horrible hack below:
> >> >>
> >> >> Strike that - we're checking result later, so this can't be the  
> >case.  
> >> >>
> >> >> Now the other interesting thing is the memset in __scsi_exectute,
> >> >> which looks very suspicious.  Try the following please:
> >> >>
> >> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> >> index 3e32dc954c3c..22f4fb550561 100644
> >> >> --- a/drivers/scsi/scsi_lib.c
> >> >> +++ b/drivers/scsi/scsi_lib.c
> >> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device  
> >*sdev, const unsigned char *cmd,  
> >> >>  * and prevent security leaks by zeroing out the excess data.
> >> >>  */
> >> >> if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
> >> >> -   memset(buffer + (bufflen - rq->resid_len), 0, 
> >> >> rq->resid_len);
> >> >> +// memset(buffer + (bufflen - rq->resid_len), 0, 
> >> >> rq->resid_len);
> >> >> +   printk_ratelimited("%s: got resid %d\n", __func__,  
> >rq->resid_len);  
> >> >>
> >> >> if (resid)
> >> >> *resid = rq->resid_len;
> >> >
> >> >
> >> > Still fails but does print resid on some of the later INQUIRY  
> >commands (not the initial one).  
> >> >
> >> Can you test what happens if you blank out the storvsc_drv  
> >workaround:  
> >> 
> >> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> >> index 585e54f..c36f42d 100644
> >> --- a/drivers/scsi/storvsc_drv.c
> >> +++ b/drivers/scsi/storvsc_drv.c
> >> @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct 
> >> storvsc_device *stor_device,
> >>   * We do this so we can distinguish truly fatal failues
> >>   * (srb status == 0x4) and off-line the device in that case.
> >>   */
> >> -
> >> +#if 0
> >>  if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> >> (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
> >>  vstor_packet->vm_srb.scsi_status = 0;
> >>  vstor_packet->vm_srb.srb_status =  
> >SRB_STATUS_SUCCESS;  
> >>  }
> >> -
> >> +#endif
> >> 
> >>  /* Copy over the status...etc */
> >>  stor_pkt->vm_srb.scsi_status =  
> >vstor_packet->vm_srb.scsi_status;  
> >> 
> >> It might thappen that we're fail to interpret the 'Device not  
> >present'   
> >> status correctly (which will happen for non-connected DVDs) causing  
> >the   
> >> SCSI stack to make incorrect decisions later on.
> >> 
> >> Cheers,
> >> 
> >> Hannes  
> >
> >There are several oddities about the host SCSI interface that I see:
> > 1. The host bus seems to report up to 6 devices even though only 2 are
> > present (Disk and CDROM).
> >2. The CDROM emulation doesn't report the same status as a real device.
> > 3. The host emulation of SCSI doesn't support all the page codes which
> > is why there is the hack.
> >
> >But as James said, these don't appear to be related to the failure
> >because
> >the code worked before and only in post 4.11 merege is there a problem.  
> 
> Your wait for the hang trace is the most suggestive.   It says we're waiting 
> for a partition read to the spurious device.  Previously this would have 
> failed or timed out, so this seems to be the root cause.
> 
> James
> 
> 

Where is the number of valid LUN's determined during the scan process?


Re: SCSI regression in 4.11

2017-03-02 Thread Stephen Hemminger
On Thu, 2 Mar 2017 14:25:14 +0100
Hannes Reinecke  wrote:

> On 03/02/2017 02:40 AM, Stephen Hemminger wrote:
> > On Thu, 2 Mar 2017 01:56:15 +0100
> > Christoph Hellwig  wrote:
> >  
> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote:  
> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger wrote:  
> > 
> > http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4
> >   
> 
>  It appears that is already in the code I am testing in linux-next...  
> >>>
> >>> It's in -next now, but it wasn't at the time you reported the bug.
> >>>
> >>> And it would sortof explain the bug if the INQUIRY data is correct
> >>> in the scatterlist, but we ignore it, given that scsi_probe_lun
> >>> ignores the result based on sense data.
> >>>
> >>> Can you check what happens with the horrible hack below:  
> >>
> >> Strike that - we're checking result later, so this can't be the case.
> >>
> >> Now the other interesting thing is the memset in __scsi_exectute,
> >> which looks very suspicious.  Try the following please:
> >>
> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> index 3e32dc954c3c..22f4fb550561 100644
> >> --- a/drivers/scsi/scsi_lib.c
> >> +++ b/drivers/scsi/scsi_lib.c
> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device *sdev, 
> >> const unsigned char *cmd,
> >> * and prevent security leaks by zeroing out the excess data.
> >> */
> >>if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
> >> -  memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len);
> >> +//memset(buffer + (bufflen - rq->resid_len), 0, 
> >> rq->resid_len);
> >> +  printk_ratelimited("%s: got resid %d\n", __func__, 
> >> rq->resid_len);
> >>
> >>if (resid)
> >>*resid = rq->resid_len;  
> >
> >
> > Still fails but does print resid on some of the later INQUIRY commands (not 
> > the initial one).
> >  
> Can you test what happens if you blank out the storvsc_drv workaround:
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 585e54f..c36f42d 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct 
> storvsc_device *stor_device,
>   * We do this so we can distinguish truly fatal failues
>   * (srb status == 0x4) and off-line the device in that case.
>   */
> -
> +#if 0
>  if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
>  vstor_packet->vm_srb.scsi_status = 0;
>  vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
>  }
> -
> +#endif
> 
>  /* Copy over the status...etc */
>  stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;
> 
> It might thappen that we're fail to interpret the 'Device not present' 
> status correctly (which will happen for non-connected DVDs) causing the 
> SCSI stack to make incorrect decisions later on.
> 
> Cheers,
> 
> Hannes

There are several oddities about the host SCSI interface that I see:
  1. The host bus seems to report up to 6 devices even though only 2 are
 present (Disk and CDROM).
  2. The CDROM emulation doesn't report the same status as a real device.
  3. The host emulation of SCSI doesn't support all the page codes which
 is why there is the hack.

But as James said, these don't appear to be related to the failure because
the code worked before and only in post 4.11 merege is there a problem.


Re: SCSI regression in 4.11

2017-03-02 Thread Stephen Hemminger
On Thu, 2 Mar 2017 14:25:14 +0100
Hannes Reinecke  wrote:

> On 03/02/2017 02:40 AM, Stephen Hemminger wrote:
> > On Thu, 2 Mar 2017 01:56:15 +0100
> > Christoph Hellwig  wrote:
> >  
> >> On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote:  
> >>> On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger wrote:  
> > 
> > http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4
> >   
> 
>  It appears that is already in the code I am testing in linux-next...  
> >>>
> >>> It's in -next now, but it wasn't at the time you reported the bug.
> >>>
> >>> And it would sortof explain the bug if the INQUIRY data is correct
> >>> in the scatterlist, but we ignore it, given that scsi_probe_lun
> >>> ignores the result based on sense data.
> >>>
> >>> Can you check what happens with the horrible hack below:  
> >>
> >> Strike that - we're checking result later, so this can't be the case.
> >>
> >> Now the other interesting thing is the memset in __scsi_exectute,
> >> which looks very suspicious.  Try the following please:
> >>
> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> index 3e32dc954c3c..22f4fb550561 100644
> >> --- a/drivers/scsi/scsi_lib.c
> >> +++ b/drivers/scsi/scsi_lib.c
> >> @@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device *sdev, 
> >> const unsigned char *cmd,
> >> * and prevent security leaks by zeroing out the excess data.
> >> */
> >>if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
> >> -  memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len);
> >> +//memset(buffer + (bufflen - rq->resid_len), 0, 
> >> rq->resid_len);
> >> +  printk_ratelimited("%s: got resid %d\n", __func__, 
> >> rq->resid_len);
> >>
> >>if (resid)
> >>*resid = rq->resid_len;  
> >
> >
> > Still fails but does print resid on some of the later INQUIRY commands (not 
> > the initial one).
> >  
> Can you test what happens if you blank out the storvsc_drv workaround:
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 585e54f..c36f42d 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct 
> storvsc_device *stor_device,
>   * We do this so we can distinguish truly fatal failues
>   * (srb status == 0x4) and off-line the device in that case.
>   */
> -
> +#if 0
>  if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
> (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
>  vstor_packet->vm_srb.scsi_status = 0;
>  vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
>  }
> -
> +#endif
> 
>  /* Copy over the status...etc */
>  stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;
> 
> It might thappen that we're fail to interpret the 'Device not present' 
> status correctly (which will happen for non-connected DVDs) causing the 
> SCSI stack to make incorrect decisions later on.
> 
> Cheers,
> 
> Hannes

Makes no difference, that was one of the first things I tried (and just tried 
again).


Re: [regression] vmw_pvscsi probe fails on 4.11

2017-03-02 Thread Christoph Hellwig
Please try this patch:

diff --git a/drivers/scsi/vmw_pvscsi.c b/drivers/scsi/vmw_pvscsi.c
index ef474a748744..c374e3b5c678 100644
--- a/drivers/scsi/vmw_pvscsi.c
+++ b/drivers/scsi/vmw_pvscsi.c
@@ -1487,7 +1487,7 @@ static int pvscsi_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
irq_flag &= ~PCI_IRQ_MSI;
 
error = pci_alloc_irq_vectors(adapter->dev, 1, 1, irq_flag);
-   if (error)
+   if (error < 0)
goto out_reset_adapter;
 
adapter->use_req_threshold = pvscsi_setup_req_threshold(adapter, true);


Re: [PATCH] scsi_sysfs: fix hang when removing scsi device

2017-03-02 Thread Bart Van Assche
On Thu, 2017-03-02 at 16:45 +0200, Israel Rukshin wrote:
> The bug reproduce when unloading srp module with one port down.
> device_del() hangs when __scsi_remove_device() get scsi_device with
> state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE.
> It hangs because device_del() is trying to send sync cache command
> when the device is offline but with SDEV_CANCEL status.
> The status was changed to SDEV_CANCEL by __scsi_remove_device()
> before it calls to device_del().

It is not device_del() but sd_shutdown() that submits the SYNCHRONIZE
CACHE command. It should also be explained in the commit message why
the block layer timeout mechanism did not cause the SYNCHRONIZE CACHE
commands to fail after the timeout expired.

> Fixes: e619e6cbe (Revert "scsi: Fix a bdi reregistration race")

This does not make sense. A revert is never the original introduction
of a bug but at most a reintroduction. Please refer to the commit that
originally introduced this hang.

> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1282,6 +1282,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
>   return;
>  
>   if (sdev->is_visible) {
> + enum scsi_device_state oldstate = sdev->sdev_state;
> +
>   if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>   return;
>  
> @@ -1289,6 +1291,15 @@ void __scsi_remove_device(struct scsi_device *sdev)
>   device_unregister(>sdev_dev);
>   transport_remove_device(dev);
>   scsi_dh_remove_device(sdev);
> +
> + /*
> +  * Fail new requests if the old state was offline.
> +  * This avoids from device_del() to hang.
> +  */
> + if (oldstate == SDEV_TRANSPORT_OFFLINE ||
> + oldstate == SDEV_OFFLINE)
> + blk_set_queue_dying(sdev->request_queue);
> +
>   device_del(dev);
>   } else
>   put_device(>sdev_dev);

Please refer to sd_shutdown() instead of device_del() in the above comment,
and also explain why the block layer timeout mechanism did not cause the
SYNCHRONIZE CACHE commands to fail.

Thanks,

Bart.

Re: [PATCH] scsi: qedi: fix build error without DEBUG_FS

2017-03-02 Thread Arnd Bergmann
On Thu, Mar 2, 2017 at 1:10 PM, Arnd Bergmann  wrote:

>
> -   QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n",
> - do_not_recover);
> +   QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "qedi_do_not_recover=%d\n",
> + qedi_do_not_recover);

> -   QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n",
> - do_not_recover);
> +   QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "qedi_do_not_recover=%d\n",
> + qedi_do_not_recover);
> return 0;
>  }

> -   { "do_not_recover", qedi_dbg_do_not_recover_ops},
> +   { "qedi_do_not_recover", qedi_dbg_do_not_recover_ops},
> { "io_trace", NULL },
> { NULL, NULL }

>
> -   cnt = sprintf(buffer, "do_not_recover=%d\n", do_not_recover);
> +   cnt = sprintf(buffer, "qedi_do_not_recover=%d\n", 
> qedi_do_not_recover);
> cnt = min_t(int, count, cnt - *ppos);

I just noticed that my search went a little too far, I should not have
touched the instances within strings. Will resend.

 Arnd


[PATCH] scsi_sysfs: fix hang when removing scsi device

2017-03-02 Thread Israel Rukshin
The bug reproduce when unloading srp module with one port down.
device_del() hangs when __scsi_remove_device() get scsi_device with
state SDEV_OFFLINE or SDEV_TRANSPORT_OFFLINE.
It hangs because device_del() is trying to send sync cache command
when the device is offline but with SDEV_CANCEL status.
The status was changed to SDEV_CANCEL by __scsi_remove_device()
before it calls to device_del().

This commit doesn't accept new commands if the original state was offline.

sysrq: SysRq : sysrq: Show Blocked State
task PC stack pid father
kworker/2:0 D 88046fa95c00 0 21178 2 0x
Workqueue: srp_remove srp_remove_work [ib_srp]
Call Trace:
[] schedule+0x35/0x80
[] schedule_timeout+0x237/0x2d0
[] io_schedule_timeout+0xa6/0x110
[] wait_for_completion_io+0xa3/0x110
[] blk_execute_rq+0xdf/0x120
[] scsi_execute+0xce/0x150 [scsi_mod]
[] scsi_execute_req_flags+0x8f/0xf0 [scsi_mod]
[] sd_sync_cache+0xa9/0x190 [sd_mod]
[] sd_shutdown+0x6a/0x100 [sd_mod]
[] sd_remove+0x64/0xc0 [sd_mod]
[] __device_release_driver+0x8d/0x120
[] device_release_driver+0x1e/0x30
[] bus_remove_device+0xf9/0x170
[] device_del+0x127/0x240
[] __scsi_remove_device+0xc1/0xd0 [scsi_mod]
[] scsi_forget_host+0x57/0x60 [scsi_mod]
[] scsi_remove_host+0x72/0x110 [scsi_mod]
[] srp_remove_work+0x8b/0x200 [ib_srp]
...

Fixes: e619e6cbe (Revert "scsi: Fix a bdi reregistration race")
Signed-off-by: Israel Rukshin 
Reviewed-by: Max Gurtovoy 
---
 drivers/scsi/scsi_sysfs.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 82dfe07..ed55aac 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1282,6 +1282,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
return;
 
if (sdev->is_visible) {
+   enum scsi_device_state oldstate = sdev->sdev_state;
+
if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
return;
 
@@ -1289,6 +1291,15 @@ void __scsi_remove_device(struct scsi_device *sdev)
device_unregister(>sdev_dev);
transport_remove_device(dev);
scsi_dh_remove_device(sdev);
+
+   /*
+* Fail new requests if the old state was offline.
+* This avoids from device_del() to hang.
+*/
+   if (oldstate == SDEV_TRANSPORT_OFFLINE ||
+   oldstate == SDEV_OFFLINE)
+   blk_set_queue_dying(sdev->request_queue);
+
device_del(dev);
} else
put_device(>sdev_dev);
-- 
2.4.3



Re: [PATCH] scsi: qedi: fix build error without DEBUG_FS

2017-03-02 Thread Johannes Thumshirn
On 03/02/2017 01:10 PM, Arnd Bergmann wrote:
> Without CONFIG_DEBUG_FS, we run into a link error:
> 
> drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_poll':
> qedi_iscsi.c:(.text.qedi_ep_poll+0x134): undefined reference to 
> `do_not_recover'
> drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_disconnect':
> qedi_iscsi.c:(.text.qedi_ep_disconnect+0x36c): undefined reference to 
> `do_not_recover'
> drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_connect':
> qedi_iscsi.c:(.text.qedi_ep_connect+0x350): undefined reference to 
> `do_not_recover'
> drivers/scsi/qedi/qedi_fw.o: In function `qedi_tmf_work':
> qedi_fw.c:(.text.qedi_tmf_work+0x3b4): undefined reference to `do_not_recover'
> 
> This defines the symbol as a constant in this case, as there is no way to
> set it to anything other than zero without DEBUG_FS. In addition, I'm renaming
> it to qedi_do_not_recover in order to put it into a driver specific namespace,
> as "do_not_recover" is a really bad name for a kernel-wide global identifier
> when it is used only in one driver.
> 
> Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver 
> framework.")
> Signed-off-by: Arnd Bergmann 
> ---

Looks good,
Reviewed-by: Johannes Thumshirn 

-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850


[PATCH] [v2] scsi: qedi: fix build error without DEBUG_FS

2017-03-02 Thread Arnd Bergmann
Without CONFIG_DEBUG_FS, we run into a link error:

drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_poll':
qedi_iscsi.c:(.text.qedi_ep_poll+0x134): undefined reference to `do_not_recover'
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_disconnect':
qedi_iscsi.c:(.text.qedi_ep_disconnect+0x36c): undefined reference to 
`do_not_recover'
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_connect':
qedi_iscsi.c:(.text.qedi_ep_connect+0x350): undefined reference to 
`do_not_recover'
drivers/scsi/qedi/qedi_fw.o: In function `qedi_tmf_work':
qedi_fw.c:(.text.qedi_tmf_work+0x3b4): undefined reference to `do_not_recover'

This defines the symbol as a constant in this case, as there is no way to
set it to anything other than zero without DEBUG_FS. In addition, I'm renaming
it to qedi_do_not_recover in order to put it into a driver specific namespace,
as "do_not_recover" is a really bad name for a kernel-wide global identifier
when it is used only in one driver.

Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver 
framework.")
Reviewed-by: Johannes Thumshirn 
Signed-off-by: Arnd Bergmann 
---
v2: don't rename references to do_not_recover in string literals
---
 drivers/scsi/qedi/qedi_debugfs.c | 16 
 drivers/scsi/qedi/qedi_fw.c  |  4 ++--
 drivers/scsi/qedi/qedi_gbl.h |  8 +++-
 drivers/scsi/qedi/qedi_iscsi.c   |  8 
 4 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_debugfs.c b/drivers/scsi/qedi/qedi_debugfs.c
index 955936274241..59417199bf36 100644
--- a/drivers/scsi/qedi/qedi_debugfs.c
+++ b/drivers/scsi/qedi/qedi_debugfs.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 
-int do_not_recover;
+int qedi_do_not_recover;
 static struct dentry *qedi_dbg_root;
 
 void
@@ -74,22 +74,22 @@ qedi_dbg_exit(void)
 static ssize_t
 qedi_dbg_do_not_recover_enable(struct qedi_dbg_ctx *qedi_dbg)
 {
-   if (!do_not_recover)
-   do_not_recover = 1;
+   if (!qedi_do_not_recover)
+   qedi_do_not_recover = 1;
 
QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n",
- do_not_recover);
+ qedi_do_not_recover);
return 0;
 }
 
 static ssize_t
 qedi_dbg_do_not_recover_disable(struct qedi_dbg_ctx *qedi_dbg)
 {
-   if (do_not_recover)
-   do_not_recover = 0;
+   if (qedi_do_not_recover)
+   qedi_do_not_recover = 0;
 
QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n",
- do_not_recover);
+ qedi_do_not_recover);
return 0;
 }
 
@@ -141,7 +141,7 @@ qedi_dbg_do_not_recover_cmd_read(struct file *filp, char 
__user *buffer,
if (*ppos)
return 0;
 
-   cnt = sprintf(buffer, "do_not_recover=%d\n", do_not_recover);
+   cnt = sprintf(buffer, "do_not_recover=%d\n", qedi_do_not_recover);
cnt = min_t(int, count, cnt - *ppos);
*ppos += cnt;
return cnt;
diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index c9f0ef4e11b3..2bce3efc66a4 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -1461,9 +1461,9 @@ static void qedi_tmf_work(struct work_struct *work)
  get_itt(tmf_hdr->rtt), get_itt(ctask->itt), cmd->task_id,
  qedi_conn->iscsi_conn_id);
 
-   if (do_not_recover) {
+   if (qedi_do_not_recover) {
QEDI_ERR(>dbg_ctx, "DONT SEND CLEANUP/ABORT %d\n",
-do_not_recover);
+qedi_do_not_recover);
goto abort_ret;
}
 
diff --git a/drivers/scsi/qedi/qedi_gbl.h b/drivers/scsi/qedi/qedi_gbl.h
index 8e488de88ece..63d793f46064 100644
--- a/drivers/scsi/qedi/qedi_gbl.h
+++ b/drivers/scsi/qedi/qedi_gbl.h
@@ -12,8 +12,14 @@
 
 #include "qedi_iscsi.h"
 
+#ifdef CONFIG_DEBUG_FS
+extern int qedi_do_not_recover;
+#else
+#define qedi_do_not_recover (0)
+#endif
+
 extern uint qedi_io_tracing;
-extern int do_not_recover;
+
 extern struct scsi_host_template qedi_host_template;
 extern struct iscsi_transport qedi_iscsi_transport;
 extern const struct qed_iscsi_ops *qedi_ops;
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index b9f79d36142d..4cc474364c50 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -833,7 +833,7 @@ qedi_ep_connect(struct Scsi_Host *shost, struct sockaddr 
*dst_addr,
return ERR_PTR(ret);
}
 
-   if (do_not_recover) {
+   if (qedi_do_not_recover) {
ret = -ENOMEM;
return ERR_PTR(ret);
}
@@ -957,7 +957,7 @@ static int qedi_ep_poll(struct iscsi_endpoint *ep, int 
timeout_ms)
struct qedi_endpoint *qedi_ep;
int ret = 0;
 
-   if (do_not_recover)
+   if (qedi_do_not_recover)
return 1;
 
qedi_ep = ep->dd_data;
@@ -1025,7 +1025,7 @@ static void 

Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run

2017-03-02 Thread Hannes Reinecke

On 03/02/2017 12:24 AM, Bart Van Assche wrote:

On Wed, 2017-03-01 at 10:15 +0100, Hannes Reinecke wrote:

The current medium access timeout counter will be increased for
each command, so if there are enough failed commands we'll hit
the medium access timeout for even a single failure.


This sentence describes multiple failed commands as a single failure.
That's confusing to me. Did you perhaps intend "for a single device
failure"?


diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f2cafae..cec439c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -58,6 +58,7 @@
 static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
 static int scsi_try_to_abort_cmd(struct scsi_host_template *,
 struct scsi_cmnd *);
+static int scsi_eh_reset(struct scsi_cmnd *scmd);

 /* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
@@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
eh_flag &= ~SCSI_EH_CANCEL_CMD;
scmd->eh_eflags |= eh_flag;
+   scsi_eh_reset(scmd);
list_add_tail(>eh_entry, >eh_cmd_q);
shost->host_failed++;
scsi_eh_wakeup(shost);
@@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int 
rtn)
if (!blk_rq_is_passthrough(scmd->request)) {
struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
if (sdrv->eh_action)
-   rtn = sdrv->eh_action(scmd, rtn);
+   rtn = sdrv->eh_action(scmd, rtn, false);
+   }
+   return rtn;
+}
+
+static int scsi_eh_reset(struct scsi_cmnd *scmd)
+{
+   int rtn = SUCCESS;
+
+   if (!blk_rq_is_passthrough(scmd->request)) {
+   struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
+   if (sdrv->eh_action)
+   rtn = sdrv->eh_action(scmd, rtn, true);
}
return rtn;
 }


Can this function be moved up such that we don't need a new forward declaration?


Why, but of course.
I just put is here to be next to the original scsi_eh_action() function.


@@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 
key)
  * sd_eh_action - error handling callback
  * @scmd:  sd-issued command that has failed
  * @eh_disp:   The recovery disposition suggested by the midlayer
+ * @reset: Reset medium access counter


It seems to me that @reset does not trigger a reset of the medium access counter
but rather of the variable that prevents the medium access error counter to be
incremented?


Correct. Will be fixing up the description.


+ * recovery).
+ * We have to be careful to count a medium access failure only once
+ * per SCSI EH run; there might be several timed out commands which


Did you perhaps intend "once per device per SCSI EH run"?


Yes.


--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -106,6 +106,7 @@ struct scsi_disk {
unsignedrc_basis: 2;
unsignedzoned: 2;
unsignedurswrz : 1;
+   unsignedmedium_access_reset : 1;


The name of this new member is confusing to me. How about using the name
"ignore_medium_access_errors" instead? And since this is a boolean, how
about using true and false in assignments to this variable?


Ok, will be doing so.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: SCSI regression in 4.11

2017-03-02 Thread Hannes Reinecke

On 03/02/2017 02:40 AM, Stephen Hemminger wrote:

On Thu, 2 Mar 2017 01:56:15 +0100
Christoph Hellwig  wrote:


On Thu, Mar 02, 2017 at 01:01:35AM +0100, Christoph Hellwig wrote:

On Wed, Mar 01, 2017 at 07:54:12AM -0800, Stephen Hemminger wrote:


http://git.infradead.org/users/hch/block.git/commitdiff/148cff67b401e2229c076c0ea418712654be77e4


It appears that is already in the code I am testing in linux-next...


It's in -next now, but it wasn't at the time you reported the bug.

And it would sortof explain the bug if the INQUIRY data is correct
in the scatterlist, but we ignore it, given that scsi_probe_lun
ignores the result based on sense data.

Can you check what happens with the horrible hack below:


Strike that - we're checking result later, so this can't be the case.

Now the other interesting thing is the memset in __scsi_exectute,
which looks very suspicious.  Try the following please:

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3e32dc954c3c..22f4fb550561 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -253,7 +253,8 @@ static int __scsi_execute(struct scsi_device *sdev, const 
unsigned char *cmd,
 * and prevent security leaks by zeroing out the excess data.
 */
if (unlikely(rq->resid_len > 0 && rq->resid_len <= bufflen))
-   memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len);
+// memset(buffer + (bufflen - rq->resid_len), 0, rq->resid_len);
+   printk_ratelimited("%s: got resid %d\n", __func__, 
rq->resid_len);

if (resid)
*resid = rq->resid_len;



Still fails but does print resid on some of the later INQUIRY commands (not the 
initial one).


Can you test what happens if you blank out the storvsc_drv workaround:

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 585e54f..c36f42d 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1060,13 +1060,13 @@ static void storvsc_on_io_completion(struct 
storvsc_device *stor_device,

 * We do this so we can distinguish truly fatal failues
 * (srb status == 0x4) and off-line the device in that case.
 */
-
+#if 0
if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
   (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
vstor_packet->vm_srb.scsi_status = 0;
vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
}
-
+#endif

/* Copy over the status...etc */
stor_pkt->vm_srb.scsi_status = vstor_packet->vm_srb.scsi_status;

It might thappen that we're fail to interpret the 'Device not present' 
status correctly (which will happen for non-connected DVDs) causing the 
SCSI stack to make incorrect decisions later on.


Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


[PATCH] scsi: qedi: fix build error without DEBUG_FS

2017-03-02 Thread Arnd Bergmann
Without CONFIG_DEBUG_FS, we run into a link error:

drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_poll':
qedi_iscsi.c:(.text.qedi_ep_poll+0x134): undefined reference to `do_not_recover'
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_disconnect':
qedi_iscsi.c:(.text.qedi_ep_disconnect+0x36c): undefined reference to 
`do_not_recover'
drivers/scsi/qedi/qedi_iscsi.o: In function `qedi_ep_connect':
qedi_iscsi.c:(.text.qedi_ep_connect+0x350): undefined reference to 
`do_not_recover'
drivers/scsi/qedi/qedi_fw.o: In function `qedi_tmf_work':
qedi_fw.c:(.text.qedi_tmf_work+0x3b4): undefined reference to `do_not_recover'

This defines the symbol as a constant in this case, as there is no way to
set it to anything other than zero without DEBUG_FS. In addition, I'm renaming
it to qedi_do_not_recover in order to put it into a driver specific namespace,
as "do_not_recover" is a really bad name for a kernel-wide global identifier
when it is used only in one driver.

Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver 
framework.")
Signed-off-by: Arnd Bergmann 
---
 drivers/scsi/qedi/qedi_debugfs.c | 22 +++---
 drivers/scsi/qedi/qedi_fw.c  |  4 ++--
 drivers/scsi/qedi/qedi_gbl.h |  8 +++-
 drivers/scsi/qedi/qedi_iscsi.c   |  8 
 4 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/scsi/qedi/qedi_debugfs.c b/drivers/scsi/qedi/qedi_debugfs.c
index 955936274241..6baf35eefc1c 100644
--- a/drivers/scsi/qedi/qedi_debugfs.c
+++ b/drivers/scsi/qedi/qedi_debugfs.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 
-int do_not_recover;
+int qedi_do_not_recover;
 static struct dentry *qedi_dbg_root;
 
 void
@@ -74,22 +74,22 @@ qedi_dbg_exit(void)
 static ssize_t
 qedi_dbg_do_not_recover_enable(struct qedi_dbg_ctx *qedi_dbg)
 {
-   if (!do_not_recover)
-   do_not_recover = 1;
+   if (!qedi_do_not_recover)
+   qedi_do_not_recover = 1;
 
-   QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n",
- do_not_recover);
+   QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "qedi_do_not_recover=%d\n",
+ qedi_do_not_recover);
return 0;
 }
 
 static ssize_t
 qedi_dbg_do_not_recover_disable(struct qedi_dbg_ctx *qedi_dbg)
 {
-   if (do_not_recover)
-   do_not_recover = 0;
+   if (qedi_do_not_recover)
+   qedi_do_not_recover = 0;
 
-   QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "do_not_recover=%d\n",
- do_not_recover);
+   QEDI_INFO(qedi_dbg, QEDI_LOG_DEBUGFS, "qedi_do_not_recover=%d\n",
+ qedi_do_not_recover);
return 0;
 }
 
@@ -101,7 +101,7 @@ static struct qedi_list_of_funcs 
qedi_dbg_do_not_recover_ops[] = {
 
 struct qedi_debugfs_ops qedi_debugfs_ops[] = {
{ "gbl_ctx", NULL },
-   { "do_not_recover", qedi_dbg_do_not_recover_ops},
+   { "qedi_do_not_recover", qedi_dbg_do_not_recover_ops},
{ "io_trace", NULL },
{ NULL, NULL }
 };
@@ -141,7 +141,7 @@ qedi_dbg_do_not_recover_cmd_read(struct file *filp, char 
__user *buffer,
if (*ppos)
return 0;
 
-   cnt = sprintf(buffer, "do_not_recover=%d\n", do_not_recover);
+   cnt = sprintf(buffer, "qedi_do_not_recover=%d\n", qedi_do_not_recover);
cnt = min_t(int, count, cnt - *ppos);
*ppos += cnt;
return cnt;
diff --git a/drivers/scsi/qedi/qedi_fw.c b/drivers/scsi/qedi/qedi_fw.c
index c9f0ef4e11b3..2bce3efc66a4 100644
--- a/drivers/scsi/qedi/qedi_fw.c
+++ b/drivers/scsi/qedi/qedi_fw.c
@@ -1461,9 +1461,9 @@ static void qedi_tmf_work(struct work_struct *work)
  get_itt(tmf_hdr->rtt), get_itt(ctask->itt), cmd->task_id,
  qedi_conn->iscsi_conn_id);
 
-   if (do_not_recover) {
+   if (qedi_do_not_recover) {
QEDI_ERR(>dbg_ctx, "DONT SEND CLEANUP/ABORT %d\n",
-do_not_recover);
+qedi_do_not_recover);
goto abort_ret;
}
 
diff --git a/drivers/scsi/qedi/qedi_gbl.h b/drivers/scsi/qedi/qedi_gbl.h
index 8e488de88ece..63d793f46064 100644
--- a/drivers/scsi/qedi/qedi_gbl.h
+++ b/drivers/scsi/qedi/qedi_gbl.h
@@ -12,8 +12,14 @@
 
 #include "qedi_iscsi.h"
 
+#ifdef CONFIG_DEBUG_FS
+extern int qedi_do_not_recover;
+#else
+#define qedi_do_not_recover (0)
+#endif
+
 extern uint qedi_io_tracing;
-extern int do_not_recover;
+
 extern struct scsi_host_template qedi_host_template;
 extern struct iscsi_transport qedi_iscsi_transport;
 extern const struct qed_iscsi_ops *qedi_ops;
diff --git a/drivers/scsi/qedi/qedi_iscsi.c b/drivers/scsi/qedi/qedi_iscsi.c
index b9f79d36142d..4cc474364c50 100644
--- a/drivers/scsi/qedi/qedi_iscsi.c
+++ b/drivers/scsi/qedi/qedi_iscsi.c
@@ -833,7 +833,7 @@ qedi_ep_connect(struct Scsi_Host *shost, struct sockaddr 
*dst_addr,
return ERR_PTR(ret);
}
 
-   if (do_not_recover) {
+   if (qedi_do_not_recover) {
  

[PATCH] megaraid_sas: enable intx only if msix request fails

2017-03-02 Thread Kashyap Desai
Without this fix, driver will enable INTx Interrupt pin even
 though MSI-x vectors are enabled. See below lspci output. DisINTx is unset
 for MSIx setup.

lspci -s 85:00.0 -vvv |grep INT |grep Control
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx-

After applying this fix, driver will enable INTx Interrupt pin only if Legacy 
interrupt method is required.
See below lspci output. DisINTx is unset for MSIx setup.
lspci -s 85:00.0 -vvv |grep INT |grep Control
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR+ FastB2B- DisINTx+

Signed-off-by: Kashyap Desai 
---
 drivers/scsi/megaraid/megaraid_sas_base.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c 
b/drivers/scsi/megaraid/megaraid_sas_base.c
index 7ac9a9e..82a8ec8 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -4990,6 +4990,7 @@ int megasas_set_crash_dump_params(struct megasas_instance 
*instance,
struct pci_dev *pdev;
 
pdev = instance->pdev;
+   pci_intx(pdev, 1);
instance->irq_context[0].instance = instance;
instance->irq_context[0].MSIxIndex = 0;
if (request_irq(pci_irq_vector(pdev, 0),
@@ -5277,10 +5278,6 @@ static int megasas_init_fw(struct megasas_instance 
*instance)
MPI2_REPLY_POST_HOST_INDEX_OFFSET);
}
 
-   i = pci_alloc_irq_vectors(instance->pdev, 1, 1, PCI_IRQ_LEGACY);
-   if (i < 0)
-   goto fail_setup_irqs;
-
dev_info(>pdev->dev,
"firmware supports msix\t: (%d)", fw_msix_count);
dev_info(>pdev->dev,
@@ -5494,7 +5491,6 @@ static int megasas_init_fw(struct megasas_instance 
*instance)
instance->instancet->disable_intr(instance);
 fail_init_adapter:
megasas_destroy_irqs(instance);
-fail_setup_irqs:
if (instance->msix_vectors)
pci_free_irq_vectors(instance->pdev);
instance->msix_vectors = 0;
-- 
1.8.3.1



[PATCH] enclosure: fix sysfs symlinks creation when using multipath

2017-03-02 Thread Maurizio Lombardi
With multipath, it may happen that the same device is passed
to enclosure_add_device() multiple times and that the enclosure_add_links()
function fails to create the symlinks because the device's sysfs
directory entry is still NULL.
In this case, the links will never be created because all the subsequent
calls to enclosure_add_device() will immediately fail with EEXIST.

This is an example of what happens:

[   19.251902] scsi 0:0:27:0: Direct-Access SEAGATE  ST8000NM0075 E002 
PQ: 0 ANSI: 6
[   19.261874] scsi 0:0:27:0: SSP: handle(0x0027), 
sas_addr(0x5000c50085826dd6), phy(34), device_name(0x5000c50085826dd4)
[   19.274656] scsi 0:0:27:0: SSP: enclosure_logical_id(0x500093d001be4000), 
slot(57)
[   19.283944] scsi 0:0:27:0: SSP: enclosure level(0x), connector name( 
)
[   19.292933] scsi 0:0:27:0: qdepth(254), tagged(1), simple(0), ordered(0), 
scsi_l
[...]
[   60.066524] enclosure_add_device(0:0:27:0) called, enclosure_add_links() 
failed with error -2
[...]
[   75.119146] sd 0:0:27:0: Attached scsi generic sg27 type 0
[   75.126722] sd 0:0:27:0: [sdaa] 15628053168 512-byte logical blocks: (8.00 
TB/7.27 TiB)
[   75.126723] sd 0:0:27:0: [sdaa] 4096-byte physical blocks
[   75.129059] sd 0:0:27:0: [sdaa] Write Protect is off
[   75.129061] sd 0:0:27:0: [sdaa] Mode Sense: db 00 10 08
[   75.130417] sd 0:0:27:0: [sdaa] Write cache: enabled, read cache: enabled, 
supports DPO and FUA
[   75.158088] sd 0:0:27:0: [sdaa] Attached SCSI disk
[...]
[   75.192722] enclosure_add_device(0:0:27:0) called, device already exists

This patch modifies the code so the driver will detect this condition
and will retry to create the symlinks when enclosure_add_device() is called.

Signed-off-by: Maurizio Lombardi 
---
 drivers/misc/enclosure.c  | 16 ++--
 include/linux/enclosure.h |  1 +
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/enclosure.c b/drivers/misc/enclosure.c
index 65fed71..a856c98 100644
--- a/drivers/misc/enclosure.c
+++ b/drivers/misc/enclosure.c
@@ -375,21 +375,33 @@ int enclosure_add_device(struct enclosure_device *edev, 
int component,
 struct device *dev)
 {
struct enclosure_component *cdev;
+   int error;
 
if (!edev || component >= edev->components)
return -EINVAL;
 
cdev = >component[component];
 
-   if (cdev->dev == dev)
+   if (cdev->dev == dev) {
+   if (!cdev->links_created) {
+   error = enclosure_add_links(cdev);
+   if (!error)
+   cdev->links_created = 1;
+   }
return -EEXIST;
+   }
 
if (cdev->dev)
enclosure_remove_links(cdev);
 
put_device(cdev->dev);
cdev->dev = get_device(dev);
-   return enclosure_add_links(cdev);
+   error = enclosure_add_links(cdev);
+   if (!error)
+   cdev->links_created = 1;
+   else
+   cdev->links_created = 0;
+   return error;
 }
 EXPORT_SYMBOL_GPL(enclosure_add_device);
 
diff --git a/include/linux/enclosure.h b/include/linux/enclosure.h
index a4cf57c..c3bdc4c 100644
--- a/include/linux/enclosure.h
+++ b/include/linux/enclosure.h
@@ -97,6 +97,7 @@ struct enclosure_component {
struct device cdev;
struct device *dev;
enum enclosure_component_type type;
+   int links_created;
int number;
int fault;
int active;
-- 
Maurizio Lombardi



Re: sg driver, sg_io and sg tablesize

2017-03-02 Thread Fam Zheng
On Wed, 03/01 17:11, Laurence Oberman wrote:
> This was suggested by Ewan and is the best way to know what the actual max I/O
> size for sg_io would be for the LPFC driver.

This sounds silly, why is there a "lame" max I/O size as in max_sectors_kb,
that is unusable because of a more limiting max_segments, and an "actual max
I/O size" as derived from the latter? What is the reason of the contradiction
here?

Fam


[regression] vmw_pvscsi probe fails on 4.11

2017-03-02 Thread Loïc Yhuel

Hi,

On Fedora Rawhide (reproduced on git master), vmw_pvscsi probe fails.
It also calls free_irq during cleanup, which produces a warning since it 
didn't

call request_irq in this case.

Reverting commit 2e48e34 (scsi: vmw_pvscsi: switch to pci_alloc_irq_vectors)
fixes the issue (I get "vmw_pvscsi: using MSI-X", and it uses the IRQ 78).

[2.671383] vmw_pvscsi: using 64bit dma
[2.671750] vmw_pvscsi: max_id: 16
[2.671752] vmw_pvscsi: setting ring_pages to 8
[2.673209] [ cut here ]
[2.673219] WARNING: CPU: 2 PID: 402 at kernel/irq/manage.c:1478 
__free_irq+0xa4/0x2e0

[2.673220] Trying to free already-free IRQ 78
[2.673221] Modules linked in: serio_raw vmw_pvscsi(+) vmxnet3 
ata_generic pata_acpi fjes vmwgfx drm_kms_helper ttm drm
[2.673236] CPU: 2 PID: 402 Comm: systemd-udevd Not tainted 
4.11.0r0-test1+ #2
[2.673238] Hardware name: VMware, Inc. VMware Virtual Platform/440BX 
Desktop Reference Platform, BIOS 6.00 07/02/2015

[2.673239] Call Trace:
[2.673244]  dump_stack+0x8e/0xd1
[2.673248]  __warn+0xcb/0xf0
[2.673252]  warn_slowpath_fmt+0x5f/0x80
[2.673257]  ? _raw_spin_lock_irqsave+0x82/0x90
[2.673259]  ? __free_irq+0x5f/0x2e0
[2.673262]  __free_irq+0xa4/0x2e0
[2.673265]  ? __pci_enable_msix+0x234/0x520
[2.673269]  free_irq+0x39/0x90
[2.673274]  pvscsi_shutdown_intr+0x25/0x40 [vmw_pvscsi]
[2.673277]  pvscsi_release_resources+0x1e/0x370 [vmw_pvscsi]
[2.673281]  pvscsi_probe+0xd2e/0x10a4 [vmw_pvscsi]
[2.673306]  local_pci_probe+0x42/0xa0
[2.673308]  ? pci_match_device+0xe0/0x110
[2.673311]  pci_device_probe+0xbe/0x150
[2.673313]  ? _raw_spin_unlock+0x27/0x40
[2.673318]  driver_probe_device+0x106/0x450
[2.673322]  __driver_attach+0xa8/0xf0
[2.673325]  ? driver_probe_device+0x450/0x450
[2.673327]  bus_for_each_dev+0x75/0xc0
[2.673331]  driver_attach+0x1e/0x20
[2.67]  bus_add_driver+0x1d3/0x270
[2.673335]  ? 0xc032d000
[2.673338]  driver_register+0x60/0xe0
[2.673340]  ? 0xc032d000
[2.673342]  __pci_register_driver+0x60/0x70
[2.673345]  pvscsi_init+0x38/0x1000 [vmw_pvscsi]
[2.673348]  do_one_initcall+0x50/0x1a0
[2.673351]  ? rcu_read_lock_sched_held+0x79/0x80
[2.673354]  ? kmem_cache_alloc_trace+0x273/0x2e0
[2.673356]  ? do_init_module+0x27/0x1e8
[2.673361]  do_init_module+0x5f/0x1e8
[2.673365]  load_module+0x2386/0x29d0
[2.673367]  ? __symbol_put+0x70/0x70
[2.673371]  ? show_coresize+0x30/0x30
[2.673375]  ? vfs_read+0x131/0x180
[2.673386]  SYSC_finit_module+0xf6/0x110
[2.673395]  SyS_finit_module+0xe/0x10
[2.673397]  do_syscall_64+0x6c/0x1f0
[2.673400]  entry_SYSCALL64_slow_path+0x25/0x25
[2.673402] RIP: 0033:0x7f770375e5d9
[2.673403] RSP: 002b:7fff59d224b8 EFLAGS: 0246 ORIG_RAX: 
0139
[2.673406] RAX: ffda RBX: 56462cb91f20 RCX: 
7f770375e5d9
[2.673407] RDX:  RSI: 7f77042999c5 RDI: 
0006
[2.673408] RBP: 7f77042999c5 R08:  R09: 
7fff59d225d0
[2.673409] R10: 0006 R11: 0246 R12: 

[2.673410] R13: 56462cb8fcd0 R14: 0002 R15: 
56462ae4cd81

[2.673419] ---[ end trace 19fc54ed551e946e ]---
[2.677816] vmw_pvscsi :13:00.0: Driver probe function 
unexpectedly returned 1





Re: [PATCH] mpt3sas: Avoid sleeping in interrupt context

2017-03-02 Thread Martin K. Petersen
> "Bart" == Bart Van Assche  writes:

Bart> Commit 669f044170d8 ("scsi: srp_transport: Move queuecommand()
Bart> wait code to SCSI core") can make scsi_internal_device_block()
Bart> sleep.  However, the mpt3sas driver can call this function from an
Bart> interrupt handler. Hence add a second argument to
Bart> scsi_internal_device_block() that restores the old behavior of
Bart> this function for the mpt3sas handler.

Applied to 4.11/scsi-fixes.

Thanks, Bart!

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v3 00/14] qla2xxx: Bug Fixes and updates for target.

2017-03-02 Thread Nicholas A. Bellinger
Hi Himanshu & Co,

On Fri, 2017-02-24 at 13:37 -0800, Himanshu Madhani wrote:
> Hi Nic,
> 
> Please consider this series for inclusion in target-pending.
> 

This is a little late to include for v4.11-rc1, but considering the
number of bug-fixes I'm OK to merge it post -rc1.

> This series contains following changes.
> 
> o Fix for the deadlock because of inconsistent lock usage reported by you.
> o Added patch to submit non-critical MBX command via IOCB path.
> o Improved T10-DIF/PI handling with target stack.
> o Changed scsi host lookup method.
> o Some minor bug fixes.
> 
> Changes from v2 --> v3
> 
> o Updated patch to use waitq instead of while loop for waiting.
> o Removed duplication of Target stack definitations for T10-DIF/PI.
> o In addition, addressed review comments from Bart & Christoph.
> 
> Changes from v1 -> v2
> 
> o Rebased series based on scsi-target-for-v4.11 branch.
> 
> Please apply to target-pending.
> 
> Thanks,
> Himanshu 
> 
> Anil Gurumurthy (1):
>   qla2xxx: Export DIF stats via debugfs
> 
> Himanshu Madhani (2):
>   qla2xxx: Add DebugFS node to display Port Database
>   qla2xxx: Update driver version to 9.00.00.00-k
> 
> Joe Carnuccio (1):
>   qla2xxx: Allow vref count to timeout on vport delete.
> 
> Quinn Tran (10):
>   qla2xxx: Fix delayed response to command for loop mode/direct connect.
>   qla2xxx: Allow relogin to proceed if remote login did not finish
>   qla2xxx: Use IOCB interface to submit non-critical MBX.
>   qla2xxx: Improve T10-DIF/PI handling in driver.
>   qla2xxx: Change scsi host lookup method.
>   qla2xxx: Fix memory leak for abts processing
>   qla2xxx: Fix request queue corruption.
>   qla2xxx: Fix inadequate lock protection for ABTS.
>   qla2xxx: Add async new target notification
>   qla2xxx: Fix sess_lock & hardware_lock lock order problem.
> 
>  drivers/scsi/qla2xxx/Kconfig   |   1 +
>  drivers/scsi/qla2xxx/qla_attr.c|   4 +-
>  drivers/scsi/qla2xxx/qla_dbg.h |   1 +
>  drivers/scsi/qla2xxx/qla_def.h |  46 ++-
>  drivers/scsi/qla2xxx/qla_dfs.c | 115 +-
>  drivers/scsi/qla2xxx/qla_gbl.h |  12 +-
>  drivers/scsi/qla2xxx/qla_init.c|  85 ++---
>  drivers/scsi/qla2xxx/qla_isr.c |  41 ++-
>  drivers/scsi/qla2xxx/qla_mbx.c | 304 ++--
>  drivers/scsi/qla2xxx/qla_mid.c |  14 +-
>  drivers/scsi/qla2xxx/qla_os.c  |  24 +-
>  drivers/scsi/qla2xxx/qla_target.c  | 729 
> +++--
>  drivers/scsi/qla2xxx/qla_target.h  |  36 +-
>  drivers/scsi/qla2xxx/qla_version.h |   6 +-
>  drivers/scsi/qla2xxx/tcm_qla2xxx.c |  49 ++-
>  15 files changed, 1049 insertions(+), 418 deletions(-)
>