Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger
On Tue, 28 Feb 2017 22:20:58 -0800
James Bottomley  wrote:

> On Tue, 2017-02-28 at 17:25 -0800, Stephen Hemminger wrote:
> > [1.346023] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb
> > status 0x20 length 36
> > [1.352913] inquiry data: : 00 aa be f1 5c 98 ff ff f0 64
> > 02 89 ff ff ff ff
> > [1.356543] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00 00 00
> > [1.359996] inquiry data: 0020: 00 00 00 00
> > [1.361835] scsi host1: scsi scan: INQUIRY result too short (5),
> > using 36
> > [1.361888] hv_storvsc:  IO cmd 0xa0 0x0 0x0 scsi status 0x0 srb
> > status 0x1 length 16
> > [1.362307] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb
> > status 0x1 length 36
> > [1.362308] inquiry data: : 00 23 34 f1 5c 98 ff ff f0 64
> > 02 89 ff ff ff ff
> > [1.362309] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00
> > 00 00 00 00 00 00
> > [1.362309] inquiry data: 0020: 00 00 00 00
> > [1.377423] scsi 1:0:0:1: Direct-Access   
> >  PQ: 0 ANSI: 0  
> 
> Well, this pinpoints the fault to the block uncopy, I think.  The
> Inquiry data is clearly correct in the page frame, so it's not getting
> copied to the scsi_execute() buffer for some reason.
> 
> James
> 

Let me know, I can run another test and dump more data.


Re: mpt3sas sleep from atomic context on v4.10

2017-02-28 Thread Omar Sandoval
On Wed, Mar 01, 2017 at 01:07:12AM +, Bart Van Assche wrote:
> On Tue, 2017-02-28 at 16:25 -0800, Omar Sandoval wrote:
> > I'm seeing this while testing on Linus' current master:
> > 
> > [  427.814466] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:149 
> > __handle_irq_event_percpu+0x187/0x190
> > [  427.832552] irq 116 handler _base_interrupt+0x0/0x9e0 [mpt3sas] enabled 
> > interrupts
> > 
> > I tracked it down to commit 669f044170d8 ("scsi: srp_transport: Move
> > queuecommand() wait code to SCSI core"). That commit made it so
> > scsi_internal_device_block() can sleep, but mpt3sas calls this from an
> > interrupt handler:
> > 
> > _base_interrupt
> > -> _base_async_event
> >-> mpt3sas_scsih_event_callback
> >   -> _scsih_check_topo_delete_events
> >  -> _scsih_block_io_to_children_attached_directly
> > -> _scsih_block_io_device
> >-> _scsih_internal_device_block
> >   -> scsi_internal_device_block
> > 
> > This change was made in 4.10. Bart, can you take a look?
> 
> How about the (entirely untested) patch below?
> 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.h  |  3 ---
>  drivers/scsi/mpt3sas/mpt3sas_scsih.c |  4 ++--
>  drivers/scsi/scsi_lib.c  | 32 ++--
>  drivers/scsi/scsi_priv.h |  3 ---
>  include/scsi/scsi_device.h   |  4 
>  5 files changed, 24 insertions(+), 22 deletions(-)

[snip]

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 3e32dc954c3c..77851697f130 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2945,6 +2945,8 @@ EXPORT_SYMBOL(scsi_target_resume);
>  /**
>   * scsi_internal_device_block - internal function to put a device 
> temporarily into the SDEV_BLOCK state
>   * @sdev:device to block
> + * @wait:   Whether or not to wait until ongoing .queuecommand() /
> + *   .queue_rq() calls have finished.
>   *
>   * Block request made by scsi lld's to temporarily stop all
>   * scsi commands on the specified device. May sleep.
> @@ -2962,7 +2964,7 @@ EXPORT_SYMBOL(scsi_target_resume);
>   * remove the rport mutex lock and unlock calls from srp_queuecommand().
>   */
>  int
> -scsi_internal_device_block(struct scsi_device *sdev)
> +scsi_internal_device_block(struct scsi_device *sdev, bool wait)
>  {
>   struct request_queue *q = sdev->request_queue;
>   unsigned long flags;
> @@ -2976,18 +2978,20 @@ scsi_internal_device_block(struct scsi_device *sdev)
>   return err;
>   }
>  
> - /* 
> -  * The device has transitioned to SDEV_BLOCK.  Stop the
> -  * block layer from calling the midlayer with this device's
> -  * request queue. 
> -  */
> - if (q->mq_ops) {
> - blk_mq_quiesce_queue(q);
> - } else {
> - spin_lock_irqsave(q->queue_lock, flags);
> - blk_stop_queue(q);
> - spin_unlock_irqrestore(q->queue_lock, flags);
> - scsi_wait_for_queuecommand(sdev);
> + if (wait) {
> + /*
> +  * The device has transitioned to SDEV_BLOCK.  Stop the
> +  * block layer from calling the midlayer with this device's
> +  * request queue.
> +  */
> + if (q->mq_ops) {
> + blk_mq_quiesce_queue(q);
> + } else {
> + spin_lock_irqsave(q->queue_lock, flags);
> + blk_stop_queue(q);
> + spin_unlock_irqrestore(q->queue_lock, flags);
> + scsi_wait_for_queuecommand(sdev);
> + }
>   }

I think here, we want this instead:

@@ -2987,7 +2989,8 @@ scsi_internal_device_block(struct scsi_device *sdev)
spin_lock_irqsave(q->queue_lock, flags);
blk_stop_queue(q);
spin_unlock_irqrestore(q->queue_lock, flags);
-   scsi_wait_for_queuecommand(sdev);
+   if (wait)
+   scsi_wait_for_queuecommand(sdev);
}

return 0;

That fixes the warnings for me.


Re: SCSI regression in 4.11

2017-02-28 Thread James Bottomley
On Tue, 2017-02-28 at 17:25 -0800, Stephen Hemminger wrote:
> [1.346023] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb
> status 0x20 length 36
> [1.352913] inquiry data: : 00 aa be f1 5c 98 ff ff f0 64
> 02 89 ff ff ff ff
> [1.356543] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00
> [1.359996] inquiry data: 0020: 00 00 00 00
> [1.361835] scsi host1: scsi scan: INQUIRY result too short (5),
> using 36
> [1.361888] hv_storvsc:  IO cmd 0xa0 0x0 0x0 scsi status 0x0 srb
> status 0x1 length 16
> [1.362307] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb
> status 0x1 length 36
> [1.362308] inquiry data: : 00 23 34 f1 5c 98 ff ff f0 64
> 02 89 ff ff ff ff
> [1.362309] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00
> [1.362309] inquiry data: 0020: 00 00 00 00
> [1.377423] scsi 1:0:0:1: Direct-Access   
>  PQ: 0 ANSI: 0

Well, this pinpoints the fault to the block uncopy, I think.  The
Inquiry data is clearly correct in the page frame, so it's not getting
copied to the scsi_execute() buffer for some reason.

James



Re: [PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Martin K. Petersen
> "Damien" == Damien Le Moal  writes:

Damien,

Damien> The problem remains that the mpt3sas driver needs fixing. As you
Damien> suggest, we can do that in sd, or directly in mpt3sas.  I tried
Damien> to do a clean fix in sd, but always end up consuming a lot of
Damien> aspirin because of all the potential corner cases to deal
Damien> with.

What? You expected this to be easy? :)

FWIW, I'm perfectly happy with the desire to shuffle the ZBC-specifics
over to sd_zbc.c.

Damien> The file sd.h has the inline function
Damien> scsi_medium_access_command() defined. We could move that to
Damien> include/scsi/scsi.h (or scsi_proto.h) and use it in place of
Damien> blk_rq_accesses_medium() in the mpt3sas driver to not force
Damien> unaligned resid corrections for zone commands. Would that be
Damien> acceptable ?

I'd still rather keep it in sd. If you don't have sufficient supplies of
aspirin for the sd_completed_bytes() approach then I'm OK with a simple
tweak to sd_done(). We need something we can reasonably Cc: to stable
for 4.10.

You are welcome to key off of scsi_medium_access_command() if you like
but I don't think it's necessary. Just having a default for the req_op
switch should suffice.

I believe that was your original sd approach. If you post it again I'll
have another look.

And then I'll put the bigger completion rework back on my todo list.

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH 1/1] fnic: minor cleanup in fnic_fcpio_itmf_cmpl_handler, removing else case

2017-02-28 Thread Satish Kharat
Getting rid of else case to make the flow look bit more simpler
logical

Signed-off-by: Satish Kharat 
Signed-off-by: Sesidhar Baddela 
---
 drivers/scsi/fnic/fnic_scsi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 2544a37..df443a9 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -1128,12 +1128,11 @@ static void fnic_fcpio_itmf_cmpl_handler(struct fnic 
*fnic,
}
 
CMD_FLAGS(sc) |= FNIC_IO_ABT_TERM_DONE;
+   CMD_ABTS_STATUS(sc) = hdr_status;
 
/* If the status is IO not found consider it as success */
if (hdr_status == FCPIO_IO_NOT_FOUND)
CMD_ABTS_STATUS(sc) = FCPIO_SUCCESS;
-   else
-   CMD_ABTS_STATUS(sc) = hdr_status;
 
atomic64_dec(_stats->io_stats.active_ios);
if (atomic64_read(>io_cmpl_skip))
-- 
2.5.5



Re: [PATCH] scsi: qedi: fix missing return error code check on call to qedi_setup_int

2017-02-28 Thread Martin K. Petersen
> "Colin" == Colin King  writes:

Colin> The call to qedi_setup_int is not updating the return code rc yet
Colin> rc is being checked for an error. Fix this by assigning rc to the
Colin> return code from the call to qedi_setup_int.

Applied to 4.11/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering


[PATCH 1/1] fnic: Ratelimit printks to avoid flooding when vlan is not set by the switch.i

2017-02-28 Thread Satish Kharat
From: Satish Kharat 

This is to avoid the log from being filled with vlan discovery
messages when there is no vlan configured on the switch.

Signed-off-by: Satish Kharat 
Signed-off-by: Sesidhar Baddela 
---
 drivers/scsi/fnic/fnic_fcs.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/fnic/fnic_fcs.c b/drivers/scsi/fnic/fnic_fcs.c
index 3b7da66..8d2d86e 100644
--- a/drivers/scsi/fnic/fnic_fcs.c
+++ b/drivers/scsi/fnic/fnic_fcs.c
@@ -342,8 +342,11 @@ static void fnic_fcoe_send_vlan_req(struct fnic *fnic)
 
fnic_fcoe_reset_vlans(fnic);
fnic->set_vlan(fnic, 0);
-   FNIC_FCS_DBG(KERN_INFO, fnic->lport->host,
- "Sending VLAN request...\n");
+
+   if (printk_ratelimit())
+   FNIC_FCS_DBG(KERN_INFO, fnic->lport->host,
+ "Sending VLAN request...\n");
+
skb = dev_alloc_skb(sizeof(struct fip_vlan));
if (!skb)
return;
@@ -1313,10 +1316,11 @@ void fnic_handle_fip_timer(struct fnic *fnic)
 
spin_lock_irqsave(>vlans_lock, flags);
if (list_empty(>vlans)) {
-   /* no vlans available, try again */
-   FNIC_FCS_DBG(KERN_DEBUG, fnic->lport->host,
- "Start VLAN Discovery\n");
spin_unlock_irqrestore(>vlans_lock, flags);
+   /* no vlans available, try again */
+   if (printk_ratelimit())
+   FNIC_FCS_DBG(KERN_DEBUG, fnic->lport->host,
+ "Start VLAN Discovery\n");
fnic_event_enq(fnic, FNIC_EVT_START_VLAN_DISC);
return;
}
@@ -1332,10 +1336,11 @@ void fnic_handle_fip_timer(struct fnic *fnic)
spin_unlock_irqrestore(>vlans_lock, flags);
break;
case FIP_VLAN_FAILED:
-   /* if all vlans are in failed state, restart vlan disc */
-   FNIC_FCS_DBG(KERN_DEBUG, fnic->lport->host,
- "Start VLAN Discovery\n");
spin_unlock_irqrestore(>vlans_lock, flags);
+   /* if all vlans are in failed state, restart vlan disc */
+   if (printk_ratelimit())
+   FNIC_FCS_DBG(KERN_DEBUG, fnic->lport->host,
+ "Start VLAN Discovery\n");
fnic_event_enq(fnic, FNIC_EVT_START_VLAN_DISC);
break;
case FIP_VLAN_SENT:
-- 
2.5.5



[PATCH 1/1] fnic: Fix for "Number of Active IOs" in fnicstats becoming negative

2017-02-28 Thread Satish Kharat
Fixing the IO stats updation (Active IOs and IO completion) to
fix "Number of Active IOs" becoming negative in the fnistats
output.

Signed-off-by: Satish Kharat 
Signed-off-by: Sesidhar Baddela 
---
 drivers/scsi/fnic/fnic_scsi.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 2544a37..ec820cb 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -1135,11 +1135,6 @@ static void fnic_fcpio_itmf_cmpl_handler(struct fnic 
*fnic,
else
CMD_ABTS_STATUS(sc) = hdr_status;
 
-   atomic64_dec(_stats->io_stats.active_ios);
-   if (atomic64_read(>io_cmpl_skip))
-   atomic64_dec(>io_cmpl_skip);
-   else
-   atomic64_inc(_stats->io_stats.io_completions);
 
if (!(CMD_FLAGS(sc) & (FNIC_IO_ABORTED | FNIC_IO_DONE)))
atomic64_inc(_stats->no_icmnd_itmf_cmpls);
@@ -1181,6 +1176,11 @@ static void fnic_fcpio_itmf_cmpl_handler(struct fnic 
*fnic,
(((u64)CMD_FLAGS(sc) << 32) |
CMD_STATE(sc)));
sc->scsi_done(sc);
+   atomic64_dec(_stats->io_stats.active_ios);
+   if (atomic64_read(>io_cmpl_skip))
+   atomic64_dec(>io_cmpl_skip);
+   else
+   
atomic64_inc(_stats->io_stats.io_completions);
}
}
 
@@ -1970,6 +1970,11 @@ int fnic_abort_cmd(struct scsi_cmnd *sc)
/* Call SCSI completion function to complete the IO */
sc->result = (DID_ABORT << 16);
sc->scsi_done(sc);
+   atomic64_dec(_stats->io_stats.active_ios);
+   if (atomic64_read(>io_cmpl_skip))
+   atomic64_dec(>io_cmpl_skip);
+   else
+   atomic64_inc(_stats->io_stats.io_completions);
}
 
 fnic_abort_cmd_end:
-- 
2.5.5



Re: [PATCH v2] libiscsi: add lock around task lists to fix list corruption regression

2017-02-28 Thread Martin K. Petersen
> "Chris" == Chris Leech  writes:

Chris> There's a rather long standing regression from the commit
Chris> "libiscsi: Reduce locking contention in fast path"

Applied to 4.11/scsi-fixes.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Damien Le Moal
Martin,

On 3/1/17 11:52, Martin K. Petersen wrote:
>> "Christoph" == Christoph Hellwig  writes:
> 
> Christoph> I don't really like this too much - this is too many SCSI
> Christoph> specifics for the block layer to care.  Maybe using bios for
> Christoph> the zone ops was a mistake after all, and we should just have
> Christoph> operations in struct block_device instead..
> 
> Yeah, I'm afraid I don't like this either.
> 
> Short term I still think we should adjust in sd. And then maybe longer
> term revisit whether these hybrid commands should be something other
> than bios.

OK. I will think about it.

The problem remains that the mpt3sas driver needs fixing. As you
suggest, we can do that in sd, or directly in mpt3sas.
I tried to do a clean fix in sd, but always end up consuming a lot of
aspirin because of all the potential corner cases to deal with. So I
would prefer really just a fix in mpt3sas for now and think of the sd
changes (if any) together with a redesign of the zone commands execution
(not as BIOs).

The file sd.h has the inline function scsi_medium_access_command()
defined. We could move that to include/scsi/scsi.h (or scsi_proto.h) and
use it in place of blk_rq_accesses_medium() in the mpt3sas driver to not
force unaligned resid corrections for zone commands. Would that be
acceptable ?

Best regards.

-- 
Damien Le Moal, Ph.D.
Sr. Manager, System Software Research Group,
Western Digital Corporation
damien.lem...@wdc.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.wdc.com, www.hgst.com


Re: [PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Martin K. Petersen
> "Christoph" == Christoph Hellwig  writes:

Christoph> I don't really like this too much - this is too many SCSI
Christoph> specifics for the block layer to care.  Maybe using bios for
Christoph> the zone ops was a mistake after all, and we should just have
Christoph> operations in struct block_device instead..

Yeah, I'm afraid I don't like this either.

Short term I still think we should adjust in sd. And then maybe longer
term revisit whether these hybrid commands should be something other
than bios.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: mpt3sas sleep from atomic context on v4.10

2017-02-28 Thread Bart Van Assche
On Tue, 2017-02-28 at 16:25 -0800, Omar Sandoval wrote:
> I'm seeing this while testing on Linus' current master:
> 
> [  427.814466] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:149 
> __handle_irq_event_percpu+0x187/0x190
> [  427.832552] irq 116 handler _base_interrupt+0x0/0x9e0 [mpt3sas] enabled 
> interrupts
> 
> I tracked it down to commit 669f044170d8 ("scsi: srp_transport: Move
> queuecommand() wait code to SCSI core"). That commit made it so
> scsi_internal_device_block() can sleep, but mpt3sas calls this from an
> interrupt handler:
> 
> _base_interrupt
> -> _base_async_event
>-> mpt3sas_scsih_event_callback
>   -> _scsih_check_topo_delete_events
>  -> _scsih_block_io_to_children_attached_directly
>   -> _scsih_block_io_device
>  -> _scsih_internal_device_block
> -> scsi_internal_device_block
> 
> This change was made in 4.10. Bart, can you take a look?

How about the (entirely untested) patch below?

---
 drivers/scsi/mpt3sas/mpt3sas_base.h  |  3 ---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c |  4 ++--
 drivers/scsi/scsi_lib.c  | 32 ++--
 drivers/scsi/scsi_priv.h |  3 ---
 include/scsi/scsi_device.h   |  4 
 5 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h 
b/drivers/scsi/mpt3sas/mpt3sas_base.h
index 4ab634fc27df..1aa7f97613ab 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.h
@@ -1444,9 +1444,6 @@ void mpt3sas_transport_update_links(struct 
MPT3SAS_ADAPTER *ioc,
u64 sas_address, u16 handle, u8 phy_number, u8 link_rate);
 extern struct sas_function_template mpt3sas_transport_functions;
 extern struct scsi_transport_template *mpt3sas_transport_template;
-extern int scsi_internal_device_block(struct scsi_device *sdev);
-extern int scsi_internal_device_unblock(struct scsi_device *sdev,
-   enum scsi_device_state new_state);
 /* trigger data externs */
 void mpt3sas_send_trigger_data_event(struct MPT3SAS_ADAPTER *ioc,
struct SL_WH_TRIGGERS_EVENT_DATA_T *event_data);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 46e866c36c8a..16d34a4bdc2e 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -2859,7 +2859,7 @@ _scsih_internal_device_block(struct scsi_device *sdev,
sas_device_priv_data->sas_target->handle);
sas_device_priv_data->block = 1;
 
-   r = scsi_internal_device_block(sdev);
+   r = scsi_internal_device_block(sdev, false);
if (r == -EINVAL)
sdev_printk(KERN_WARNING, sdev,
"device_block failed with return(%d) for handle(0x%04x)\n",
@@ -2895,7 +2895,7 @@ _scsih_internal_device_unblock(struct scsi_device *sdev,
"performing a block followed by an unblock\n",
r, sas_device_priv_data->sas_target->handle);
sas_device_priv_data->block = 1;
-   r = scsi_internal_device_block(sdev);
+   r = scsi_internal_device_block(sdev, false);
if (r)
sdev_printk(KERN_WARNING, sdev, "retried device_block "
"failed with return(%d) for handle(0x%04x)\n",
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 3e32dc954c3c..77851697f130 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2945,6 +2945,8 @@ EXPORT_SYMBOL(scsi_target_resume);
 /**
  * scsi_internal_device_block - internal function to put a device temporarily 
into the SDEV_BLOCK state
  * @sdev:  device to block
+ * @wait:   Whether or not to wait until ongoing .queuecommand() /
+ * .queue_rq() calls have finished.
  *
  * Block request made by scsi lld's to temporarily stop all
  * scsi commands on the specified device. May sleep.
@@ -2962,7 +2964,7 @@ EXPORT_SYMBOL(scsi_target_resume);
  * remove the rport mutex lock and unlock calls from srp_queuecommand().
  */
 int
-scsi_internal_device_block(struct scsi_device *sdev)
+scsi_internal_device_block(struct scsi_device *sdev, bool wait)
 {
struct request_queue *q = sdev->request_queue;
unsigned long flags;
@@ -2976,18 +2978,20 @@ scsi_internal_device_block(struct scsi_device *sdev)
return err;
}
 
-   /* 
-* The device has transitioned to SDEV_BLOCK.  Stop the
-* block layer from calling the midlayer with this device's
-* request queue. 
-*/
-   if (q->mq_ops) {
-   blk_mq_quiesce_queue(q);
-   } else {
-   spin_lock_irqsave(q->queue_lock, flags);
-   blk_stop_queue(q);
-   spin_unlock_irqrestore(q->queue_lock, flags);
-   scsi_wait_for_queuecommand(sdev);
+   if (wait) {
+   /*
+* The device has transitioned to SDEV_BLOCK.  

Re: SCSI regression in 4.11

2017-02-28 Thread James Bottomley
On Tue, 2017-02-28 at 10:41 -0800, Stephen Hemminger wrote:
> [1.652279] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb
> status 0x20 length 36
> [1.652297] scsi host1: scsi scan: INQUIRY result too short (5),
> using 36

This is definitive.  We sent the Inquiry command, we got 36 bytes of
payload from hyperv (correct size), but when we tried to get the length
from the data buffer of the command, we found zeros where it should
have been (the length is the byte at offset 4 plus 5).  Wherever the
inquiry data went it wasn't into the command buffer ... it's like we
have some type of sg list setup error, but I can't find it.

James



Re: SCSI regression in 4.11

2017-02-28 Thread James Bottomley
On Tue, 2017-02-28 at 10:57 -0800, Stephen Hemminger wrote:
> On Tue, 28 Feb 2017 09:06:13 -0800
> James Bottomley  wrote:
> 
> > On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> > > On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
> > > > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger
> > > > wrote:  
> > > > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before
> > > > > each
> > > > > I/O")
> > > > > 
> > > > > but that is already in linux-next.
> > > > > 
> > > > > Noticed another place where memset(of the data was being done
> > > > > not
> > > > > the extra bits.
> > > > > Tried this, but didn't fix it either...  
> > > > 
> > > > Are you using blk-mq or the legacy request code?  
> > > 
> > > Stephen doesn't have MQ set in the config he posted, I'm assuming
> > > he 
> > > didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I
> > > asked if turning on MQ makes a difference.  
> > 
> > OK, since we're not making much progress, Stephen, could you insert
> > some debugging into the storvsc driver?  The trace clearly shows
> > we're
> > getting zeros back in the buffer when we should have data from the
> > initial scan.  Firstly, does the vmbus think it's transferring any
> > data
> > for the INQUIRY and READ_CAPACITY commands (looks like
> > storvsc_command_completion() data_transfer_length)?  If it does,
> > there's probably an issue initialising the sg list.  If it doesn't,
> > we're probably sending bogus commands.
> > 
> > James
> > 
> 
> The following code in storvsc looks suspicious
> 
> static void storvsc_on_io_completion(struct storvsc_device
> *stor_device,
> struct vstor_packet *vstor_packet,
> struct storvsc_cmd_request *request)
> {
>   struct vstor_packet *stor_pkt;
>   struct hv_device *device = stor_device->device;
> 
>   stor_pkt = >vstor_packet;
> 
>   /*
>* The current SCSI handling on the host side does
>* not correctly handle:
>* INQUIRY command with page code parameter set to 0x80
>* MODE_SENSE command with cmd[2] == 0x1c
>*
>* Setup srb and scsi status so this won't be fatal.
>* We do this so we can distinguish truly fatal failues
>* (srb status == 0x4) and off-line the device in that case.
>*/
> 
>   if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
>  (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
>   vstor_packet->vm_srb.scsi_status = 0;
>   vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
>   }
> 
> If SCSI layer is sending inquiry about devices to do scanning then 
> wouldn't this workaround break things?  Maybe a better to fully test 
> for the broken command.

Let's concentrate on INQUIRY since that's the first command in the
probe sequence.  I think it's completing successfully because your
hyperv layer says it has 36 bytes of transfer and that's the size of a
successful initial INQUIRY, so the fact that the code above would break
stuff if the INQUIRY failed is orthogonal to the the current problem.

can you print out some of the DMA buffer in storvsc_on_io_completion()?

I think just the stor_pkt->vm_srb.cdb[0] (to identify the command
completing) and byte 5 of the buffer will tell us what we need to know.
 It's going to be complex to get byte 5, you'll need to do a
kmap_atomic_pfn on request->payload->range.pfn_array[0] and then look
at byte 5.  If that's zero it means there's some problem with hyperv
writing to the pfn if it's 0x24 (expected value for an initial inquiry)
we've got a problem somewhere in bio completion not copying the value
back.

James



[PATCH 1/1] fnic: Adding debug IO and Abort latency counter to fnic stats

2017-02-28 Thread Satish Kharat
The IO and Abort latency counter count the time taken to complete
the IO and abort command into broad buckets. It is not intend for
performance measurement, just a debug stat.
current_max_io_time tries to keep tarck of the maximun time an IO
has taken to complete if it > 30sec, it just another debug stat.

Signed-off-by: Satish Kharat 
Signed-off-by: Sesidhar Baddela 
---
 drivers/scsi/fnic/fnic.h   |  2 +-
 drivers/scsi/fnic/fnic_scsi.c  | 43 ++
 drivers/scsi/fnic/fnic_stats.h | 15 ++
 drivers/scsi/fnic/fnic_trace.c | 47 ++
 4 files changed, 102 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/fnic/fnic.h b/drivers/scsi/fnic/fnic.h
index 9ddc920..99d00c9 100644
--- a/drivers/scsi/fnic/fnic.h
+++ b/drivers/scsi/fnic/fnic.h
@@ -39,7 +39,7 @@
 
 #define DRV_NAME   "fnic"
 #define DRV_DESCRIPTION"Cisco FCoE HBA Driver"
-#define DRV_VERSION"1.6.0.21"
+#define DRV_VERSION"1.6.0.34"
 #define PFXDRV_NAME ": "
 #define DFX DRV_NAME "%d: "
 
diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 2544a37..1f116b4 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -823,6 +823,7 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic,
spinlock_t *io_lock;
u64 cmd_trace;
unsigned long start_time;
+   unsigned long io_duration_time;
 
/* Decode the cmpl description to get the io_req id */
fcpio_header_dec(>hdr, , _status, );
@@ -1017,6 +1018,28 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic 
*fnic,
else
atomic64_inc(_stats->io_stats.io_completions);
 
+
+   io_duration_time = jiffies_to_msecs(jiffies) - 
jiffies_to_msecs(io_req->start_time);
+
+   if(io_duration_time <= 10)
+   atomic64_inc(_stats->io_stats.io_btw_0_to_10_msec);
+   else if(io_duration_time <= 100)
+   atomic64_inc(_stats->io_stats.io_btw_10_to_100_msec);
+   else if(io_duration_time <= 500)
+   atomic64_inc(_stats->io_stats.io_btw_100_to_500_msec);
+   else if(io_duration_time <= 5000)
+   atomic64_inc(_stats->io_stats.io_btw_500_to_5000_msec);
+   else if(io_duration_time <= 1)
+   atomic64_inc(_stats->io_stats.io_btw_5000_to_1_msec);
+   else if(io_duration_time <= 3)
+   atomic64_inc(_stats->io_stats.io_btw_1_to_3_msec);
+   else {
+   atomic64_inc(_stats->io_stats.io_greater_than_3_msec);
+
+   if(io_duration_time > 
atomic64_read(_stats->io_stats.current_max_io_time))
+   atomic64_set(_stats->io_stats.current_max_io_time, 
io_duration_time);
+   }
+
/* Call SCSI completion function to complete the IO */
if (sc->scsi_done)
sc->scsi_done(sc);
@@ -1793,6 +1816,7 @@ int fnic_abort_cmd(struct scsi_cmnd *sc)
struct terminate_stats *term_stats;
enum fnic_ioreq_state old_ioreq_state;
int tag;
+   unsigned long abt_issued_time;
DECLARE_COMPLETION_ONSTACK(tm_done);
 
/* Wait for rport to unblock */
@@ -1846,6 +1870,25 @@ int fnic_abort_cmd(struct scsi_cmnd *sc)
spin_unlock_irqrestore(io_lock, flags);
goto wait_pending;
}
+
+   abt_issued_time = jiffies_to_msecs(jiffies) - 
jiffies_to_msecs(io_req->start_time);
+   if (abt_issued_time <= 6000)
+   atomic64_inc(_stats->abort_issued_btw_0_to_6_sec);
+   else if (abt_issued_time > 6000 && abt_issued_time <= 2)
+   atomic64_inc(_stats->abort_issued_btw_6_to_20_sec);
+   else if (abt_issued_time > 2 && abt_issued_time <= 3)
+   atomic64_inc(_stats->abort_issued_btw_20_to_30_sec);
+   else if (abt_issued_time > 3 && abt_issued_time <= 4)
+   atomic64_inc(_stats->abort_issued_btw_30_to_40_sec);
+   else if (abt_issued_time > 4 && abt_issued_time <= 5)
+   atomic64_inc(_stats->abort_issued_btw_40_to_50_sec);
+   else if (abt_issued_time > 5 && abt_issued_time <= 6)
+   atomic64_inc(_stats->abort_issued_btw_50_to_60_sec);
+   else
+   atomic64_inc(_stats->abort_issued_greater_than_60_sec);
+
+   FNIC_SCSI_DBG(KERN_INFO, fnic->lport->host,
+   "CBD Opcode: %02x Abort issued time: %lu msec\n", sc->cmnd[0], 
abt_issued_time);
/*
 * Command is still pending, need to abort it
 * If the firmware completes the command after this point,
diff --git a/drivers/scsi/fnic/fnic_stats.h b/drivers/scsi/fnic/fnic_stats.h
index 540cceb8..fc47955 100644
--- a/drivers/scsi/fnic/fnic_stats.h
+++ b/drivers/scsi/fnic/fnic_stats.h
@@ -26,6 +26,14 @@ struct io_path_stats {
atomic64_t 

Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger

> Let's concentrate on INQUIRY since that's the first command in the
> probe sequence.  I think it's completing successfully because your
> hyperv layer says it has 36 bytes of transfer and that's the size of a
> successful initial INQUIRY, so the fact that the code above would break
> stuff if the INQUIRY failed is orthogonal to the the current problem.

Right, that bodge only breaks some minor things likes scsiinfo commands.

> can you print out some of the DMA buffer in storvsc_on_io_completion()?
> 
> I think just the stor_pkt->vm_srb.cdb[0] (to identify the command
> completing) and byte 5 of the buffer will tell us what we need to know.
>  It's going to be complex to get byte 5, you'll need to do a
> kmap_atomic_pfn on request->payload->range.pfn_array[0] and then look
> at byte 5.  If that's zero it means there's some problem with hyperv
> writing to the pfn if it's 0x24 (expected value for an initial inquiry)
> we've got a problem somewhere in bio completion not copying the value
> back.

Here is another boot, this time went dumpster diving as you suggested
to get the request data.



diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 2af63a80c7fa..a51d8eba6e04 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1059,6 +1059,21 @@ static void storvsc_on_io_completion(struct 
storvsc_device *stor_device,
vstor_packet->vm_srb.srb_status,
vstor_packet->vm_srb.data_transfer_length);
 
+   if (stor_pkt->vm_srb.cdb[0] == INQUIRY) {
+   struct scsi_cmnd *cmd = request->cmd;
+   struct scatterlist *sg = scsi_sglist(cmd);
+   struct page *page = sg_page(sg);
+   void *vaddr = kmap_atomic(page);
+   
+   print_hex_dump(KERN_INFO,
+  "inquiry data: ", DUMP_PREFIX_OFFSET,
+  16, 1,
+  vaddr, vstor_packet->vm_srb.data_transfer_length,
+  false);
+
+   kunmap_atomic(page);
+   }
+
/*
 * The current SCSI handling on the host side does
 * not correctly handle:

Which results in:
...

[1.225501] scsi host0: storvsc_host_t
[1.234707] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x1 
length 36
[1.235517] scsi host1: storvsc_host_t
[1.238037] inquiry data: : 00 23 34 f1 5c 98 ff ff f0 64 02 89 ff 
ff ff ff
[1.256800] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[1.259430] inquiry data: 0020: 00 00 00 00
[1.261431] scsi 0:0:0:0: Direct-Access Msft Virtual Disk 1.0  
PQ: 0 ANSI: 5
[1.264080] hv_storvsc:  IO cmd 0x12 0x1 0x0 scsi status 0x0 srb status 0x1 
length 12
[1.267420] inquiry data: : 00 00 00 08 00 83 8f b0 b1 b2 ce cf
[1.270759] hv_storvsc:  IO cmd 0x12 0x1 0x83 scsi status 0x0 srb status 0x1 
length 52
[1.275007] inquiry data: : 00 83 00 30 01 01 00 18 4d 53 46 54 20 
20 20 20
[1.277988] inquiry data: 0010: 43 77 cc 85 5f 19 c2 46 ac 48 c7 33 b9 
dd 2d 2a
[1.281096] inquiry data: 0020: 01 03 00 10 60 02 24 80 43 77 cc 85 5f 
19 c7 33
[1.284246] inquiry data: 0030: b9 dd 2d 2a
[1.306538] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x1 
length 36
[1.310621] inquiry data: : 00 92 be f1 5c 98 ff ff f0 64 02 89 ff 
ff ff ff
[1.316244] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[1.328151] random: fast init done
[1.340184] inquiry data: 0020: 00 00 00 00
[1.342710] scsi 1:0:0:0: CD-ROMMsft Virtual DVD-ROM  1.0  
PQ: 0 ANSI: 0
[1.346023] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.352913] inquiry data: : 00 aa be f1 5c 98 ff ff f0 64 02 89 ff 
ff ff ff
[1.356543] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[1.359996] inquiry data: 0020: 00 00 00 00
[1.361835] scsi host1: scsi scan: INQUIRY result too short (5), using 36
[1.361888] hv_storvsc:  IO cmd 0xa0 0x0 0x0 scsi status 0x0 srb status 0x1 
length 16
[1.362307] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x1 
length 36
[1.362308] inquiry data: : 00 23 34 f1 5c 98 ff ff f0 64 02 89 ff 
ff ff ff
[1.362309] inquiry data: 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[1.362309] inquiry data: 0020: 00 00 00 00
[1.377423] scsi 1:0:0:1: Direct-Access
PQ: 0 ANSI: 0
[1.399208] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.402836] inquiry data: : 00 a3 be f1 5c 98 ff ff 00 00 00 00 00 
00 00 00
[1.406466] inquiry data: 0010: 00 00 00 00 00 00 00 00 c0 30 66 f9 5c 
98 ff ff
[1.409766] inquiry data: 0020: 00 00 00 00
[1.412366] scsi 1:0:0:2: Direct-Access

[PATCH] libsas: add sas_unregister_devs_sas_addr in flutter case

2017-02-28 Thread Jason Yan
In sas_rediscover_dev when we call sas_get_phy_attached_dev to find the
device is ok and when in the flutter case when we call
sas_ex_phy_discover the device is gone, the sas_addr was changed to
zero.
[300247.584696] sas: ex 500e004aaa1f phy0 originated
BROADCAST(CHANGE)
[300247.663516] sas: ex 500e004aaa1f phy00:U:0 attached:
 (no device)
[300247.663518] sas: ex 500e004aaa1f phy 0x0 broadcast flutter

When the device is up again, the libsas checked that the old sas_addr
zero so just add a new device.
[300247.846326] sas: Expander phy change count has changed
[300247.846418] sas: ex 500e004aaa1f phy0 originated
BROADCAST(CHANGE)
[300247.846420] sas: ex 500e004aaa1f phy0 new device attached
[300247.846519] sas: ex 500e004aaa1f phy00:U:A attached:
500e004aaa00 (stp)
[300247.875873] sas: done REVALIDATING DOMAIN on port 0, pid:12565, res
0x0

This will cause a panic when these two device were destroyed. Fix this
by call sas_unregister_devs_sas_addr in the flutter case if the sas_addr
is zero.

Signed-off-by: Jason Yan 
---
 drivers/scsi/libsas/sas_expander.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/libsas/sas_expander.c 
b/drivers/scsi/libsas/sas_expander.c
index 570b2cb..1a784d7 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -2032,6 +2032,11 @@ static int sas_rediscover_dev(struct domain_device *dev, 
int phy_id, bool last)
 
sas_ex_phy_discover(dev, phy_id);
 
+   if (!SAS_ADDR(phy->attached_sas_addr)) {
+   sas_unregister_devs_sas_addr(dev, phy_id, last);
+   return res;
+   }
+
if (ata_dev && phy->attached_dev_type == SAS_SATA_PENDING)
action = ", needs recovery";
SAS_DPRINTK("ex %016llx phy 0x%x broadcast flutter%s\n",
-- 
2.5.0



mpt3sas sleep from atomic context on v4.10

2017-02-28 Thread Omar Sandoval
I'm seeing this while testing on Linus' current master:

[  427.814466] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:149 
__handle_irq_event_percpu+0x187/0x190
[  427.832552] irq 116 handler _base_interrupt+0x0/0x9e0 [mpt3sas] enabled 
interrupts

I tracked it down to commit 669f044170d8 ("scsi: srp_transport: Move
queuecommand() wait code to SCSI core"). That commit made it so
scsi_internal_device_block() can sleep, but mpt3sas calls this from an
interrupt handler:

_base_interrupt
-> _base_async_event
   -> mpt3sas_scsih_event_callback
  -> _scsih_check_topo_delete_events
 -> _scsih_block_io_to_children_attached_directly
-> _scsih_block_io_device
   -> _scsih_internal_device_block
  -> scsi_internal_device_block

This change was made in 4.10. Bart, can you take a look?

Thanks.


Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-28 Thread Xiubo Li



Write throughput is pretty
low at around 150 MB/s.

What's the original write throughput without this patch? Is it also
around 80 MB/s ?

It is around 20-30 MB/s. Same fio args except using --rw=write.

Got it.

Thanks.

BRs
Xiubo




Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger
On Tue, 28 Feb 2017 09:06:13 -0800
James Bottomley  wrote:

> On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> > On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
> > > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:  
> > > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
> > > > I/O")
> > > > 
> > > > but that is already in linux-next.
> > > > 
> > > > Noticed another place where memset(of the data was being done not
> > > > the extra bits.
> > > > Tried this, but didn't fix it either...  
> > > 
> > > Are you using blk-mq or the legacy request code?  
> > 
> > Stephen doesn't have MQ set in the config he posted, I'm assuming he 
> > didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
> > asked if turning on MQ makes a difference.  
> 
> OK, since we're not making much progress, Stephen, could you insert
> some debugging into the storvsc driver?  The trace clearly shows we're
> getting zeros back in the buffer when we should have data from the
> initial scan.  Firstly, does the vmbus think it's transferring any data
> for the INQUIRY and READ_CAPACITY commands (looks like
> storvsc_command_completion() data_transfer_length)?  If it does,
> there's probably an issue initialising the sg list.  If it doesn't,
> we're probably sending bogus commands.
> 
> James

I tried booting with scsi_mod.use_blk_mq=true and that did nothing.
Rebuilding now with 
CONFIG_SCSI_MQ_DEFAULT=y

Sure, can instrument. after that test.


[PATCH 1/1] fnic: Avoid false out-of-order detection for aborted command

2017-02-28 Thread Satish Kharat
If SCSI-ML has already issued abort on a command i.e
FNIC_IOREQ_ABTS_PENDING is set and we get a IO completion avoid
this being flagged as out-of-order completion by setting the
FNIC_IO_DONE flag in fnic_fcpio_icmnd_cmpl_handler

Signed-off-by: Satish Kharat 
Signed-off-by: Sesidhar Baddela 
---
 drivers/scsi/fnic/fnic_scsi.c | 40 ++--
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 2544a37..d851d89 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -876,32 +876,28 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic 
*fnic,
 
/*
 *  if SCSI-ML has already issued abort on this command,
-* ignore completion of the IO. The abts path will clean it up
+*  set completion of the IO. The abts path will clean it up
 */
if (CMD_STATE(sc) == FNIC_IOREQ_ABTS_PENDING) {
-   spin_unlock_irqrestore(io_lock, flags);
+
+   /*
+* set the FNIC_IO_DONE so that this doesn't get
+* flagged as 'out of order' if it was not aborted
+*/
+   CMD_FLAGS(sc) |= FNIC_IO_DONE;
CMD_FLAGS(sc) |= FNIC_IO_ABTS_PENDING;
-   switch (hdr_status) {
-   case FCPIO_SUCCESS:
-   CMD_FLAGS(sc) |= FNIC_IO_DONE;
-   FNIC_SCSI_DBG(KERN_INFO, fnic->lport->host,
- "icmnd_cmpl ABTS pending hdr status = %s "
- "sc  0x%p scsi_status %x  residual %d\n",
- fnic_fcpio_status_to_str(hdr_status), sc,
- icmnd_cmpl->scsi_status,
- icmnd_cmpl->residual);
-   break;
-   case FCPIO_ABORTED:
+   spin_unlock_irqrestore(io_lock, flags);
+   if(FCPIO_ABORTED == hdr_status)
CMD_FLAGS(sc) |= FNIC_IO_ABORTED;
-   break;
-   default:
-   FNIC_SCSI_DBG(KERN_INFO, fnic->lport->host,
- "icmnd_cmpl abts pending "
- "hdr status = %s tag = 0x%x sc = 
0x%p\n",
- fnic_fcpio_status_to_str(hdr_status),
- id, sc);
-   break;
-   }
+
+   FNIC_SCSI_DBG(KERN_INFO, fnic->lport->host,
+   "icmnd_cmpl abts pending "
+ "hdr status = %s tag = 0x%x sc = 0x%p"
+ "scsi_status = %x residual = %d\n",
+ fnic_fcpio_status_to_str(hdr_status),
+ id, sc,
+ icmnd_cmpl->scsi_status,
+ icmnd_cmpl->residual);
return;
}
 
-- 
2.5.5



[PATCH 1/1] fnic: Adding Check Condition counter to misc fnicstats

2017-02-28 Thread Satish Kharat
Just a simple counter of number of check conditions encountered on
that host.

Signed-off-by: Satish Kharat 
Signed-off-by: Sesidhar Baddela 
---
 drivers/scsi/fnic/fnic_scsi.c  | 3 +++
 drivers/scsi/fnic/fnic_stats.h | 1 +
 drivers/scsi/fnic/fnic_trace.c | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 2544a37..d3b32da 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -919,6 +919,9 @@ static void fnic_fcpio_icmnd_cmpl_handler(struct fnic *fnic,
if (icmnd_cmpl->flags & FCPIO_ICMND_CMPL_RESID_UNDER)
xfer_len -= icmnd_cmpl->residual;
 
+   if (icmnd_cmpl->scsi_status == SAM_STAT_CHECK_CONDITION)
+   atomic64_inc(_stats->misc_stats.check_condition);
+
if (icmnd_cmpl->scsi_status == SAM_STAT_TASK_SET_FULL)
atomic64_inc(_stats->misc_stats.queue_fulls);
break;
diff --git a/drivers/scsi/fnic/fnic_stats.h b/drivers/scsi/fnic/fnic_stats.h
index 540cceb8..69acdac 100644
--- a/drivers/scsi/fnic/fnic_stats.h
+++ b/drivers/scsi/fnic/fnic_stats.h
@@ -88,6 +88,7 @@ struct misc_stats {
atomic64_t devrst_cpwq_alloc_failures;
atomic64_t io_cpwq_alloc_failures;
atomic64_t no_icmnd_itmf_cmpls;
+   atomic64_t check_condition;
atomic64_t queue_fulls;
atomic64_t rport_not_ready;
atomic64_t frame_errors;
diff --git a/drivers/scsi/fnic/fnic_trace.c b/drivers/scsi/fnic/fnic_trace.c
index 5a5fa01..ec20b3e 100644
--- a/drivers/scsi/fnic/fnic_trace.c
+++ b/drivers/scsi/fnic/fnic_trace.c
@@ -357,6 +357,7 @@ int fnic_get_stats_data(struct stats_debug_info *debug,
  "Number of Copy WQ Alloc Failures for Device Reset: %lld\n"
  "Number of Copy WQ Alloc Failures for IOs: %lld\n"
  "Number of no icmnd itmf Completions: %lld\n"
+ "Number of Check Conditions encountered: %lld\n"
  "Number of QUEUE Fulls: %lld\n"
  "Number of rport not ready: %lld\n"
  "Number of receive frame errors: %lld\n",
@@ -377,6 +378,7 @@ int fnic_get_stats_data(struct stats_debug_info *debug,
  >misc_stats.devrst_cpwq_alloc_failures),
  (u64)atomic64_read(>misc_stats.io_cpwq_alloc_failures),
  (u64)atomic64_read(>misc_stats.no_icmnd_itmf_cmpls),
+ (u64)atomic64_read(>misc_stats.check_condition),
  (u64)atomic64_read(>misc_stats.queue_fulls),
  (u64)atomic64_read(>misc_stats.rport_not_ready),
  (u64)atomic64_read(>misc_stats.frame_errors));
-- 
2.5.5



[PATCH 3/3] uapi: fix scsi/scsi_bsg_fc.h userspace compilation errors

2017-02-28 Thread Dmitry V. Levin
Include  and consistently use types it provides
to fix the following scsi/scsi_bsg_fc.h userspace compilation errors:

/usr/include/scsi/scsi_bsg_fc.h:83:2: error: unknown type name 'uint8_t'
  uint8_t  reserved;
/usr/include/scsi/scsi_bsg_fc.h:86:2: error: unknown type name 'uint8_t'
  uint8_t  port_id[3];
/usr/include/scsi/scsi_bsg_fc.h:104:2: error: unknown type name 'uint8_t'
  uint8_t  reserved;
/usr/include/scsi/scsi_bsg_fc.h:107:2: error: unknown type name 'uint8_t'
  uint8_t  port_id[3];
/usr/include/scsi/scsi_bsg_fc.h:128:2: error: unknown type name 'uint8_t'
  uint8_t  command_code;
/usr/include/scsi/scsi_bsg_fc.h:131:2: error: unknown type name 'uint8_t'
  uint8_t  port_id[3];
/usr/include/scsi/scsi_bsg_fc.h:168:2: error: unknown type name 'uint32_t'
  uint32_t status;  /* See FC_CTELS_STATUS_xxx */
/usr/include/scsi/scsi_bsg_fc.h:172:3: error: unknown type name 'uint8_t'
   uint8_t action;  /* fragment_id for CT REJECT */
/usr/include/scsi/scsi_bsg_fc.h:173:3: error: unknown type name 'uint8_t'
   uint8_t reason_code;
/usr/include/scsi/scsi_bsg_fc.h:174:3: error: unknown type name 'uint8_t'
   uint8_t reason_explanation;
/usr/include/scsi/scsi_bsg_fc.h:175:3: error: unknown type name 'uint8_t'
   uint8_t vendor_unique;
/usr/include/scsi/scsi_bsg_fc.h:191:2: error: unknown type name 'uint8_t'
  uint8_t  reserved;
/usr/include/scsi/scsi_bsg_fc.h:194:2: error: unknown type name 'uint8_t'
  uint8_t  port_id[3];
/usr/include/scsi/scsi_bsg_fc.h:199:2: error: unknown type name 'uint32_t'
  uint32_t preamble_word0; /* revision & IN_ID */
/usr/include/scsi/scsi_bsg_fc.h:200:2: error: unknown type name 'uint32_t'
  uint32_t preamble_word1; /* GS_Type, GS_SubType, Options, Rsvd */
/usr/include/scsi/scsi_bsg_fc.h:201:2: error: unknown type name 'uint32_t'
  uint32_t preamble_word2; /* Cmd Code, Max Size */
/usr/include/scsi/scsi_bsg_fc.h:221:2: error: unknown type name 'uint64_t'
  uint64_t vendor_id;
/usr/include/scsi/scsi_bsg_fc.h:224:2: error: unknown type name 'uint32_t'
  uint32_t vendor_cmd[0];
/usr/include/scsi/scsi_bsg_fc.h:231:2: error: unknown type name 'uint32_t'
  uint32_t vendor_rsp[0];
/usr/include/scsi/scsi_bsg_fc.h:250:2: error: unknown type name 'uint8_t'
  uint8_t els_code;
/usr/include/scsi/scsi_bsg_fc.h:268:2: error: unknown type name 'uint32_t'
  uint32_t preamble_word0; /* revision & IN_ID */
/usr/include/scsi/scsi_bsg_fc.h:269:2: error: unknown type name 'uint32_t'
  uint32_t preamble_word1; /* GS_Type, GS_SubType, Options, Rsvd */
/usr/include/scsi/scsi_bsg_fc.h:270:2: error: unknown type name 'uint32_t'
  uint32_t preamble_word2; /* Cmd Code, Max Size */
/usr/include/scsi/scsi_bsg_fc.h:282:2: error: unknown type name 'uint32_t'
  uint32_t msgcode;
/usr/include/scsi/scsi_bsg_fc.h:306:2: error: unknown type name 'uint32_t'
  uint32_t result;
/usr/include/scsi/scsi_bsg_fc.h:309:2: error: unknown type name 'uint32_t'
  uint32_t reply_payload_rcv_len;

Signed-off-by: Dmitry V. Levin 
---
 include/uapi/scsi/scsi_bsg_fc.h | 54 +
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/include/uapi/scsi/scsi_bsg_fc.h b/include/uapi/scsi/scsi_bsg_fc.h
index 3031b90..2b0b851 100644
--- a/include/uapi/scsi/scsi_bsg_fc.h
+++ b/include/uapi/scsi/scsi_bsg_fc.h
@@ -22,6 +22,8 @@
 #ifndef SCSI_BSG_FC_H
 #define SCSI_BSG_FC_H
 
+#include 
+
 /*
  * This file intended to be included by both kernel and user space
  */
@@ -80,10 +82,10 @@
  * with the transport upon completion of the login.
  */
 struct fc_bsg_host_add_rport {
-   uint8_t reserved;
+   __u8reserved;
 
/* FC Address Identier of the remote port to login to */
-   uint8_t port_id[3];
+   __u8port_id[3];
 };
 
 /* Response:
@@ -101,10 +103,10 @@ struct fc_bsg_host_add_rport {
  * remain logged in with the remote port.
  */
 struct fc_bsg_host_del_rport {
-   uint8_t reserved;
+   __u8reserved;
 
/* FC Address Identier of the remote port to logout of */
-   uint8_t port_id[3];
+   __u8port_id[3];
 };
 
 /* Response:
@@ -125,10 +127,10 @@ struct fc_bsg_host_els {
 * ELS Command Code being sent (must be the same as byte 0
 * of the payload)
 */
-   uint8_t command_code;
+   __u8command_code;
 
/* FC Address Identier of the remote port to send the ELS to */
-   uint8_t port_id[3];
+   __u8port_id[3];
 };
 
 /* Response:
@@ -165,14 +167,14 @@ struct fc_bsg_ctels_reply {
 * Note: x_RJT/BSY status will indicae that the rjt_data field
 *   is valid and contains the reason/explanation values.
 */
-   uint32_tstatus; /* See FC_CTELS_STATUS_xxx */
+   __u32   status; /* See FC_CTELS_STATUS_xxx */
 
/* valid if status is not FC_CTELS_STATUS_OK */
struct  {
-   uint8_t action; /* fragment_id for CT REJECT */
- 

[PATCH 2/3] uapi: fix scsi/scsi_netlink_fc.h userspace compilation errors

2017-02-28 Thread Dmitry V. Levin
Consistently use types from linux/types.h to fix the following
scsi/scsi_netlink_fc.h userspace compilation errors:

/usr/include/scsi/scsi_netlink_fc.h:60:2: error: unknown type name 'uint64_t'
  uint64_t seconds;
/usr/include/scsi/scsi_netlink_fc.h:61:2: error: unknown type name 'uint64_t'
  uint64_t vendor_id;
/usr/include/scsi/scsi_netlink_fc.h:62:2: error: unknown type name 'uint16_t'
  uint16_t host_no;
/usr/include/scsi/scsi_netlink_fc.h:63:2: error: unknown type name 'uint16_t'
  uint16_t event_datalen;
/usr/include/scsi/scsi_netlink_fc.h:64:2: error: unknown type name 'uint32_t'
  uint32_t event_num;
/usr/include/scsi/scsi_netlink_fc.h:65:2: error: unknown type name 'uint32_t'
  uint32_t event_code;
/usr/include/scsi/scsi_netlink_fc.h:66:2: error: unknown type name 'uint32_t'
  uint32_t event_data;
/usr/include/scsi/scsi_netlink_fc.h:67:33: error: 'uint64_t' undeclared here 
(not in a function)
 } __attribute__((aligned(sizeof(uint64_t;

Signed-off-by: Dmitry V. Levin 
---
 include/uapi/scsi/scsi_netlink_fc.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/uapi/scsi/scsi_netlink_fc.h 
b/include/uapi/scsi/scsi_netlink_fc.h
index cbf76e4..2493a0f 100644
--- a/include/uapi/scsi/scsi_netlink_fc.h
+++ b/include/uapi/scsi/scsi_netlink_fc.h
@@ -57,14 +57,14 @@
  */
 struct fc_nl_event {
struct scsi_nl_hdr snlh;/* must be 1st element ! */
-   uint64_t seconds;
-   uint64_t vendor_id;
-   uint16_t host_no;
-   uint16_t event_datalen;
-   uint32_t event_num;
-   uint32_t event_code;
-   uint32_t event_data;
-} __attribute__((aligned(sizeof(uint64_t;
+   __u64 seconds;
+   __u64 vendor_id;
+   __u16 host_no;
+   __u16 event_datalen;
+   __u32 event_num;
+   __u32 event_code;
+   __u32 event_data;
+} __attribute__((aligned(sizeof(__u64;
 
 
 #endif /* SCSI_NETLINK_FC_H */
-- 
ldv


[PATCH 1/3] uapi: fix scsi/scsi_netlink.h userspace compilation errors

2017-02-28 Thread Dmitry V. Levin
Consistently use types from linux/types.h to fix the following
scsi/scsi_netlink.h userspace compilation errors:

/usr/include/scsi/scsi_netlink.h:43:2: error: unknown type name 'uint8_t'
  uint8_t version;
/usr/include/scsi/scsi_netlink.h:44:2: error: unknown type name 'uint8_t'
  uint8_t transport;
/usr/include/scsi/scsi_netlink.h:45:2: error: unknown type name 'uint16_t'
  uint16_t magic;
/usr/include/scsi/scsi_netlink.h:46:2: error: unknown type name 'uint16_t'
  uint16_t msgtype;
/usr/include/scsi/scsi_netlink.h:47:2: error: unknown type name 'uint16_t'
  uint16_t msglen;
/usr/include/scsi/scsi_netlink.h:48:33: error: 'uint64_t' undeclared here (not 
in a function)
 } __attribute__((aligned(sizeof(uint64_t;
/usr/include/scsi/scsi_netlink.h:92:2: error: expected specifier-qualifier-list 
before 'uint64_t'
  uint64_t vendor_id;

Signed-off-by: Dmitry V. Levin 
---
 include/uapi/scsi/scsi_netlink.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/uapi/scsi/scsi_netlink.h b/include/uapi/scsi/scsi_netlink.h
index 62b4eda..20360e1 100644
--- a/include/uapi/scsi/scsi_netlink.h
+++ b/include/uapi/scsi/scsi_netlink.h
@@ -40,12 +40,12 @@
 
 /* SCSI_TRANSPORT_MSG event message header */
 struct scsi_nl_hdr {
-   uint8_t version;
-   uint8_t transport;
-   uint16_t magic;
-   uint16_t msgtype;
-   uint16_t msglen;
-} __attribute__((aligned(sizeof(uint64_t;
+   __u8 version;
+   __u8 transport;
+   __u16 magic;
+   __u16 msgtype;
+   __u16 msglen;
+} __attribute__((aligned(sizeof(__u64;
 
 /* scsi_nl_hdr->version value */
 #define SCSI_NL_VERSION1
@@ -89,10 +89,10 @@ struct scsi_nl_hdr {
  */
 struct scsi_nl_host_vendor_msg {
struct scsi_nl_hdr snlh;/* must be 1st element ! */
-   uint64_t vendor_id;
-   uint16_t host_no;
-   uint16_t vmsg_datalen;
-} __attribute__((aligned(sizeof(uint64_t;
+   __u64 vendor_id;
+   __u16 host_no;
+   __u16 vmsg_datalen;
+} __attribute__((aligned(sizeof(__u64;
 
 
 /*
-- 
ldv


Re: [PATCH v2] libiscsi: add lock around task lists to fix list corruption regression

2017-02-28 Thread Guilherme G. Piccoli
Thanks very much Chris, really good patch. Hopefully it can reach 4.11!

Cheers,


Guilherme



RE: [PATCH v3 00/14] qla2xxx: Bug Fixes and updates for target.

2017-02-28 Thread Madhani, Himanshu


> -Original Message-
> From: Bart Van Assche [mailto:bart.vanass...@sandisk.com]
> Sent: Tuesday, February 28, 2017 2:10 PM
> To: Madhani, Himanshu ; target-
> de...@vger.kernel.org; n...@linux-iscsi.org
> Cc: linux-scsi@vger.kernel.org; Malavali, Giridhar
> 
> Subject: Re: [PATCH v3 00/14] qla2xxx: Bug Fixes and updates for target.
> 
> On Tue, 2017-02-28 at 22:00 +, Madhani, Himanshu wrote:
> > Can we get some review comments for this series.
> 
> Hello Himanshu,
> 
> The merge window opened one week ago and is still open. Anyone who
> contributes to the Linux kernel should know that patches should not be
> posted during the merge window because it's a busy time for most kernel
> contributors. Additionally, no feedback should be expected during the merge
> window for patches that are not bug fixes for changes that went in during
> the merge window. Anyway, I'll have a look at these patches as soon as I
> have the time.
> 
> Bart.

Sorry for the ignorance. I did not realize merge window was open. 

Thanks
Himanshu


RE: [PATCH v3 00/14] qla2xxx: Bug Fixes and updates for target.

2017-02-28 Thread Madhani, Himanshu
Hi Nic/Bart, 

Can we get some review comments for this series. 

Thanks, 
Himanshu

> -Original Message-
> From: Himanshu Madhani [mailto:himanshu.madh...@cavium.com]
> Sent: Friday, February 24, 2017 1:37 PM
> To: target-de...@vger.kernel.org; bart.vanass...@sandisk.com;
> n...@linux-iscsi.org
> Cc: Malavali, Giridhar ; linux-
> s...@vger.kernel.org; Madhani, Himanshu
> 
> Subject: [PATCH v3 00/14] qla2xxx: Bug Fixes and updates for target.
> 
> Hi Nic,
> 
> Please consider this series for inclusion in target-pending.
> 
> This series contains following changes.
> 
> o Fix for the deadlock because of inconsistent lock usage reported by you.
> o Added patch to submit non-critical MBX command via IOCB path.
> o Improved T10-DIF/PI handling with target stack.
> o Changed scsi host lookup method.
> o Some minor bug fixes.
> 
> Changes from v2 --> v3
> 
> o Updated patch to use waitq instead of while loop for waiting.
> o Removed duplication of Target stack definitations for T10-DIF/PI.
> o In addition, addressed review comments from Bart & Christoph.
> 
> Changes from v1 -> v2
> 
> o Rebased series based on scsi-target-for-v4.11 branch.
> 
> Please apply to target-pending.
> 
> Thanks,
> Himanshu
> 
> Anil Gurumurthy (1):
>   qla2xxx: Export DIF stats via debugfs
> 
> Himanshu Madhani (2):
>   qla2xxx: Add DebugFS node to display Port Database
>   qla2xxx: Update driver version to 9.00.00.00-k
> 
> Joe Carnuccio (1):
>   qla2xxx: Allow vref count to timeout on vport delete.
> 
> Quinn Tran (10):
>   qla2xxx: Fix delayed response to command for loop mode/direct connect.
>   qla2xxx: Allow relogin to proceed if remote login did not finish
>   qla2xxx: Use IOCB interface to submit non-critical MBX.
>   qla2xxx: Improve T10-DIF/PI handling in driver.
>   qla2xxx: Change scsi host lookup method.
>   qla2xxx: Fix memory leak for abts processing
>   qla2xxx: Fix request queue corruption.
>   qla2xxx: Fix inadequate lock protection for ABTS.
>   qla2xxx: Add async new target notification
>   qla2xxx: Fix sess_lock & hardware_lock lock order problem.
> 
>  drivers/scsi/qla2xxx/Kconfig   |   1 +
>  drivers/scsi/qla2xxx/qla_attr.c|   4 +-
>  drivers/scsi/qla2xxx/qla_dbg.h |   1 +
>  drivers/scsi/qla2xxx/qla_def.h |  46 ++-
>  drivers/scsi/qla2xxx/qla_dfs.c | 115 +-
>  drivers/scsi/qla2xxx/qla_gbl.h |  12 +-
>  drivers/scsi/qla2xxx/qla_init.c|  85 ++---
>  drivers/scsi/qla2xxx/qla_isr.c |  41 ++-
>  drivers/scsi/qla2xxx/qla_mbx.c | 304 ++--
>  drivers/scsi/qla2xxx/qla_mid.c |  14 +-
>  drivers/scsi/qla2xxx/qla_os.c  |  24 +-
>  drivers/scsi/qla2xxx/qla_target.c  | 729 +++
> --
>  drivers/scsi/qla2xxx/qla_target.h  |  36 +-
>  drivers/scsi/qla2xxx/qla_version.h |   6 +-
>  drivers/scsi/qla2xxx/tcm_qla2xxx.c |  49 ++-
>  15 files changed, 1049 insertions(+), 418 deletions(-)
> 
> --
> 1.8.3.1



Re: [PATCH v3 00/14] qla2xxx: Bug Fixes and updates for target.

2017-02-28 Thread Bart Van Assche
On Tue, 2017-02-28 at 22:00 +, Madhani, Himanshu wrote:
> Can we get some review comments for this series. 

Hello Himanshu,

The merge window opened one week ago and is still open. Anyone who contributes
to the Linux kernel should know that patches should not be posted during the
merge window because it's a busy time for most kernel contributors. 
Additionally,
no feedback should be expected during the merge window for patches that are not
bug fixes for changes that went in during the merge window. Anyway, I'll have a
look at these patches as soon as I have the time.

Bart.

Re: [Lsf-pc] [LSF/MM TOPIC] do we really need PG_error at all?

2017-02-28 Thread NeilBrown
On Mon, Feb 27 2017, Jeff Layton wrote:

> On Tue, 2017-02-28 at 10:32 +1100, NeilBrown wrote:
>> On Mon, Feb 27 2017, Andreas Dilger wrote:
>> 
>> > 
>> > My thought is that PG_error is definitely useful for applications to get
>> > correct errors back when doing write()/sync_file_range() so that they know
>> > there is an error in the data that _they_ wrote, rather than receiving an
>> > error for data that may have been written by another thread, and in turn
>> > clearing the error from another thread so it *doesn't* know it had a write
>> > error.
>> 
>> It might be useful in that way, but it is not currently used that way.
>> Such usage would be a change in visible behaviour.
>> 
>> sync_file_range() calls filemap_fdatawait_range(), which calls
>> filemap_check_errors().
>> If there have been any errors in the file recently, inside or outside
>> the range, the latter will return an error which will propagate up.
>> 
>> > 
>> > As for stray sync() clearing PG_error from underneath an application, that
>> > shouldn't happen since filemap_fdatawait_keep_errors() doesn't clear errors
>> > and is used by device flushing code (fdatawait_one_bdev(), 
>> > wait_sb_inodes()).
>> 
>> filemap_fdatawait_keep_errors() calls __filemap_fdatawait_range() which
>> clears PG_error on every page.
>> What it doesn't do is call filemap_check_errors(), and so doesn't clear
>> AS_ENOSPC or AS_EIO.
>> 
>> 
>
> I think it's helpful to get a clear idea of what happens now in the face
> of errors and what we expect to happen, and I don't quite have that yet:
>
> --8<-
> void page_endio(struct page *page, bool is_write, int err)
> {
> if (!is_write) {
> if (!err) {
> SetPageUptodate(page);
> } else {
> ClearPageUptodate(page);
> SetPageError(page);
> }
> unlock_page(page);
> } else {
> if (err) {
> SetPageError(page);
> if (page->mapping)
> mapping_set_error(page->mapping, err);
> }
> end_page_writeback(page);
> }
> }
> --8<-
>
> ...not everything uses page_endio, but it's a good place to look since
> we have both flavors of error handling in one place.
>
> On a write error, we SetPageError and set the error in the mapping.
>
> What I'm not clear on is:
>
> 1) what happens to the page at that point when we get a writeback error?
> Does it just remain in-core and is allowed to service reads (assuming
> that it was uptodate before)?

Yes, it remains in core and can service reads.  It is no different from
a page on which a write recent succeeded, except that the write didn't
succeed so the contents of backing store might be different from the
contents of the page.

>
> Can I redirty it and have it retry the write? Is there standard behavior
> for this or is it just up to the whim of the filesystem?

Everything is at the whim of the filesystem, but I doubt if many differ
from the above.

NeilBrown

>
> I'll probably have questions about the read side as well, but for now it
> looks like it's mostly used in an ad-hoc way to communicate errors
> across subsystems (block to fs layer, for instance).
> --
> Jeff Layton 


signature.asc
Description: PGP signature


Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger
On Tue, 28 Feb 2017 09:06:13 -0800
James Bottomley  wrote:

> On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> > On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
> > > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:  
> > > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
> > > > I/O")
> > > > 
> > > > but that is already in linux-next.
> > > > 
> > > > Noticed another place where memset(of the data was being done not
> > > > the extra bits.
> > > > Tried this, but didn't fix it either...  
> > > 
> > > Are you using blk-mq or the legacy request code?  
> > 
> > Stephen doesn't have MQ set in the config he posted, I'm assuming he 
> > didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
> > asked if turning on MQ makes a difference.  
> 
> OK, since we're not making much progress, Stephen, could you insert
> some debugging into the storvsc driver?  The trace clearly shows we're
> getting zeros back in the buffer when we should have data from the
> initial scan.  Firstly, does the vmbus think it's transferring any data
> for the INQUIRY and READ_CAPACITY commands (looks like
> storvsc_command_completion() data_transfer_length)?  If it does,
> there's probably an issue initialising the sg list.  If it doesn't,
> we're probably sending bogus commands.
> 
> James
> 

The following code in storvsc looks suspicious

static void storvsc_on_io_completion(struct storvsc_device *stor_device,
  struct vstor_packet *vstor_packet,
  struct storvsc_cmd_request *request)
{
struct vstor_packet *stor_pkt;
struct hv_device *device = stor_device->device;

stor_pkt = >vstor_packet;

/*
 * The current SCSI handling on the host side does
 * not correctly handle:
 * INQUIRY command with page code parameter set to 0x80
 * MODE_SENSE command with cmd[2] == 0x1c
 *
 * Setup srb and scsi status so this won't be fatal.
 * We do this so we can distinguish truly fatal failues
 * (srb status == 0x4) and off-line the device in that case.
 */

if ((stor_pkt->vm_srb.cdb[0] == INQUIRY) ||
   (stor_pkt->vm_srb.cdb[0] == MODE_SENSE)) {
vstor_packet->vm_srb.scsi_status = 0;
vstor_packet->vm_srb.srb_status = SRB_STATUS_SUCCESS;
}

If SCSI layer is sending inquiry about devices to do scanning then wouldn't this
workaround break things?  Maybe a better to fully test for the broken command.

Original commit was:

commit 4ed51a21c0f69e1379cf858fc21a9d9022bfe0e7
Author: K. Y. Srinivasan 
Date:   Sat Aug 27 11:31:26 2011 -0700

Staging: hv: storvsc: Fixup srb and scsi status for INQUIRY and MODE_SENSE

The current VHD handler on the Windows Host does not correctly handle
INQUIRY and MODE_SENSE commands with some options. Fixup srb_status
in these cases since the failure is not fatal.

Signed-off-by: K. Y. Srinivasan 
Signed-off-by: Haiyang Zhang 
Signed-off-by: Greg Kroah-Hartman 




Re: [PATCH 1/2] sd: Rework ZBC integration

2017-02-28 Thread kbuild test robot
Hi Damien,

[auto build test WARNING on scsi/for-next]
[also build test WARNING on v4.10 next-20170228]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Damien-Le-Moal/Improve-ZBC-integration/20170301-020730
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git for-next
config: parisc-c3000_defconfig (attached as .config)
compiler: hppa-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=parisc 

All warnings (new ones prefixed by >>):

   In file included from drivers/usb/storage/transport.c:65:0:
   drivers/usb/storage/../../scsi/sd.h: In function 'sd_zbc_complete':
>> drivers/usb/storage/../../scsi/sd.h:309:16: warning: no return statement in 
>> function returning non-void [-Wreturn-type]
struct scsi_sense_hdr *sshdr) {}
   ^~

vim +309 drivers/usb/storage/../../scsi/sd.h

89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  293  }
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  294  
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  295  static inline 
void sd_zbc_cancel_write_cmnd(struct scsi_cmnd *cmd) {}
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  296  
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  297  static inline 
int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd)
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  298  {
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  299   return 
BLKPREP_INVALID;
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  300  }
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  301  
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  302  static inline 
int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd)
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  303  {
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  304   return 
BLKPREP_INVALID;
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  305  }
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  306  
5e9f074f drivers/scsi/sd.h Damien Le Moal 2017-02-28  307  static inline 
unsigned int sd_zbc_complete(struct scsi_cmnd *cmd,
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  308   
   unsigned int good_bytes,
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18 @309   
   struct scsi_sense_hdr *sshdr) {}
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  310  
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  311  #endif /* 
CONFIG_BLK_DEV_ZONED */
89d94756 drivers/scsi/sd.h Hannes Reinecke2016-10-18  312  
e73aec82 include/scsi/sd.h Martin K. Petersen 2007-02-27  313  #endif /* 
_SCSI_DISK_H */

:: The code at line 309 was first introduced by commit
:: 89d9475610771b5e5fe1879075f0fc9ba6e3755f sd: Implement support for ZBC 
devices

:: TO: Hannes Reinecke <h...@suse.de>
:: CC: Jens Axboe <ax...@fb.com>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger
On Tue, 28 Feb 2017 09:06:13 -0800
James Bottomley  wrote:

> On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> > On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
> > > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:  
> > > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
> > > > I/O")
> > > > 
> > > > but that is already in linux-next.
> > > > 
> > > > Noticed another place where memset(of the data was being done not
> > > > the extra bits.
> > > > Tried this, but didn't fix it either...  
> > > 
> > > Are you using blk-mq or the legacy request code?  
> > 
> > Stephen doesn't have MQ set in the config he posted, I'm assuming he 
> > didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
> > asked if turning on MQ makes a difference.  
> 
> OK, since we're not making much progress, Stephen, could you insert
> some debugging into the storvsc driver?  The trace clearly shows we're
> getting zeros back in the buffer when we should have data from the
> initial scan.  Firstly, does the vmbus think it's transferring any data
> for the INQUIRY and READ_CAPACITY commands (looks like
> storvsc_command_completion() data_transfer_length)?  If it does,
> there's probably an issue initialising the sg list.  If it doesn't,
> we're probably sending bogus commands.
> 
> James
> 

This is log of failed boot of linux-next (with blk_mq enabled).
See attached if you want to see exactly what got added.

Sorry don't speak SCSI yet.


[1.496999] hv_utils: Registering HyperV Utility Driver
[1.505415] hv_vmbus: registering driver hv_util
[1.508074] input: AT Translated Set 2 keyboard as 
/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/VMBUS:00/d34b2567-b9b6-42b9-8778-0a4ec0b955bf/serio0/input/input0
[1.524314] hv_vmbus: registering driver hid_hyperv
[1.543577] hv_storvsc f71e9b3b-aed9-4dd2-a1ff-c00218a39853: complete init 
request
[1.547120] hv_storvsc f71e9b3b-aed9-4dd2-a1ff-c00218a39853: complete init 
request
[1.549952] hv_storvsc f71e9b3b-aed9-4dd2-a1ff-c00218a39853: complete init 
request
[1.552876] hv_storvsc f71e9b3b-aed9-4dd2-a1ff-c00218a39853: complete init 
request
[1.555931] hv_storvsc f71e9b3b-aed9-4dd2-a1ff-c00218a39853: complete init 
request
[1.559151] scsi host0: storvsc_host_t
[1.590197] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x1 
length 36
[1.590765] hv_storvsc e7d171c1-a71d-400e-854e-7e0cc000ce3d: complete init 
request
[1.590796] hv_storvsc e7d171c1-a71d-400e-854e-7e0cc000ce3d: complete init 
request
[1.590820] hv_storvsc e7d171c1-a71d-400e-854e-7e0cc000ce3d: complete init 
request
[1.590839] hv_storvsc e7d171c1-a71d-400e-854e-7e0cc000ce3d: complete init 
request
[1.590942] hv_storvsc e7d171c1-a71d-400e-854e-7e0cc000ce3d: complete init 
request
[1.591506] scsi host1: storvsc_host_t
[1.600925] hv_utils: Heartbeat IC version 3.0
[1.619450] input: Microsoft Vmbus HID-compliant Mouse as 
/devices/0006:045E:0621.0001/input/input1
[1.620658] hid 0006:045E:0621.0001: input:  HID v0.01 Mouse 
[Microsoft Vmbus HID-compliant Mouse] on 
[1.627185] hv_utils: Shutdown IC version 3.0
[1.627280] hv_utils: cannot register PTP clock: 0
[1.628081] random: fast init done
[1.628890] hv_utils: TimeSync IC version 4.0
[1.629407] hv_utils: VSS IC version 5.0
[1.651620] scsi 0:0:0:0: Direct-Access Msft Virtual Disk 1.0  
PQ: 0 ANSI: 5
[1.651956] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x1 
length 36
[1.651977] scsi 1:0:0:0: CD-ROMMsft Virtual DVD-ROM  1.0  
PQ: 0 ANSI: 0
[1.652279] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.652297] scsi host1: scsi scan: INQUIRY result too short (5), using 36
[1.652299] scsi 1:0:0:1: Direct-Access
PQ: 0 ANSI: 0
[1.652595] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.652611] scsi 1:0:0:2: Direct-Access
PQ: 0 ANSI: 0
[1.652892] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.652910] scsi 1:0:0:3: Direct-Access
PQ: 0 ANSI: 0
[1.653178] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.653195] scsi 1:0:0:4: Direct-Access
PQ: 0 ANSI: 0
[1.653445] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.653464] scsi 1:0:0:5: Direct-Access
PQ: 0 ANSI: 0
[1.653729] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.653746] scsi 1:0:0:6: Direct-Access
PQ: 0 ANSI: 0
[1.669879] hv_storvsc:  IO cmd 0x12 0x0 0x0 scsi status 0x0 srb status 0x20 
length 36
[1.670143] scsi 1:0:0:7: Direct-Access   

[PATCH] scsi: storvsc: Add support for FC rport.

2017-02-28 Thread Cathy Avery
Included in the current storvsc driver for Hyper-V is the ability
to access luns on an FC fabric via a virtualized fiber channel
adapter exposed by the Hyper-V host. The driver also attaches to the FC
transport to allow host and port names to be published under
/sys/class/fc_host/hostX. Current customer tools running on the VM
require that these names be available in the well known standard
location under fc_host/hostX.

A problem arose when attaching to the FC transport. The scsi_scan
code attempts to call fc_user_scan which has basically become a no-op
due to the fact that the virtualized FC device does not expose rports.
At this point you cannot refresh the scsi bus after mapping or unmapping
luns on the SAN without a reboot of the VM.

This patch stubs in an rport per fc_host in storvsc so that the
requirement of a defined rport is now met within the fc_transport and
echo "- - -" > /sys/class/scsi_host/hostX/scan now works.

Signed-off-by: Cathy Avery 
---
 drivers/scsi/storvsc_drv.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 585e54f..6d7b932 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -478,6 +478,9 @@ struct storvsc_device {
 */
u64 node_name;
u64 port_name;
+#if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
+   struct fc_rport *rport;
+#endif
 };
 
 struct hv_host_device {
@@ -1823,8 +1826,16 @@ static int storvsc_probe(struct hv_device *device,
}
 #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
if (host->transportt == fc_transport_template) {
+   struct fc_rport_identifiers ids;
+
+   ids.node_name = 0;
+   ids.port_name = 0;
+   ids.port_id = 0;
+   ids.roles |= FC_PORT_ROLE_FCP_TARGET;
+
fc_host_node_name(host) = stor_device->node_name;
fc_host_port_name(host) = stor_device->port_name;
+   stor_device->rport = fc_remote_port_add(host, 0, );
}
 #endif
return 0;
@@ -1854,8 +1865,10 @@ static int storvsc_remove(struct hv_device *dev)
struct Scsi_Host *host = stor_device->host;
 
 #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
-   if (host->transportt == fc_transport_template)
+   if (host->transportt == fc_transport_template) {
+   fc_remote_port_delete(stor_device->rport);
fc_remove_host(host);
+   }
 #endif
scsi_remove_host(host);
storvsc_dev_remove(dev);
-- 
2.5.0



Re: [ANNOUNCE]: Broadcom (Emulex) FC Target driver - efct

2017-02-28 Thread James Smart


On 2/28/2017 8:34 AM, Hannes Reinecke wrote:

Can you clarify these?
Are these 'just' resource allocation problems or something else, too?


Most are resource allocation - buffer pools, dma pools, pages for 
resources, and hw resource allocation splits. However, async receive RQ 
policies are a case where initiator and target have to share the policy 
and perhaps a buffer pool, so they have to be careful. Another area is 
in ABTS handling. Initiators typically leave things up to the hardware, 
and do little if any ABTS handling. Most targets though, as CMD IU may 
be received without an assign exchange context and be buffered until the 
target is ready to do something, require that they handle ABTS's.  Some 
of the target features, when enabled, dictate host ownership of ABTS 
policy. So, if running I+T, it gets rather tricky.


-- james




Re: [PATCH] scsi: lpfc: use proper format string for dma_addr_t

2017-02-28 Thread James Smart

Arnd,

Thank you. Looks good.

-- james

Signed-off-by: James Smart 


On 2/27/2017 12:37 PM, Arnd Bergmann wrote:

dma_addr_t may be either u32 or u64, depending on the kernel configuration,
and we get a warning for the 32-bit case:

drivers/scsi/lpfc/lpfc_nvme.c: In function 'lpfc_nvme_ls_req':
drivers/scsi/lpfc/lpfc_logmsg.h:52:52: error: format '%llu' expects argument of 
type 'long long unsigned int', but argument 11 has type 'dma_addr_t {aka 
unsigned int}' [-Werror=format=]
drivers/scsi/lpfc/lpfc_logmsg.h:52:52: error: format '%llu' expects argument of 
type 'long long unsigned int', but argument 12 has type 'dma_addr_t {aka 
unsigned int}' [-Werror=format=]
drivers/scsi/lpfc/lpfc_nvme.c: In function 'lpfc_nvme_ls_abort':
drivers/scsi/lpfc/lpfc_logmsg.h:52:52: error: format '%llu' expects argument of 
type 'long long unsigned int', but argument 11 has type 'dma_addr_t {aka 
unsigned int}' [-Werror=format=]
drivers/scsi/lpfc/lpfc_logmsg.h:52:52: error: format '%llu' expects argument of 
type 'long long unsigned int', but argument 12 has type 'dma_addr_t {aka 
unsigned int}' [-Werror=format=]

printk has a special "%pad" format string that passes the dma address by
reference to solve this problem.

Fixes: 01649561a8b4 ("scsi: lpfc: NVME Initiator: bind to nvme_fc api")
Signed-off-by: Arnd Bergmann 
---
  drivers/scsi/lpfc/lpfc_nvme.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_nvme.c b/drivers/scsi/lpfc/lpfc_nvme.c
index 625b6589a34d..609a908ea9db 100644
--- a/drivers/scsi/lpfc/lpfc_nvme.c
+++ b/drivers/scsi/lpfc/lpfc_nvme.c
@@ -457,11 +457,11 @@ lpfc_nvme_ls_req(struct nvme_fc_local_port *pnvme_lport,
/* Expand print to include key fields. */
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_DISC,
 "6051 ENTER.  lport %p, rport %p lsreq%p rqstlen:%d "
-"rsplen:%d %llux %llux\n",
+"rsplen:%d %pad %pad\n",
 pnvme_lport, pnvme_rport,
 pnvme_lsreq, pnvme_lsreq->rqstlen,
-pnvme_lsreq->rsplen, pnvme_lsreq->rqstdma,
-pnvme_lsreq->rspdma);
+pnvme_lsreq->rsplen, _lsreq->rqstdma,
+_lsreq->rspdma);
  
  	vport->phba->fc4NvmeLsRequests++;
  
@@ -527,11 +527,11 @@ lpfc_nvme_ls_abort(struct nvme_fc_local_port *pnvme_lport,

/* Expand print to include key fields. */
lpfc_printf_vlog(vport, KERN_INFO, LOG_NVME_ABTS,
 "6040 ENTER.  lport %p, rport %p lsreq %p rqstlen:%d "
-"rsplen:%d %llux %llux\n",
+"rsplen:%d %pad %pad\n",
 pnvme_lport, pnvme_rport,
 pnvme_lsreq, pnvme_lsreq->rqstlen,
-pnvme_lsreq->rsplen, pnvme_lsreq->rqstdma,
-pnvme_lsreq->rspdma);
+pnvme_lsreq->rsplen, _lsreq->rqstdma,
+_lsreq->rspdma);
  
  	/*

 * Lock the ELS ring txcmplq and build a local list of all ELS IOs




Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-28 Thread Mike Christie
On 02/27/2017 07:22 PM, Xiubo Li wrote:
> Hi Mike
> 
> Thanks verrry much for your work and test cases.
> 
> 
> From: Xiubo Li 
>
> Currently for the TCMU, the ring buffer size is fixed to 64K cmd
> area + 1M data area, and this will be bottlenecks for high iops.
>>> Hi Xiubo, thanks for your work.
>>>
>>> daynmic -> dynamic
>>>
>>> Have you benchmarked this patch and determined what kind of iops
>>> improvement it allows? Do you see the data area reaching its
>>> fully-allocated size?
>>>
>> I tested this patch with Venky's tcmu-runner rbd aio patches, with one
>> 10 gig iscsi session, and for pretty basic fio direct io (64 -256K
>> read/writes with a queue depth of 64 numjobs between 1 and 4) tests read
>> throughput goes from about 80 to 500 MB/s.
> Looks nice.
> 
>> Write throughput is pretty
>> low at around 150 MB/s.
> What's the original write throughput without this patch? Is it also
> around 80 MB/s ?

It is around 20-30 MB/s. Same fio args except using --rw=write.


Re: [PATCH] scsi: lpfc: use div_u64 for 64-bit division

2017-02-28 Thread James Smart

Arnd,

Thank you. Looks good.

-- james

Signed-off-by: James Smart 

On 2/27/2017 12:31 PM, Arnd Bergmann wrote:

The new debugfs output causes a link error on 32-bit architectures:

ERROR: "__aeabi_uldivmod" [drivers/scsi/lpfc/lpfc.ko] undefined!

This code is not performance critical, so we can simply use div_u64().

Fixes: bd2cdd5e400f ("scsi: lpfc: NVME Initiator: Add debugfs support")
Fixes: 2b65e18202fd ("scsi: lpfc: NVME Target: Add debugfs support")
Signed-off-by: Arnd Bergmann 
---
  drivers/scsi/lpfc/lpfc_debugfs.c | 64 
  1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/drivers/scsi/lpfc/lpfc_debugfs.c b/drivers/scsi/lpfc/lpfc_debugfs.c
index 599fde4ea8b1..47c67bf0514e 100644
--- a/drivers/scsi/lpfc/lpfc_debugfs.c
+++ b/drivers/scsi/lpfc/lpfc_debugfs.c
@@ -873,8 +873,8 @@ lpfc_debugfs_nvmektime_data(struct lpfc_vport *vport, char 
*buf, int size)
len += snprintf(
buf + len, PAGE_SIZE - len,
"avg:%08lld min:%08lld max %08lld\n",
-   phba->ktime_seg1_total /
-   phba->ktime_data_samples,
+   div_u64(phba->ktime_seg1_total,
+   phba->ktime_data_samples),
phba->ktime_seg1_min,
phba->ktime_seg1_max);
len += snprintf(
@@ -884,8 +884,8 @@ lpfc_debugfs_nvmektime_data(struct lpfc_vport *vport, char 
*buf, int size)
len += snprintf(
buf + len, PAGE_SIZE - len,
"avg:%08lld min:%08lld max %08lld\n",
-   phba->ktime_seg2_total /
-   phba->ktime_data_samples,
+   div_u64(phba->ktime_seg2_total,
+   phba->ktime_data_samples),
phba->ktime_seg2_min,
phba->ktime_seg2_max);
len += snprintf(
@@ -895,8 +895,8 @@ lpfc_debugfs_nvmektime_data(struct lpfc_vport *vport, char 
*buf, int size)
len += snprintf(
buf + len, PAGE_SIZE - len,
"avg:%08lld min:%08lld max %08lld\n",
-   phba->ktime_seg3_total /
-   phba->ktime_data_samples,
+   div_u64(phba->ktime_seg3_total,
+   phba->ktime_data_samples),
phba->ktime_seg3_min,
phba->ktime_seg3_max);
len += snprintf(
@@ -906,17 +906,17 @@ lpfc_debugfs_nvmektime_data(struct lpfc_vport *vport, 
char *buf, int size)
len += snprintf(
buf + len, PAGE_SIZE - len,
"avg:%08lld min:%08lld max %08lld\n",
-   phba->ktime_seg4_total /
-   phba->ktime_data_samples,
+   div_u64(phba->ktime_seg4_total,
+   phba->ktime_data_samples),
phba->ktime_seg4_min,
phba->ktime_seg4_max);
len += snprintf(
buf + len, PAGE_SIZE - len,
"Total IO avg time: %08lld\n",
-   ((phba->ktime_seg1_total +
+   div_u64(phba->ktime_seg1_total +
phba->ktime_seg2_total  +
phba->ktime_seg3_total +
-   phba->ktime_seg4_total) /
+   phba->ktime_seg4_total,
phba->ktime_data_samples));
return len;
}
@@ -935,8 +935,8 @@ lpfc_debugfs_nvmektime_data(struct lpfc_vport *vport, char 
*buf, int size)
"cmd pass to NVME Layer\n");
len += snprintf(buf + len, PAGE_SIZE-len,
"avg:%08lld min:%08lld max %08lld\n",
-   phba->ktime_seg1_total /
-   phba->ktime_data_samples,
+   div_u64(phba->ktime_seg1_total,
+   phba->ktime_data_samples),
phba->ktime_seg1_min,
phba->ktime_seg1_max);
len += snprintf(buf + len, PAGE_SIZE-len,
@@ -944,8 +944,8 @@ lpfc_debugfs_nvmektime_data(struct lpfc_vport *vport, char 
*buf, int size)
"-to- Driver rcv cmd OP (action)\n");
len += snprintf(buf + len, PAGE_SIZE-len,
"avg:%08lld min:%08lld max %08lld\n",
-   phba->ktime_seg2_total /
-   phba->ktime_data_samples,
+   div_u64(phba->ktime_seg2_total,
+   phba->ktime_data_samples),
phba->ktime_seg2_min,
phba->ktime_seg2_max);
len += snprintf(buf + len, PAGE_SIZE-len,
@@ -953,8 +953,8 @@ 

Re: SCSI regression in 4.11

2017-02-28 Thread Jens Axboe
On 02/28/2017 10:16 AM, Stephen Hemminger wrote:
> On Tue, 28 Feb 2017 09:06:13 -0800
> James Bottomley  wrote:
> 
>> On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
>>> On 02/28/2017 07:08 AM, Christoph Hellwig wrote:  
 On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:  
> Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
> I/O")
>
> but that is already in linux-next.
>
> Noticed another place where memset(of the data was being done not
> the extra bits.
> Tried this, but didn't fix it either...  

 Are you using blk-mq or the legacy request code?  
>>>
>>> Stephen doesn't have MQ set in the config he posted, I'm assuming he 
>>> didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
>>> asked if turning on MQ makes a difference.  
>>
>> OK, since we're not making much progress, Stephen, could you insert
>> some debugging into the storvsc driver?  The trace clearly shows we're
>> getting zeros back in the buffer when we should have data from the
>> initial scan.  Firstly, does the vmbus think it's transferring any data
>> for the INQUIRY and READ_CAPACITY commands (looks like
>> storvsc_command_completion() data_transfer_length)?  If it does,
>> there's probably an issue initialising the sg list.  If it doesn't,
>> we're probably sending bogus commands.
>>
>> James
> 
> I tried booting with scsi_mod.use_blk_mq=true and that did nothing.
> Rebuilding now with 
> CONFIG_SCSI_MQ_DEFAULT=y

If you already booted with scsi_mod.use_blk_mq=true and tested that,
then don't bother with the config option. They basically toggle
the same switch.

> Sure, can instrument. after that test.

That'd be great!

-- 
Jens Axboe



Re: SCSI regression in 4.11

2017-02-28 Thread Stephen Hemminger
On Tue, 28 Feb 2017 15:08:12 +0100
Christoph Hellwig  wrote:

> On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:
> > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each I/O")
> > 
> > but that is already in linux-next.
> > 
> > Noticed another place where memset(of the data was being done not the extra 
> > bits.
> > Tried this, but didn't fix it either...  
> 
> Are you using blk-mq or the legacy request code?

I was using legacy, but even with CONFIG_SCSI_MQ_DEFAULT=y
the same failure occurs.


Re: SCSI regression in 4.11

2017-02-28 Thread James Bottomley
On Tue, 2017-02-28 at 08:32 -0700, Jens Axboe wrote:
> On 02/28/2017 07:08 AM, Christoph Hellwig wrote:
> > On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:
> > > Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each
> > > I/O")
> > > 
> > > but that is already in linux-next.
> > > 
> > > Noticed another place where memset(of the data was being done not
> > > the extra bits.
> > > Tried this, but didn't fix it either...
> > 
> > Are you using blk-mq or the legacy request code?
> 
> Stephen doesn't have MQ set in the config he posted, I'm assuming he 
> didn't boot with scsi_mod.use_blk_mq=true. In a previous email, I 
> asked if turning on MQ makes a difference.

OK, since we're not making much progress, Stephen, could you insert
some debugging into the storvsc driver?  The trace clearly shows we're
getting zeros back in the buffer when we should have data from the
initial scan.  Firstly, does the vmbus think it's transferring any data
for the INQUIRY and READ_CAPACITY commands (looks like
storvsc_command_completion() data_transfer_length)?  If it does,
there's probably an issue initialising the sg list.  If it doesn't,
we're probably sending bogus commands.

James



Re: [PATCH 2/3] block: Separate zone requests from medium access requests

2017-02-28 Thread Bart Van Assche
On Tue, 2017-02-28 at 19:25 +0900, Damien Le Moal wrote:
> From: Bart Van Assche 
> 
> Use blk_rq_accesses_medium() instead of !blk_rq_is_passthrough() to
> ensure that code that is intended for normal medium access requests,
> e.g. DISCARD, READ and WRITE requests, is not applied to
> REQ_OP_ZONE_REPORT requests nor to REQ_OP_ZONE_RESET requests.
> This allows excluding these zone requests from request accounting
> and from request scheduling.
> 
> Signed-off-by: Bart Van Assche 

Hello Damien,

Since you posted this patch you should have added your Signed-off-by. And
since you have edited this patch, you should have documented what you have
changed.

>  static inline void req_set_nomerge(struct request_queue *q, struct request 
> *req)
> diff --git a/block/elevator.c b/block/elevator.c
> index 699d10f..cbf81c6 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -635,16 +635,20 @@ void __elv_add_request(struct request_queue *q, struct 
> request *rq, int where)
>  
>   rq->q = q;
>  
> - if (rq->rq_flags & RQF_SOFTBARRIER) {
> + if (!blk_rq_accesses_medium(rq)) {
> + /* Do not schedule zone requests */
> + where = ELEVATOR_INSERT_FRONT;
> + } if (rq->rq_flags & RQF_SOFTBARRIER) {

This change was not in the patch I sent to you. Additionally, this change
doesn't look properly formatted. Please make sure that the second "if" starts
on a new line.

>   }
>   } else if (!(rq->rq_flags & RQF_ELVPRIV) &&
>   (where == ELEVATOR_INSERT_SORT ||
> -  where == ELEVATOR_INSERT_SORT_MERGE))
> +  where == ELEVATOR_INSERT_SORT_MERGE)) {
>   where = ELEVATOR_INSERT_BACK;
> + }

This change wasn't in my patch either. Since this change only adds a pair of
braces, can it be left out?

Thanks,

Bart.

Re: [PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Christoph Hellwig
I don't really like this too much - this is too many SCSI specifics
for the block layer to care.  Maybe using bios for the zone ops was a
mistake after all, and we should just have operations in struct block_device
instead..


Re: [PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Bart Van Assche
On Tue, 2017-02-28 at 17:02 +0100, Christoph Hellwig wrote:
> I don't really like this too much - this is too many SCSI specifics
> for the block layer to care.  Maybe using bios for the zone ops was a
> mistake after all, and we should just have operations in struct block_device
> instead..

blk_rq_accesses_medium() has nothing to do with SCSI. It is a reintroduction
of what REQ_TYPE_FS stood for before ZBC was introduced but with another name.
BTW, I think that all blk_rq_is_passthrough() callers have to be reviewed to
see whether or not !blk_rq_accesses_medium() is perhaps what was intended.

Bart.

Re: [ANNOUNCE]: Broadcom (Emulex) FC Target driver - efct

2017-02-28 Thread Hannes Reinecke
Hi James,

On 02/28/2017 12:28 AM, James Smart wrote:
> I'd like to announce the availability of the Broadcom (Emulex) FC Target
> driver - efct.
> This driver has been part of the Emulex OneCore Storage SDK tool kit for
> Emulex
> SLI-4 adapters. The SLI-4 adapters support 16Gb/s and higher adapters.
> Although this
> kit has supported FCoE in the past, it is currently limited to FC support.
> 
> This driver provides the following:
> - Target mode operation:
>   - Functional with LIO-based interfaces
>   - Extensive use of hw offloads such as auto-xfer_rdy, auto-rsp, cmd
> cpu spreading
>   - High login mode - thousands of logins
>   - T-10 DIF/PI support  (inline and separate)
>   - NPIV support
> - Concurrent Initiator support if needed
> - Discovery engine has F_Port and fabric services emulation.
> - Extended mgmt interfaces:
>   - firmware dump api, including dump to host memory for faster dumps
>   - Healthcheck operations and watchdogs
> - Extended driver behaviors such as:
>   - polled mode operation
>   - multi-queue: cpu, roundrobin, or priority  (but not tied  to scsi-mq)
>   - long chained sgl's
>   - extensive internal logging and statistics
>   - Tuning parameters on modes and resource allocation to different
> features
> 
Yay! Finally!

> Broadcom is looking to upstream this driver and would like review and
> feedback.
> The driver may be found at the following git repository:
> g...@gitlab.com:jsmart/efct-Emulex_FC_Target.git
> 
> 
> Some of the key questions we have are with lpfc :
> 1) Coexistence vs integration
> Currently, the efct driver maps to a different set of PCI ids than lpfc.
> It's very clear there's an overlap with lpfc, both on SLI-4 hw as well
> as initiator support.
> Although target mode support can be simplistically added to lpfc, what
> we've found is that doing so means a lot of tradeoffs. Some of the
> target mode features, when enabled,
> impact the initiator support and how it would operate.
> 
Can you clarify these?
Are these 'just' resource allocation problems or something else, too?

> 2) SLI-3 support
> lpfc provides SLI-3 support so that all FC adapters are supported,
> including the older ones.
> The form of the driver, based on its history, is SLI-3 with SLI-3
> adapted to SLI-4 at the point
> it hits the hardware. efct does not support SLI-3.
> 
I personally wouldn't have a problem with _not_ enabling SLI-3; after
all, this is (conceptually :-) new code, so we do not _need_ to support
older HW here.

> 3) complexity of configuration knobs caused by the kitchen-sink of
> features in lpfc ?
> we are pushing the limit on needing per-instance attributes rather than
> global module
> parameters.
> 
Yes, this is always a good idea. Also one might want to look at which of
those knobs are there for pure historical reasons, maybe we can cut down
on the number of knobs already :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: SCSI regression in 4.11

2017-02-28 Thread Jens Axboe
On 02/28/2017 07:08 AM, Christoph Hellwig wrote:
> On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:
>> Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each I/O")
>>
>> but that is already in linux-next.
>>
>> Noticed another place where memset(of the data was being done not the extra 
>> bits.
>> Tried this, but didn't fix it either...
> 
> Are you using blk-mq or the legacy request code?

Stephen doesn't have MQ set in the config he posted, I'm assuming he didn't
boot with scsi_mod.use_blk_mq=true. In a previous email, I asked if turning
on MQ makes a difference.

-- 
Jens Axboe



Re: SCSI regression in 4.11

2017-02-28 Thread Christoph Hellwig
On Mon, Feb 27, 2017 at 05:19:31PM -0800, Stephen Hemminger wrote:
> Fixes: ee5242360424 ("scsi: zero per-cmd driver data before each I/O")
> 
> but that is already in linux-next.
> 
> Noticed another place where memset(of the data was being done not the extra 
> bits.
> Tried this, but didn't fix it either...

Are you using blk-mq or the legacy request code?


Re: [PATCH] sd: close hole in > 2T device rejection when !CONFIG_LBDAF

2017-02-28 Thread Steve Magnani


On 02/27/2017 12:57 PM, Bart Van Assche wrote:

...
How about the (untested) patch below? The approach below avoids that the check 
is
duplicated and - at least in my opinion - results in code that is easier to 
read.
I find lba_too_large() a little dense, but functionally OK. The "shift 
>= 0" clause could be dropped.

I tested this on my "problem" system (READ CAPACITY(10)) without incident.


diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index cb6e68dd6df0..3533d1e46bde 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2082,6 +2082,16 @@ static void read_capacity_error(struct scsi_disk *sdkp, 
struct scsi_device *sdp,
sdkp->capacity = 0; /* unknown mapped to zero - as usual */
  }
  
+/*

+ * Check whether or not logical_to_sectors(sdp, lba) will overflow.
+ */
+static bool lba_too_large(u64 lba, u32 logical_block_size)
+{
+   int shift = sizeof(sector_t) * 8 + 9 - ilog2(logical_block_size);
+
+   return shift >= 0 && shift < 64 && lba >= (1ULL << shift);
+}
+
  #define RC16_LEN 32
  #if RC16_LEN > SD_BUF_SIZE
  #error RC16_LEN must not be more than SD_BUF_SIZE
@@ -2154,7 +2164,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, 
struct scsi_device *sdp,
return -ENODEV;
}
  
-	if ((sizeof(sdkp->capacity) == 4) && (lba >= 0xULL)) {

+   if (lba_too_large(lba + 1, sector_size)) {
sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a "
"kernel compiled with support for large block "
"devices.\n");
@@ -2243,7 +2253,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, 
struct scsi_device *sdp,
return sector_size;
}
  
-	if ((sizeof(sdkp->capacity) == 4) && (lba == 0x)) {

+   if (lba_too_large(lba + 1ULL, sector_size)) {
sd_printk(KERN_ERR, sdkp, "Too big for this kernel. Use a "
"kernel compiled with support for large block "
"devices.\n");




[PATCH 0/3] Separate zone requests from medium access requests

2017-02-28 Thread Damien Le Moal
This series introduces blk_rq_accesses_medium(), which is equivalent to
!blk_rq_is_passthrough() minus the zone request operations REQ_OP_ZONE_REPORT
and REQ_OP_ZONE_RESET. This new helper allows avoiding problems due to the
non-standard nature of these commands (report zones does no operate on logical
block size units while reset zone operates on entire zones only).

Using blk_rq_accesses_medium(), patch 2 excludes the zone commands from
request accounting (these commands are not accessing the device medium).
Exclusion from request scheduling is also added.

Finally, patch 3 uses the blk_rq_accesses_medium() helper to fix improperly
unaligned resid values only and only for medium access commands. This
correctly excludes from the resid correction zone requests as well as
passthrough requests.

ALl 3 patches are originally from Bart.

Bart Van Assche (3):
  block: Introduce blk_rq_accesses_medium()
  block: Separate zone requests from medium access requests
  mpt3sas: Do not check resid for non medium access commands

 block/blk-core.c |  2 +-
 block/blk.h  |  2 +-
 block/elevator.c | 12 
 block/mq-deadline.c  |  2 +-
 drivers/scsi/mpt3sas/mpt3sas_scsih.c |  4 ++--
 include/linux/blk_types.h| 17 ++---
 include/linux/blkdev.h   | 18 +++---
 7 files changed, 38 insertions(+), 19 deletions(-)

-- 
2.9.3



Re: [PATCHv2 4/5] scsi: make scsi_eh_scmd_add() always succeed

2017-02-28 Thread Hannes Reinecke
On 02/27/2017 11:27 PM, Bart Van Assche wrote:
> On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
>>  spin_lock_irqsave(shost->host_lock, flags);
>> +WARN_ON(shost->shost_state != SHOST_RUNNING &&
>> +shost->shost_state != SHOST_CANCEL &&
>> +shost->shost_state != SHOST_RECOVERY &&
>> +shost->shost_state != SHOST_CANCEL_RECOVERY);
>>  if (scsi_host_set_state(shost, SHOST_RECOVERY))
>> -if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
>> -goto out_unlock;
>> +scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
> 
> Please issue a warning if the second scsi_host_set_state() fails. And once
> that failure triggers a warning, I don't think we need the newly added
> WARN_ON() statement anymore. Something else that surprised me is that you
> consistently use WARN_ON() instead of WARN_ON_ONCE() in this patch?
> 
Okay, will be fixing it up.

> Otherwise this patch looks fine to me.
> 
Thanks.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


[PATCH 1/3] block: Introduce blk_rq_accesses_medium()

2017-02-28 Thread Damien Le Moal
From: Bart Van Assche 

A medium access request is defined as an internal regular request that
operates on a whole number of logical blocks of the storage medium.
These include REQ_OP_READ, REQ_OP_WRITE, REQ_OP_FLUSH, REQ_OP_DISCARD,
REQ_OP_SECURE_ERASE, REQ_OP_WRITE_SAME and REQ_OP_WRITE_ZEROES.

Zoned block device requests (REQ_OP_ZONE_REPORT and REQ_OP_ZONE_RESET)
as well as SCSI passthrough and driver private commands are not
considered medium access request.

Reshuffle enum req_opf definitions grouping medium access request up
to REQ_OP_MEDIUM_LAST and introduce the helper function
blk_rq_accesses_medium() to test a request.

Signed-off-by: Bart Van Assche 
---
 include/linux/blk_types.h | 17 ++---
 include/linux/blkdev.h| 12 
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d703acb..6420057 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -151,16 +151,19 @@ enum req_opf {
REQ_OP_FLUSH= 2,
/* discard sectors */
REQ_OP_DISCARD  = 3,
-   /* get zone information */
-   REQ_OP_ZONE_REPORT  = 4,
/* securely erase sectors */
-   REQ_OP_SECURE_ERASE = 5,
-   /* seset a zone write pointer */
-   REQ_OP_ZONE_RESET   = 6,
+   REQ_OP_SECURE_ERASE = 4,
/* write the same sector many times */
-   REQ_OP_WRITE_SAME   = 7,
+   REQ_OP_WRITE_SAME   = 5,
/* write the zero filled sector many times */
-   REQ_OP_WRITE_ZEROES = 8,
+   REQ_OP_WRITE_ZEROES = 6,
+
+   REQ_OP_MEDIUM_LAST = REQ_OP_WRITE_ZEROES,
+
+   /* get zone information */
+   REQ_OP_ZONE_REPORT  = 16,
+   /* reset a zone write pointer */
+   REQ_OP_ZONE_RESET   = 17,
 
/* SCSI passthrough using struct scsi_request */
REQ_OP_SCSI_IN  = 32,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index aecca0e..7d1ce2d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -248,6 +248,18 @@ static inline bool blk_rq_is_passthrough(struct request 
*rq)
return blk_rq_is_scsi(rq) || blk_rq_is_private(rq);
 }
 
+/**
+ * blk_rq_accesses_medium - test if a request is a medium access request
+ * @rq: A block layer request.
+ *
+ * A medium access request is a regular internal request that operates on
+ * a whole number of logical blocks of the storage medium.
+ */
+static inline bool blk_rq_accesses_medium(const struct request *rq)
+{
+   return req_op(rq) <= REQ_OP_MEDIUM_LAST;
+}
+
 static inline unsigned short req_get_ioprio(struct request *req)
 {
return req->ioprio;
-- 
2.9.3



[PATCH 2/3] block: Separate zone requests from medium access requests

2017-02-28 Thread Damien Le Moal
From: Bart Van Assche 

Use blk_rq_accesses_medium() instead of !blk_rq_is_passthrough() to
ensure that code that is intended for normal medium access requests,
e.g. DISCARD, READ and WRITE requests, is not applied to
REQ_OP_ZONE_REPORT requests nor to REQ_OP_ZONE_RESET requests.
This allows excluding these zone requests from request accounting
and from request scheduling.

Signed-off-by: Bart Van Assche 
---
 block/blk-core.c   |  2 +-
 block/blk.h|  2 +-
 block/elevator.c   | 12 
 block/mq-deadline.c|  2 +-
 include/linux/blkdev.h |  6 +++---
 5 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b9e857f..addd8e1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2608,7 +2608,7 @@ bool blk_update_request(struct request *req, int error, 
unsigned int nr_bytes)
req->__data_len -= total_bytes;
 
/* update sector only for requests with clear definition of sector */
-   if (!blk_rq_is_passthrough(req))
+   if (blk_rq_accesses_medium(req))
req->__sector += total_bytes >> 9;
 
/* mixed attributes always follow the first bio */
diff --git a/block/blk.h b/block/blk.h
index d1ea4bd9..9b63db7 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -255,7 +255,7 @@ static inline int blk_do_io_stat(struct request *rq)
 {
return rq->rq_disk &&
   (rq->rq_flags & RQF_IO_STAT) &&
-   !blk_rq_is_passthrough(rq);
+   blk_rq_accesses_medium(rq);
 }
 
 static inline void req_set_nomerge(struct request_queue *q, struct request 
*req)
diff --git a/block/elevator.c b/block/elevator.c
index 699d10f..cbf81c6 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -635,16 +635,20 @@ void __elv_add_request(struct request_queue *q, struct 
request *rq, int where)
 
rq->q = q;
 
-   if (rq->rq_flags & RQF_SOFTBARRIER) {
+   if (!blk_rq_accesses_medium(rq)) {
+   /* Do not schedule zone requests */
+   where = ELEVATOR_INSERT_FRONT;
+   } if (rq->rq_flags & RQF_SOFTBARRIER) {
/* barriers are scheduling boundary, update end_sector */
-   if (!blk_rq_is_passthrough(rq)) {
+   if (blk_rq_accesses_medium(rq)) {
q->end_sector = rq_end_sector(rq);
q->boundary_rq = rq;
}
} else if (!(rq->rq_flags & RQF_ELVPRIV) &&
(where == ELEVATOR_INSERT_SORT ||
-where == ELEVATOR_INSERT_SORT_MERGE))
+where == ELEVATOR_INSERT_SORT_MERGE)) {
where = ELEVATOR_INSERT_BACK;
+   }
 
switch (where) {
case ELEVATOR_INSERT_REQUEUE:
@@ -679,7 +683,7 @@ void __elv_add_request(struct request_queue *q, struct 
request *rq, int where)
if (elv_attempt_insert_merge(q, rq))
break;
case ELEVATOR_INSERT_SORT:
-   BUG_ON(blk_rq_is_passthrough(rq));
+   BUG_ON(!blk_rq_accesses_medium(rq));
rq->rq_flags |= RQF_SORTED;
q->nr_sorted++;
if (rq_mergeable(rq)) {
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 23612163..389c1af 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -399,7 +399,7 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, 
struct request *rq,
 
blk_mq_sched_request_inserted(rq);
 
-   if (at_head || blk_rq_is_passthrough(rq)) {
+   if (at_head || !blk_rq_accesses_medium(rq)) {
if (at_head)
list_add(>queuelist, >dispatch);
else
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7d1ce2d..dcf926d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -720,7 +720,7 @@ static inline void queue_flag_clear(unsigned int flag, 
struct request_queue *q)
 
 static inline bool blk_account_rq(struct request *rq)
 {
-   return (rq->rq_flags & RQF_STARTED) && !blk_rq_is_passthrough(rq);
+   return (rq->rq_flags & RQF_STARTED) && blk_rq_accesses_medium(rq);
 }
 
 #define blk_rq_cpu_valid(rq)   ((rq)->cpu != -1)
@@ -796,7 +796,7 @@ static inline void blk_clear_rl_full(struct request_list 
*rl, bool sync)
 
 static inline bool rq_mergeable(struct request *rq)
 {
-   if (blk_rq_is_passthrough(rq))
+   if (!blk_rq_accesses_medium(rq))
return false;
 
if (req_op(rq) == REQ_OP_FLUSH)
@@ -1070,7 +1070,7 @@ static inline unsigned int blk_rq_get_max_sectors(struct 
request *rq,
 {
struct request_queue *q = rq->q;
 
-   if (blk_rq_is_passthrough(rq))
+   if (!blk_rq_accesses_medium(rq))
return q->limits.max_hw_sectors;
 
if (!q->limits.chunk_sectors ||
-- 
2.9.3



Re: [PATCHv2 2/5] scsi: make eh_eflags persistent

2017-02-28 Thread Hannes Reinecke
On 02/27/2017 11:15 PM, Bart Van Assche wrote:
> On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
>> To detect if a failed command has been retried we must not
>> clear scmd->eh_eflags when EH finishes.
>> The flag should be persistent throughout the lifetime
>> of the command.
> 
> Hello Hannes,
> 
> Is it on purpose that this patch does not remove the following statement from
> drivers/scsi/libsas/sas_scsi_host.c?
> 
>   cmd->eh_eflags = 0;
> 
Oh, yes, you are right.

Will be fixing it up.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)


Re: [Lsf-pc] [LSF/MM TOPIC] do we really need PG_error at all?

2017-02-28 Thread Jeff Layton
On Tue, 2017-02-28 at 12:12 +0200, Boaz Harrosh wrote:
> On 02/28/2017 03:11 AM, Jeff Layton wrote:
> <>
> > 
> > I'll probably have questions about the read side as well, but for now it
> > looks like it's mostly used in an ad-hoc way to communicate errors
> > across subsystems (block to fs layer, for instance).
> 
> If memory does not fail me it used to be checked long time ago in the
> read-ahead case. On the buffered read case, the first page is read synchronous
> and any error is returned to the caller, but then a read-ahead chunk is
> read async all the while the original thread returned to the application.
> So any errors are only recorded on the page-bit, since otherwise the uptodate
> is off and the IO will be retransmitted. Then the move to read_iter changed
> all that I think.
> But again this is like 5-6 years ago, and maybe I didn't even understand
> very well.
> 

Yep, that's what I meant about using it to communicate errors between
layers. e.g. end_buffer_async_read will check PageError and only
SetPageUptodate if it's not set. That has morphed a lot in the last few
years though and it looks like it may rely on PG_error less than it used
to.

> 
> I would like a Documentation of all this as well please. Where are the
> tests for this?
> 

Documentation is certainly doable (and I'd like to write some once we
have this all straightened out). In particular, I think we need clear
guidelines for fs authors on how to handle pagecache read and write
errors. Tests are a little tougher -- this is all kernel-internal stuff
and not easily visible to userland.

The one thing I have noticed is that even if you set AS_ENOSPC in the
mapping, you'll still get back -EIO on the first fsync if any PG_error
bits are set. I think we ought to fix that by not doing the
TestClearPageError call in __filemap_fdatawait_range, and just rely on
the mapping error there.

We could maybe roll a test for that, but it's rather hard to test ENOSPC
conditions in a fs-agnostic way. I'm open to suggestions here though.

-- 
Jeff Layton 


Re: [PATCH] scsi: qedi: fix missing return error code check on call to qedi_setup_int

2017-02-28 Thread Rangankar, Manish

On 28/02/17 4:32 PM, "Colin King"  wrote:

>From: Colin Ian King 
>
>The call to qedi_setup_int is not updating the return code rc yet rc
>is being checked for an error. Fix this by assigning rc to the return
>code from the call to qedi_setup_int.
>
>Signed-off-by: Colin Ian King 
>---
> drivers/scsi/qedi/qedi_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
>index 5eda21d..8e3d928 100644
>--- a/drivers/scsi/qedi/qedi_main.c
>+++ b/drivers/scsi/qedi/qedi_main.c
>@@ -1805,7 +1805,7 @@ static int __qedi_probe(struct pci_dev *pdev, int
>mode)
>*/
>   qedi_ops->common->update_pf_params(qedi->cdev, >pf_params);
> 
>-  qedi_setup_int(qedi);
>+  rc = qedi_setup_int(qedi);
>   if (rc)
>   goto stop_iscsi_func;
> 
>-- 
>2.10.2
>

Acked-by: Manish Rangankar 



[PATCH] scsi: qedi: fix missing return error code check on call to qedi_setup_int

2017-02-28 Thread Colin King
From: Colin Ian King 

The call to qedi_setup_int is not updating the return code rc yet rc
is being checked for an error. Fix this by assigning rc to the return
code from the call to qedi_setup_int.

Signed-off-by: Colin Ian King 
---
 drivers/scsi/qedi/qedi_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/qedi/qedi_main.c b/drivers/scsi/qedi/qedi_main.c
index 5eda21d..8e3d928 100644
--- a/drivers/scsi/qedi/qedi_main.c
+++ b/drivers/scsi/qedi/qedi_main.c
@@ -1805,7 +1805,7 @@ static int __qedi_probe(struct pci_dev *pdev, int mode)
 */
qedi_ops->common->update_pf_params(qedi->cdev, >pf_params);
 
-   qedi_setup_int(qedi);
+   rc = qedi_setup_int(qedi);
if (rc)
goto stop_iscsi_func;
 
-- 
2.10.2



[PATCH 2/2] sd_zbc: Remove superfluous assignments

2017-02-28 Thread Damien Le Moal
From: Bart Van Assche 

A value is assigned to the variable 'capacity' in sd_zbc_read_zones()
but that value is never used. Hence remove the variable 'capacity'.

Signed-off-by: Bart Van Assche 

The variable 'ret' is initialized in sd_zbc_read_zones() but that
initialization is not necessary.

Signed-off-by: Damien Le Moal 
---
 drivers/scsi/sd_zbc.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 789c970..cf2c6a7 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -595,8 +595,7 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
 int sd_zbc_read_zones(struct scsi_disk *sdkp,
  unsigned char *buf)
 {
-   sector_t capacity;
-   int ret = 0;
+   int ret;
 
if (!sd_is_zoned(sdkp))
/*
@@ -628,7 +627,6 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp,
ret = sd_zbc_check_capacity(sdkp, buf);
if (ret)
goto err;
-   capacity = logical_to_sectors(sdkp->device, sdkp->capacity);
 
/*
 * Check zone size: only devices with a constant zone size (except
-- 
2.9.3



[PATCH 0/2] Improve ZBC integration

2017-02-28 Thread Damien Le Moal
2 patches in this series.

The first one from myself, cleans up and simplify sd_done() by moving all
ZBC specific processing to sd_zbc_complete().
sd_zbc_complete() as well as sd_zbc_report_zones_complete() are improved to
be more resilient against eventual weird values of good_bytes and resid
returned for the report zone command. These changes alone do not however fix
the mpt3sas problem which incorrectly align resid to the device logical block
size for the report zones command. The fix for this is in the series sent to
Jens ("mpt3sas: Do not check resid for non medium access commands").

The second patch, from Bart, cleans up unused variable assignments in
sd_zbc_read_zones().

Bart Van Assche (1):
  sd_zbc: Remove superfluous assignments

Damien Le Moal (1):
  sd: Rework ZBC integration

 drivers/scsi/sd.c |  36 ++---
 drivers/scsi/sd.h |  11 +++---
 drivers/scsi/sd_zbc.c | 105 --
 3 files changed, 85 insertions(+), 67 deletions(-)

-- 
2.9.3



[PATCH 1/2] sd: Rework ZBC integration

2017-02-28 Thread Damien Le Moal
Remove most of the zone request specific code from sd_done() and
move it to sd_zbc_complete(), which is called at the end of sd_done()
so that good_bytes is set for REQ_OP_ZONE_REPORT and REQ_OP_ZONE_RESET.
In sd_zbc_complete(), enhance checks on the value of good_bytes for
report zones to avoid eventual problems with bogus hardware.

Signed-off-by: Damien Le Moal 
---
 drivers/scsi/sd.c |  36 ++
 drivers/scsi/sd.h |  11 +++---
 drivers/scsi/sd_zbc.c | 101 +-
 3 files changed, 84 insertions(+), 64 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c7839f6..1bcd80a 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1786,15 +1786,11 @@ static int sd_done(struct scsi_cmnd *SCpnt)
struct scsi_sense_hdr sshdr;
struct scsi_disk *sdkp = scsi_disk(SCpnt->request->rq_disk);
struct request *req = SCpnt->request;
-   int sense_valid = 0;
-   int sense_deferred = 0;
-   unsigned char op = SCpnt->cmnd[0];
-   unsigned char unmap = SCpnt->cmnd[1] & 8;
+   bool sense_valid = false;
 
switch (req_op(req)) {
case REQ_OP_DISCARD:
case REQ_OP_WRITE_SAME:
-   case REQ_OP_ZONE_RESET:
if (!result) {
good_bytes = blk_rq_bytes(req);
scsi_set_resid(SCpnt, 0);
@@ -1803,27 +1799,15 @@ static int sd_done(struct scsi_cmnd *SCpnt)
scsi_set_resid(SCpnt, blk_rq_bytes(req));
}
break;
-   case REQ_OP_ZONE_REPORT:
-   if (!result) {
-   good_bytes = scsi_bufflen(SCpnt)
-   - scsi_get_resid(SCpnt);
-   scsi_set_resid(SCpnt, 0);
-   } else {
-   good_bytes = 0;
-   scsi_set_resid(SCpnt, blk_rq_bytes(req));
-   }
-   break;
}
 
if (result) {
-   sense_valid = scsi_command_normalize_sense(SCpnt, );
-   if (sense_valid)
-   sense_deferred = scsi_sense_is_deferred();
+   sense_valid = scsi_command_normalize_sense(SCpnt, ) &&
+   !scsi_sense_is_deferred();
}
sdkp->medium_access_timed_out = 0;
 
-   if (driver_byte(result) != DRIVER_SENSE &&
-   (!sense_valid || sense_deferred))
+   if (driver_byte(result) != DRIVER_SENSE && !sense_valid)
goto out;
 
switch (sshdr.sense_key) {
@@ -1847,10 +1831,14 @@ static int sd_done(struct scsi_cmnd *SCpnt)
good_bytes = sd_completed_bytes(SCpnt);
break;
case ILLEGAL_REQUEST:
-   if (sshdr.asc == 0x10)  /* DIX: Host detected corruption */
+   if (sshdr.asc == 0x10) {
+   /* DIX: Host detected corruption */
good_bytes = sd_completed_bytes(SCpnt);
-   /* INVALID COMMAND OPCODE or INVALID FIELD IN CDB */
-   if (sshdr.asc == 0x20 || sshdr.asc == 0x24) {
+   } else if (sshdr.asc == 0x20 || sshdr.asc == 0x24) {
+   unsigned char op = SCpnt->cmnd[0];
+   unsigned char unmap = SCpnt->cmnd[1] & 8;
+
+   /* INVALID COMMAND OPCODE or INVALID FIELD IN CDB */
switch (op) {
case UNMAP:
sd_config_discard(sdkp, SD_LBP_DISABLE);
@@ -1876,7 +1864,7 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 
  out:
if (sd_is_zoned(sdkp))
-   sd_zbc_complete(SCpnt, good_bytes, );
+   good_bytes = sd_zbc_complete(SCpnt, good_bytes, );
 
SCSI_LOG_HLCOMPLETE(1, scmd_printk(KERN_INFO, SCpnt,
   "sd_done: completed %d of %d 
bytes\n",
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 4dac35e..0fd4886 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -270,8 +270,9 @@ extern int sd_zbc_setup_write_cmnd(struct scsi_cmnd *cmd);
 extern void sd_zbc_cancel_write_cmnd(struct scsi_cmnd *cmd);
 extern int sd_zbc_setup_report_cmnd(struct scsi_cmnd *cmd);
 extern int sd_zbc_setup_reset_cmnd(struct scsi_cmnd *cmd);
-extern void sd_zbc_complete(struct scsi_cmnd *cmd, unsigned int good_bytes,
-   struct scsi_sense_hdr *sshdr);
+extern unsigned int sd_zbc_complete(struct scsi_cmnd *cmd,
+   unsigned int good_bytes,
+   struct scsi_sense_hdr *sshdr);
 
 #else /* CONFIG_BLK_DEV_ZONED */
 
@@ -303,9 +304,9 @@ static inline int sd_zbc_setup_reset_cmnd(struct scsi_cmnd 
*cmd)
return BLKPREP_INVALID;
 }
 
-static inline void sd_zbc_complete(struct scsi_cmnd *cmd,
-  unsigned int good_bytes,
-  struct scsi_sense_hdr 

[PATCH 3/3] mpt3sas: Do not check resid for non medium access commands

2017-02-28 Thread Damien Le Moal
From: Bart Van Assche 

Commit f2e767bb5d6e ("mpt3sas: Force request partial completion
alignment") introduced a forced alignment of resid to the device
logical block size to fix bogus HBA firmware sometimes returning an
unaligned value. This fix however did not consider the case of
commands not operating on logical block size units
(e.g. REQ_OP_ZONE_REPORT and its 64B aligned partial replies). This
could result is incorrectly aligning resid for these commands, which
for REQ_OP_REPORT_ZONES result in the inability to determine the
number of zone descriptors returned.

Fix the resid alignment check to exclude all requests that are not
medium access requests using blk_rq_access_medium(). This will exclude
from the resid forced fix all passthrough requests as well as zone
command requests.

Fixes: f2e767bb5d6e ("mpt3sas: Force request partial completion alignment")
Signed-off-by: Bart Van Assche 
---
 drivers/scsi/mpt3sas/mpt3sas_scsih.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c 
b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 46e866c..405dc84 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -4748,8 +4748,8 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 
msix_index, u32 reply)
 * then scsi-ml does not need to handle this misbehavior.
 */
sector_sz = scmd->device->sector_size;
-   if (unlikely(!blk_rq_is_passthrough(scmd->request) && sector_sz &&
-xfer_cnt % sector_sz)) {
+   if (unlikely(sector_sz && (xfer_cnt & (sector_sz - 1)) &&
+blk_rq_accesses_medium(scmd->request))) {
sdev_printk(KERN_INFO, scmd->device,
"unaligned partial completion avoided (xfer_cnt=%u, 
sector_sz=%u)\n",
xfer_cnt, sector_sz);
-- 
2.9.3



Re: [PATCH] scsi_error: count medium access timeout only once per EH run

2017-02-28 Thread Hannes Reinecke
On 02/27/2017 08:33 PM, Ewan D. Milne wrote:
> On Thu, 2017-02-23 at 11:27 +0100, Hannes Reinecke wrote:
>> The current medium access timeout counter will be increased for
>> each command, so if there are enough failed commands we'll hit
>> the medium access timeout for even a single failure.
>> Fix this by making the timeout per EH run, ie the counter will
>> only be increased once per device and EH run.
> 
> So, this is good, the current implementation has a flaw in that
> under certain conditions, a device will get offlined immediately,
> (i.e. if there are a few medium access commands pending, and
> they all timeout), which isn't what was intended.
> 
> It means, of course, that we will no longer detect cases like:
> 
> , , SUCCESS, SUCCESS, SUCCESS, 
> 
> as separate medium access timeouts, but I think the original
> intent of Martin's change wasn't to operate on such a short
> time-scale, am I right, Martin?
> 
> I made a few notes on the coding/implementation (below), but that
> doesn't affect the functional change.  We should definitely change
> what we have now, it is causing people problems.
> 
>>
>> Cc: Ewan Milne 
>> Cc: Lawrence Oberman 
>> Cc: Benjamin Block 
>> Cc: Steffen Maier 
>> Signed-off-by: Hannes Reinecke 
>> ---
>>  drivers/scsi/scsi_error.c |  2 ++
>>  drivers/scsi/sd.c | 16 ++--
>>  drivers/scsi/sd.h |  1 +
>>  include/scsi/scsi.h   |  1 +
>>  4 files changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index f2cafae..481ea1b 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -58,6 +58,7 @@
>>  static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
>>  static int scsi_try_to_abort_cmd(struct scsi_host_template *,
>>   struct scsi_cmnd *);
>> +static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn);
>>  
>>  /* called with shost->host_lock held */
>>  void scsi_eh_wakeup(struct Scsi_Host *shost)
>> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>>  if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>>  eh_flag &= ~SCSI_EH_CANCEL_CMD;
>>  scmd->eh_eflags |= eh_flag;
>> +scsi_eh_action(scmd, NEEDS_RESET);
> 
> So here we are overloading the eh_disp argument with a flag to reset the
> medium_access_reset variable.  James changed the calling sequence of
> this function already to remove arguments, we could just add another
> boolean parameter "reset".  scsi_driver.eh_action() would need it too.
> 
Sure, I could be doing it.
Using a separate 'eh_disp' variable has the added benefit that it
doesn't break the kABI :-)
But yeah, I modify that.

>>  list_add_tail(>eh_entry, >eh_cmd_q);
>>  shost->host_failed++;
>>  scsi_eh_wakeup(shost);
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index be535d4..cd9f290 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1696,12 +1696,21 @@ static int sd_pr_clear(struct block_device *bdev, 
>> u64 key)
>>   *  the eh command is passed in eh_disp.  We're looking for devices that
>>   *  fail medium access commands but are OK with non access commands like
>>   *  test unit ready (so wrongly see the device as having a successful
>> - *  recovery)
>> + *  recovery).
>> + *  We have to be careful to count a medium access failure only once
>> + *  per SCSI EH run; there might be several timed out commands which
>> + *  will cause the 'max_medium_access_timeouts' counter to trigger
>> + *  after the first SCSI EH run already and set the device to offline.
>>   **/
>>  static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
>>  {
>>  struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk);
>>  
>> +if (eh_disp == NEEDS_RESET) {
>> +/* New SCSI EH run, reset gate variable */
>> +sdkp->medium_access_reset = 0;
>> +return eh_disp;
>> +}
>>  if (!scsi_device_online(scmd->device) ||
>>  !scsi_medium_access_command(scmd) ||
>>  host_byte(scmd->result) != DID_TIME_OUT ||
>> @@ -1715,7 +1724,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int 
>> eh_disp)
>>   * process of recovering or has it suffered an internal failure
>>   * that prevents access to the storage medium.
>>   */
>> -sdkp->medium_access_timed_out++;
>> +if (!sdkp->medium_access_reset) {
>> +sdkp->medium_access_timed_out++;
>> +sdkp->medium_access_reset++;
>> +}
> 
> If we only increment sdkp->medium_access_reset when it was 0, then it
> will only have the values 0 and 1, and does not need to have the full
> unsigned int precision.  A single bit field is sufficient, in which
> case the code would be:  sdkp->medium_access_reset = 1;
> 
Okay, can do that.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries & Storage

Re: [Lsf-pc] [LSF/MM TOPIC] do we really need PG_error at all?

2017-02-28 Thread Boaz Harrosh
On 02/28/2017 03:11 AM, Jeff Layton wrote:
<>
> 
> I'll probably have questions about the read side as well, but for now it
> looks like it's mostly used in an ad-hoc way to communicate errors
> across subsystems (block to fs layer, for instance).

If memory does not fail me it used to be checked long time ago in the
read-ahead case. On the buffered read case, the first page is read synchronous
and any error is returned to the caller, but then a read-ahead chunk is
read async all the while the original thread returned to the application.
So any errors are only recorded on the page-bit, since otherwise the uptodate
is off and the IO will be retransmitted. Then the move to read_iter changed
all that I think.
But again this is like 5-6 years ago, and maybe I didn't even understand
very well.

> --
> Jeff Layton 
> 

I would like a Documentation of all this as well please. Where are the
tests for this?

Thanks
Boaz



Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-28 Thread Xiubo Li

On 02/17/2017 01:24 AM, lixi...@cmss.chinamobile.com wrote:

From: Xiubo Li 

Currently for the TCMU, the ring buffer size is fixed to 64K cmd
area + 1M data area, and this will be bottlenecks for high iops.

Hi Xiubo, thanks for your work.

daynmic -> dynamic

Have you benchmarked this patch and determined what kind of iops
improvement it allows? Do you see the data area reaching its
fully-allocated size?


I tested this patch with Venky's tcmu-runner rbd aio patches, with one
10 gig iscsi session, and for pretty basic fio direct io (64 -256K
read/writes with a queue depth of 64 numjobs between 1 and 4) tests read
throughput goes from about 80 to 500 MB/s. Write throughput is pretty
low at around 150 MB/s.

I did not hit the fully allocated size. I did not drive a lot of IO though.


How about dealing with memories shrinking in patch series followed?

As the initial patch, we could set the cmd area size to 8MB and the
data area size to 512MB. And this could work fine for most cases
without using too much memories.

On my similar test case by using VMs(low iops case) using fio, -bs=[64K,
128K, 512K, 1M] -size=20G, -iodepth 1 -numjobs=10,  the bw of read
increases from about 5200KB/s to about 6100KB/s, and the bw of write
increases from about 3000KB/s to about 3300KB/s.

While bs < 64K(from the log, the maximum of the data length is 64K),
the smaller of it the two bws will be closer.

But for all my test cases, the allocated size is far away from the full size
too.

Thanks,

BRs
Xiubo






Re: [PATCH v5 02/11] phy: exynos-ufs: add UFS PHY driver for EXYNOS SoC

2017-02-28 Thread Alim Akhtar
Hi Kishon,

On 02/28/2017 09:04 AM, Kishon Vijay Abraham I wrote:
> Hi,
>
> On Monday 27 February 2017 07:40 PM, Alim Akhtar wrote:
>> Hi Kishon,
>>
>> On 02/27/2017 10:56 AM, Kishon Vijay Abraham I wrote:
>>> Hi,
>>>
>>> On Thursday 23 February 2017 12:20 AM, Alim Akhtar wrote:
 On Fri, Feb 3, 2017 at 2:49 PM, Alim Akhtar  
 wrote:
> Hi Kishon,
>
>
> On 11/19/2015 07:09 PM, Kishon Vijay Abraham I wrote:
>>
>> Hi,
>>
>> On Tuesday 17 November 2015 01:41 PM, Alim Akhtar wrote:
>>>
>>> Hi
>>> Thanks again for looking into this.
>>>
>>> On 11/17/2015 11:46 AM, Kishon Vijay Abraham I wrote:

 Hi,

 On Monday 09 November 2015 10:56 AM, Alim Akhtar wrote:
>
> From: Seungwon Jeon 
>
> This patch introduces Exynos UFS PHY driver. This driver
> supports to deal with phy calibration and power control
> according to UFS host driver's behavior.
>
> Signed-off-by: Seungwon Jeon 
> Signed-off-by: Alim Akhtar 
> Cc: Kishon Vijay Abraham I 
> ---
>   drivers/phy/Kconfig|7 ++
>   drivers/phy/Makefile   |1 +
>   drivers/phy/phy-exynos-ufs.c   |  241
> 
>   drivers/phy/phy-exynos-ufs.h   |   85 +
>   drivers/phy/phy-exynos7-ufs.h  |   89 +
>   include/linux/phy/phy-exynos-ufs.h |   85 +
>   6 files changed, 508 insertions(+)
>   create mode 100644 drivers/phy/phy-exynos-ufs.c
>   create mode 100644 drivers/phy/phy-exynos-ufs.h
>   create mode 100644 drivers/phy/phy-exynos7-ufs.h
>   create mode 100644 include/linux/phy/phy-exynos-ufs.h
>
> diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
> index 7eb5859dd035..7d38a92e0297 100644
> --- a/drivers/phy/Kconfig
> +++ b/drivers/phy/Kconfig
> @@ -389,4 +389,11 @@ config PHY_CYGNUS_PCIE
> Enable this to support the Broadcom Cygnus PCIe PHY.
> If unsure, say N.
>
> +config PHY_EXYNOS_UFS
> +tristate "EXYNOS SoC series UFS PHY driver"
> +depends on OF && ARCH_EXYNOS || COMPILE_TEST
> +select GENERIC_PHY
> +help
> +  Support for UFS PHY on Samsung EXYNOS chipsets.
> +
>   endmenu
> diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
> index 075db1a81aa5..9bec4d1a89e1 100644
> --- a/drivers/phy/Makefile
> +++ b/drivers/phy/Makefile
> @@ -10,6 +10,7 @@ obj-$(CONFIG_ARMADA375_USBCLUSTER_PHY)+=
> phy-armada375-usb2.o
>   obj-$(CONFIG_BCM_KONA_USB2_PHY)+= phy-bcm-kona-usb2.o
>   obj-$(CONFIG_PHY_EXYNOS_DP_VIDEO)+= phy-exynos-dp-video.o
>   obj-$(CONFIG_PHY_EXYNOS_MIPI_VIDEO)+= phy-exynos-mipi-video.o
> +obj-$(CONFIG_PHY_EXYNOS_UFS)+= phy-exynos-ufs.o
>   obj-$(CONFIG_PHY_LPC18XX_USB_OTG)+= phy-lpc18xx-usb-otg.o
>   obj-$(CONFIG_PHY_PXA_28NM_USB2)+= phy-pxa-28nm-usb2.o
>   obj-$(CONFIG_PHY_PXA_28NM_HSIC)+= phy-pxa-28nm-hsic.o
> diff --git a/drivers/phy/phy-exynos-ufs.c
> b/drivers/phy/phy-exynos-ufs.c
> new file mode 100644
> index ..cb1aeaa3d4eb
> --- /dev/null
> +++ b/drivers/phy/phy-exynos-ufs.c
> @@ -0,0 +1,241 @@
> +/*
> + * UFS PHY driver for Samsung EXYNOS SoC
> + *
> + * Copyright (C) 2015 Samsung Electronics Co., Ltd.
> + * Author: Seungwon Jeon 
> + *
> + * This program is free software; you can redistribute it and/or
> modify
> + * it under the terms of the GNU General Public License as published
> by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "phy-exynos-ufs.h"
> +
> +#define for_each_phy_lane(phy, i) \
> +for (i = 0; i < (phy)->lane_cnt; i++)
> +#define for_each_phy_cfg(cfg) \
> +for (; (cfg)->id; (cfg)++)
> +
> +#define PHY_DEF_LANE_CNT1
> +
> +static void exynos_ufs_phy_config(struct exynos_ufs_phy *phy,
> +const struct exynos_ufs_phy_cfg *cfg, u8 lane)
> +{
> +enum {LANE_0, LANE_1}; /* lane index */