date:20171017

Re: [PATCH v9 09/10] block, scsi: Make SCSI quiesce and resume work reliably

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 10:18 PM, Bart Van Assche wrote:
> On Tue, 2017-10-17 at 08:33 +0200, Hannes Reinecke wrote:
>> How do you ensure that PREEMPT requests are not stuck in the queue
>> _behind_ non-PREEMPT requests?
>> Once they are in the queue the request are already allocated, so your
>> deferred allocation won't work.
>> _And_ deferred requests will be re-inserted at the head of the queue.
>> Consequently the PREEMPT request will never ever scheduled during quiesce.
>> How do you avoid such a scenario?
> 
> Hello Hannes,
> 
> The blk_mq_freeze_queue() / blk_mq_unfreeze_queue() calls that have been
> added through patch 9/10 to scsi_device_quiesce() will only succeed after all
> outstanding requests have finished. That includes non-preempt requests and
> requests that are about to be requeued. The changes in scsi_device_quiesce()
> are such that blk_queue_enter() will fail for non-preempt requests after the
> queue has been unfrozen. In other words, if a scsi_device_quiesce() call
> succeeds, it is guaranteed that there are no non-preempt requests in the queue
> and also that no new non-preempt requests will enter the queue until
> scsi_device_resume() is called.
> 
Ah. Right.

That was the explanation I was after.

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH] sd: always retry READ CAPACITY for ALUA state transition

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 03:57 PM, James Bottomley wrote:
> On Tue, 2017-10-17 at 09:11 +0200, Hannes Reinecke wrote:
>> During ALUA state transitions the device might return
>> a sense code 02/04/0a (Logical unit not accessible, asymmetric
>> access state transition). As this is a transient error
>> we should just retry the READ CAPACITY call until
>> the state transition finishes and the correct
>> capacity can be returned.
> 
> This will lock up the system if some ALUA initiator gets into a state
> where it always returns transitioning and never completes, which
> doesn't look like the best way to handle problem devices.
> 
> I thought after the ALUA transition the LUN gives a unit attention ...
> can't you use that some way to trigger the capacity re-read, so do
> asynchronous event notification instead of polling forever.
> 
Hmm.
Will give it a try.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume

2017-10-17 Thread Oleksandr Natalenko

I've indeed tested some previous versions of this patchset, but still use 
Ming's solution.

Can it be clarified which one (Bart's or Ming's) is a correct approach?

On středa 18. října 2017 1:28:14 CEST Jens Axboe wrote:
> On 10/17/2017 05:26 PM, Bart Van Assche wrote:
> > Hello Jens,
> > 
> > It is known that during the resume following a hibernate, especially when
> > using an md RAID1 array created on top of SCSI devices, sometimes the
> > system hangs instead of coming up properly. This patch series fixes that
> > problem. These patches have been tested on top of the block layer for-next
> > branch. Please consider these changes for kernel v4.15.
> 
> The folks that had trouble with this issue, have they tested the
> patchset? I'd like to get a solution queued up for this sooner rather
> than later, but I'd greatly prefer if we had some folks that have
> tested and verified it.

Re: [PATCH 1/1] qla2xxx: Initialize Work element before requesting IRQs

2017-10-17 Thread Bart Van Assche

On Mon, 2017-10-16 at 11:26 -0700, Madhani, Himanshu wrote:
> diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
> index 937209805baf..3bd956d3bc5d 100644
> --- a/drivers/scsi/qla2xxx/qla_os.c
> +++ b/drivers/scsi/qla2xxx/qla_os.c
> @@ -3061,6 +3061,8 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct 
> pci_device_id *id)
>   host->max_cmd_len, host->max_channel, host->max_lun,
>   host->transportt, sht->vendor_id);
>  
> + INIT_WORK(_vha->iocb_work, qla2x00_iocb_work_fn);
> +
>   /* Set up the irqs */
>   ret = qla2x00_request_irqs(ha, rsp);
>   if (ret)
> @@ -3175,8 +3177,6 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct 
> pci_device_id *id)
>   host->can_queue, base_vha->req,
>   base_vha->mgmt_svr_loop_id, host->sg_tablesize);
>  
> - INIT_WORK(_vha->iocb_work, qla2x00_iocb_work_fn);
> -
>   if (ha->mqenable) {
>   bool mq = false;
>   bool startit = false;

Hello Himanshu,

That patch indeed fixes the bug described in the patch description when
applied on top of kernel v4.13.7. However, with that patch applied I ran
into another bug. Can you have a look?

BUG: unable to handle kernel NULL pointer dereference at 0190
IP: qlt_free_session_done+0x172/0x570 [qla2xxx]
PGD 0 
P4D 0 
Oops:  [#1] SMP
CPU: 0 PID: 47 Comm: kworker/0:1 Not tainted 4.13.7+ #1
Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F10 
08/03/2015
Workqueue: events qlt_free_session_done [qla2xxx]
task: 9c4bcee94300 task.stack: ba99c01d4000
RIP: 0010:qlt_free_session_done+0x172/0x570 [qla2xxx]
RSP: 0018:ba99c01d7dc8 EFLAGS: 00010046
RAX:  RBX: 9c4bb5c95720 RCX: c09174c8
RDX: 0001 RSI: 9c4bb5c95720 RDI: 9c4baf7e5ce4
RBP: ba99c01d7e50 R08: c0903620 R09: 9c4bc840e400
R10: ba99c01d7db0 R11:  R12: 9c4bc840e400
R13:  R14: 9c4baf7e5000 R15: 9c4bc840e4c0
FS:  () GS:9c4bdfa0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0190 CR3: 00040c624000 CR4: 001426f0
Call Trace:
 ? qlt_unreg_sess+0xfe/0x110 [qla2xxx]
 ? qla24xx_delete_sess_fn+0x69/0x80 [qla2xxx]
 process_one_work+0x1d6/0x3d0
 worker_thread+0x42/0x3e0
 kthread+0x11f/0x140
 ? trace_event_raw_event_workqueue_execute_start+0x90/0x90
 ? kthread_create_on_node+0x40/0x40
 ret_from_fork+0x22/0x30
Code: 00 00 00 00 41 c7 87 ac 00 00 00 07 00 00 00 83 e0 f9 83 c8 04 41 f6 87 
71 ff ff ff 02 41 88 87 70 ff ff ff 48 8b 83 40 04 00 00 <8b> 80 90 01 00 00 41 
89 47 4c 74 22 41 8b 87 68 ff ff ff 25 00 
RIP: qlt_free_session_done+0x172/0x570 [qla2xxx] RSP: ba99c01d7dc8
CR2: 0190
---[ end trace 89dee74f51a05258 ]---

(gdb) list *(qlt_free_session_done+0x172)
0x661c2 is in qlt_free_session_done (drivers/scsi/qla2xxx/qla_target.c:1027).
1022}
1023
1024sess->disc_state = DSC_DELETED;
1025sess->fw_login_state = DSC_LS_PORT_UNAVAIL;
1026sess->deleted = QLA_SESS_DELETED;
1027sess->login_retry = vha->hw->login_retry_count;
1028
1029if (sess->login_succ && !IS_SW_RESV_ADDR(sess->d_id)) {
1030vha->fcport_count--;
1031sess->login_succ = 0;

[Bug 195285] qla2xxx FW immediatly crashing after target start

2017-10-17 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=195285

--- Comment #14 from himanshu.madh...@cavium.com ---
I am OOO. I will respond to your message when i am back at work.



Thanks,

Himanshu

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 195285] qla2xxx FW immediatly crashing after target start

2017-10-17 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=195285

Anthony (anthony.blood...@gmail.com) changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |CODE_FIX

--- Comment #13 from Anthony (anthony.blood...@gmail.com) ---
Patch in upstream now. Please close the bug.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] scsi: gdth: Convert timers to use timer_setup()

2017-10-17 Thread Kees Cook

On Tue, Oct 17, 2017 at 8:06 PM, Martin K. Petersen
 wrote:
>
> Kees,
>
>> Thanks for the reviews! Do you want the timer tree to carry these
>> patches, or can you pick them up in the scsi tree?
>
> Up to you. I'm not going to rebase my 4.15 queue this late in the cycle
> so the patches would end up going in at the end of the merge window. If
> you prefer to have them hit the first pull you should queue them up in
> the timer tree.

Okay, thanks, I'll take them via the timer tree.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH] sd: Micro-optimize READ / WRITE CDB encoding

2017-10-17 Thread Martin K. Petersen


Bart,

> Only compute 'threshold' if .last_sector_bug has been set. Reduce
> the number of branches that has to be taken to check the starting
> LBA and transfer length alignment. Optimize the encoding of the
> READ(10), WRITE(10), READ(16), WRITE(16), READ(32) and WRITE(32)
> CDB encoding.

Please take a look at this. It missed 4.15 because I've been fighting a
hardware bug the last several weeks. However, I'd still like to get it
in shape for 4.16:

https://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git/log/?h=scsi-work

Feel free to beat me to it.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: gdth: Convert timers to use timer_setup()

2017-10-17 Thread Martin K. Petersen


Kees,

> Thanks for the reviews! Do you want the timer tree to carry these
> patches, or can you pick them up in the scsi tree?

Up to you. I'm not going to rebase my 4.15 queue this late in the cycle
so the patches would end up going in at the end of the merge window. If
you prefer to have them hit the first pull you should queue them up in
the timer tree.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: Add REPORTLUN2 to EMC SYMMETRIX blacklist entry

2017-10-17 Thread Martin K. Petersen


Hannes,

> All EMC SYMMETRIX support REPORT_LUNS, even if configured to report
> SCSI-2 for whatever reason.

Applied to 4.15/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: Always retry internal target error

2017-10-17 Thread Martin K. Petersen


Hannes,

> + if (!strncmp(scmd->device->vendor, "EMC", 3) &&
> + !strncmp(scmd->device->model, "SYMMETRIX", 9) &&
> + (sshdr.asc == 0x44) && (sshdr.ascq == 0x0)) {
> + /*
> +  * EMC Symmetrix returns 'Internal target failure'
> +  * for a variety of internal issues, all of which
> +  * can be recovered by retry.
> +  */
> + return ADD_TO_MLQUEUE;
> + }

It's decidedly awful to have vendor/model-specific triggers in
scsi_error.

What are the drawbacks of just always refiring on AC/0x44/ITF?

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: Do not retry invalid function error

2017-10-17 Thread Martin K. Petersen


Hannes,

> Hitachi USP-V returns 'Invalid function' when the internal staging
> mechanism encountered an error. These errors should not be retried on
> another path.

s/invalid/illegal/

Applied to 4.15/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: Add TRY_VPD_PAGES to HITACHI OPEN-V blacklist entry

2017-10-17 Thread Martin K. Petersen


Hannes,

> HITACHI is always supporting VPD pages, even though it's claiming to
> support SCSI Revision 3 only.

Applied to 4.15/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: Add 'AIX VDASD' to blacklist

2017-10-17 Thread Martin K. Petersen


Hannes,

> The AIX VDASD devices do support VPD pages, but implement only SPC. So
> set BLIST_TRY_VPD_PAGS to correctly display the VPD information in
> sysfs.

Applied to 4.15/scsi-queue.

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH] scsi: sg: Re-fix off by one in sg_fill_request_table()

2017-10-17 Thread Martin K. Petersen


Ben,

> Commit 109bade9c625 ("scsi: sg: use standard lists for sg_requests")
> introduced an off-by-one error in sg_ioctl(), which was fixed by
> commit bd46fc406b30 ("scsi: sg: off by one in sg_ioctl()").
>
> Unfortunately commit 4759df905a47 ("scsi: sg: factor out
> sg_fill_request_table()") moved that code, and reintroduced the
> bug (perhaps due to a botched rebase).  Fix it again.

Applied to 4.14/scsi-fixes. Thank you!

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume

2017-10-17 Thread Jens Axboe

On 10/17/2017 05:40 PM, Bart Van Assche wrote:
> On Tue, 2017-10-17 at 17:28 -0600, Jens Axboe wrote:
>> On 10/17/2017 05:26 PM, Bart Van Assche wrote:
>>> It is known that during the resume following a hibernate, especially when
>>> using an md RAID1 array created on top of SCSI devices, sometimes the system
>>> hangs instead of coming up properly. This patch series fixes that
>>> problem. These patches have been tested on top of the block layer for-next
>>> branch. Please consider these changes for kernel v4.15.
>>
>> The folks that had trouble with this issue, have they tested the
>> patchset? I'd like to get a solution queued up for this sooner rather
>> than later, but I'd greatly prefer if we had some folks that have
>> tested and verified it.
> 
> Hello Jens,
> 
> Previous versions of this series have been tested by the two reporters of
> this issue (Oleksandr Natalenko and Martin Steigerwald). I think this version
> is close enough to what has been tested by Oleksandr and Martin for their
> test results still to be valid. See also:
> * https://www.mail-archive.com/linux-block@vger.kernel.org/msg13930.html
> * https://www.mail-archive.com/linux-block@vger.kernel.org/msg14389.html

OK good. Can we add their tested-by then?

-- 
Jens Axboe

Re: [PATCH 0/2] block/SCSI MQ: two RESTART related patches

2017-10-17 Thread Ming Lei

On Tue, Oct 17, 2017 at 04:47:23PM +0100, John Garry wrote:
> On 17/10/2017 06:12, Ming Lei wrote:
> > On Tue, Oct 17, 2017 at 01:04:16PM +0800, Ming Lei wrote:
> > > Hi Jens,
> > > 
> > > The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(),
> > > so that we can keep same behaviour with before, and it can be
> > > thought as a fix.
> > > 
> > > The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED
> > > from current blk-mq's RESTART mechanism because SCSI_MQ can covers its
> > > restart by itself, so that no need to handle TAG_SHARED in blk-mq
> > > RESTART. And >20% IOPS boost is observed in my rand read test over
> > > scsi_debug.
> > > 
> > > John, please test this two patches and see if it may improve your SAS
> > > IO performance, and you can find the two patches in the following branch:
> > > 
> > >   https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1
> > 
> 
> Hi Ming,
> 
> As requested, here's my figures for blk_mq_improve_restart_V1:
> without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off
> read, rw, write IOPS  
> 989K, 113K/113K, 835K
> 
> with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off
> read, rw, write IOPS  
> 738K, 130K/130K, 686K
> 
> For axboe for-next tip (21ed538):
> without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off
> read, rw, write IOPS  
> 977K, 117K/117K, 821K
> 
> with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off
> read, rw, write IOPS  
> 733K, 128K/128K, 676K
> 
> All cases do not have LLDD mq exposed/enabled. So unfortunately not much
> difference with your branch.

That looks a bit strange, this patchset should cut half of
blk_mq_run_hw_queues() since SCSI-MQ does it by itself in
scsi_end_request().

So looks your issue is related with neither IO merge nor RESTART.
just wondering what the possible reason is?

But this patchset is still correct thing to do, even though your
issue can't be addressed.


Thanks,
Ming

Re: [PATCH V8 4/5] libsas: Align SMP req/resp todma_get_cache_alignment()

2017-10-17 Thread 陈华才

Hi, Marek,

Yes, I know in include/linux/slab.h, there is
#define KMALLOC_MIN_SIZE ARCH_DMA_MINALIGN

But I observed that the req/resp isn't aligned to ARCH_DMA_MINALIGN and I don't 
know why.

Problems I experienced is: In non-coherent mode, mvsas with an expander cannot 
detect any disks.

Huacai
 
 
-- Original --
From:  "Marek Szyprowski";
Date:  Tue, Oct 17, 2017 07:55 PM
To:  "Huacai Chen"; "Christoph Hellwig"; 
Cc:  "Robin Murphy"; "Andrew 
Morton"; "Fuxin Zhang"; 
"linux-kernel"; "Ralf 
Baechle"; "JamesHogan"; 
"linux-mips"; "James E . J 
.Bottomley"; "Martin K . 
Petersen"; 
"linux-scsi"; "Tejun Heo"; 
"linux-ide"; "stable"; 
Subject:  Re: [PATCH V8 4/5] libsas: Align SMP req/resp 
todma_get_cache_alignment()

 
Hi Huacai,

On 2017-10-17 10:05, Huacai Chen wrote:
> In non-coherent DMA mode, kernel uses cache flushing operations to
> maintain I/O coherency, so libsas's SMP request/response should be
> aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer and a kernel
> structure share a same cache line, and if the kernel structure has
> dirty data, cache_invalidate (no writeback) will cause data corruption.
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Huacai Chen 
> ---
>   drivers/scsi/libsas/sas_expander.c | 93 
> +++---
>   1 file changed, 57 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/scsi/libsas/sas_expander.c 
> b/drivers/scsi/libsas/sas_expander.c
> index 6b4fd23..124a44b 100644
> --- a/drivers/scsi/libsas/sas_expander.c
> +++ b/drivers/scsi/libsas/sas_expander.c
> @@ -164,17 +164,17 @@ static int smp_execute_task(struct domain_device *dev, 
> void *req, int req_size,
>   
>   /* -- Allocations -- */
>   
> -static inline void *alloc_smp_req(int size)
> +static inline void *alloc_smp_req(int size, int align)
>   {
> - u8 *p = kzalloc(size, GFP_KERNEL);
> + u8 *p = kzalloc(ALIGN(size, align), GFP_KERNEL);

If I remember correctly, kernel guarantees that each kmalloced buffer is
always at least aligned to the CPU cache line, so CPU cache can be
invalidated on the allocated buffer without corrupting anything else.
Taking this into account, I wonder if the above change make sense.

Have you experienced any problems without this change?

>   if (p)
>   p[0] = SMP_REQUEST;
>   return p;
>   }
>   
> -static inline void *alloc_smp_resp(int size)
> +static inline void *alloc_smp_resp(int size, int align)
>   {
> - return kzalloc(size, GFP_KERNEL);
> + return kzalloc(ALIGN(size, align), GFP_KERNEL);

Save a above.

>   }
>   
>   static char sas_route_char(struct domain_device *dev, struct ex_phy *phy)
> @@ -403,15 +403,17 @@ static int sas_ex_phy_discover_helper(struct 
> domain_device *dev, u8 *disc_req,
>   int sas_ex_phy_discover(struct domain_device *dev, int single)
>   {
>   struct expander_device *ex = >ex_dev;
> - int  res = 0;
> + int  res = 0, align;
>   u8   *disc_req;
>   u8   *disc_resp;
>   
> - disc_req = alloc_smp_req(DISCOVER_REQ_SIZE);
> + align = dma_get_cache_alignment(>phy->dev);
> +
> + disc_req = alloc_smp_req(DISCOVER_REQ_SIZE, align);
>   if (!disc_req)
>   return -ENOMEM;
>   
> - disc_resp = alloc_smp_resp(DISCOVER_RESP_SIZE);
> + disc_resp = alloc_smp_resp(DISCOVER_RESP_SIZE, align);
>   if (!disc_resp) {
>   kfree(disc_req);
>   return -ENOMEM;
> @@ -480,14 +482,15 @@ static int sas_ex_general(struct domain_device *dev)
>   {
>   u8 *rg_req;
>   struct smp_resp *rg_resp;
> - int res;
> - int i;
> + int i, res, align;
>   
> - rg_req = alloc_smp_req(RG_REQ_SIZE);
> + align = dma_get_cache_alignment(>phy->dev);
> +
> + rg_req = alloc_smp_req(RG_REQ_SIZE, align);
>   if (!rg_req)
>   return -ENOMEM;
>   
> - rg_resp = alloc_smp_resp(RG_RESP_SIZE);
> + rg_resp = alloc_smp_resp(RG_RESP_SIZE, align);
>   if (!rg_resp) {
>   kfree(rg_req);
>   return -ENOMEM;
> @@ -552,13 +555,15 @@ static int sas_ex_manuf_info(struct domain_device *dev)
>   {
>   u8 *mi_req;
>   u8 *mi_resp;
> - int res;
> + int res, align;
>   
> - mi_req = alloc_smp_req(MI_REQ_SIZE);
> + align = dma_get_cache_alignment(>phy->dev);
> +
> + mi_req = alloc_smp_req(MI_REQ_SIZE, align);
>   if (!mi_req)
>   return -ENOMEM;
>   
> - mi_resp = alloc_smp_resp(MI_RESP_SIZE);
> + mi_resp = alloc_smp_resp(MI_RESP_SIZE, align);
>   if (!mi_resp) {
>   kfree(mi_req);

Re: [PATCH V8 5/5] libata: Align DMA buffer todma_get_cache_alignment()

2017-10-17 Thread 陈华才

Hi, Sergei,

Without this patch, in non-coherent mode, ata_do_set_mode() return with "no PIO 
support", and disk detection fails.

Huacai
 
 
-- Original --
From:  "Sergei Shtylyov";
Date:  Tue, Oct 17, 2017 05:43 PM
To:  "Huacai Chen"; "Christoph Hellwig"; 
Cc:  "Marek Szyprowski"; "Robin 
Murphy"; "Andrew Morton"; 
"Fuxin Zhang"; 
"linux-kernel"; "Ralf 
Baechle"; "James Hogan"; 
"linux-mips"; "James E . J . 
Bottomley"; "Martin K . 
Petersen"; 
"linux-scsi"; "Tejun Heo"; 
"linux-ide"; "stable"; 
Subject:  Re: [PATCH V8 5/5] libata: Align DMA buffer 
todma_get_cache_alignment()

 
On 10/17/2017 11:05 AM, Huacai Chen wrote:

> In non-coherent DMA mode, kernel uses cache flushing operations to
> maintain I/O coherency, so in ata_do_dev_read_id() the DMA buffer
> should be aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer
> and a kernel structure share a same cache line, and if the kernel
> structure has dirty data, cache_invalidate (no writeback) will cause
> data corruption.
> 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Huacai Chen 
> ---
>   drivers/ata/libata-core.c | 15 +--
>   1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index ee4c1ec..e134955 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -1833,8 +1833,19 @@ static u32 ata_pio_mask_no_iordy(const struct 
> ata_device *adev)
>   unsigned int ata_do_dev_read_id(struct ata_device *dev,
>   struct ata_taskfile *tf, u16 *id)
>   {
> - return ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE,
> -  id, sizeof(id[0]) * ATA_ID_WORDS, 0);
> + u16 *devid;
> + int res, size = sizeof(u16) * ATA_ID_WORDS;
> +
> + if (IS_ALIGNED((unsigned long)id, dma_get_cache_alignment(>tdev)))
> + res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, id, 
> size, 0);
> + else {
> + devid = kmalloc(size, GFP_KERNEL);
> + res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, devid, 
> size, 0);
> + memcpy(id, devid, size);
> + kfree(devid);
> + }
> +
> + return res;

This function is called only for the PIO mode commands, so I doubt this is 
necessary...

MBR, Sergei

Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume

2017-10-17 Thread Bart Van Assche

On Tue, 2017-10-17 at 17:28 -0600, Jens Axboe wrote:
> On 10/17/2017 05:26 PM, Bart Van Assche wrote:
> > It is known that during the resume following a hibernate, especially when
> > using an md RAID1 array created on top of SCSI devices, sometimes the system
> > hangs instead of coming up properly. This patch series fixes that
> > problem. These patches have been tested on top of the block layer for-next
> > branch. Please consider these changes for kernel v4.15.
> 
> The folks that had trouble with this issue, have they tested the
> patchset? I'd like to get a solution queued up for this sooner rather
> than later, but I'd greatly prefer if we had some folks that have
> tested and verified it.

Hello Jens,

Previous versions of this series have been tested by the two reporters of
this issue (Oleksandr Natalenko and Martin Steigerwald). I think this version
is close enough to what has been tested by Oleksandr and Martin for their
test results still to be valid. See also:
* https://www.mail-archive.com/linux-block@vger.kernel.org/msg13930.html
* https://www.mail-archive.com/linux-block@vger.kernel.org/msg14389.html

Thanks,

Bart.

Re: [PATCH v10 00/10] block, scsi, md: Improve suspend and resume

2017-10-17 Thread Jens Axboe

On 10/17/2017 05:26 PM, Bart Van Assche wrote:
> Hello Jens,
> 
> It is known that during the resume following a hibernate, especially when
> using an md RAID1 array created on top of SCSI devices, sometimes the system
> hangs instead of coming up properly. This patch series fixes that
> problem. These patches have been tested on top of the block layer for-next
> branch. Please consider these changes for kernel v4.15.

The folks that had trouble with this issue, have they tested the
patchset? I'd like to get a solution queued up for this sooner rather
than later, but I'd greatly prefer if we had some folks that have
tested and verified it.

-- 
Jens Axboe

[PATCH v10 02/10] md: Introduce md_stop_all_writes()

2017-10-17 Thread Bart Van Assche

Introduce md_stop_all_writes() because the next patch will add
a second caller for this function. This patch does not change
any functionality.

Signed-off-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Shaohua Li 
Reviewed-by: Hannes Reinecke 
Tested-by: Martin Steigerwald 
Cc: linux-r...@vger.kernel.org
Cc: Ming Lei 
Cc: Christoph Hellwig 
---
 drivers/md/md.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8933cafc212d..b99584e5d6b1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8937,8 +8937,7 @@ int rdev_clear_badblocks(struct md_rdev *rdev, sector_t 
s, int sectors,
 }
 EXPORT_SYMBOL_GPL(rdev_clear_badblocks);
 
-static int md_notify_reboot(struct notifier_block *this,
-   unsigned long code, void *x)
+static void md_stop_all_writes(void)
 {
struct list_head *tmp;
struct mddev *mddev;
@@ -8962,6 +8961,12 @@ static int md_notify_reboot(struct notifier_block *this,
 */
if (need_delay)
mdelay(1000*1);
+}
+
+static int md_notify_reboot(struct notifier_block *this,
+   unsigned long code, void *x)
+{
+   md_stop_all_writes();
 
return NOTIFY_DONE;
 }
-- 
2.14.2

[PATCH v10 10/10] block, nvme: Introduce blk_mq_req_flags_t

2017-10-17 Thread Bart Van Assche

Several block layer and NVMe core functions accept a combination
of BLK_MQ_REQ_* flags through the 'flags' argument but there is
no verification at compile time whether the right type of block
layer flags is passed. Make it possible for sparse to verify this.
This patch does not change any functionality.

Signed-off-by: Bart Van Assche 
Reviewed-by: Hannes Reinecke 
Cc: linux-n...@lists.infradead.org
Cc: Christoph Hellwig 
Cc: Johannes Thumshirn 
Cc: Ming Lei 
---
 block/blk-core.c  | 12 ++--
 block/blk-mq.c|  4 ++--
 block/blk-mq.h|  2 +-
 drivers/nvme/host/core.c  |  5 +++--
 drivers/nvme/host/nvme.h  |  5 +++--
 include/linux/blk-mq.h| 17 +++--
 include/linux/blk_types.h |  2 ++
 include/linux/blkdev.h|  4 ++--
 8 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 63e88e78c1fe..4fc9d6d44edd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -799,7 +799,7 @@ EXPORT_SYMBOL(blk_alloc_queue);
  * @q: request queue pointer
  * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
  */
-int blk_queue_enter(struct request_queue *q, unsigned int flags)
+int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
 {
const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
 
@@ -1225,7 +1225,7 @@ int blk_update_nr_requests(struct request_queue *q, 
unsigned int nr)
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *__get_request(struct request_list *rl, unsigned int op,
-struct bio *bio, unsigned int flags)
+struct bio *bio, blk_mq_req_flags_t flags)
 {
struct request_queue *q = rl->q;
struct request *rq;
@@ -1408,7 +1408,7 @@ static struct request *__get_request(struct request_list 
*rl, unsigned int op,
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *get_request(struct request_queue *q, unsigned int op,
-  struct bio *bio, unsigned int flags)
+  struct bio *bio, blk_mq_req_flags_t flags)
 {
const bool is_sync = op_is_sync(op);
DEFINE_WAIT(wait);
@@ -1458,7 +1458,7 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
 
 /* flags: BLK_MQ_REQ_PREEMPT and/or BLK_MQ_REQ_NOWAIT. */
 static struct request *blk_old_get_request(struct request_queue *q,
-  unsigned int op, unsigned int flags)
+   unsigned int op, blk_mq_req_flags_t flags)
 {
struct request *rq;
gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
@@ -1495,7 +1495,7 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
  * @flags: BLK_MQ_REQ_* flags, e.g. BLK_MQ_REQ_NOWAIT.
  */
 struct request *blk_get_request_flags(struct request_queue *q, unsigned int op,
- unsigned int flags)
+ blk_mq_req_flags_t flags)
 {
struct request *req;
 
@@ -2291,7 +2291,7 @@ blk_qc_t generic_make_request(struct bio *bio)
current->bio_list = bio_list_on_stack;
do {
struct request_queue *q = bio->bi_disk->queue;
-   unsigned int flags = bio->bi_opf & REQ_NOWAIT ?
+   blk_mq_req_flags_t flags = bio->bi_opf & REQ_NOWAIT ?
BLK_MQ_REQ_NOWAIT : 0;
 
if (likely(blk_queue_enter(q, flags) == 0)) {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c6bff60e6b8b..c037b1ad64a7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -380,7 +380,7 @@ static struct request *blk_mq_get_request(struct 
request_queue *q,
 }
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
-   unsigned int flags)
+   blk_mq_req_flags_t flags)
 {
struct blk_mq_alloc_data alloc_data = { .flags = flags };
struct request *rq;
@@ -406,7 +406,7 @@ struct request *blk_mq_alloc_request(struct request_queue 
*q, unsigned int op,
 EXPORT_SYMBOL(blk_mq_alloc_request);
 
 struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
-   unsigned int op, unsigned int flags, unsigned int hctx_idx)
+   unsigned int op, blk_mq_req_flags_t flags, unsigned int hctx_idx)
 {
struct blk_mq_alloc_data alloc_data = { .flags = flags };
struct request *rq;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 522b420dedc0..5dcfe4fa5e0d 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -110,7 +110,7 @@ static inline void blk_mq_put_ctx(struct blk_mq_ctx *ctx)
 struct blk_mq_alloc_data {
/* input parameter */
struct request_queue *q;
-   unsigned int flags;
+   blk_mq_req_flags_t flags;
unsigned int shallow_depth;

[PATCH v10 05/10] block: Introduce blk_get_request_flags()

2017-10-17 Thread Bart Van Assche

A side effect of this patch is that the GFP mask that is passed to
several allocation functions in the legacy block layer is changed
from GFP_KERNEL into __GFP_DIRECT_RECLAIM.

Signed-off-by: Bart Van Assche 
Reviewed-by: Hannes Reinecke 
Tested-by: Martin Steigerwald 
Cc: Christoph Hellwig 
Cc: Ming Lei 
Cc: Johannes Thumshirn 
---
 block/blk-core.c   | 50 +++---
 include/linux/blkdev.h |  3 +++
 2 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 23a0110fc62b..559818be8e02 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1157,7 +1157,7 @@ int blk_update_nr_requests(struct request_queue *q, 
unsigned int nr)
  * @rl: request list to allocate from
  * @op: operation and flags
  * @bio: bio to allocate request for (can be %NULL)
- * @gfp_mask: allocation mask
+ * @flags: BLQ_MQ_REQ_* flags
  *
  * Get a free request from @q.  This function may fail under memory
  * pressure or if @q is dead.
@@ -1167,7 +1167,7 @@ int blk_update_nr_requests(struct request_queue *q, 
unsigned int nr)
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *__get_request(struct request_list *rl, unsigned int op,
-   struct bio *bio, gfp_t gfp_mask)
+struct bio *bio, unsigned int flags)
 {
struct request_queue *q = rl->q;
struct request *rq;
@@ -1176,6 +1176,8 @@ static struct request *__get_request(struct request_list 
*rl, unsigned int op,
struct io_cq *icq = NULL;
const bool is_sync = op_is_sync(op);
int may_queue;
+   gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
+__GFP_DIRECT_RECLAIM;
req_flags_t rq_flags = RQF_ALLOCED;
 
lockdep_assert_held(q->queue_lock);
@@ -1336,7 +1338,7 @@ static struct request *__get_request(struct request_list 
*rl, unsigned int op,
  * @q: request_queue to allocate request from
  * @op: operation and flags
  * @bio: bio to allocate request for (can be %NULL)
- * @gfp_mask: allocation mask
+ * @flags: BLK_MQ_REQ_* flags.
  *
  * Get a free request from @q.  If %__GFP_DIRECT_RECLAIM is set in @gfp_mask,
  * this function keeps retrying under memory pressure and fails iff @q is dead.
@@ -1346,7 +1348,7 @@ static struct request *__get_request(struct request_list 
*rl, unsigned int op,
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
 static struct request *get_request(struct request_queue *q, unsigned int op,
-   struct bio *bio, gfp_t gfp_mask)
+  struct bio *bio, unsigned int flags)
 {
const bool is_sync = op_is_sync(op);
DEFINE_WAIT(wait);
@@ -1358,7 +1360,7 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
 
rl = blk_get_rl(q, bio);/* transferred to @rq on success */
 retry:
-   rq = __get_request(rl, op, bio, gfp_mask);
+   rq = __get_request(rl, op, bio, flags);
if (!IS_ERR(rq))
return rq;
 
@@ -1367,7 +1369,7 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
return ERR_PTR(-EAGAIN);
}
 
-   if (!gfpflags_allow_blocking(gfp_mask) || unlikely(blk_queue_dying(q))) 
{
+   if ((flags & BLK_MQ_REQ_NOWAIT) || unlikely(blk_queue_dying(q))) {
blk_put_rl(rl);
return rq;
}
@@ -1394,10 +1396,13 @@ static struct request *get_request(struct request_queue 
*q, unsigned int op,
goto retry;
 }
 
+/* flags: BLK_MQ_REQ_PREEMPT and/or BLK_MQ_REQ_NOWAIT. */
 static struct request *blk_old_get_request(struct request_queue *q,
-  unsigned int op, gfp_t gfp_mask)
+  unsigned int op, unsigned int flags)
 {
struct request *rq;
+   gfp_t gfp_mask = flags & BLK_MQ_REQ_NOWAIT ? GFP_ATOMIC :
+__GFP_DIRECT_RECLAIM;
int ret = 0;
 
WARN_ON_ONCE(q->mq_ops);
@@ -1410,7 +1415,7 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
if (ret)
return ERR_PTR(ret);
spin_lock_irq(q->queue_lock);
-   rq = get_request(q, op, NULL, gfp_mask);
+   rq = get_request(q, op, NULL, flags);
if (IS_ERR(rq)) {
spin_unlock_irq(q->queue_lock);
blk_queue_exit(q);
@@ -1424,25 +1429,40 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
return rq;
 }
 
-struct request *blk_get_request(struct request_queue *q, unsigned int op,
-   gfp_t gfp_mask)
+/**
+ * blk_get_request_flags - allocate a request
+ * @q: request queue to allocate a request for
+ * @op: operation (REQ_OP_*) and REQ_* flags,

[PATCH v10 06/10] block: Introduce BLK_MQ_REQ_PREEMPT

2017-10-17 Thread Bart Van Assche

Set RQF_PREEMPT if BLK_MQ_REQ_PREEMPT is passed to
blk_get_request_flags().

Signed-off-by: Bart Van Assche 
Reviewed-by: Hannes Reinecke 
Tested-by: Martin Steigerwald 
Cc: Christoph Hellwig 
Cc: Ming Lei 
Cc: Johannes Thumshirn 
---
 block/blk-core.c   | 4 +++-
 block/blk-mq.c | 2 ++
 include/linux/blk-mq.h | 1 +
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 559818be8e02..ab835d3c2f39 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1260,6 +1260,8 @@ static struct request *__get_request(struct request_list 
*rl, unsigned int op,
blk_rq_set_rl(rq, rl);
rq->cmd_flags = op;
rq->rq_flags = rq_flags;
+   if (flags & BLK_MQ_REQ_PREEMPT)
+   rq->rq_flags |= RQF_PREEMPT;
 
/* init elvpriv */
if (rq_flags & RQF_ELVPRIV) {
@@ -1441,7 +1443,7 @@ struct request *blk_get_request_flags(struct 
request_queue *q, unsigned int op,
struct request *req;
 
WARN_ON_ONCE(op & REQ_NOWAIT);
-   WARN_ON_ONCE(flags & ~BLK_MQ_REQ_NOWAIT);
+   WARN_ON_ONCE(flags & ~(BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_PREEMPT));
 
if (q->mq_ops) {
req = blk_mq_alloc_request(q, op, flags);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 59b7de6b616b..6a025b17caac 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -290,6 +290,8 @@ static struct request *blk_mq_rq_ctx_init(struct 
blk_mq_alloc_data *data,
rq->q = data->q;
rq->mq_ctx = data->ctx;
rq->cmd_flags = op;
+   if (data->flags & BLK_MQ_REQ_PREEMPT)
+   rq->rq_flags |= RQF_PREEMPT;
if (blk_queue_io_stat(data->q))
rq->rq_flags |= RQF_IO_STAT;
/* do not touch atomic flags, it needs atomic ops against the timer */
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index e5e6becd57d3..22c7f36745fc 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -213,6 +213,7 @@ enum {
BLK_MQ_REQ_NOWAIT   = (1 << 0), /* return when out of requests */
BLK_MQ_REQ_RESERVED = (1 << 1), /* allocate from reserved pool */
BLK_MQ_REQ_INTERNAL = (1 << 2), /* allocate internal/sched tag */
+   BLK_MQ_REQ_PREEMPT  = (1 << 3), /* set RQF_PREEMPT */
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,
-- 
2.14.2

[PATCH v10 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests

2017-10-17 Thread Bart Van Assche

Convert blk_get_request(q, op, __GFP_RECLAIM) into
blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not
change any functionality.

Signed-off-by: Bart Van Assche 
Tested-by: Martin Steigerwald 
Acked-by: David S. Miller  [ for IDE ]
Acked-by: Martin K. Petersen 
Reviewed-by: Hannes Reinecke 
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Johannes Thumshirn 
---
 drivers/ide/ide-pm.c| 4 ++--
 drivers/scsi/scsi_lib.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/ide/ide-pm.c b/drivers/ide/ide-pm.c
index 544f02d673ca..f56d742908df 100644
--- a/drivers/ide/ide-pm.c
+++ b/drivers/ide/ide-pm.c
@@ -89,9 +89,9 @@ int generic_ide_resume(struct device *dev)
}
 
memset(, 0, sizeof(rqpm));
-   rq = blk_get_request(drive->queue, REQ_OP_DRV_IN, __GFP_RECLAIM);
+   rq = blk_get_request_flags(drive->queue, REQ_OP_DRV_IN,
+  BLK_MQ_REQ_PREEMPT);
ide_req(rq)->type = ATA_PRIV_PM_RESUME;
-   rq->rq_flags |= RQF_PREEMPT;
rq->special = 
rqpm.pm_step = IDE_PM_START_RESUME;
rqpm.pm_state = PM_EVENT_ON;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6f10afaca25b..7c119696402c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -252,9 +252,9 @@ int scsi_execute(struct scsi_device *sdev, const unsigned 
char *cmd,
struct scsi_request *rq;
int ret = DRIVER_ERROR << 24;
 
-   req = blk_get_request(sdev->request_queue,
+   req = blk_get_request_flags(sdev->request_queue,
data_direction == DMA_TO_DEVICE ?
-   REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, __GFP_RECLAIM);
+   REQ_OP_SCSI_OUT : REQ_OP_SCSI_IN, BLK_MQ_REQ_PREEMPT);
if (IS_ERR(req))
return ret;
rq = scsi_req(req);
@@ -268,7 +268,7 @@ int scsi_execute(struct scsi_device *sdev, const unsigned 
char *cmd,
rq->retries = retries;
req->timeout = timeout;
req->cmd_flags |= flags;
-   req->rq_flags |= rq_flags | RQF_QUIET | RQF_PREEMPT;
+   req->rq_flags |= rq_flags | RQF_QUIET;
 
/*
 * head injection *required* here otherwise quiesce won't work
-- 
2.14.2

[PATCH v10 01/10] md: Rename md_notifier into md_reboot_notifier

2017-10-17 Thread Bart Van Assche

This avoids confusion with the pm notifier that will be added
through a later patch.

Signed-off-by: Bart Van Assche 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Shaohua Li 
Reviewed-by: Hannes Reinecke 
Tested-by: Martin Steigerwald 
Cc: linux-r...@vger.kernel.org
Cc: Ming Lei 
Cc: Christoph Hellwig 
---
 drivers/md/md.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5d61049e7417..8933cafc212d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8966,7 +8966,7 @@ static int md_notify_reboot(struct notifier_block *this,
return NOTIFY_DONE;
 }
 
-static struct notifier_block md_notifier = {
+static struct notifier_block md_reboot_notifier = {
.notifier_call  = md_notify_reboot,
.next   = NULL,
.priority   = INT_MAX, /* before any real devices */
@@ -9003,7 +9003,7 @@ static int __init md_init(void)
blk_register_region(MKDEV(mdp_major, 0), 1UL<

[PATCH v10 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag

2017-10-17 Thread Bart Van Assche

This flag will be used in the next patch to let the block layer
core know whether or not a SCSI request queue has been quiesced.
A quiesced SCSI queue namely only processes RQF_PREEMPT requests.

Signed-off-by: Bart Van Assche 
Reviewed-by: Hannes Reinecke 
Tested-by: Martin Steigerwald 
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Johannes Thumshirn 
---
 block/blk-core.c   | 30 ++
 block/blk-mq-debugfs.c |  1 +
 include/linux/blkdev.h |  6 ++
 3 files changed, 37 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index ab835d3c2f39..b73203286bf5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -346,6 +346,36 @@ void blk_sync_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_sync_queue);
 
+/**
+ * blk_set_preempt_only - set QUEUE_FLAG_PREEMPT_ONLY
+ * @q: request queue pointer
+ *
+ * Returns the previous value of the PREEMPT_ONLY flag - 0 if the flag was not
+ * set and 1 if the flag was already set.
+ */
+int blk_set_preempt_only(struct request_queue *q)
+{
+   unsigned long flags;
+   int res;
+
+   spin_lock_irqsave(q->queue_lock, flags);
+   res = queue_flag_test_and_set(QUEUE_FLAG_PREEMPT_ONLY, q);
+   spin_unlock_irqrestore(q->queue_lock, flags);
+
+   return res;
+}
+EXPORT_SYMBOL_GPL(blk_set_preempt_only);
+
+void blk_clear_preempt_only(struct request_queue *q)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(q->queue_lock, flags);
+   queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
+   spin_unlock_irqrestore(q->queue_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
+
 /**
  * __blk_run_queue_uncond - run a queue whether or not it has been stopped
  * @q: The queue to run
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 7f4a1ba532af..75f31535f280 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -74,6 +74,7 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(REGISTERED),
QUEUE_FLAG_NAME(SCSI_PASSTHROUGH),
QUEUE_FLAG_NAME(QUIESCED),
+   QUEUE_FLAG_NAME(PREEMPT_ONLY),
 };
 #undef QUEUE_FLAG_NAME
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 05203175eb9c..864ad2e4a58c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -630,6 +630,7 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED  26  /* queue has been registered to a disk 
*/
 #define QUEUE_FLAG_SCSI_PASSTHROUGH 27 /* queue supports SCSI commands */
 #define QUEUE_FLAG_QUIESCED28  /* queue has been quiesced */
+#define QUEUE_FLAG_PREEMPT_ONLY29  /* only process REQ_PREEMPT 
requests */
 
 #define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) |\
 (1 << QUEUE_FLAG_SAME_COMP)|   \
@@ -730,6 +731,11 @@ static inline void queue_flag_clear(unsigned int flag, 
struct request_queue *q)
((rq)->cmd_flags & (REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT| \
 REQ_FAILFAST_DRIVER))
 #define blk_queue_quiesced(q)  test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags)
+#define blk_queue_preempt_only(q)  \
+   test_bit(QUEUE_FLAG_PREEMPT_ONLY, &(q)->queue_flags)
+
+extern int blk_set_preempt_only(struct request_queue *q);
+extern void blk_clear_preempt_only(struct request_queue *q);
 
 static inline bool blk_account_rq(struct request *rq)
 {
-- 
2.14.2

[PATCH v10 04/10] block: Make q_usage_counter also track legacy requests

2017-10-17 Thread Bart Van Assche

From: Ming Lei 

This patch makes it possible to pause request allocation for
the legacy block layer by calling blk_mq_freeze_queue() and
blk_mq_unfreeze_queue().

Signed-off-by: Ming Lei 
[ bvanassche: Combined two patches into one, edited a comment and made sure
  REQ_NOWAIT is handled properly in blk_old_get_request() ]
Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Johannes Thumshirn 
Reviewed-by: Hannes Reinecke 
Tested-by: Martin Steigerwald 
Cc: Ming Lei 
---
 block/blk-core.c | 12 
 block/blk-mq.c   | 10 ++
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e8e149ccc86b..23a0110fc62b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -610,6 +610,9 @@ void blk_set_queue_dying(struct request_queue *q)
}
spin_unlock_irq(q->queue_lock);
}
+
+   /* Make blk_queue_enter() reexamine the DYING flag. */
+   wake_up_all(>mq_freeze_wq);
 }
 EXPORT_SYMBOL_GPL(blk_set_queue_dying);
 
@@ -1395,16 +1398,22 @@ static struct request *blk_old_get_request(struct 
request_queue *q,
   unsigned int op, gfp_t gfp_mask)
 {
struct request *rq;
+   int ret = 0;
 
WARN_ON_ONCE(q->mq_ops);
 
/* create ioc upfront */
create_io_context(gfp_mask, q->node);
 
+   ret = blk_queue_enter(q, !(gfp_mask & __GFP_DIRECT_RECLAIM) ||
+ (op & REQ_NOWAIT));
+   if (ret)
+   return ERR_PTR(ret);
spin_lock_irq(q->queue_lock);
rq = get_request(q, op, NULL, gfp_mask);
if (IS_ERR(rq)) {
spin_unlock_irq(q->queue_lock);
+   blk_queue_exit(q);
return rq;
}
 
@@ -1576,6 +1585,7 @@ void __blk_put_request(struct request_queue *q, struct 
request *req)
blk_free_request(rl, req);
freed_request(rl, sync, rq_flags);
blk_put_rl(rl);
+   blk_queue_exit(q);
}
 }
 EXPORT_SYMBOL_GPL(__blk_put_request);
@@ -1857,8 +1867,10 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, 
struct bio *bio)
 * Grab a free request. This is might sleep but can not fail.
 * Returns with the queue unlocked.
 */
+   blk_queue_enter_live(q);
req = get_request(q, bio->bi_opf, bio, GFP_NOIO);
if (IS_ERR(req)) {
+   blk_queue_exit(q);
__wbt_done(q->rq_wb, wb_acct);
if (PTR_ERR(req) == -ENOMEM)
bio->bi_status = BLK_STS_RESOURCE;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 097ca3ece716..59b7de6b616b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,7 +125,8 @@ void blk_freeze_queue_start(struct request_queue *q)
freeze_depth = atomic_inc_return(>mq_freeze_depth);
if (freeze_depth == 1) {
percpu_ref_kill(>q_usage_counter);
-   blk_mq_run_hw_queues(q, false);
+   if (q->mq_ops)
+   blk_mq_run_hw_queues(q, false);
}
 }
 EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
@@ -255,13 +256,6 @@ void blk_mq_wake_waiters(struct request_queue *q)
queue_for_each_hw_ctx(q, hctx, i)
if (blk_mq_hw_queue_mapped(hctx))
blk_mq_tag_wakeup_all(hctx->tags, true);
-
-   /*
-* If we are called because the queue has now been marked as
-* dying, we need to ensure that processes currently waiting on
-* the queue are notified as well.
-*/
-   wake_up_all(>mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
-- 
2.14.2

[PATCH v10 09/10] block, scsi: Make SCSI quiesce and resume work reliably

2017-10-17 Thread Bart Van Assche

The contexts from which a SCSI device can be quiesced or resumed are:
* Writing into /sys/class/scsi_device/*/device/state.
* SCSI parallel (SPI) domain validation.
* The SCSI device power management methods. See also scsi_bus_pm_ops.

It is essential during suspend and resume that neither the filesystem
state nor the filesystem metadata in RAM changes. This is why while
the hibernation image is being written or restored that SCSI devices
are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
and scsi_device_resume(). In the SDEV_QUIESCE state execution of
non-preempt requests is deferred. This is realized by returning
BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
devices. Avoid that a full queue prevents power management requests
to be submitted by deferring allocation of non-preempt requests for
devices in the quiesced state. This patch has been tested by running
the following commands and by verifying that after resume the fio job
is still running:

for d in /sys/class/block/sd*[a-z]; do
  hcil=$(readlink "$d/device")
  hcil=${hcil#../../../}
  echo 4 > "$d/queue/nr_requests"
  echo 1 > "/sys/class/scsi_device/$hcil/device/queue_depth"
done
bdev=$(readlink /dev/disk/by-uuid/5217d83f-213e-4b42-b86e-20013325ba6c)
bdev=${bdev#../../}
hcil=$(readlink "/sys/block/$bdev/device")
hcil=${hcil#../../../}
fio --name="$bdev" --filename="/dev/$bdev" --buffered=0 --bs=512 --rw=randread \
  --ioengine=libaio --numjobs=4 --iodepth=16 --iodepth_batch=1 --thread \
  --loops=$((2**31)) &
pid=$!
sleep 1
systemctl hibernate
sleep 10
kill $pid

Reported-by: Oleksandr Natalenko 
References: "I/O hangs after resuming from suspend-to-ram" 
(https://marc.info/?l=linux-block=150340235201348).
Signed-off-by: Bart Van Assche 
Tested-by: Martin Steigerwald 
Cc: Martin K. Petersen 
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 block/blk-core.c   | 43 ---
 block/blk-mq.c |  4 ++--
 block/blk-timeout.c|  2 +-
 drivers/scsi/scsi_lib.c| 42 ++
 fs/block_dev.c |  4 ++--
 include/linux/blkdev.h |  2 +-
 include/scsi/scsi_device.h |  1 +
 7 files changed, 73 insertions(+), 25 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index b73203286bf5..63e88e78c1fe 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -372,6 +372,7 @@ void blk_clear_preempt_only(struct request_queue *q)
 
spin_lock_irqsave(q->queue_lock, flags);
queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q);
+   wake_up_all(>mq_freeze_wq);
spin_unlock_irqrestore(q->queue_lock, flags);
 }
 EXPORT_SYMBOL_GPL(blk_clear_preempt_only);
@@ -793,15 +794,41 @@ struct request_queue *blk_alloc_queue(gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(blk_alloc_queue);
 
-int blk_queue_enter(struct request_queue *q, bool nowait)
+/**
+ * blk_queue_enter() - try to increase q->q_usage_counter
+ * @q: request queue pointer
+ * @flags: BLK_MQ_REQ_NOWAIT and/or BLK_MQ_REQ_PREEMPT
+ */
+int blk_queue_enter(struct request_queue *q, unsigned int flags)
 {
+   const bool preempt = flags & BLK_MQ_REQ_PREEMPT;
+
while (true) {
+   bool success = false;
int ret;
 
-   if (percpu_ref_tryget_live(>q_usage_counter))
+   rcu_read_lock_sched();
+   if (percpu_ref_tryget_live(>q_usage_counter)) {
+   /*
+* The code that sets the PREEMPT_ONLY flag is
+* responsible for ensuring that that flag is globally
+* visible before the queue is unfrozen.
+*/
+   if (preempt || !blk_queue_preempt_only(q)) {
+   success = true;
+   } else {
+   percpu_ref_put(>q_usage_counter);
+   WARN_ONCE(true,
+ "%s: Attempt to allocate non-preempt 
request in preempt-only mode.\n",
+ kobject_name(q->kobj.parent));
+   }
+   }
+   rcu_read_unlock_sched();
+
+   if (success)
return 0;
 
-   if (nowait)
+   if (flags & BLK_MQ_REQ_NOWAIT)
return -EBUSY;
 
/*
@@ -814,7 +841,8 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
smp_rmb();
 
ret = wait_event_interruptible(q->mq_freeze_wq,
-   !atomic_read(>mq_freeze_depth) ||
+   (atomic_read(>mq_freeze_depth) == 0 &&
+(preempt ||

[PATCH v10 03/10] md: Neither resync nor reshape while the system is frozen

2017-10-17 Thread Bart Van Assche

Some people use the md driver on laptops and use the suspend and
resume functionality. Since it is essential that submitting of
new I/O requests stops before a hibernation image is created,
interrupt the md resync and reshape actions if the system is
being frozen. Note: the resync and reshape will restart after
the system is resumed and a message similar to the following
will appear in the system log:

md: md0: data-check interrupted.

Signed-off-by: Bart Van Assche 
Reviewed-by: Shaohua Li 
Reviewed-by: Hannes Reinecke 
Tested-by: Martin Steigerwald 
Cc: linux-r...@vger.kernel.org
Cc: Ming Lei 
Cc: Christoph Hellwig 
Cc: Johannes Thumshirn 
---
 drivers/md/md.c | 50 +-
 drivers/md/md.h |  8 
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index b99584e5d6b1..8b2fc750f97f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -66,6 +66,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include "md.h"
@@ -7439,6 +7441,7 @@ static int md_thread(void *arg)
 */
 
allow_signal(SIGKILL);
+   set_freezable();
while (!kthread_should_stop()) {
 
/* We need to wait INTERRUPTIBLE so that
@@ -7449,7 +7452,7 @@ static int md_thread(void *arg)
if (signal_pending(current))
flush_signals(current);
 
-   wait_event_interruptible_timeout
+   wait_event_freezable_timeout
(thread->wqueue,
 test_bit(THREAD_WAKEUP, >flags)
 || kthread_should_stop() || kthread_should_park(),
@@ -8944,6 +8947,8 @@ static void md_stop_all_writes(void)
int need_delay = 0;
 
for_each_mddev(mddev, tmp) {
+   mddev->frozen_before_suspend =
+   test_bit(MD_RECOVERY_FROZEN, >recovery);
if (mddev_trylock(mddev)) {
if (mddev->pers)
__md_stop_writes(mddev);
@@ -8963,6 +8968,47 @@ static void md_stop_all_writes(void)
mdelay(1000*1);
 }
 
+static void md_restore_frozen_flag(void)
+{
+   struct list_head *tmp;
+   struct mddev *mddev;
+
+   for_each_mddev(mddev, tmp) {
+   if (mddev->frozen_before_suspend)
+   set_bit(MD_RECOVERY_FROZEN, >recovery);
+   else
+   clear_bit(MD_RECOVERY_FROZEN, >recovery);
+   }
+}
+
+/*
+ * Ensure that neither resyncing nor reshaping occurs while the system is
+ * frozen.
+ */
+static int md_notify_pm(struct notifier_block *bl, unsigned long state,
+   void *unused)
+{
+   pr_debug("%s: state = %ld\n", __func__, state);
+
+   switch (state) {
+   case PM_HIBERNATION_PREPARE:
+   case PM_SUSPEND_PREPARE:
+   case PM_RESTORE_PREPARE:
+   md_stop_all_writes();
+   break;
+   case PM_POST_HIBERNATION:
+   case PM_POST_SUSPEND:
+   case PM_POST_RESTORE:
+   md_restore_frozen_flag();
+   break;
+   }
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block md_pm_notifier = {
+   .notifier_call  = md_notify_pm,
+};
+
 static int md_notify_reboot(struct notifier_block *this,
unsigned long code, void *x)
 {
@@ -9009,6 +9055,7 @@ static int __init md_init(void)
md_probe, NULL, NULL);
 
register_reboot_notifier(_reboot_notifier);
+   register_pm_notifier(_pm_notifier);
raid_table_header = register_sysctl_table(raid_root_table);
 
md_geninit();
@@ -9248,6 +9295,7 @@ static __exit void md_exit(void)
 
unregister_blkdev(MD_MAJOR,"md");
unregister_blkdev(mdp_major, "mdp");
+   unregister_pm_notifier(_pm_notifier);
unregister_reboot_notifier(_reboot_notifier);
unregister_sysctl_table(raid_table_header);
 
diff --git a/drivers/md/md.h b/drivers/md/md.h
index d8287d3cd1bf..727b3aef4d26 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -356,6 +356,14 @@ struct mddev {
 */
int recovery_disabled;
 
+   /* Writes are stopped before hibernation, suspend and restore by
+* calling md_stop_all_writes(). That function sets the
+* MD_RECOVERY_FROZEN flag. The value of that flag is saved before
+* writes are stopped such that it can be restored after hibernation,
+* suspend or restore have finished.
+*/
+   boolfrozen_before_suspend;
+
int in_sync;/* know to not need 
resync */
/* 'open_mutex' avoids races between 'md_open' and 'do_md_stop', so
 * that we are never stopping an array while it is open.

[PATCH v10 00/10] block, scsi, md: Improve suspend and resume

2017-10-17 Thread Bart Van Assche

Hello Jens,

It is known that during the resume following a hibernate, especially when
using an md RAID1 array created on top of SCSI devices, sometimes the system
hangs instead of coming up properly. This patch series fixes that
problem. These patches have been tested on top of the block layer for-next
branch. Please consider these changes for kernel v4.15.

Thanks,

Bart.

Changes between v9 and v10:
- Made sure that scsi_device_resume() handles SCSI devices that are in an
  error state correctly. Unlike other SCSI transport protocols, during an
  ATA device resume the ATA controller is reset. This should fix the
  blk_queue_enter WARNING reported by the kernel test robot.

Changes between v8 and v9:
- Modified the third patch such that the MD_RECOVERY_FROZEN flag is restored
  properly after a resume.
- Modified the ninth patch such that a kernel warning is reported if it is
  attempted to call scsi_device_quiesce() from multiple contexts concurrently.
- Added Reviewed-by / Tested-by tags as appropriate.
  
Changes between v7 and v8:
- Fixed a (theoretical?) race identified by Ming Lei.
- Added a tenth patch that checks whether the proper type of flags has been
  passed to a range of block layer functions.

Changes between v6 and v7:
- Added support for the PREEMPT_ONLY flag in blk-mq-debugfs.c.
- Fixed kerneldoc header of blk_queue_enter().
- Added a rcu_read_lock_sched() / rcu_read_unlock_sched() pair in
  blk_set_preempt_only().
- Removed a synchronize_rcu() call from scsi_device_quiesce().
- Modified the description of patch 9/9 in this series.
- Removed scsi_run_queue() call from scsi_device_resume().

Changes between v5 and v6:
- Split an md patch into two patches to make it easier to review the changes.
- For the md patch that suspends I/O while the system is frozen, switched back
  to the freezer mechanism because this makes the code shorter and easier to
  review.
- Changed blk_set/clear_preempt_only() from EXPORT_SYMBOL() into
  EXPORT_SYMBOL_GPL().
- Made blk_set_preempt_only() behave as a test-and-set operation.
- Introduced blk_get_request_flags() and BLK_MQ_REQ_PREEMPT as requested by
  Christoph and reduced the number of arguments of blk_queue_enter() back from
  three to two.
- In scsi_device_quiesce(), moved the blk_mq_freeze_queue() call out of a
  critical section. Made the explanation of why the synchronize_rcu() call
  is necessary more detailed.

Changes between v4 and v5:
- Split blk_set_preempt_only() into two functions as requested by Christoph.
- Made blk_get_request() trigger WARN_ONCE() if it is attempted to allocate
  a request while the system is frozen. This is a big help to identify drivers
  that submit I/O while the system is frozen.
- Since kernel thread freezing is on its way out, reworked the approach for
  avoiding that the md driver submits I/O while the system is frozen such that
  the approach no longer depends on the kernel thread freeze mechanism.
- Fixed the (theoretical) deadlock in scsi_device_quiesce() that was identified
  by Ming.

Changes between v3 and v4:
- Made sure that this patch series not only works for scsi-mq but also for
  the legacy SCSI stack.
- Removed an smp_rmb()/smp_wmb() pair from the hot path and added a comment
  that explains why that is safe.
- Reordered the patches in this patch series.

Changes between v2 and v3:
- Made md kernel threads freezable.
- Changed the approach for quiescing SCSI devices again.
- Addressed Ming's review comments.

Changes compared to v1 of this patch series:
- Changed the approach and rewrote the patch series.

Bart Van Assche (9):
  md: Rename md_notifier into md_reboot_notifier
  md: Introduce md_stop_all_writes()
  md: Neither resync nor reshape while the system is frozen
  block: Introduce blk_get_request_flags()
  block: Introduce BLK_MQ_REQ_PREEMPT
  ide, scsi: Tell the block layer at request allocation time about
preempt requests
  block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  block, scsi: Make SCSI quiesce and resume work reliably
  block, nvme: Introduce blk_mq_req_flags_t

Ming Lei (1):
  block: Make q_usage_counter also track legacy requests

 block/blk-core.c   | 133 ++---
 block/blk-mq-debugfs.c |   1 +
 block/blk-mq.c |  20 +++
 block/blk-mq.h |   2 +-
 block/blk-timeout.c|   2 +-
 drivers/ide/ide-pm.c   |   4 +-
 drivers/md/md.c|  65 --
 drivers/md/md.h|   8 +++
 drivers/nvme/host/core.c   |   5 +-
 drivers/nvme/host/nvme.h   |   5 +-
 drivers/scsi/scsi_lib.c|  48 +++-
 fs/block_dev.c |   4 +-
 include/linux/blk-mq.h |  16 --
 include/linux/blk_types.h  |   2 +
 include/linux/blkdev.h |  11 +++-
 include/scsi/scsi_device.h |   1 +
 16 files changed, 258 insertions(+), 69 deletions(-)

-- 
2.14.2

Re: [PATCH] sd: Micro-optimize READ / WRITE CDB encoding

2017-10-17 Thread Bart Van Assche

On Tue, 2017-10-17 at 18:17 -0400, Douglas Gilbert wrote:
> On 2017-10-17 05:03 PM, Bart Van Assche wrote:
> > @@ -1025,7 +1025,6 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd 
> > *SCpnt)
> > struct gendisk *disk = rq->rq_disk;
> > struct scsi_disk *sdkp = scsi_disk(disk);
> > sector_t block = blk_rq_pos(rq);
> 
> s/block/lba/  # use the well understood SCSI abbreviation

Since blk_rq_pos() returns the offset in units of 512 bytes, how about renaming
'block' into 'sector' and using the name 'lba' for the number obtained after the
shift operation?

> 
> > -   sector_t threshold;
> > unsigned int this_count = blk_rq_sectors(rq);
> > unsigned int dif, dix;
> > bool zoned_write = sd_is_zoned(sdkp) && rq_data_dir(rq) == WRITE;
> > @@ -1071,20 +1070,22 @@ static int sd_setup_read_write_cmnd(struct 
> > scsi_cmnd *SCpnt)
> > goto out;
> > }
> >   
> > -   /*
> > -* Some SD card readers can't handle multi-sector accesses which touch
> > -* the last one or two hardware sectors.  Split accesses as needed.
> > -*/
> > -   threshold = get_capacity(disk) - SD_LAST_BUGGY_SECTORS *
> > -   (sdp->sector_size / 512);
> > +   if (unlikely(sdp->last_sector_bug)) {
> > +   sector_t threshold;
> 
> s/threshold/threshold_lba/# a bit long but more precise

A similar comment applies here - shouldn't this be called 'threshold_sector'?

> > }
> > }
> > if (rq_data_dir(rq) == WRITE) {
> > @@ -1173,56 +1157,26 @@ static int sd_setup_read_write_cmnd(struct 
> > scsi_cmnd *SCpnt)
> > SCpnt->cmnd[7] = 0x18;
> > SCpnt->cmnd[9] = (rq_data_dir(rq) == READ) ? READ_32 : WRITE_32;
> 
> Perhaps rq_data_dir(rq) could be placed in a local variable

I will keep that for a separate patch.

Bart.

Re: [PATCH] sd: Micro-optimize READ / WRITE CDB encoding

2017-10-17 Thread Douglas Gilbert


On 2017-10-17 05:03 PM, Bart Van Assche wrote:

Only compute 'threshold' if .last_sector_bug has been set. Reduce
the number of branches that has to be taken to check the starting
LBA and transfer length alignment. Optimize the encoding of the
READ(10), WRITE(10), READ(16), WRITE(16), READ(32) and WRITE(32)
CDB encoding.


Great!

Some suggestions, see inline below.

And showing us the improved version of sd_setup_read_write_cmnd()
as well as the diff could help us (or at least me) see the bigger
picture.

Doug Gilbert


Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
  drivers/scsi/sd.c | 102 +++---
  1 file changed, 28 insertions(+), 74 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 67b50baa943c..83284a8fa2cd 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1025,7 +1025,6 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd 
*SCpnt)
struct gendisk *disk = rq->rq_disk;
struct scsi_disk *sdkp = scsi_disk(disk);
sector_t block = blk_rq_pos(rq);


s/block/lba/# use the well understood SCSI abbreviation


-   sector_t threshold;
unsigned int this_count = blk_rq_sectors(rq);
unsigned int dif, dix;
bool zoned_write = sd_is_zoned(sdkp) && rq_data_dir(rq) == WRITE;
@@ -1071,20 +1070,22 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd 
*SCpnt)
goto out;
}
  
-	/*

-* Some SD card readers can't handle multi-sector accesses which touch
-* the last one or two hardware sectors.  Split accesses as needed.
-*/
-   threshold = get_capacity(disk) - SD_LAST_BUGGY_SECTORS *
-   (sdp->sector_size / 512);
+   if (unlikely(sdp->last_sector_bug)) {
+   sector_t threshold;


s/threshold/threshold_lba/  # a bit long but more precise

  
-	if (unlikely(sdp->last_sector_bug && block + this_count > threshold)) {

-   if (block < threshold) {
-   /* Access up to the threshold but not beyond */
-   this_count = threshold - block;
-   } else {
+   /*
+* Some SD card readers can't handle multi-sector accesses
+* which touch the last one or two hardware sectors.  Split
+* accesses as needed.
+*/
+   threshold = get_capacity(disk) - SD_LAST_BUGGY_SECTORS *
+   (sdp->sector_size / 512);
+   if (block >= threshold) {
/* Access only a single hardware sector */
this_count = sdp->sector_size / 512;
+   } else if (block + this_count > threshold) {
+   /* Access up to the threshold but not beyond */
+   this_count = threshold - block;
}
}
  
@@ -1102,34 +1103,17 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)

 * and not force the scsi disk driver to use bounce buffers
 * for this.
 */
-   if (sdp->sector_size == 1024) {
-   if ((block & 1) || (blk_rq_sectors(rq) & 1)) {
+   if (sdp->sector_size != 512) {
+   if ((block | this_count) & (sdp->sector_size / 512 - 1)) {
scmd_printk(KERN_ERR, SCpnt,
-   "Bad block number requested\n");
+   "Bad block number requested: block number = 
%llu, sector count = %u, sector size = %u bytes\n",
+   block + 0ULL, this_count, sdp->sector_size);
goto out;
} else {
-   block = block >> 1;
-   this_count = this_count >> 1;
-   }
-   }
-   if (sdp->sector_size == 2048) {
-   if ((block & 3) || (blk_rq_sectors(rq) & 3)) {
-   scmd_printk(KERN_ERR, SCpnt,
-   "Bad block number requested\n");
-   goto out;
-   } else {
-   block = block >> 2;
-   this_count = this_count >> 2;
-   }
-   }
-   if (sdp->sector_size == 4096) {
-   if ((block & 7) || (blk_rq_sectors(rq) & 7)) {
-   scmd_printk(KERN_ERR, SCpnt,
-   "Bad block number requested\n");
-   goto out;
-   } else {
-   block = block >> 3;
-   this_count = this_count >> 3;
+   int shift = ilog2(sdp->sector_size / 512);
+
+   block = block >> shift;


lba >>= shift;/* scale down LBA */


+   this_count = this_count >> shift;


likewise

Re: integer overflow in aic7xxx (was [bug report] Linux-2.6.12-rc2)

2017-10-17 Thread Dan Carpenter

Argh...  My script chose a stupid subject.  Sorry for not catching that.

regards,
dan carpenter

On Wed, Oct 18, 2017 at 12:52:49AM +0300, Dan Carpenter wrote:
> Hey,
> 
> This code is older than git is so it probably doesn't matter.  But just
> for laughs does anyone know what this should be?
> 
>   drivers/scsi/aic7xxx/aic7xxx_core.c:4807 ahc_init_scbdata()
>   warn: integer overflow (literal): u32max + 1
> 
> drivers/scsi/aic7xxx/aic7xxx_core.c
>   4794  
>   4795  /*
>   4796   * Create our DMA tags.  These tags define the kinds of device
>   4797   * accessible memory allocations and memory mappings we will
>   4798   * need to perform during normal operation.
>   4799   *
>   4800   * Unless we need to further restrict the allocation, we rely
>   4801   * on the restrictions of the parent dmat, hence the common
>   4802   * use of MAXADDR and MAXSIZE.
>   4803   */
>   4804  
>   4805  /* DMA tag for our hardware scb structures */
>   4806  if (ahc_dma_tag_create(ahc, ahc->parent_dmat, /*alignment*/1,
>   4807 /*boundary*/BUS_SPACE_MAXADDR_32BIT + 
> 1,
>^^^
> This is "0x + 1" which has an integer overflow so it's a
> complicated way to say zero.
> 
>   4808 /*lowaddr*/BUS_SPACE_MAXADDR_32BIT,
>   4809 /*highaddr*/BUS_SPACE_MAXADDR,
>   4810 /*filter*/NULL, /*filterarg*/NULL,
>   4811 AHC_SCB_MAX_ALLOC * sizeof(struct 
> hardware_scb),
>   4812 /*nsegments*/1,
>   4813 /*maxsegsz*/BUS_SPACE_MAXSIZE_32BIT,
>   4814 /*flags*/0, _data->hscb_dmat) != 
> 0) {
>   4815  goto error_exit;
>   4816  }
>   4817  
>   4818  scb_data->init_level++;
>   4819  
> 
> regards,
> dan carpenter

[bug report] Linux-2.6.12-rc2

2017-10-17 Thread Dan Carpenter

Hey,

This code is older than git is so it probably doesn't matter.  But just
for laughs does anyone know what this should be?

drivers/scsi/aic7xxx/aic7xxx_core.c:4807 ahc_init_scbdata()
warn: integer overflow (literal): u32max + 1

drivers/scsi/aic7xxx/aic7xxx_core.c
  4794  
  4795  /*
  4796   * Create our DMA tags.  These tags define the kinds of device
  4797   * accessible memory allocations and memory mappings we will
  4798   * need to perform during normal operation.
  4799   *
  4800   * Unless we need to further restrict the allocation, we rely
  4801   * on the restrictions of the parent dmat, hence the common
  4802   * use of MAXADDR and MAXSIZE.
  4803   */
  4804  
  4805  /* DMA tag for our hardware scb structures */
  4806  if (ahc_dma_tag_create(ahc, ahc->parent_dmat, /*alignment*/1,
  4807 /*boundary*/BUS_SPACE_MAXADDR_32BIT + 1,
   ^^^
This is "0x + 1" which has an integer overflow so it's a
complicated way to say zero.

  4808 /*lowaddr*/BUS_SPACE_MAXADDR_32BIT,
  4809 /*highaddr*/BUS_SPACE_MAXADDR,
  4810 /*filter*/NULL, /*filterarg*/NULL,
  4811 AHC_SCB_MAX_ALLOC * sizeof(struct 
hardware_scb),
  4812 /*nsegments*/1,
  4813 /*maxsegsz*/BUS_SPACE_MAXSIZE_32BIT,
  4814 /*flags*/0, _data->hscb_dmat) != 0) {
  4815  goto error_exit;
  4816  }
  4817  
  4818  scb_data->init_level++;
  4819  

regards,
dan carpenter

Re: [PATCH v4 2/5] dt-bindings: scsi: ufs: add document for hisi-ufs

2017-10-17 Thread Rob Herring

On Fri, Oct 13, 2017 at 03:49:33PM +0800, Li Wei wrote:
> add ufs node document for Hisilicon.
> 
> Signed-off-by: Li Wei 
> ---
>  Documentation/devicetree/bindings/ufs/ufs-hisi.txt | 47 
> ++
>  1 file changed, 47 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/ufs/ufs-hisi.txt
> 
> diff --git a/Documentation/devicetree/bindings/ufs/ufs-hisi.txt 
> b/Documentation/devicetree/bindings/ufs/ufs-hisi.txt
> new file mode 100644
> index ..ee114a65143d
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/ufs/ufs-hisi.txt
> @@ -0,0 +1,47 @@
> +* Hisilicon Universal Flash Storage (UFS) Host Controller
> +
> +UFS nodes are defined to describe on-chip UFS hardware macro.
> +Each UFS Host Controller should have its own node.
> +
> +Required properties:
> +- compatible: compatible list, contains one of the following -
> + "hisilicon,hi3660-ufs" for hisi ufs host controller
> +  present on Hi3660 chipset.
> +- reg   : should contain UFS register address space & UFS SYS 
> CTRL register address,
> +- interrupt-parent  : interrupt device
> +- interrupts: interrupt number
> +- clocks : List of phandle and clock specifier pairs
> +- clock-names   : List of clock input name strings sorted in the same
> +   order as the clocks property. "clk_ref", "clk_phy" is 
> optional
> +- resets: reset node register, one reset the clk and the other 
> reset the controller
> +- reset-names   : describe reset node register
> +
> +Optional properties for board device:
> +- reset-gpio : specifies to reset devices

Should be "reset-gpios".

> +
> +Example:
> +
> + ufs: ufs@ff3b {
> + compatible = "hisilicon,hi3660-ufs", "jedec,ufs-1.1";
> + /* 0: HCI standard */
> + /* 1: UFS SYS CTRL */
> + reg = <0x0 0xff3b 0x0 0x1000>,
> + <0x0 0xff3b1000 0x0 0x1000>;
> + interrupt-parent = <>;
> + interrupts = ;
> + clocks = <_ctrl HI3660_CLK_GATE_UFSIO_REF>,
> + <_ctrl HI3660_CLK_GATE_UFSPHY_CFG>;
> + clock-names = "clk_ref", "clk_phy";
> + freq-table-hz = <0 0>, <0 0>;
> + /* offset: 0x84; bit: 12 */
> + /* offset: 0x84; bit: 7  */
> + resets = <_rst 0x84 12>,
> + <_rst 0x84 7>;
> + reset-names = "rst", "assert";
> + };
> +
> +  {
> + reset-gpio = < 1 0>;
> + status = "okay";

Don't show status or the chip/board split in examples.

> + };
> +
> -- 
> 2.11.0
>

[PATCH] sd: Micro-optimize READ / WRITE CDB encoding

2017-10-17 Thread Bart Van Assche

Only compute 'threshold' if .last_sector_bug has been set. Reduce
the number of branches that has to be taken to check the starting
LBA and transfer length alignment. Optimize the encoding of the
READ(10), WRITE(10), READ(16), WRITE(16), READ(32) and WRITE(32)
CDB encoding.

Signed-off-by: Bart Van Assche 
Cc: Christoph Hellwig 
Cc: Hannes Reinecke 
Cc: Johannes Thumshirn 
---
 drivers/scsi/sd.c | 102 +++---
 1 file changed, 28 insertions(+), 74 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 67b50baa943c..83284a8fa2cd 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1025,7 +1025,6 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd 
*SCpnt)
struct gendisk *disk = rq->rq_disk;
struct scsi_disk *sdkp = scsi_disk(disk);
sector_t block = blk_rq_pos(rq);
-   sector_t threshold;
unsigned int this_count = blk_rq_sectors(rq);
unsigned int dif, dix;
bool zoned_write = sd_is_zoned(sdkp) && rq_data_dir(rq) == WRITE;
@@ -1071,20 +1070,22 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd 
*SCpnt)
goto out;
}
 
-   /*
-* Some SD card readers can't handle multi-sector accesses which touch
-* the last one or two hardware sectors.  Split accesses as needed.
-*/
-   threshold = get_capacity(disk) - SD_LAST_BUGGY_SECTORS *
-   (sdp->sector_size / 512);
+   if (unlikely(sdp->last_sector_bug)) {
+   sector_t threshold;
 
-   if (unlikely(sdp->last_sector_bug && block + this_count > threshold)) {
-   if (block < threshold) {
-   /* Access up to the threshold but not beyond */
-   this_count = threshold - block;
-   } else {
+   /*
+* Some SD card readers can't handle multi-sector accesses
+* which touch the last one or two hardware sectors.  Split
+* accesses as needed.
+*/
+   threshold = get_capacity(disk) - SD_LAST_BUGGY_SECTORS *
+   (sdp->sector_size / 512);
+   if (block >= threshold) {
/* Access only a single hardware sector */
this_count = sdp->sector_size / 512;
+   } else if (block + this_count > threshold) {
+   /* Access up to the threshold but not beyond */
+   this_count = threshold - block;
}
}
 
@@ -1102,34 +1103,17 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd 
*SCpnt)
 * and not force the scsi disk driver to use bounce buffers
 * for this.
 */
-   if (sdp->sector_size == 1024) {
-   if ((block & 1) || (blk_rq_sectors(rq) & 1)) {
+   if (sdp->sector_size != 512) {
+   if ((block | this_count) & (sdp->sector_size / 512 - 1)) {
scmd_printk(KERN_ERR, SCpnt,
-   "Bad block number requested\n");
+   "Bad block number requested: block number = 
%llu, sector count = %u, sector size = %u bytes\n",
+   block + 0ULL, this_count, sdp->sector_size);
goto out;
} else {
-   block = block >> 1;
-   this_count = this_count >> 1;
-   }
-   }
-   if (sdp->sector_size == 2048) {
-   if ((block & 3) || (blk_rq_sectors(rq) & 3)) {
-   scmd_printk(KERN_ERR, SCpnt,
-   "Bad block number requested\n");
-   goto out;
-   } else {
-   block = block >> 2;
-   this_count = this_count >> 2;
-   }
-   }
-   if (sdp->sector_size == 4096) {
-   if ((block & 7) || (blk_rq_sectors(rq) & 7)) {
-   scmd_printk(KERN_ERR, SCpnt,
-   "Bad block number requested\n");
-   goto out;
-   } else {
-   block = block >> 3;
-   this_count = this_count >> 3;
+   int shift = ilog2(sdp->sector_size / 512);
+
+   block = block >> shift;
+   this_count = this_count >> shift;
}
}
if (rq_data_dir(rq) == WRITE) {
@@ -1173,56 +1157,26 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd 
*SCpnt)
SCpnt->cmnd[7] = 0x18;
SCpnt->cmnd[9] = (rq_data_dir(rq) == READ) ? READ_32 : WRITE_32;
SCpnt->cmnd[10] = protect | ((rq->cmd_flags & REQ_FUA) ? 0x8 : 
0);
-
/* LBA */
-   SCpnt->cmnd[12] = sizeof(block) > 4 ? (unsigned char) (block >>

Re: [PATCH v9 09/10] block, scsi: Make SCSI quiesce and resume work reliably

2017-10-17 Thread Bart Van Assche

On Tue, 2017-10-17 at 08:33 +0200, Hannes Reinecke wrote:
> How do you ensure that PREEMPT requests are not stuck in the queue
> _behind_ non-PREEMPT requests?
> Once they are in the queue the request are already allocated, so your
> deferred allocation won't work.
> _And_ deferred requests will be re-inserted at the head of the queue.
> Consequently the PREEMPT request will never ever scheduled during quiesce.
> How do you avoid such a scenario?

Hello Hannes,

The blk_mq_freeze_queue() / blk_mq_unfreeze_queue() calls that have been
added through patch 9/10 to scsi_device_quiesce() will only succeed after all
outstanding requests have finished. That includes non-preempt requests and
requests that are about to be requeued. The changes in scsi_device_quiesce()
are such that blk_queue_enter() will fail for non-preempt requests after the
queue has been unfrozen. In other words, if a scsi_device_quiesce() call
succeeds, it is guaranteed that there are no non-preempt requests in the queue
and also that no new non-preempt requests will enter the queue until
scsi_device_resume() is called.

Bart.

Re: [PATCH v9 09/10] block, scsi: Make SCSI quiesce and resume work reliably

2017-10-17 Thread Bart Van Assche

On Tue, 2017-10-17 at 08:43 +0800, Ming Lei wrote:
> On Mon, Oct 16, 2017 at 04:29:04PM -0700, Bart Van Assche wrote:
> > [ ... ]
> >  int
> >  scsi_device_quiesce(struct scsi_device *sdev)
> >  {
> > +   struct request_queue *q = sdev->request_queue;
> > int err;
> >  
> > +   /*
> > +* It is allowed to call scsi_device_quiesce() multiple times from
> > +* the same context but concurrent scsi_device_quiesce() calls are
> > +* not allowed.
> > +*/
> > +   WARN_ON_ONCE(sdev->quiesced_by && sdev->quiesced_by != current);
> 
> If the above line and comment is removed, everything is still fine,
> either nested calling or concurrent calling, isn't it?

If scsi_device_quiesce() is called concurrently from two different threads
then the first scsi_device_resume() call will resume I/O for *both* contexts.
That's not what the callers expect. If scsi_device_quiesce() and
scsi_device_resume() are called concurrently that would be even worse. I think
we *really* should know whether callers serialize scsi_device_quiesce() and
scsi_device_resume() calls properly. Hence the WARN_ON_ONCE() statement.

Bart.

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-17 Thread Bart Van Assche


On 10/16/17 23:38, Hannes Reinecke wrote:

Again, these are just some pre-defined values to avoid I/O starvation
when having several LUNs. _if_ we can guarantee I/O fairness between
several (hundreds!) devices all sharing the same tagspace we wouldn't
need these variables.


I think we already have two mechanisms for guaranteeing I/O fairness:
* Scsi_Host.starved_list for both the legacy SCSI core and scsi-mq.
* For scsi-mq, the active_queues counter in the blk-mq core. See also
  hctx_may_queue().

Bart.

[PATCH V2] scsi: storvsc: Allow only one remove lun work item to be issued per lun

2017-10-17 Thread Cathy Avery

When running multipath on a VM if all available paths go down
the driver can schedule large amounts of storvsc_remove_lun
work items to the same lun. In response to the failing paths
typically storvsc responds by taking host->scan_mutex and issuing
a TUR per lun. If there has been heavy IO to the failed device
all the failed IOs are returned from the host. A remove lun work
item is issued per failed IO. If the outstanding TURs have not been
completed in a timely manner the scan_mutex is never released or
released too late. Consequently the many remove lun work items are
not completed as scsi_remove_device also tries to take host->scan_mutex.
This results in dragging the VM down and sometimes completely.

This patch only allows one remove lun to be issued to a particular
lun while it is an instantiated member of the scsi stack.

Changes since v1:

Use single threaded workqueue to serialize work in
storvsc_handle_error [Christoph Hellwig]

Signed-off-by: Cathy Avery 
---
 drivers/scsi/storvsc_drv.c | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 5e7200f..6febcdb 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -486,6 +486,8 @@ struct hv_host_device {
unsigned int port;
unsigned char path;
unsigned char target;
+   struct workqueue_struct *handle_error_wq;
+   char work_q_name[20];
 };
 
 struct storvsc_scan_work {
@@ -922,6 +924,7 @@ static void storvsc_handle_error(struct vmscsi_request 
*vm_srb,
 {
struct storvsc_scan_work *wrk;
void (*process_err_fn)(struct work_struct *work);
+   struct hv_host_device *host_dev = shost_priv(host);
bool do_work = false;
 
switch (SRB_STATUS(vm_srb->srb_status)) {
@@ -988,7 +991,7 @@ static void storvsc_handle_error(struct vmscsi_request 
*vm_srb,
wrk->lun = vm_srb->lun;
wrk->tgt_id = vm_srb->target_id;
INIT_WORK(>work, process_err_fn);
-   schedule_work(>work);
+   queue_work(host_dev->handle_error_wq, >work);
 }
 
 
@@ -1803,10 +1806,19 @@ static int storvsc_probe(struct hv_device *device,
if (stor_device->num_sc != 0)
host->nr_hw_queues = stor_device->num_sc + 1;
 
+   /*
+* Set the error handler work queue.
+*/
+   snprintf(host_dev->work_q_name, sizeof(host_dev->work_q_name),
+"storvsc_error_wq_%d", host->host_no);
+   host_dev->handle_error_wq =
+   create_singlethread_workqueue(host_dev->work_q_name);
+   if (!host_dev->handle_error_wq)
+   goto err_out2;
/* Register the HBA and start the scsi bus scan */
ret = scsi_add_host(host, >device);
if (ret != 0)
-   goto err_out2;
+   goto err_out3;
 
if (!dev_is_ide) {
scsi_scan_host(host);
@@ -1815,7 +1827,7 @@ static int storvsc_probe(struct hv_device *device,
 device->dev_instance.b[4]);
ret = scsi_add_device(host, 0, target, 0);
if (ret)
-   goto err_out3;
+   goto err_out4;
}
 #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
if (host->transportt == fc_transport_template) {
@@ -1827,14 +1839,17 @@ static int storvsc_probe(struct hv_device *device,
fc_host_port_name(host) = stor_device->port_name;
stor_device->rport = fc_remote_port_add(host, 0, );
if (!stor_device->rport)
-   goto err_out3;
+   goto err_out4;
}
 #endif
return 0;
 
-err_out3:
+err_out4:
scsi_remove_host(host);
 
+err_out3:
+   destroy_workqueue(host_dev->handle_error_wq);
+
 err_out2:
/*
 * Once we have connected with the host, we would need to
@@ -1858,6 +1873,7 @@ static int storvsc_remove(struct hv_device *dev)
 {
struct storvsc_device *stor_device = hv_get_drvdata(dev);
struct Scsi_Host *host = stor_device->host;
+   struct hv_host_device *host_dev = shost_priv(host);
 
 #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
if (host->transportt == fc_transport_template) {
@@ -1865,6 +1881,7 @@ static int storvsc_remove(struct hv_device *dev)
fc_remove_host(host);
}
 #endif
+   destroy_workqueue(host_dev->handle_error_wq);
scsi_remove_host(host);
storvsc_dev_remove(dev);
scsi_host_put(host);
-- 
2.5.0

[PATCH] scsi: storvsc: Allow only one remove lun work item to be issued per lun

2017-10-17 Thread Cathy Avery

When running multipath on a VM if all available paths go down
the driver can schedule large amounts of storvsc_remove_lun
work items to the same lun. In response to the failing paths
typically storvsc responds by taking host->scan_mutex and issuing
a TUR per lun. If there has been heavy IO to the failed device
all the failed IOs are returned from the host. A remove lun work
item is issued per failed IO. If the outstanding TURs have not been
completed in a timely manner the scan_mutex is never released or
released too late. Consequently the many remove lun work items are
not completed as scsi_remove_device also tries to take host->scan_mutex.
This results in dragging the VM down and sometimes completely.

This patch only allows one remove lun to be issued to a particular
lun while it is an instantiated member of the scsi stack.

Changes since v1:

Use single threaded workqueue to serialize work in
storvsc_handle_error [Christoph Hellwig]

Signed-off-by: Cathy Avery 
---
 drivers/scsi/storvsc_drv.c | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 5e7200f..6febcdb 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -486,6 +486,8 @@ struct hv_host_device {
unsigned int port;
unsigned char path;
unsigned char target;
+   struct workqueue_struct *handle_error_wq;
+   char work_q_name[20];
 };
 
 struct storvsc_scan_work {
@@ -922,6 +924,7 @@ static void storvsc_handle_error(struct vmscsi_request 
*vm_srb,
 {
struct storvsc_scan_work *wrk;
void (*process_err_fn)(struct work_struct *work);
+   struct hv_host_device *host_dev = shost_priv(host);
bool do_work = false;
 
switch (SRB_STATUS(vm_srb->srb_status)) {
@@ -988,7 +991,7 @@ static void storvsc_handle_error(struct vmscsi_request 
*vm_srb,
wrk->lun = vm_srb->lun;
wrk->tgt_id = vm_srb->target_id;
INIT_WORK(>work, process_err_fn);
-   schedule_work(>work);
+   queue_work(host_dev->handle_error_wq, >work);
 }
 
 
@@ -1803,10 +1806,19 @@ static int storvsc_probe(struct hv_device *device,
if (stor_device->num_sc != 0)
host->nr_hw_queues = stor_device->num_sc + 1;
 
+   /*
+* Set the error handler work queue.
+*/
+   snprintf(host_dev->work_q_name, sizeof(host_dev->work_q_name),
+"storvsc_error_wq_%d", host->host_no);
+   host_dev->handle_error_wq =
+   create_singlethread_workqueue(host_dev->work_q_name);
+   if (!host_dev->handle_error_wq)
+   goto err_out2;
/* Register the HBA and start the scsi bus scan */
ret = scsi_add_host(host, >device);
if (ret != 0)
-   goto err_out2;
+   goto err_out3;
 
if (!dev_is_ide) {
scsi_scan_host(host);
@@ -1815,7 +1827,7 @@ static int storvsc_probe(struct hv_device *device,
 device->dev_instance.b[4]);
ret = scsi_add_device(host, 0, target, 0);
if (ret)
-   goto err_out3;
+   goto err_out4;
}
 #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
if (host->transportt == fc_transport_template) {
@@ -1827,14 +1839,17 @@ static int storvsc_probe(struct hv_device *device,
fc_host_port_name(host) = stor_device->port_name;
stor_device->rport = fc_remote_port_add(host, 0, );
if (!stor_device->rport)
-   goto err_out3;
+   goto err_out4;
}
 #endif
return 0;
 
-err_out3:
+err_out4:
scsi_remove_host(host);
 
+err_out3:
+   destroy_workqueue(host_dev->handle_error_wq);
+
 err_out2:
/*
 * Once we have connected with the host, we would need to
@@ -1858,6 +1873,7 @@ static int storvsc_remove(struct hv_device *dev)
 {
struct storvsc_device *stor_device = hv_get_drvdata(dev);
struct Scsi_Host *host = stor_device->host;
+   struct hv_host_device *host_dev = shost_priv(host);
 
 #if IS_ENABLED(CONFIG_SCSI_FC_ATTRS)
if (host->transportt == fc_transport_template) {
@@ -1865,6 +1881,7 @@ static int storvsc_remove(struct hv_device *dev)
fc_remove_host(host);
}
 #endif
+   destroy_workqueue(host_dev->handle_error_wq);
scsi_remove_host(host);
storvsc_dev_remove(dev);
scsi_host_put(host);
-- 
2.5.0

Re: [PATCH] scsi: libiscsi: Convert timers to use timer_setup()

2017-10-17 Thread Chris Leech


Reviewed-by: Chris Leech 

- Original Message -
> In preparation for unconditionally passing the struct timer_list pointer to
> all timer callbacks, switch to using the new timer_setup() and from_timer()
> to pass the timer pointer explicitly.
> 
> Cc: Lee Duncan 
> Cc: Chris Leech 
> Cc: "James E.J. Bottomley" 
> Cc: "Martin K. Petersen" 
> Cc: open-is...@googlegroups.com
> Cc: linux-scsi@vger.kernel.org
> Signed-off-by: Kees Cook 
> ---
>  drivers/scsi/libiscsi.c | 16 ++--
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index f8dc1601efd5..9c50d2d9f27c 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1805,9 +1805,9 @@ int iscsi_target_alloc(struct scsi_target *starget)
>  }
>  EXPORT_SYMBOL_GPL(iscsi_target_alloc);
>  
> -static void iscsi_tmf_timedout(unsigned long data)
> +static void iscsi_tmf_timedout(struct timer_list *t)
>  {
> - struct iscsi_conn *conn = (struct iscsi_conn *)data;
> + struct iscsi_conn *conn = from_timer(conn, t, tmf_timer);
>   struct iscsi_session *session = conn->session;
>  
>   spin_lock(>frwd_lock);
> @@ -1838,8 +1838,6 @@ static int iscsi_exec_task_mgmt_fn(struct iscsi_conn
> *conn,
>   }
>   conn->tmfcmd_pdus_cnt++;
>   conn->tmf_timer.expires = timeout * HZ + jiffies;
> - conn->tmf_timer.function = iscsi_tmf_timedout;
> - conn->tmf_timer.data = (unsigned long)conn;
>   add_timer(>tmf_timer);
>   ISCSI_DBG_EH(session, "tmf set timeout\n");
>  
> @@ -2089,9 +2087,9 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct
> scsi_cmnd *sc)
>  }
>  EXPORT_SYMBOL_GPL(iscsi_eh_cmd_timed_out);
>  
> -static void iscsi_check_transport_timeouts(unsigned long data)
> +static void iscsi_check_transport_timeouts(struct timer_list *t)
>  {
> - struct iscsi_conn *conn = (struct iscsi_conn *)data;
> + struct iscsi_conn *conn = from_timer(conn, t, transport_timer);
>   struct iscsi_session *session = conn->session;
>   unsigned long recv_timeout, next_timeout = 0, last_recv;
>  
> @@ -2913,9 +2911,7 @@ iscsi_conn_setup(struct iscsi_cls_session *cls_session,
> int dd_size,
>   conn->exp_statsn = 0;
>   conn->tmf_state = TMF_INITIAL;
>  
> - init_timer(>transport_timer);
> - conn->transport_timer.data = (unsigned long)conn;
> - conn->transport_timer.function = iscsi_check_transport_timeouts;
> + timer_setup(>transport_timer, iscsi_check_transport_timeouts, 0);
>  
>   INIT_LIST_HEAD(>mgmtqueue);
>   INIT_LIST_HEAD(>cmdqueue);
> @@ -2939,7 +2935,7 @@ iscsi_conn_setup(struct iscsi_cls_session *cls_session,
> int dd_size,
>   goto login_task_data_alloc_fail;
>   conn->login_task->data = conn->data = data;
>  
> - init_timer(>tmf_timer);
> + timer_setup(>tmf_timer, iscsi_tmf_timedout, 0);
>   init_waitqueue_head(>ehwait);
>  
>   return cls_conn;
> --
> 2.7.4
> 
> 
> --
> Kees Cook
> Pixel Security
>

Re: [PATCH 0/2] block/SCSI MQ: two RESTART related patches

2017-10-17 Thread John Garry


On 17/10/2017 06:12, Ming Lei wrote:

On Tue, Oct 17, 2017 at 01:04:16PM +0800, Ming Lei wrote:

Hi Jens,

The 1st patch runs idle hctx after dealy in scsi_mq_get_budget(),
so that we can keep same behaviour with before, and it can be
thought as a fix.

The 2nd patch cleans up RESTART, and removes handling for TAG_SHARED
from current blk-mq's RESTART mechanism because SCSI_MQ can covers its
restart by itself, so that no need to handle TAG_SHARED in blk-mq
RESTART. And >20% IOPS boost is observed in my rand read test over
scsi_debug.

John, please test this two patches and see if it may improve your SAS
IO performance, and you can find the two patches in the following branch:

https://github.com/ming1/linux/commits/blk_mq_improve_restart_V1




Hi Ming,

As requested, here's my figures for blk_mq_improve_restart_V1:
without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS
989K, 113K/113K, 835K

with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS
738K, 130K/130K, 686K

For axboe for-next tip (21ed538):
without default SCSI_MQ, deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS
977K, 117K/117K, 821K

with default SCSI_MQ, mq-deadline scheduler, CONFIG PREEMPT off
read, rw, write IOPS
733K, 128K/128K, 676K

All cases do not have LLDD mq exposed/enabled. So unfortunately not much 
difference with your branch.


cheers,
John


Forget to mention, you need to either pull the above branch directly
or apply the two patches against for-next branch of Jens' block
tree:

git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git 
#for-next

[GIT PULL] SCSI fixes for 4.14-rc5

2017-10-17 Thread James Bottomley

Four mostly error leg fixes and one more important regression in a
prior commit (the qla2xxx one).

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

Hannes Reinecke (1):
  scsi: fixup kernel warning during rmmod()

Johannes Thumshirn (2):
  scsi: fc: check for rport presence in fc_block_scsi_eh
  scsi: libiscsi: fix shifting of DID_REQUEUE host byte

Quinn Tran (1):
  scsi: qla2xxx: Fix uninitialized work element

Satish Kharat (1):
  scsi: libfc: fix a deadlock in fc_rport_work

And the diffstat:

 drivers/scsi/libfc/fc_rport.c|  2 +-
 drivers/scsi/libiscsi.c  |  2 +-
 drivers/scsi/qla2xxx/qla_os.c|  3 ++-
 drivers/scsi/scsi_sysfs.c| 10 --
 drivers/scsi/scsi_transport_fc.c |  3 +++
 5 files changed, 15 insertions(+), 5 deletions(-)

With full diff below.

James

---

diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c
index 520325867e2b..31d31aad3de1 100644
--- a/drivers/scsi/libfc/fc_rport.c
+++ b/drivers/scsi/libfc/fc_rport.c
@@ -383,11 +383,11 @@ static void fc_rport_work(struct work_struct *work)
fc_rport_enter_flogi(rdata);
mutex_unlock(>rp_mutex);
} else {
+   mutex_unlock(>rp_mutex);
FC_RPORT_DBG(rdata, "work delete\n");
mutex_lock(>disc.disc_mutex);
list_del_rcu(>peers);
mutex_unlock(>disc.disc_mutex);
-   mutex_unlock(>rp_mutex);
kref_put(>kref, fc_rport_destroy);
}
} else {
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index c62e8d111fd9..f8dc1601efd5 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1728,7 +1728,7 @@ int iscsi_queuecommand(struct Scsi_Host *host, struct 
scsi_cmnd *sc)
 
if (test_bit(ISCSI_SUSPEND_BIT, >suspend_tx)) {
reason = FAILURE_SESSION_IN_RECOVERY;
-   sc->result = DID_REQUEUE;
+   sc->result = DID_REQUEUE << 16;
goto fault;
}
 
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 5b2437a5ea44..937209805baf 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -3175,6 +3175,8 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
host->can_queue, base_vha->req,
base_vha->mgmt_svr_loop_id, host->sg_tablesize);
 
+   INIT_WORK(_vha->iocb_work, qla2x00_iocb_work_fn);
+
if (ha->mqenable) {
bool mq = false;
bool startit = false;
@@ -3223,7 +3225,6 @@ qla2x00_probe_one(struct pci_dev *pdev, const struct 
pci_device_id *id)
 */
qla2xxx_wake_dpc(base_vha);
 
-   INIT_WORK(_vha->iocb_work, qla2x00_iocb_work_fn);
INIT_WORK(>board_disable, qla2x00_disable_board_on_pci_error);
 
if (IS_QLA8031(ha) || IS_MCTP_CAPABLE(ha)) {
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index bf53356f41f0..f796bd61f3f0 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1376,13 +1376,19 @@ static void __scsi_remove_target(struct scsi_target 
*starget)
spin_lock_irqsave(shost->host_lock, flags);
  restart:
list_for_each_entry(sdev, >__devices, siblings) {
+   /*
+* We cannot call scsi_device_get() here, as
+* we might've been called from rmmod() causing
+* scsi_device_get() to fail the module_is_live()
+* check.
+*/
if (sdev->channel != starget->channel ||
sdev->id != starget->id ||
-   scsi_device_get(sdev))
+   !get_device(>sdev_gendev))
continue;
spin_unlock_irqrestore(shost->host_lock, flags);
scsi_remove_device(sdev);
-   scsi_device_put(sdev);
+   put_device(>sdev_gendev);
spin_lock_irqsave(shost->host_lock, flags);
goto restart;
}
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index cbd4495d0ff9..8c46a6d536af 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3320,6 +3320,9 @@ int fc_block_scsi_eh(struct scsi_cmnd *cmnd)
 {
struct fc_rport *rport = starget_to_rport(scsi_target(cmnd->device));
 
+   if (WARN_ON_ONCE(!rport))
+   return FAST_IO_FAIL;
+
return fc_block_rport(rport);
 }
 EXPORT_SYMBOL(fc_block_scsi_eh);

Re: scsi: Add TRY_VPD_PAGES to HITACHI OPEN-V blacklist entry

2017-10-17 Thread Xose Vazquez Perez

On 10/17/2017 09:10 AM, Hannes Reinecke wrote:

> HITACHI is always supporting VPD pages, even though it's claiming
> to support SCSI Revision 3 only.


"HP/OPEN-" is also affected. HP XP is a rebrand of HITACHI VSP.

{"HP", "OPEN-", "*", BLIST_REPORTLUN2}, /* HP XP Arrays */


> Signed-off-by: Hannes Reinecke 
> ---
>  drivers/scsi/scsi_devinfo.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
> index ae2ad84..a998585 100644
> --- a/drivers/scsi/scsi_devinfo.c
> +++ b/drivers/scsi/scsi_devinfo.c
> @@ -174,7 +174,7 @@ struct scsi_dev_info_list_table {
>   {"HITACHI", "DF500", "*", BLIST_REPORTLUN2},
>   {"HITACHI", "DISK-SUBSYSTEM", "*", BLIST_REPORTLUN2},
>   {"HITACHI", "HUS1530", "*", BLIST_NO_DIF},
> - {"HITACHI", "OPEN-", "*", BLIST_REPORTLUN2},
> + {"HITACHI", "OPEN-", "*", BLIST_REPORTLUN2 | BLIST_TRY_VPD_PAGES},
>   {"HITACHI", "OP-C-", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
>   {"HITACHI", "3380-", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
>   {"HITACHI", "3390-", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
>

Re: [PATCH] scsi: gdth: Convert timers to use timer_setup()

2017-10-17 Thread Kees Cook

On Mon, Oct 16, 2017 at 9:09 PM, Martin K. Petersen
 wrote:
>
> Kees,
>
>> In preparation for unconditionally passing the struct timer_list
>> pointer to all timer callbacks, switch to using the new timer_setup()
>> and from_timer() to pass the timer pointer explicitly.
>
> Reviewed-by: Martin K. Petersen 

Thanks for the reviews! Do you want the timer tree to carry these
patches, or can you pick them up in the scsi tree?

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCH] scsi: sd: Implement blacklist option for WRITE SAME w/ UNMAP

2017-10-17 Thread Laurence Oberman

On Tue, 2017-10-17 at 10:26 -0400, Laurence Oberman wrote:
> On Fri, 2017-09-29 at 10:01 -0400, Laurence Oberman wrote:
> > On Fri, 2017-09-29 at 09:21 -0400, Martin K. Petersen wrote:
> > > Laurence,
> > > 
> > > > I am testing this but its not being picked up so I want to know
> > > > if
> > > > I
> > > > have the kernel command line wrong here.
> > > > 
> > > > scsi_dev_flags=LIO-ORG:thin2:0x8000
> > > > 
> > > > What am I doing wrong to pass the BLIST flags.
> > > 
> > > This worked for me:
> > > 
> > > [root@kvm ~]# echo "Linux:scsi_debug:0x8000" >
> > > /proc/scsi/device_info
> > > [root@kvm ~]# grep Linux /proc/scsi/device_info 
> > > 'Linux   ' 'scsi_debug  ' 0x8000
> > > [root@kvm ~]# modprobe scsi_debug unmap_max_blocks=10
> > > unmap_max_desc=1 write_same_length=20 lbpws=1
> > > [root@kvm ~]# lsblk -D
> > > NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> > > sda 0  512B   5K 0
> > > 
> > > (With the caveat that I tweaked scsi_debug to report the UNMAP
> > > parameters despite lbpu being 0).
> > > 
> > 
> > OK, Thanks, that is working now and I pick up the correct size now.
> > Its going to be very useful for these corner case array
> > inconsistencies.
> > 
> > Tested-by: Laurence Oberman 
> > 
> > Sep 29 09:56:11 localhost kernel: scsi 1:0:0:50: Direct-
> > Access LIO-
> > ORG  thin24.0  PQ: 0 ANSI: 5
> > Sep 29 09:56:11 localhost kernel: scsi 1:0:0:50: alua: supports
> > implicit and explicit TPGS
> > Sep 29 09:56:11 localhost kernel: scsi 1:0:0:50: alua: device
> > naa.6001405f7aa27ca453f4381a00f22ea6 port group 0 rel port 2
> > Sep 29 09:56:11 localhost kernel: sd 1:0:0:50: Attached scsi
> > generic
> > sg64 type 0
> > Sep 29 09:56:11 localhost kernel: RHDEBUG: unmap_limit_for_ws set
> > by
> > kernel flag for case SD_LBP_WS16
> > Sep 29 09:56:11 localhost kernel: sd 1:0:0:50: [sdbl] 8192 512-
> > byte 
> > logical blocks: (41.9 GB/39.1 GiB)
> > Sep 29 09:56:11 localhost kernel: sd 1:0:0:50: [sdbl] Write Protect
> > is
> > off
> > Sep 29 09:56:11 localhost kernel: sd 1:0:0:50: [sdbl] Write cache:
> > enabled, read cache: enabled, supports DPO and FUA
> > Sep 29 09:56:11 localhost kernel: sd 1:0:0:50: alua: transition
> > timeout
> > set to 60 seconds
> 
> 
> Hi Martin
> 
> We have the code accepted for the patch above and its working fine
> with
> echo after boot as you already know.
> 
> However I am still fighting with trying to pass this on the kernel
> command line. We need to be able to do this as a dynamic method for
> adding devices to the list so that the list is populated prior to the
> device scan.
> 
> Using scsi_dev_flags=LIO-ORG:thin2:0x8000 on the kernel line is
> ignored and not filled in.
> Its apparent to me that we have no longer have code to actually copy
> the string from the kernel line after boot.
> 
> I ran some tests and added a couple of printk;s to see if we have any
> capture and its NULL.
> 
> So when did this stop working, is what I am now chasing
> 
> [1.524595] RHDEBUG:In scsi_init_devinfo scsi_dev_flags=
> [1.524705] RHDEBUG: In scsi_init_devinfo error=0
> 
> We have this in  drivers/scsi/scsi_devinfo.c
> 
> module_param_string(dev_flags, scsi_dev_flags,
> sizeof(scsi_dev_flags),
> 0);
> MODULE_PARM_DESC(dev_flags,
> 
>  "Given scsi_dev_flags=vendor:model:flags[,v:m:f] add
> black/white"
>  " list entries for vendor and model with an integer value of
> flags"
>  " to the scsi device info list");
> 
> and we have:
> 
> /**
>  * scsi_init_devinfo - set up the dynamic device list.
>  *
>  * Description:
>  *  Add command line entries from scsi_dev_flags, then add
>  *  scsi_static_device_list entries to the scsi device info list.
>  */
> int __init scsi_init_devinfo(void)
> {
> #ifdef CONFIG_SCSI_PROC_FS
> struct proc_dir_entry *p;
> #endif
> int error, i;
> 
> printk("RHDEBUG:In scsi_init_devinfo
> scsi_dev_flags=%s\n",scsi_dev_flags);
> 
> error = scsi_dev_info_add_list(SCSI_DEVINFO_GLOBAL, NULL);
> printk("RHDEBUG: In scsi_init_devinfo error=%d\n",error);
> if (error) {
> printk("RHDEBUG: In scsi_init_devinfo, calling
> scsi_dev_info_add_list returning with error=%d\n",error);
> return error;
> }
> 
> error = scsi_dev_info_list_add_str(scsi_dev_flags);
> if (error) {
> printk("RHDEBUG: In scsi_init_devinfo, calling
> scsi_info_list_add returning with error=%d\n",error);
> goto out;
> }
> 
> for (i = 0; scsi_static_device_list[i].vendor; i++) {
> error = scsi_dev_info_list_add(1 /* compatibile */,
> scsi_static_device_list[i].vendor,
> scsi_static_device_list[i].model,
> NULL,
> scsi_static_device_list[i].flags);
> if (error)
>

Re: [PATCH v2 5/8] target: Use sgl_alloc_order() and sgl_free()

2017-10-17 Thread Bart Van Assche

On Tue, 2017-10-17 at 08:14 +0200, Hannes Reinecke wrote:
> On 10/17/2017 12:49 AM, Bart Van Assche wrote:
> > [ ... ]
> >  void target_free_sgl(struct scatterlist *sgl, int nents)
> >  {
> > -   struct scatterlist *sg;
> > -   int count;
> > -
> > -   for_each_sg(sgl, sg, nents, count)
> > -   __free_page(sg_page(sg));
> > -
> > -   kfree(sgl);
> > +   sgl_free(sgl);
> >  }
> >  EXPORT_SYMBOL(target_free_sgl);
> >  
> > @@ -2423,42 +2417,10 @@ int
> >  target_alloc_sgl(struct scatterlist **sgl, unsigned int *nents, u32 length,
> >  bool zero_page, bool chainable)
> >  {
> > -   [ ... ]
> > +   *sgl = sgl_alloc_order(length, 0, chainable, gfp, nents);
> > +   return *sgl ? 0 : -ENOMEM;
> >  }
> >  EXPORT_SYMBOL(target_alloc_sgl);
> >  
> > The calling convention from target_alloc_sgl() is decidedly dodgy.
> 
> Can't we convert it into returning the sgl, and remove the first parameter?

Hello Hannes,

Another option is to remove the target_alloc_sgl() and target_free_sgl() 
functions
and to make LIO target drivers call sgl_alloc_order() and sgl_free_order() 
directly.
Do you perhaps have a preference for one of these two approaches?

Thanks,

Bart.

Re: [PATCH v2 7/8] scsi/pmcraid: Remove an unused structure member

2017-10-17 Thread Bart Van Assche

On Tue, 2017-10-17 at 08:21 +0200, Hannes Reinecke wrote:
> On 10/17/2017 12:49 AM, Bart Van Assche wrote:
> > Signed-off-by: Bart Van Assche 
> > Reviewed-by: Johannes Thumshirn 
> > Cc: linux-scsi@vger.kernel.org
> > Cc: Martin K. Petersen 
> > Cc: Anil Ravindranath 
> > ---
> >  drivers/scsi/pmcraid.h | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/drivers/scsi/pmcraid.h b/drivers/scsi/pmcraid.h
> > index 8bfac72a242b..44da91712115 100644
> > --- a/drivers/scsi/pmcraid.h
> > +++ b/drivers/scsi/pmcraid.h
> > @@ -542,7 +542,6 @@ struct pmcraid_sglist {
> > u32 order;
> > u32 num_sg;
> > u32 num_dma_sg;
> > -   u32 buffer_len;
> > struct scatterlist scatterlist[1];
> >  };
> >  
> > 
> 
> This actually is the same story that we've had with ipr (and, looking at
> the code, those two drivers look awfully similar ...).
> pmcraid_sglist looks as if it's a hardware-dependent structure, so just
> removing one entry from the middle of a structure might not be a good idea.
> But this is something for the pmcraid folks to clarify.

Hello Hannes,

Sorry but I don't see how a structure that contains a struct scatterlist
could be hardware-dependent?

Thanks,

Bart.

Re: [PATCH] scsi: sd: Implement blacklist option for WRITE SAME w/ UNMAP

2017-10-17 Thread Laurence Oberman

On Fri, 2017-09-29 at 10:01 -0400, Laurence Oberman wrote:
> On Fri, 2017-09-29 at 09:21 -0400, Martin K. Petersen wrote:
> > Laurence,
> > 
> > > I am testing this but its not being picked up so I want to know
> > > if
> > > I
> > > have the kernel command line wrong here.
> > > 
> > > scsi_dev_flags=LIO-ORG:thin2:0x8000
> > > 
> > > What am I doing wrong to pass the BLIST flags.
> > 
> > This worked for me:
> > 
> > [root@kvm ~]# echo "Linux:scsi_debug:0x8000" >
> > /proc/scsi/device_info
> > [root@kvm ~]# grep Linux /proc/scsi/device_info 
> > 'Linux   ' 'scsi_debug  ' 0x8000
> > [root@kvm ~]# modprobe scsi_debug unmap_max_blocks=10
> > unmap_max_desc=1 write_same_length=20 lbpws=1
> > [root@kvm ~]# lsblk -D
> > NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
> > sda 0  512B   5K 0
> > 
> > (With the caveat that I tweaked scsi_debug to report the UNMAP
> > parameters despite lbpu being 0).
> > 
> 
> OK, Thanks, that is working now and I pick up the correct size now.
> Its going to be very useful for these corner case array
> inconsistencies.
> 
> Tested-by: Laurence Oberman 
> 
> Sep 29 09:56:11 localhost kernel: scsi 1:0:0:50: Direct-
> Access LIO-
> ORG  thin24.0  PQ: 0 ANSI: 5
> Sep 29 09:56:11 localhost kernel: scsi 1:0:0:50: alua: supports
> implicit and explicit TPGS
> Sep 29 09:56:11 localhost kernel: scsi 1:0:0:50: alua: device
> naa.6001405f7aa27ca453f4381a00f22ea6 port group 0 rel port 2
> Sep 29 09:56:11 localhost kernel: sd 1:0:0:50: Attached scsi generic
> sg64 type 0
> Sep 29 09:56:11 localhost kernel: RHDEBUG: unmap_limit_for_ws set by
> kernel flag for case SD_LBP_WS16
> Sep 29 09:56:11 localhost kernel: sd 1:0:0:50: [sdbl] 8192 512-
> byte 
> logical blocks: (41.9 GB/39.1 GiB)
> Sep 29 09:56:11 localhost kernel: sd 1:0:0:50: [sdbl] Write Protect
> is
> off
> Sep 29 09:56:11 localhost kernel: sd 1:0:0:50: [sdbl] Write cache:
> enabled, read cache: enabled, supports DPO and FUA
> Sep 29 09:56:11 localhost kernel: sd 1:0:0:50: alua: transition
> timeout
> set to 60 seconds


Hi Martin

We have the code accepted for the patch above and its working fine with
echo after boot as you already know.

However I am still fighting with trying to pass this on the kernel
command line. We need to be able to do this as a dynamic method for
adding devices to the list so that the list is populated prior to the
device scan.

Using scsi_dev_flags=LIO-ORG:thin2:0x8000 on the kernel line is
ignored and not filled in.
Its apparent to me that we have no longer have code to actually copy
the string from the kernel line after boot.

I ran some tests and added a couple of printk;s to see if we have any
capture and its NULL.

So when did this stop working, is what I am now chasing

[1.524595] RHDEBUG:In scsi_init_devinfo scsi_dev_flags=
[1.524705] RHDEBUG: In scsi_init_devinfo error=0

We have this in  drivers/scsi/scsi_devinfo.c

module_param_string(dev_flags, scsi_dev_flags, sizeof(scsi_dev_flags),
0);
MODULE_PARM_DESC(dev_flags,

 "Given scsi_dev_flags=vendor:model:flags[,v:m:f] add
black/white"
 " list entries for vendor and model with an integer value of
flags"
 " to the scsi device info list");

and we have:

/**
 * scsi_init_devinfo - set up the dynamic device list.
 *
 * Description:
 *  Add command line entries from scsi_dev_flags, then add
 *  scsi_static_device_list entries to the scsi device info list.
 */
int __init scsi_init_devinfo(void)
{
#ifdef CONFIG_SCSI_PROC_FS
struct proc_dir_entry *p;
#endif
int error, i;

printk("RHDEBUG:In scsi_init_devinfo
scsi_dev_flags=%s\n",scsi_dev_flags);

error = scsi_dev_info_add_list(SCSI_DEVINFO_GLOBAL, NULL);
printk("RHDEBUG: In scsi_init_devinfo error=%d\n",error);
if (error) {
printk("RHDEBUG: In scsi_init_devinfo, calling
scsi_dev_info_add_list returning with error=%d\n",error);
return error;
}

error = scsi_dev_info_list_add_str(scsi_dev_flags);
if (error) {
printk("RHDEBUG: In scsi_init_devinfo, calling
scsi_info_list_add returning with error=%d\n",error);
goto out;
}

for (i = 0; scsi_static_device_list[i].vendor; i++) {
error = scsi_dev_info_list_add(1 /* compatibile */,
scsi_static_device_list[i].vendor,
scsi_static_device_list[i].model,
NULL,
scsi_static_device_list[i].flags);
if (error)
goto out;
}

#ifdef CONFIG_SCSI_PROC_FS
p = proc_create("scsi/device_info", 0, NULL,
_devinfo_proc_fops);
if (!p) {
error = -ENOMEM;
goto out;
}
#endif /* CONFIG_SCSI_PROC_FS */

 out:
if (error)
scsi_exit_devinfo();
return

Re: [PATCH] sd: always retry READ CAPACITY for ALUA state transition

2017-10-17 Thread James Bottomley

On Tue, 2017-10-17 at 09:11 +0200, Hannes Reinecke wrote:
> During ALUA state transitions the device might return
> a sense code 02/04/0a (Logical unit not accessible, asymmetric
> access state transition). As this is a transient error
> we should just retry the READ CAPACITY call until
> the state transition finishes and the correct
> capacity can be returned.

This will lock up the system if some ALUA initiator gets into a state
where it always returns transitioning and never completes, which
doesn't look like the best way to handle problem devices.

I thought after the ALUA transition the LUN gives a unit attention ...
can't you use that some way to trigger the capacity re-read, so do
asynchronous event notification instead of polling forever.

James

Re: [PATCH] scsi: sg: Re-fix off by one in sg_fill_request_table()

2017-10-17 Thread Douglas Gilbert


On 2017-10-15 01:16 PM, Ben Hutchings wrote:

Commit 109bade9c625 ("scsi: sg: use standard lists for sg_requests")
introduced an off-by-one error in sg_ioctl(), which was fixed by
commit bd46fc406b30 ("scsi: sg: off by one in sg_ioctl()").

Unfortunately commit 4759df905a47 ("scsi: sg: factor out
sg_fill_request_table()") moved that code, and reintroduced the
bug (perhaps due to a botched rebase).  Fix it again.

Fixes: 4759df905a47 ("scsi: sg: factor out sg_fill_request_table()")
Cc: sta...@vger.kernel.org
Signed-off-by: Ben Hutchings 


Acked-by: Douglas Gilbert 

Thanks.


---
  drivers/scsi/sg.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 0419c2298eab..aa28874e8fb9 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -837,7 +837,7 @@ sg_fill_request_table(Sg_fd *sfp, sg_req_info_t *rinfo)
  
  	val = 0;

list_for_each_entry(srp, >rq_list, entry) {
-   if (val > SG_MAX_QUEUE)
+   if (val >= SG_MAX_QUEUE)
break;
rinfo[val].req_state = srp->done + 1;
rinfo[val].problem =

Re: [PATCH V8 4/5] libsas: Align SMP req/resp to dma_get_cache_alignment()

2017-10-17 Thread Marek Szyprowski


Hi Huacai,

On 2017-10-17 10:05, Huacai Chen wrote:

In non-coherent DMA mode, kernel uses cache flushing operations to
maintain I/O coherency, so libsas's SMP request/response should be
aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer and a kernel
structure share a same cache line, and if the kernel structure has
dirty data, cache_invalidate (no writeback) will cause data corruption.

Cc: sta...@vger.kernel.org
Signed-off-by: Huacai Chen 
---
  drivers/scsi/libsas/sas_expander.c | 93 +++---
  1 file changed, 57 insertions(+), 36 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c 
b/drivers/scsi/libsas/sas_expander.c
index 6b4fd23..124a44b 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -164,17 +164,17 @@ static int smp_execute_task(struct domain_device *dev, 
void *req, int req_size,
  
  /* -- Allocations -- */
  
-static inline void *alloc_smp_req(int size)

+static inline void *alloc_smp_req(int size, int align)
  {
-   u8 *p = kzalloc(size, GFP_KERNEL);
+   u8 *p = kzalloc(ALIGN(size, align), GFP_KERNEL);


If I remember correctly, kernel guarantees that each kmalloced buffer is
always at least aligned to the CPU cache line, so CPU cache can be
invalidated on the allocated buffer without corrupting anything else.
Taking this into account, I wonder if the above change make sense.

Have you experienced any problems without this change?


if (p)
p[0] = SMP_REQUEST;
return p;
  }
  
-static inline void *alloc_smp_resp(int size)

+static inline void *alloc_smp_resp(int size, int align)
  {
-   return kzalloc(size, GFP_KERNEL);
+   return kzalloc(ALIGN(size, align), GFP_KERNEL);


Save a above.


  }
  
  static char sas_route_char(struct domain_device *dev, struct ex_phy *phy)

@@ -403,15 +403,17 @@ static int sas_ex_phy_discover_helper(struct 
domain_device *dev, u8 *disc_req,
  int sas_ex_phy_discover(struct domain_device *dev, int single)
  {
struct expander_device *ex = >ex_dev;
-   int  res = 0;
+   int  res = 0, align;
u8   *disc_req;
u8   *disc_resp;
  
-	disc_req = alloc_smp_req(DISCOVER_REQ_SIZE);

+   align = dma_get_cache_alignment(>phy->dev);
+
+   disc_req = alloc_smp_req(DISCOVER_REQ_SIZE, align);
if (!disc_req)
return -ENOMEM;
  
-	disc_resp = alloc_smp_resp(DISCOVER_RESP_SIZE);

+   disc_resp = alloc_smp_resp(DISCOVER_RESP_SIZE, align);
if (!disc_resp) {
kfree(disc_req);
return -ENOMEM;
@@ -480,14 +482,15 @@ static int sas_ex_general(struct domain_device *dev)
  {
u8 *rg_req;
struct smp_resp *rg_resp;
-   int res;
-   int i;
+   int i, res, align;
  
-	rg_req = alloc_smp_req(RG_REQ_SIZE);

+   align = dma_get_cache_alignment(>phy->dev);
+
+   rg_req = alloc_smp_req(RG_REQ_SIZE, align);
if (!rg_req)
return -ENOMEM;
  
-	rg_resp = alloc_smp_resp(RG_RESP_SIZE);

+   rg_resp = alloc_smp_resp(RG_RESP_SIZE, align);
if (!rg_resp) {
kfree(rg_req);
return -ENOMEM;
@@ -552,13 +555,15 @@ static int sas_ex_manuf_info(struct domain_device *dev)
  {
u8 *mi_req;
u8 *mi_resp;
-   int res;
+   int res, align;
  
-	mi_req = alloc_smp_req(MI_REQ_SIZE);

+   align = dma_get_cache_alignment(>phy->dev);
+
+   mi_req = alloc_smp_req(MI_REQ_SIZE, align);
if (!mi_req)
return -ENOMEM;
  
-	mi_resp = alloc_smp_resp(MI_RESP_SIZE);

+   mi_resp = alloc_smp_resp(MI_RESP_SIZE, align);
if (!mi_resp) {
kfree(mi_req);
return -ENOMEM;
@@ -593,13 +598,15 @@ int sas_smp_phy_control(struct domain_device *dev, int 
phy_id,
  {
u8 *pc_req;
u8 *pc_resp;
-   int res;
+   int res, align;
+
+   align = dma_get_cache_alignment(>phy->dev);
  
-	pc_req = alloc_smp_req(PC_REQ_SIZE);

+   pc_req = alloc_smp_req(PC_REQ_SIZE, align);
if (!pc_req)
return -ENOMEM;
  
-	pc_resp = alloc_smp_resp(PC_RESP_SIZE);

+   pc_resp = alloc_smp_resp(PC_RESP_SIZE, align);
if (!pc_resp) {
kfree(pc_req);
return -ENOMEM;
@@ -664,17 +671,19 @@ static int sas_dev_present_in_domain(struct asd_sas_port 
*port,
  #define RPEL_RESP_SIZE32
  int sas_smp_get_phy_events(struct sas_phy *phy)
  {
-   int res;
+   int res, align;
u8 *req;
u8 *resp;
struct sas_rphy *rphy = dev_to_rphy(phy->dev.parent);
struct domain_device *dev = sas_find_dev_by_rphy(rphy);
  
-	req = alloc_smp_req(RPEL_REQ_SIZE);

+   align = dma_get_cache_alignment(>dev);
+
+   req = alloc_smp_req(RPEL_REQ_SIZE, align);
if (!req)
return -ENOMEM;
  
-	resp = alloc_smp_resp(RPEL_RESP_SIZE);

+   resp = alloc_smp_resp(RPEL_RESP_SIZE, align);

Re: [PATCH v2 12/15] RDMA/cma: make config_item_type const

2017-10-17 Thread Leon Romanovsky

On Mon, Oct 16, 2017 at 05:18:51PM +0200, Bhumika Goyal wrote:
> Make these structures const as they are either passed to the functions
> having the argument as const or stored as a reference in the "ci_type"
> const field of a config_item structure.
>
> Signed-off-by: Bhumika Goyal 
> ---
> * Changes in v2- Combine all the followup patches and the constification
> patches into a series.
>
>  drivers/infiniband/core/cma_configfs.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky 


signature.asc
Description: PGP signature

Re: [PATCH v9 09/10] block, scsi: Make SCSI quiesce and resume work reliably

2017-10-17 Thread Ming Lei

On Tue, Oct 17, 2017 at 08:33:36AM +0200, Hannes Reinecke wrote:
> On 10/17/2017 01:29 AM, Bart Van Assche wrote:
> > The contexts from which a SCSI device can be quiesced or resumed are:
> > * Writing into /sys/class/scsi_device/*/device/state.
> > * SCSI parallel (SPI) domain validation.
> > * The SCSI device power management methods. See also scsi_bus_pm_ops.
> > 
> > It is essential during suspend and resume that neither the filesystem
> > state nor the filesystem metadata in RAM changes. This is why while
> > the hibernation image is being written or restored that SCSI devices
> > are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
> > and scsi_device_resume(). In the SDEV_QUIESCE state execution of
> > non-preempt requests is deferred. This is realized by returning
> > BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
> > devices. Avoid that a full queue prevents power management requests
> > to be submitted by deferring allocation of non-preempt requests for
> > devices in the quiesced state. This patch has been tested by running
> > the following commands and by verifying that after resume the fio job
> > is still running:
> > 
> (We've discussed this at ALPSS already:)
> 
> How do you ensure that PREEMPT requests are not stuck in the queue
> _behind_ non-PREEMPT requests?

Once the PREEMP_ONLY flag is set, blk_mq_freeze_queue() is called for
draining up all requests in queue first, and new requests are prevented
from being allocated meantime.

So looks no such issue?

> Once they are in the queue the request are already allocated, so your
> deferred allocation won't work.
> _And_ deferred requests will be re-inserted at the head of the queue.
> Consequently the PREEMPT request will never ever scheduled during quiesce.
> How do you avoid such a scenario?

>From the implementation, no any PREEMPT request can be allocated once
scsi_device_quiesce() returns.

Also not see deferred requests in this patch, could you explain it a bit?


-- 
Ming

Re: [PATCH v2 07/15] usb: gadget: configfs: make config_item_type const

2017-10-17 Thread Felipe Balbi

Bhumika Goyal  writes:

> Make config_item_type structures const as they are either passed to a
> function having the argument as const or stored in the const "ci_type"
> field of a config_item structure.
>
> Done using Coccinelle.
>
> Signed-off-by: Bhumika Goyal 

Acked-by: Felipe Balbi 

-- 
balbi


signature.asc
Description: PGP signature

Re: [PATCH v2 11/15] stm class: make config_item_type const

2017-10-17 Thread Greg KH

On Mon, Oct 16, 2017 at 05:18:50PM +0200, Bhumika Goyal wrote:
> Make config_item_type structures const as they are either passed to a
> function having the argument as const or used inside a if statement or
> stored in the const "ci_type" field of a config_item structure.
> 
> Done using Coccinelle.
> 
> Signed-off-by: Bhumika Goyal 
> ---
> * Changes in v2- Combine all the followup patches and the constification
> patches into a series.

Acked-by: Greg Kroah-Hartman

Re: [PATCH v2 01/15] configfs: make ci_type field, some pointers and function arguments const

2017-10-17 Thread Greg KH

On Mon, Oct 16, 2017 at 05:18:40PM +0200, Bhumika Goyal wrote:
> The ci_type field of the config_item structure do not modify the fields
> of the config_item_type structure it points to. And the other pointers
> initialized with ci_type do not modify the fields as well.
> So, make the ci_type field and the pointers initialized with ci_type
> as const.
> 
> Make the struct config_item_type *type function argument of functions
> config_{item/group}_init_type_name const as the argument in both the
> functions is only stored in the ci_type field of a config_item structure
> which is now made const.
> Make the argument of configfs_register_default_group const as it is
> only passed to the argument of the function config_group_init_type_name
> which is now const.
> 
> Signed-off-by: Bhumika Goyal 
> ---
> * Changes in v2- Combine all the followup patches and the constification
> patches into a series.
> 
>  fs/configfs/dir.c| 10 +-
>  fs/configfs/item.c   |  6 +++---
>  fs/configfs/symlink.c|  4 ++--
>  include/linux/configfs.h |  8 
>  4 files changed, 14 insertions(+), 14 deletions(-)

Acked-by: Greg Kroah-Hartman

Re: [PATCH v2 00/15] make structure field, function arguments and structures const

2017-10-17 Thread Greg KH

On Tue, Oct 17, 2017 at 12:16:18PM +0200, Julia Lawall wrote:
> 
> 
> On Tue, 17 Oct 2017, Greg KH wrote:
> 
> > On Mon, Oct 16, 2017 at 05:18:39PM +0200, Bhumika Goyal wrote:
> > > Make the ci_type field and some function arguments as const. After this
> > > change, make config_item_type structures as const.
> > >
> > > * Changes in v2- Combine all the followup patches and the constification
> > > patches into a series.
> >
> > Who do you want to take these patches?  If you want, I can take them
> > through my driver-core tree, which has done other configfs stuff like
> > this in the past.
> 
> Christoph Hellwig proposed to take care of it.

Great!  I'll go ack the individual ones that I might need to...

thanks,

greg k-h

Re: [PATCH v2 08/15] nvmet: make config_item_type const

2017-10-17 Thread Sagi Grimberg


Acked-by: Sagi Grimberg

Re: [PATCH v2 00/15] make structure field, function arguments and structures const

2017-10-17 Thread Julia Lawall



On Tue, 17 Oct 2017, Greg KH wrote:

> On Mon, Oct 16, 2017 at 05:18:39PM +0200, Bhumika Goyal wrote:
> > Make the ci_type field and some function arguments as const. After this
> > change, make config_item_type structures as const.
> >
> > * Changes in v2- Combine all the followup patches and the constification
> > patches into a series.
>
> Who do you want to take these patches?  If you want, I can take them
> through my driver-core tree, which has done other configfs stuff like
> this in the past.

Christoph Hellwig proposed to take care of it.

julia



>
> thanks,
>
> greg k-h
>

Re: [PATCH v2 00/15] make structure field, function arguments and structures const

2017-10-17 Thread Greg KH

On Mon, Oct 16, 2017 at 05:18:39PM +0200, Bhumika Goyal wrote:
> Make the ci_type field and some function arguments as const. After this
> change, make config_item_type structures as const.
> 
> * Changes in v2- Combine all the followup patches and the constification
> patches into a series.

Who do you want to take these patches?  If you want, I can take them
through my driver-core tree, which has done other configfs stuff like
this in the past.

thanks,

greg k-h

Re: [PATCH V8 5/5] libata: Align DMA buffer to dma_get_cache_alignment()

2017-10-17 Thread Sergei Shtylyov


On 10/17/2017 11:05 AM, Huacai Chen wrote:


In non-coherent DMA mode, kernel uses cache flushing operations to
maintain I/O coherency, so in ata_do_dev_read_id() the DMA buffer
should be aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer
and a kernel structure share a same cache line, and if the kernel
structure has dirty data, cache_invalidate (no writeback) will cause
data corruption.

Cc: sta...@vger.kernel.org
Signed-off-by: Huacai Chen 
---
  drivers/ata/libata-core.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index ee4c1ec..e134955 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1833,8 +1833,19 @@ static u32 ata_pio_mask_no_iordy(const struct ata_device 
*adev)
  unsigned int ata_do_dev_read_id(struct ata_device *dev,
struct ata_taskfile *tf, u16 *id)
  {
-   return ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE,
-id, sizeof(id[0]) * ATA_ID_WORDS, 0);
+   u16 *devid;
+   int res, size = sizeof(u16) * ATA_ID_WORDS;
+
+   if (IS_ALIGNED((unsigned long)id, dma_get_cache_alignment(>tdev)))
+   res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, id, 
size, 0);
+   else {
+   devid = kmalloc(size, GFP_KERNEL);
+   res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, devid, 
size, 0);
+   memcpy(id, devid, size);
+   kfree(devid);
+   }
+
+   return res;


   This function is called only for the PIO mode commands, so I doubt this is 
necessary...


MBR, Sergei

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-17 Thread Ming Lei

On Tue, Oct 17, 2017 at 08:38:01AM +0200, Hannes Reinecke wrote:
> On 10/17/2017 03:29 AM, Ming Lei wrote:
> > On Mon, Oct 16, 2017 at 01:30:09PM +0200, Hannes Reinecke wrote:
> >> On 10/13/2017 07:29 PM, Ming Lei wrote:
> >>> On Fri, Oct 13, 2017 at 05:08:52PM +, Bart Van Assche wrote:
>  On Sat, 2017-10-14 at 00:45 +0800, Ming Lei wrote:
> > On Fri, Oct 13, 2017 at 04:31:04PM +, Bart Van Assche wrote:
> >> On Sat, 2017-10-14 at 00:07 +0800, Ming Lei wrote:
> >>> Actually it is in hot path, for example, lpfc and qla2xx's queue 
> >>> depth is 3,
> >>
> >> Sorry but I doubt whether that is correct. More in general, I don't 
> >> know any modern
> >> storage HBA for which the default queue depth is so low.
> >
> > You can grep:
> >
> > [ming@ming linux]$ git grep -n cmd_per_lun ./drivers/scsi/ | grep -E 
> > "qla2xxx|lpfc"
> 
>  Such a low queue depth will result in suboptimal performance for adapters
>  that communicate over a storage network. I think that's a bug and that 
>  both
>  adapters support much higher cmd_per_lun values.
> 
>  (+James Smart)
> 
>  James, can you explain us why commit 445cf4f4d2aa decreased 
>  LPFC_CMD_PER_LUN
>  from 30 to 3? Was that perhaps a workaround for a bug in a specific 
>  target
>  implementation?
> 
>  (+Himanshu Madhani)
> 
>  Himanshu, do you perhaps know whether it is safe to increase cmd_per_lun 
>  for
>  the qla2xxx initiator driver to the scsi_host->can_queue value?
> >>>
> >>> ->can_queue is size of the whole tag space shared by all LUNs, looks it 
> >>> isn't
> >>> reasonable to increase cmd_per_lun to .can_queue.
> >>>
> >> '3' is just a starting point; later on it'll be adjusted via
> >> scsi_change_depth().
> >> Looks like it's not working correctly with blk-mq, though.
> > 
> > At default, in scsi_alloc_sdev(), q->queue_depth is set as
> > host->cmd_per_lun. You are right, q->queue_depth can be adjusted
> > later too.
> > 
> > q->queue_depth is respected in scsi_dev_queue_ready().
> > .cmd_per_lun defines the max outstanding cmds for each lun, I
> > guess it is respected by some hardware inside.
> > 
> No, this is purely a linux abstraction. Nothing to do with the hardware.

That is also my initial understanding.

But my test showed that actually the max outstanding cmds per LUN is
really 3 even though q->queue_depth is 30, that is why I guess the
hardware may put a hard limit inside:

https://marc.info/?l=linux-block=150549401611868=2

Also if they were same thing, why does lpfc define different
default value for q->queue_depth and .cmd_per_lun?

drivers/scsi/lpfc/lpfc_attr.c: 3411
/*
# lun_queue_depth:  This parameter is used to limit the number of 
outstanding
# commands per FCP LUN. Value range is [1,512]. Default value is 30.
# If this parameter value is greater than 1/8th the maximum number of 
exchanges
# supported by the HBA port, then the lun queue depth will be reduced to
# 1/8th the maximum number of exchanges.
*/
LPFC_VPORT_ATTR_R(lun_queue_depth, 30, 1, 512,
  "Max number of FCP commands we can queue to a 
specific LUN");

drivers/scsi/lpfc/lpfc.h: 47
#define LPFC_CMD_PER_LUN3   /* max outstanding cmds per lun 
*/


-- 
Ming

Re: [PATCH v2 1/8] lib/scatterlist: Introduce sgl_alloc() and sgl_free()

2017-10-17 Thread Johannes Thumshirn


Thanks Bart,
Reviewed-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Re: [PATCH v2 02/15] usb: gadget: make config_item_type structures const

2017-10-17 Thread Felipe Balbi


Hi,

Bhumika Goyal  writes:
> Make these structures const as they are only passed to the const
> argument of the functions config_{group/item}_init_type_name.
>
> Signed-off-by: Bhumika Goyal 
> ---
> * Changes in v2- Combine all the followup patches and the constification
> patches into a series.

I'm assuming this depends on patch 1 of the series. In that case:

Acked-by: Felipe Balbi 

-- 
balbi


signature.asc
Description: PGP signature

[PATCH V8 2/5] MIPS: Implement dma_map_ops::get_cache_alignment()

2017-10-17 Thread Huacai Chen

Currently, MIPS is an architecture which support coherent & noncoherent
devices co-exist. So implement get_cache_alignment() function pointer
in 'struct dma_map_ops' to return different dma alignments.

Cc: sta...@vger.kernel.org
Signed-off-by: Huacai Chen 
---
 arch/mips/cavium-octeon/dma-octeon.c  |  3 ++-
 arch/mips/include/asm/dma-coherence.h |  2 ++
 arch/mips/loongson64/common/dma-swiotlb.c |  1 +
 arch/mips/mm/dma-default.c| 11 ++-
 arch/mips/netlogic/common/nlm-dma.c   |  3 ++-
 5 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/mips/cavium-octeon/dma-octeon.c 
b/arch/mips/cavium-octeon/dma-octeon.c
index c64bd87..41c29a85 100644
--- a/arch/mips/cavium-octeon/dma-octeon.c
+++ b/arch/mips/cavium-octeon/dma-octeon.c
@@ -324,7 +324,8 @@ static struct octeon_dma_map_ops _octeon_pci_dma_map_ops = {
.sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
.sync_sg_for_device = octeon_dma_sync_sg_for_device,
.mapping_error = swiotlb_dma_mapping_error,
-   .dma_supported = swiotlb_dma_supported
+   .dma_supported = swiotlb_dma_supported,
+   .get_cache_alignment = mips_dma_get_cache_alignment
},
 };
 
diff --git a/arch/mips/include/asm/dma-coherence.h 
b/arch/mips/include/asm/dma-coherence.h
index 72d0eab..5f7a9fc 100644
--- a/arch/mips/include/asm/dma-coherence.h
+++ b/arch/mips/include/asm/dma-coherence.h
@@ -29,4 +29,6 @@ extern int hw_coherentio;
 #define hw_coherentio  0
 #endif /* CONFIG_DMA_MAYBE_COHERENT */
 
+int mips_dma_get_cache_alignment(struct device *dev);
+
 #endif
diff --git a/arch/mips/loongson64/common/dma-swiotlb.c 
b/arch/mips/loongson64/common/dma-swiotlb.c
index 34486c1..17b9897 100644
--- a/arch/mips/loongson64/common/dma-swiotlb.c
+++ b/arch/mips/loongson64/common/dma-swiotlb.c
@@ -119,6 +119,7 @@ static const struct dma_map_ops loongson_dma_map_ops = {
.sync_sg_for_device = loongson_dma_sync_sg_for_device,
.mapping_error = swiotlb_dma_mapping_error,
.dma_supported = loongson_dma_supported,
+   .get_cache_alignment = mips_dma_get_cache_alignment
 };
 
 void __init plat_swiotlb_setup(void)
diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c
index c01bd20..e8f0659 100644
--- a/arch/mips/mm/dma-default.c
+++ b/arch/mips/mm/dma-default.c
@@ -394,6 +394,14 @@ void dma_cache_sync(struct device *dev, void *vaddr, 
size_t size,
 
 EXPORT_SYMBOL(dma_cache_sync);
 
+int mips_dma_get_cache_alignment(struct device *dev)
+{
+   if (plat_device_is_coherent(dev))
+   return 1;
+   else
+   return ARCH_DMA_MINALIGN;
+}
+
 static const struct dma_map_ops mips_default_dma_map_ops = {
.alloc = mips_dma_alloc_coherent,
.free = mips_dma_free_coherent,
@@ -407,7 +415,8 @@ static const struct dma_map_ops mips_default_dma_map_ops = {
.sync_sg_for_cpu = mips_dma_sync_sg_for_cpu,
.sync_sg_for_device = mips_dma_sync_sg_for_device,
.mapping_error = mips_dma_mapping_error,
-   .dma_supported = mips_dma_supported
+   .dma_supported = mips_dma_supported,
+   .get_cache_alignment = mips_dma_get_cache_alignment
 };
 
 const struct dma_map_ops *mips_dma_map_ops = _default_dma_map_ops;
diff --git a/arch/mips/netlogic/common/nlm-dma.c 
b/arch/mips/netlogic/common/nlm-dma.c
index 0ec9d9d..e9a9ddc 100644
--- a/arch/mips/netlogic/common/nlm-dma.c
+++ b/arch/mips/netlogic/common/nlm-dma.c
@@ -79,7 +79,8 @@ const struct dma_map_ops nlm_swiotlb_dma_ops = {
.sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
.sync_sg_for_device = swiotlb_sync_sg_for_device,
.mapping_error = swiotlb_dma_mapping_error,
-   .dma_supported = swiotlb_dma_supported
+   .dma_supported = swiotlb_dma_supported,
+   .get_cache_alignment = mips_dma_get_cache_alignment
 };
 
 void __init plat_swiotlb_setup(void)
-- 
2.7.0

[PATCH V8 4/5] libsas: Align SMP req/resp to dma_get_cache_alignment()

2017-10-17 Thread Huacai Chen

In non-coherent DMA mode, kernel uses cache flushing operations to
maintain I/O coherency, so libsas's SMP request/response should be
aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer and a kernel
structure share a same cache line, and if the kernel structure has
dirty data, cache_invalidate (no writeback) will cause data corruption.

Cc: sta...@vger.kernel.org
Signed-off-by: Huacai Chen 
---
 drivers/scsi/libsas/sas_expander.c | 93 +++---
 1 file changed, 57 insertions(+), 36 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c 
b/drivers/scsi/libsas/sas_expander.c
index 6b4fd23..124a44b 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -164,17 +164,17 @@ static int smp_execute_task(struct domain_device *dev, 
void *req, int req_size,
 
 /* -- Allocations -- */
 
-static inline void *alloc_smp_req(int size)
+static inline void *alloc_smp_req(int size, int align)
 {
-   u8 *p = kzalloc(size, GFP_KERNEL);
+   u8 *p = kzalloc(ALIGN(size, align), GFP_KERNEL);
if (p)
p[0] = SMP_REQUEST;
return p;
 }
 
-static inline void *alloc_smp_resp(int size)
+static inline void *alloc_smp_resp(int size, int align)
 {
-   return kzalloc(size, GFP_KERNEL);
+   return kzalloc(ALIGN(size, align), GFP_KERNEL);
 }
 
 static char sas_route_char(struct domain_device *dev, struct ex_phy *phy)
@@ -403,15 +403,17 @@ static int sas_ex_phy_discover_helper(struct 
domain_device *dev, u8 *disc_req,
 int sas_ex_phy_discover(struct domain_device *dev, int single)
 {
struct expander_device *ex = >ex_dev;
-   int  res = 0;
+   int  res = 0, align;
u8   *disc_req;
u8   *disc_resp;
 
-   disc_req = alloc_smp_req(DISCOVER_REQ_SIZE);
+   align = dma_get_cache_alignment(>phy->dev);
+
+   disc_req = alloc_smp_req(DISCOVER_REQ_SIZE, align);
if (!disc_req)
return -ENOMEM;
 
-   disc_resp = alloc_smp_resp(DISCOVER_RESP_SIZE);
+   disc_resp = alloc_smp_resp(DISCOVER_RESP_SIZE, align);
if (!disc_resp) {
kfree(disc_req);
return -ENOMEM;
@@ -480,14 +482,15 @@ static int sas_ex_general(struct domain_device *dev)
 {
u8 *rg_req;
struct smp_resp *rg_resp;
-   int res;
-   int i;
+   int i, res, align;
 
-   rg_req = alloc_smp_req(RG_REQ_SIZE);
+   align = dma_get_cache_alignment(>phy->dev);
+
+   rg_req = alloc_smp_req(RG_REQ_SIZE, align);
if (!rg_req)
return -ENOMEM;
 
-   rg_resp = alloc_smp_resp(RG_RESP_SIZE);
+   rg_resp = alloc_smp_resp(RG_RESP_SIZE, align);
if (!rg_resp) {
kfree(rg_req);
return -ENOMEM;
@@ -552,13 +555,15 @@ static int sas_ex_manuf_info(struct domain_device *dev)
 {
u8 *mi_req;
u8 *mi_resp;
-   int res;
+   int res, align;
 
-   mi_req = alloc_smp_req(MI_REQ_SIZE);
+   align = dma_get_cache_alignment(>phy->dev);
+
+   mi_req = alloc_smp_req(MI_REQ_SIZE, align);
if (!mi_req)
return -ENOMEM;
 
-   mi_resp = alloc_smp_resp(MI_RESP_SIZE);
+   mi_resp = alloc_smp_resp(MI_RESP_SIZE, align);
if (!mi_resp) {
kfree(mi_req);
return -ENOMEM;
@@ -593,13 +598,15 @@ int sas_smp_phy_control(struct domain_device *dev, int 
phy_id,
 {
u8 *pc_req;
u8 *pc_resp;
-   int res;
+   int res, align;
+
+   align = dma_get_cache_alignment(>phy->dev);
 
-   pc_req = alloc_smp_req(PC_REQ_SIZE);
+   pc_req = alloc_smp_req(PC_REQ_SIZE, align);
if (!pc_req)
return -ENOMEM;
 
-   pc_resp = alloc_smp_resp(PC_RESP_SIZE);
+   pc_resp = alloc_smp_resp(PC_RESP_SIZE, align);
if (!pc_resp) {
kfree(pc_req);
return -ENOMEM;
@@ -664,17 +671,19 @@ static int sas_dev_present_in_domain(struct asd_sas_port 
*port,
 #define RPEL_RESP_SIZE 32
 int sas_smp_get_phy_events(struct sas_phy *phy)
 {
-   int res;
+   int res, align;
u8 *req;
u8 *resp;
struct sas_rphy *rphy = dev_to_rphy(phy->dev.parent);
struct domain_device *dev = sas_find_dev_by_rphy(rphy);
 
-   req = alloc_smp_req(RPEL_REQ_SIZE);
+   align = dma_get_cache_alignment(>dev);
+
+   req = alloc_smp_req(RPEL_REQ_SIZE, align);
if (!req)
return -ENOMEM;
 
-   resp = alloc_smp_resp(RPEL_RESP_SIZE);
+   resp = alloc_smp_resp(RPEL_RESP_SIZE, align);
if (!resp) {
kfree(req);
return -ENOMEM;
@@ -709,7 +718,8 @@ int sas_get_report_phy_sata(struct domain_device *dev, int 
phy_id,
struct smp_resp *rps_resp)
 {
int res;
-   u8 *rps_req = alloc_smp_req(RPS_REQ_SIZE);
+   int align = dma_get_cache_alignment(>phy->dev);
+   u8 *rps_req =

[PATCH V8 3/5] scsi: Align block queue to dma_get_cache_alignment()

2017-10-17 Thread Huacai Chen

In non-coherent DMA mode, kernel uses cache flushing operations to
maintain I/O coherency, so scsi's block queue should be aligned to
ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer and a kernel structure
share a same cache line, and if the kernel structure has dirty data,
cache_invalidate (no writeback) will cause data corruption.

Cc: sta...@vger.kernel.org
Signed-off-by: Huacai Chen 
---
 drivers/scsi/scsi_lib.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9cf6a80..19abc2e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2132,11 +2132,11 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct 
request_queue *q)
q->limits.cluster = 0;
 
/*
-* set a reasonable default alignment on word boundaries: the
-* host and device may alter it using
+* set a reasonable default alignment on word/cacheline boundaries:
+* the host and device may alter it using
 * blk_queue_update_dma_alignment() later.
 */
-   blk_queue_dma_alignment(q, 0x03);
+   blk_queue_dma_alignment(q, max(4, dma_get_cache_alignment(dev)) - 1);
 }
 EXPORT_SYMBOL_GPL(__scsi_init_queue);
 
-- 
2.7.0

[PATCH V8 5/5] libata: Align DMA buffer to dma_get_cache_alignment()

2017-10-17 Thread Huacai Chen

In non-coherent DMA mode, kernel uses cache flushing operations to
maintain I/O coherency, so in ata_do_dev_read_id() the DMA buffer
should be aligned to ARCH_DMA_MINALIGN. Otherwise, If a DMA buffer
and a kernel structure share a same cache line, and if the kernel
structure has dirty data, cache_invalidate (no writeback) will cause
data corruption.

Cc: sta...@vger.kernel.org
Signed-off-by: Huacai Chen 
---
 drivers/ata/libata-core.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index ee4c1ec..e134955 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -1833,8 +1833,19 @@ static u32 ata_pio_mask_no_iordy(const struct ata_device 
*adev)
 unsigned int ata_do_dev_read_id(struct ata_device *dev,
struct ata_taskfile *tf, u16 *id)
 {
-   return ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE,
-id, sizeof(id[0]) * ATA_ID_WORDS, 0);
+   u16 *devid;
+   int res, size = sizeof(u16) * ATA_ID_WORDS;
+
+   if (IS_ALIGNED((unsigned long)id, dma_get_cache_alignment(>tdev)))
+   res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, id, 
size, 0);
+   else {
+   devid = kmalloc(size, GFP_KERNEL);
+   res = ata_exec_internal(dev, tf, NULL, DMA_FROM_DEVICE, devid, 
size, 0);
+   memcpy(id, devid, size);
+   kfree(devid);
+   }
+
+   return res;
 }
 
 /**
-- 
2.7.0

[PATCH V8 1/5] dma-mapping: Rework dma_get_cache_alignment()

2017-10-17 Thread Huacai Chen

Make dma_get_cache_alignment() to accept a 'dev' argument. As a result,
it can return different alignments due to different devices' I/O cache
coherency.

Currently, ARM/ARM64 and MIPS support coherent & noncoherent devices
co-exist. This may be extended in the future, so add a new function
pointer (i.e, get_cache_alignment) in 'struct dma_map_ops' as a generic
solution.

Cc: sta...@vger.kernel.org
Cc: Michael S. Tsirkin 
Cc: Pawel Osciak 
Cc: Marek Szyprowski 
Cc: Kyungmin Park 
Cc: Michael Chan 
Cc: Benjamin Herrenschmidt 
Cc: Ivan Mikhaylov 
Cc: Tariq Toukan 
Cc: Andy Gross 
Cc: Mark A. Greer 
Cc: Robert Baldyga 
Cc: Marek Szyprowski 
Signed-off-by: Huacai Chen 
---
 drivers/infiniband/hw/mthca/mthca_main.c   |  2 +-
 drivers/media/v4l2-core/videobuf2-dma-contig.c |  2 +-
 drivers/net/ethernet/broadcom/b44.c|  8 +++
 drivers/net/ethernet/ibm/emac/core.c   | 22 ++---
 drivers/net/ethernet/ibm/emac/core.h   |  6 ++---
 drivers/net/ethernet/mellanox/mlx4/main.c  |  2 +-
 drivers/spi/spi-qup.c  |  4 ++--
 drivers/tty/serial/mpsc.c  | 33 ++
 drivers/tty/serial/samsung.c   | 14 +--
 include/linux/dma-mapping.h| 17 +
 10 files changed, 68 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/hw/mthca/mthca_main.c 
b/drivers/infiniband/hw/mthca/mthca_main.c
index e36a9bc..078fe8d 100644
--- a/drivers/infiniband/hw/mthca/mthca_main.c
+++ b/drivers/infiniband/hw/mthca/mthca_main.c
@@ -416,7 +416,7 @@ static int mthca_init_icm(struct mthca_dev *mdev,
 
/* CPU writes to non-reserved MTTs, while HCA might DMA to reserved 
mtts */
mdev->limits.reserved_mtts = ALIGN(mdev->limits.reserved_mtts * 
mdev->limits.mtt_seg_size,
-  dma_get_cache_alignment()) / 
mdev->limits.mtt_seg_size;
+  
dma_get_cache_alignment(>pdev->dev)) / mdev->limits.mtt_seg_size;
 
mdev->mr_table.mtt_table = mthca_alloc_icm_table(mdev, 
init_hca->mtt_base,
 
mdev->limits.mtt_seg_size,
diff --git a/drivers/media/v4l2-core/videobuf2-dma-contig.c 
b/drivers/media/v4l2-core/videobuf2-dma-contig.c
index 9f389f3..1f6a9b7 100644
--- a/drivers/media/v4l2-core/videobuf2-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf2-dma-contig.c
@@ -484,7 +484,7 @@ static void *vb2_dc_get_userptr(struct device *dev, 
unsigned long vaddr,
int ret = 0;
struct sg_table *sgt;
unsigned long contig_size;
-   unsigned long dma_align = dma_get_cache_alignment();
+   unsigned long dma_align = dma_get_cache_alignment(dev);
 
/* Only cache aligned DMA transfers are reliable */
if (!IS_ALIGNED(vaddr | size, dma_align)) {
diff --git a/drivers/net/ethernet/broadcom/b44.c 
b/drivers/net/ethernet/broadcom/b44.c
index a1125d1..2f6ffe5 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -2344,6 +2344,10 @@ static int b44_init_one(struct ssb_device *sdev,
struct net_device *dev;
struct b44 *bp;
int err;
+   unsigned int dma_desc_align_size = 
dma_get_cache_alignment(sdev->dma_dev);
+
+   /* Setup paramaters for syncing RX/TX DMA descriptors */
+   dma_desc_sync_size = max_t(unsigned int, dma_desc_align_size, 
sizeof(struct dma_desc));
 
instance++;
 
@@ -2587,12 +2591,8 @@ static inline void b44_pci_exit(void)
 
 static int __init b44_init(void)
 {
-   unsigned int dma_desc_align_size = dma_get_cache_alignment();
int err;
 
-   /* Setup paramaters for syncing RX/TX DMA descriptors */
-   dma_desc_sync_size = max_t(unsigned int, dma_desc_align_size, 
sizeof(struct dma_desc));
-
err = b44_pci_init();
if (err)
return err;
diff --git a/drivers/net/ethernet/ibm/emac/core.c 
b/drivers/net/ethernet/ibm/emac/core.c
index 7feff24..b0c5319 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -1030,8 +1030,9 @@ static int emac_set_mac_address(struct net_device *ndev, 
void *sa)
 
 static int emac_resize_rx_ring(struct emac_instance *dev, int new_mtu)
 {
-   int rx_sync_size = emac_rx_sync_size(new_mtu);
-   int rx_skb_size = emac_rx_skb_size(new_mtu);
+   struct device *dma_dev = >ofdev->dev;
+   int rx_skb_size = emac_rx_skb_size(dma_dev, new_mtu);
+   int rx_sync_size = emac_rx_sync_size(dma_dev, new_mtu);
int i, ret = 0;
int mr1_jumbo_bit_change = 0;
 
@@ -1115,20 +1116,21 @@ static int

Re: [PATCH] scsi: libfc: Convert timers to use timer_setup()

2017-10-17 Thread Johannes Thumshirn

Looks good,
Acked-by: Johannes Thumshirn 
-- 
Johannes Thumshirn  Storage
jthumsh...@suse.de+49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

[PATCH] scsi: Add 'AIX VDASD' to blacklist

2017-10-17 Thread Hannes Reinecke

The AIX VDASD devices do support VPD pages, but implement
only SPC. So set BLIST_TRY_VPD_PAGS to correctly display
the VPD information in sysfs.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_devinfo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index d39b27c..ae2ad84 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -134,6 +134,7 @@ struct scsi_dev_info_list_table {
{"3PARdata", "VV", NULL, BLIST_REPORTLUN2},
{"ADAPTEC", "AACRAID", NULL, BLIST_FORCELUN},
{"ADAPTEC", "Adaptec 5400S", NULL, BLIST_FORCELUN},
+   {"AIX", "VDASD", NULL, BLIST_TRY_VPD_PAGES},
{"AFT PRO", "-IX CF", "0.0>", BLIST_FORCELUN},
{"BELKIN", "USB 2 HS-CF", "1.95",  BLIST_FORCELUN | BLIST_INQUIRY_36},
{"BROWNIE", "1200U3P", NULL, BLIST_NOREPORTLUN},
-- 
1.8.5.6

[PATCH] scsi: Add REPORTLUN2 to EMC SYMMETRIX blacklist entry

2017-10-17 Thread Hannes Reinecke

From: Kurt Garloff 

All EMC SYMMETRIX support REPORT_LUNS, even if configured to report
SCSI-2 for whatever reason.

Signed-off-by: Kurt Garloff 
Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_devinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index a998585..555269b 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -161,7 +161,7 @@ struct scsi_dev_info_list_table {
{"DGC", "RAID", NULL, BLIST_SPARSELUN}, /* Dell PV 650F, storage on LUN 
0 */
{"DGC", "DISK", NULL, BLIST_SPARSELUN}, /* Dell PV 650F, no storage on 
LUN 0 */
{"EMC",  "Invista", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
-   {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN | 
BLIST_FORCELUN},
+   {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN | 
BLIST_REPORTLUN2},
{"EMULEX", "MD21/S2 ESDI", NULL, BLIST_SINGLELUN},
{"easyRAID", "16P", NULL, BLIST_NOREPORTLUN},
{"easyRAID", "X6P", NULL, BLIST_NOREPORTLUN},
-- 
1.8.5.6

[PATCH] scsi: Add TRY_VPD_PAGES to HITACHI OPEN-V blacklist entry

2017-10-17 Thread Hannes Reinecke

HITACHI is always supporting VPD pages, even though it's claiming
to support SCSI Revision 3 only.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_devinfo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c
index ae2ad84..a998585 100644
--- a/drivers/scsi/scsi_devinfo.c
+++ b/drivers/scsi/scsi_devinfo.c
@@ -174,7 +174,7 @@ struct scsi_dev_info_list_table {
{"HITACHI", "DF500", "*", BLIST_REPORTLUN2},
{"HITACHI", "DISK-SUBSYSTEM", "*", BLIST_REPORTLUN2},
{"HITACHI", "HUS1530", "*", BLIST_NO_DIF},
-   {"HITACHI", "OPEN-", "*", BLIST_REPORTLUN2},
+   {"HITACHI", "OPEN-", "*", BLIST_REPORTLUN2 | BLIST_TRY_VPD_PAGES},
{"HITACHI", "OP-C-", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
{"HITACHI", "3380-", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
{"HITACHI", "3390-", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
-- 
1.8.5.6

[PATCH] scsi: Always retry internal target error

2017-10-17 Thread Hannes Reinecke

EMC Symmetrix is returning 'internal target error' for a variety
of conditions, most of which will be transient.
So we should always retry it, even with failfast set.
Otherwise we'd get spurious path flaps with multipath.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_error.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5086489..5722c2e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -497,6 +497,16 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
if (sshdr.asc == 0x10) /* DIF */
return SUCCESS;
 
+   if (!strncmp(scmd->device->vendor, "EMC", 3) &&
+   !strncmp(scmd->device->model, "SYMMETRIX", 9) &&
+   (sshdr.asc == 0x44) && (sshdr.ascq == 0x0)) {
+   /*
+* EMC Symmetrix returns 'Internal target failure'
+* for a variety of internal issues, all of which
+* can be recovered by retry.
+*/
+   return ADD_TO_MLQUEUE;
+   }
return NEEDS_RETRY;
case NOT_READY:
case UNIT_ATTENTION:
@@ -1846,6 +1856,9 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
rtn = scsi_check_sense(scmd);
if (rtn == NEEDS_RETRY)
goto maybe_retry;
+   else if (rtn == ADD_TO_MLQUEUE)
+   /* Always enforce a retry for ADD_TO_MLQUEUE */
+   rtn = NEEDS_RETRY;
/* if rtn == FAILED, we have no sense information;
 * returning FAILED will wake the error handler thread
 * to collect the sense and redo the decide
-- 
1.8.5.6

[PATCH] sd: always retry READ CAPACITY for ALUA state transition

2017-10-17 Thread Hannes Reinecke

During ALUA state transitions the device might return
a sense code 02/04/0a (Logical unit not accessible, asymmetric
access state transition). As this is a transient error
we should just retry the READ CAPACITY call until
the state transition finishes and the correct
capacity can be returned.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/sd.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 37daf9a..b4647f5 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2333,6 +2333,11 @@ static int read_capacity_16(struct scsi_disk *sdkp, 
struct scsi_device *sdp,
 * give it one more chance */
if (--reset_retries > 0)
continue;
+   if (sense_valid &&
+   sshdr.sense_key == NOT_READY &&
+   sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
+   /* ALUA state transition; always retry */
+   continue;
}
retries--;
 
@@ -2418,6 +2423,11 @@ static int read_capacity_10(struct scsi_disk *sdkp, 
struct scsi_device *sdp,
 * give it one more chance */
if (--reset_retries > 0)
continue;
+   if (sense_valid &&
+   sshdr.sense_key == NOT_READY &&
+   sshdr.asc == 0x04 && sshdr.ascq == 0x0A)
+   /* ALUA state transition; always retry */
+   continue;
}
retries--;
 
-- 
1.8.5.6

[PATCH] scsi: Handle power-on reset unit attention

2017-10-17 Thread Hannes Reinecke

As per SAM there is a status precedence, with any sense code 29/XX
taking second place just after an ACA ACTIVE status.
Additionally, each target might prefer to not queue any unit
attention conditions, but just report one.
Due to the above this will be that one with the highest precedence.
This results in the sense code 29/XX effectively overwriting any
other unit attention.
Hence we should report the power-on reset to userland so that
it can take appropriate action.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_error.c  | 6 ++
 drivers/scsi/scsi_lib.c| 4 
 include/scsi/scsi_device.h | 3 ++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 5722c2e..f49710a 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -403,6 +403,12 @@ static void scsi_report_sense(struct scsi_device *sdev,
"threshold.\n");
}
 
+   if (sshdr->asc == 0x29) {
+   evt_type = SDEV_EVT_POWER_ON_RESET_OCCURRED;
+   sdev_printk(KERN_WARNING, sdev,
+   "Power-on or device reset occurred\n");
+   }
+
if (sshdr->asc == 0x2a && sshdr->ascq == 0x01) {
evt_type = SDEV_EVT_MODE_PARAMETER_CHANGE_REPORTED;
sdev_printk(KERN_WARNING, sdev,
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c72b97a..e5fcfa8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2743,6 +2743,9 @@ static void scsi_evt_emit(struct scsi_device *sdev, 
struct scsi_event *evt)
case SDEV_EVT_ALUA_STATE_CHANGE_REPORTED:
envp[idx++] = "SDEV_UA=ASYMMETRIC_ACCESS_STATE_CHANGED";
break;
+   case SDEV_EVT_POWER_ON_RESET_OCCURRED:
+   envp[idx++] = "SDEV_UA=POWER_ON_RESET_OCCURRED";
+   break;
default:
/* do nothing */
break;
@@ -2847,6 +2850,7 @@ struct scsi_event *sdev_evt_alloc(enum scsi_device_event 
evt_type,
case SDEV_EVT_MODE_PARAMETER_CHANGE_REPORTED:
case SDEV_EVT_LUN_CHANGE_REPORTED:
case SDEV_EVT_ALUA_STATE_CHANGE_REPORTED:
+   case SDEV_EVT_POWER_ON_RESET_OCCURRED:
default:
/* do nothing */
break;
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 82e93ee..d68985c 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -64,9 +64,10 @@ enum scsi_device_event {
SDEV_EVT_MODE_PARAMETER_CHANGE_REPORTED,/* 2A 01  UA reported */
SDEV_EVT_LUN_CHANGE_REPORTED,   /* 3F 0E  UA reported */
SDEV_EVT_ALUA_STATE_CHANGE_REPORTED,/* 2A 06  UA reported */
+   SDEV_EVT_POWER_ON_RESET_OCCURRED,   /* 29 00  UA reported */
 
SDEV_EVT_FIRST  = SDEV_EVT_MEDIA_CHANGE,
-   SDEV_EVT_LAST   = SDEV_EVT_ALUA_STATE_CHANGE_REPORTED,
+   SDEV_EVT_LAST   = SDEV_EVT_POWER_ON_RESET_OCCURRED,
 
SDEV_EVT_MAXBITS= SDEV_EVT_LAST + 1
 };
-- 
1.8.5.6

[PATCH] scsi: Do not retry invalid function error

2017-10-17 Thread Hannes Reinecke

Hitachi USP-V returns 'Invalid function' when the internal
staging mechanism encountered an error. These errors should
not be retried on another path.

Signed-off-by: Hannes Reinecke 
---
 drivers/scsi/scsi_error.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 3894205..5086489 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -579,6 +579,7 @@ int scsi_check_sense(struct scsi_cmnd *scmd)
case ILLEGAL_REQUEST:
if (sshdr.asc == 0x20 || /* Invalid command operation code */
sshdr.asc == 0x21 || /* Logical block address out of range 
*/
+   sshdr.asc == 0x22 || /* Invalid function */
sshdr.asc == 0x24 || /* Invalid field in cdb */
sshdr.asc == 0x26) { /* Parameter value invalid */
set_host_byte(scmd, DID_TARGET_FAILURE);
-- 
1.8.5.6

Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq_ops

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 03:29 AM, Ming Lei wrote:
> On Mon, Oct 16, 2017 at 01:30:09PM +0200, Hannes Reinecke wrote:
>> On 10/13/2017 07:29 PM, Ming Lei wrote:
>>> On Fri, Oct 13, 2017 at 05:08:52PM +, Bart Van Assche wrote:
 On Sat, 2017-10-14 at 00:45 +0800, Ming Lei wrote:
> On Fri, Oct 13, 2017 at 04:31:04PM +, Bart Van Assche wrote:
>> On Sat, 2017-10-14 at 00:07 +0800, Ming Lei wrote:
>>> Actually it is in hot path, for example, lpfc and qla2xx's queue depth 
>>> is 3,
>>
>> Sorry but I doubt whether that is correct. More in general, I don't know 
>> any modern
>> storage HBA for which the default queue depth is so low.
>
> You can grep:
>
> [ming@ming linux]$ git grep -n cmd_per_lun ./drivers/scsi/ | grep -E 
> "qla2xxx|lpfc"

 Such a low queue depth will result in suboptimal performance for adapters
 that communicate over a storage network. I think that's a bug and that both
 adapters support much higher cmd_per_lun values.

 (+James Smart)

 James, can you explain us why commit 445cf4f4d2aa decreased 
 LPFC_CMD_PER_LUN
 from 30 to 3? Was that perhaps a workaround for a bug in a specific target
 implementation?

 (+Himanshu Madhani)

 Himanshu, do you perhaps know whether it is safe to increase cmd_per_lun 
 for
 the qla2xxx initiator driver to the scsi_host->can_queue value?
>>>
>>> ->can_queue is size of the whole tag space shared by all LUNs, looks it 
>>> isn't
>>> reasonable to increase cmd_per_lun to .can_queue.
>>>
>> '3' is just a starting point; later on it'll be adjusted via
>> scsi_change_depth().
>> Looks like it's not working correctly with blk-mq, though.
> 
> At default, in scsi_alloc_sdev(), q->queue_depth is set as
> host->cmd_per_lun. You are right, q->queue_depth can be adjusted
> later too.
> 
> q->queue_depth is respected in scsi_dev_queue_ready().
> .cmd_per_lun defines the max outstanding cmds for each lun, I
> guess it is respected by some hardware inside.
> 
No, this is purely a linux abstraction. Nothing to do with the hardware.

> For example, I remembered that on lpfc q->queue_depth is 30 because
> the default 'lpfc_lun_queue_depth' is 30. And its .cmd_per_lun is 3.
> Per my observation, this .cmd_per_lun limit is still workable.
> 
Again, these are just some pre-defined values to avoid I/O starvation
when having several LUNs. _if_ we can guarantee I/O fairness between
several (hundreds!) devices all sharing the same tagspace we wouldn't
need these variables.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v9 10/10] block, nvme: Introduce blk_mq_req_flags_t

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 01:29 AM, Bart Van Assche wrote:
> Several block layer and NVMe core functions accept a combination
> of BLK_MQ_REQ_* flags through the 'flags' argument but there is
> no verification at compile time whether the right type of block
> layer flags is passed. Make it possible for sparse to verify this.
> This patch does not change any functionality.
> 
> Signed-off-by: Bart Van Assche 
> Cc: linux-n...@lists.infradead.org
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> Cc: Ming Lei 
> ---
>  block/blk-core.c  | 12 ++--
>  block/blk-mq.c|  4 ++--
>  block/blk-mq.h|  2 +-
>  drivers/nvme/host/core.c  |  5 +++--
>  drivers/nvme/host/nvme.h  |  5 +++--
>  include/linux/blk-mq.h| 17 +++--
>  include/linux/blk_types.h |  2 ++
>  include/linux/blkdev.h|  4 ++--
>  8 files changed, 30 insertions(+), 21 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v9 09/10] block, scsi: Make SCSI quiesce and resume work reliably

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 01:29 AM, Bart Van Assche wrote:
> The contexts from which a SCSI device can be quiesced or resumed are:
> * Writing into /sys/class/scsi_device/*/device/state.
> * SCSI parallel (SPI) domain validation.
> * The SCSI device power management methods. See also scsi_bus_pm_ops.
> 
> It is essential during suspend and resume that neither the filesystem
> state nor the filesystem metadata in RAM changes. This is why while
> the hibernation image is being written or restored that SCSI devices
> are quiesced. The SCSI core quiesces devices through scsi_device_quiesce()
> and scsi_device_resume(). In the SDEV_QUIESCE state execution of
> non-preempt requests is deferred. This is realized by returning
> BLKPREP_DEFER from inside scsi_prep_state_check() for quiesced SCSI
> devices. Avoid that a full queue prevents power management requests
> to be submitted by deferring allocation of non-preempt requests for
> devices in the quiesced state. This patch has been tested by running
> the following commands and by verifying that after resume the fio job
> is still running:
> 
(We've discussed this at ALPSS already:)

How do you ensure that PREEMPT requests are not stuck in the queue
_behind_ non-PREEMPT requests?
Once they are in the queue the request are already allocated, so your
deferred allocation won't work.
_And_ deferred requests will be re-inserted at the head of the queue.
Consequently the PREEMPT request will never ever scheduled during quiesce.
How do you avoid such a scenario?

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v9 08/10] block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 01:29 AM, Bart Van Assche wrote:
> This flag will be used in the next patch to let the block layer
> core know whether or not a SCSI request queue has been quiesced.
> A quiesced SCSI queue namely only processes RQF_PREEMPT requests.
> 
> Signed-off-by: Bart Van Assche 
> Tested-by: Martin Steigerwald 
> Cc: Ming Lei 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---
>  block/blk-core.c   | 30 ++
>  block/blk-mq-debugfs.c |  1 +
>  include/linux/blkdev.h |  6 ++
>  3 files changed, 37 insertions(+)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v9 07/10] ide, scsi: Tell the block layer at request allocation time about preempt requests

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 01:29 AM, Bart Van Assche wrote:
> Convert blk_get_request(q, op, __GFP_RECLAIM) into
> blk_get_request_flags(q, op, BLK_MQ_PREEMPT). This patch does not
> change any functionality.
> 
> Signed-off-by: Bart Van Assche 
> Tested-by: Martin Steigerwald 
> Acked-by: David S. Miller  [ for IDE ]
> Cc: Martin K. Petersen 
> Cc: Ming Lei 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---
>  drivers/ide/ide-pm.c| 4 ++--
>  drivers/scsi/scsi_lib.c | 6 +++---
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v9 05/10] block: Introduce blk_get_request_flags()

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 01:29 AM, Bart Van Assche wrote:
> A side effect of this patch is that the GFP mask that is passed to
> several allocation functions in the legacy block layer is changed
> from GFP_KERNEL into __GFP_DIRECT_RECLAIM.
> 
> Signed-off-by: Bart Van Assche 
> Tested-by: Martin Steigerwald 
> Cc: Christoph Hellwig 
> Cc: Ming Lei 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---
>  block/blk-core.c   | 50 
> +++---
>  include/linux/blkdev.h |  3 +++
>  2 files changed, 38 insertions(+), 15 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v9 06/10] block: Introduce BLK_MQ_REQ_PREEMPT

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 01:29 AM, Bart Van Assche wrote:
> Set RQF_PREEMPT if BLK_MQ_REQ_PREEMPT is passed to
> blk_get_request_flags().
> 
> Signed-off-by: Bart Van Assche 
> Tested-by: Martin Steigerwald 
> Cc: Christoph Hellwig 
> Cc: Ming Lei 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---
>  block/blk-core.c   | 4 +++-
>  block/blk-mq.c | 2 ++
>  include/linux/blk-mq.h | 1 +
>  3 files changed, 6 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v9 04/10] block: Make q_usage_counter also track legacy requests

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 01:28 AM, Bart Van Assche wrote:
> From: Ming Lei 
> 
> This patch makes it possible to pause request allocation for
> the legacy block layer by calling blk_mq_freeze_queue() and
> blk_mq_unfreeze_queue().
> 
> Signed-off-by: Ming Lei 
> [ bvanassche: Combined two patches into one, edited a comment and made sure
>   REQ_NOWAIT is handled properly in blk_old_get_request() ]
> Signed-off-by: Bart Van Assche 
> Reviewed-by: Christoph Hellwig 
> Reviewed-by: Johannes Thumshirn 
> Tested-by: Martin Steigerwald 
> Cc: Ming Lei 
> Cc: Hannes Reinecke 
> ---
>  block/blk-core.c | 12 
>  block/blk-mq.c   | 10 ++
>  2 files changed, 14 insertions(+), 8 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v9 03/10] md: Neither resync nor reshape while the system is frozen

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 01:28 AM, Bart Van Assche wrote:
> Some people use the md driver on laptops and use the suspend and
> resume functionality. Since it is essential that submitting of
> new I/O requests stops before a hibernation image is created,
> interrupt the md resync and reshape actions if the system is
> being frozen. Note: the resync and reshape will restart after
> the system is resumed and a message similar to the following
> will appear in the system log:
> 
> md: md0: data-check interrupted.
> 
> Signed-off-by: Bart Van Assche 
> Reviewed-by: Shaohua Li 
> Tested-by: Martin Steigerwald 
> Cc: linux-r...@vger.kernel.org
> Cc: Ming Lei 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Johannes Thumshirn 
> ---
>  drivers/md/md.c | 50 +-
>  drivers/md/md.h |  8 
>  2 files changed, 57 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v9 02/10] md: Introduce md_stop_all_writes()

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 01:28 AM, Bart Van Assche wrote:
> Introduce md_stop_all_writes() because the next patch will add
> a second caller for this function. This patch does not change
> any functionality.
> 
> Signed-off-by: Bart Van Assche 
> Reviewed-by: Johannes Thumshirn 
> Reviewed-by: Shaohua Li 
> Tested-by: Martin Steigerwald 
> Cc: linux-r...@vger.kernel.org
> Cc: Ming Lei 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> ---
>  drivers/md/md.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 8933cafc212d..b99584e5d6b1 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8937,8 +8937,7 @@ int rdev_clear_badblocks(struct md_rdev *rdev, sector_t 
> s, int sectors,
>  }
>  EXPORT_SYMBOL_GPL(rdev_clear_badblocks);
>  
> -static int md_notify_reboot(struct notifier_block *this,
> - unsigned long code, void *x)
> +static void md_stop_all_writes(void)
>  {
>   struct list_head *tmp;
>   struct mddev *mddev;
> @@ -8962,6 +8961,12 @@ static int md_notify_reboot(struct notifier_block 
> *this,
>*/
>   if (need_delay)
>   mdelay(1000*1);
> +}
> +
> +static int md_notify_reboot(struct notifier_block *this,
> + unsigned long code, void *x)
> +{
> + md_stop_all_writes();
>  
>   return NOTIFY_DONE;
>  }
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v9 01/10] md: Rename md_notifier into md_reboot_notifier

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 01:28 AM, Bart Van Assche wrote:
> This avoids confusion with the pm notifier that will be added
> through a later patch.
> 
> Signed-off-by: Bart Van Assche 
> Reviewed-by: Johannes Thumshirn 
> Reviewed-by: Shaohua Li 
> Tested-by: Martin Steigerwald 
> Cc: linux-r...@vger.kernel.org
> Cc: Ming Lei 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> ---
>  drivers/md/md.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 5d61049e7417..8933cafc212d 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8966,7 +8966,7 @@ static int md_notify_reboot(struct notifier_block *this,
>   return NOTIFY_DONE;
>  }
>  
> -static struct notifier_block md_notifier = {
> +static struct notifier_block md_reboot_notifier = {
>   .notifier_call  = md_notify_reboot,
>   .next   = NULL,
>   .priority   = INT_MAX, /* before any real devices */
> @@ -9003,7 +9003,7 @@ static int __init md_init(void)
>   blk_register_region(MKDEV(mdp_major, 0), 1UL<   md_probe, NULL, NULL);
>  
> - register_reboot_notifier(_notifier);
> + register_reboot_notifier(_reboot_notifier);
>   raid_table_header = register_sysctl_table(raid_root_table);
>  
>   md_geninit();
> @@ -9243,7 +9243,7 @@ static __exit void md_exit(void)
>  
>   unregister_blkdev(MD_MAJOR,"md");
>   unregister_blkdev(mdp_major, "mdp");
> - unregister_reboot_notifier(_notifier);
> + unregister_reboot_notifier(_reboot_notifier);
>   unregister_sysctl_table(raid_table_header);
>  
>   /* We cannot unload the modules while some process is
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v2 8/8] scsi/pmcraid: Use sgl_alloc_order() and sgl_free_order()

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 12:49 AM, Bart Van Assche wrote:
> Use the sgl_alloc_order() and sgl_free_order() functions instead
> of open coding these functions.
> 
> Signed-off-by: Bart Van Assche 
> Reviewed-by: Johannes Thumshirn 
> Cc: linux-scsi@vger.kernel.org
> Cc: Martin K. Petersen 
> Cc: Anil Ravindranath 
> ---
>  drivers/scsi/Kconfig   |  1 +
>  drivers/scsi/pmcraid.c | 43 ---
>  drivers/scsi/pmcraid.h |  2 +-
>  3 files changed, 6 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
> index d11e75e76195..632200ec36a6 100644
> --- a/drivers/scsi/Kconfig
> +++ b/drivers/scsi/Kconfig
> @@ -1576,6 +1576,7 @@ config ZFCP
>  config SCSI_PMCRAID
>   tristate "PMC SIERRA Linux MaxRAID adapter support"
>   depends on PCI && SCSI && NET
> + select SGL_ALLOC
>   ---help---
> This driver supports the PMC SIERRA MaxRAID adapters.
>  
> diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c
> index b4d6cd8cd1ad..95fc0352f9bb 100644
> --- a/drivers/scsi/pmcraid.c
> +++ b/drivers/scsi/pmcraid.c
> @@ -3232,12 +3232,7 @@ static int pmcraid_build_ioadl(
>   */
>  static void pmcraid_free_sglist(struct pmcraid_sglist *sglist)
>  {
> - int i;
> -
> - for (i = 0; i < sglist->num_sg; i++)
> - __free_pages(sg_page(&(sglist->scatterlist[i])),
> -  sglist->order);
> -
> + sgl_free_order(sglist->scatterlist, sglist->order);
>   kfree(sglist);
>  }
>  
> @@ -3254,50 +3249,20 @@ static void pmcraid_free_sglist(struct pmcraid_sglist 
> *sglist)
>  static struct pmcraid_sglist *pmcraid_alloc_sglist(int buflen)
>  {
>   struct pmcraid_sglist *sglist;
> - struct scatterlist *scatterlist;
> - struct page *page;
> - int num_elem, i, j;
>   int sg_size;
>   int order;
> - int bsize_elem;
>  
>   sg_size = buflen / (PMCRAID_MAX_IOADLS - 1);
>   order = (sg_size > 0) ? get_order(sg_size) : 0;
> - bsize_elem = PAGE_SIZE * (1 << order);
> -
> - /* Determine the actual number of sg entries needed */
> - if (buflen % bsize_elem)
> - num_elem = (buflen / bsize_elem) + 1;
> - else
> - num_elem = buflen / bsize_elem;
>  
>   /* Allocate a scatter/gather list for the DMA */
> - sglist = kzalloc(sizeof(struct pmcraid_sglist) +
> -  (sizeof(struct scatterlist) * (num_elem - 1)),
> -  GFP_KERNEL);
> -
> + sglist = kzalloc(sizeof(struct pmcraid_sglist), GFP_KERNEL);
>   if (sglist == NULL)
>   return NULL;
>  
> - scatterlist = sglist->scatterlist;
> - sg_init_table(scatterlist, num_elem);
>   sglist->order = order;
> - sglist->num_sg = num_elem;
> - sg_size = buflen;
> -
> - for (i = 0; i < num_elem; i++) {
> - page = alloc_pages(GFP_KERNEL|GFP_DMA|__GFP_ZERO, order);
> - if (!page) {
> - for (j = i - 1; j >= 0; j--)
> - __free_pages(sg_page([j]), order);
> - kfree(sglist);
> - return NULL;
> - }
> -
> - sg_set_page([i], page,
> - sg_size < bsize_elem ? sg_size : bsize_elem, 0);
> - sg_size -= bsize_elem;
> - }
> + sgl_alloc_order(buflen, order, false,
> + GFP_KERNEL | GFP_DMA | __GFP_ZERO, >num_sg);
>  
>   return sglist;
>  }
> diff --git a/drivers/scsi/pmcraid.h b/drivers/scsi/pmcraid.h
> index 44da91712115..754ef30927e2 100644
> --- a/drivers/scsi/pmcraid.h
> +++ b/drivers/scsi/pmcraid.h
> @@ -542,7 +542,7 @@ struct pmcraid_sglist {
>   u32 order;
>   u32 num_sg;
>   u32 num_dma_sg;
> - struct scatterlist scatterlist[1];
> + struct scatterlist *scatterlist;
>  };
>  
>  /* page D0 inquiry data of focal point resource */
> 
The comment to ipr applied here, too.
We need input from the pmcraid folks if this is a valid conversion.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v2 7/8] scsi/pmcraid: Remove an unused structure member

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 12:49 AM, Bart Van Assche wrote:
> Signed-off-by: Bart Van Assche 
> Reviewed-by: Johannes Thumshirn 
> Cc: linux-scsi@vger.kernel.org
> Cc: Martin K. Petersen 
> Cc: Anil Ravindranath 
> ---
>  drivers/scsi/pmcraid.h | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/scsi/pmcraid.h b/drivers/scsi/pmcraid.h
> index 8bfac72a242b..44da91712115 100644
> --- a/drivers/scsi/pmcraid.h
> +++ b/drivers/scsi/pmcraid.h
> @@ -542,7 +542,6 @@ struct pmcraid_sglist {
>   u32 order;
>   u32 num_sg;
>   u32 num_dma_sg;
> - u32 buffer_len;
>   struct scatterlist scatterlist[1];
>  };
>  
> 
This actually is the same story that we've had with ipr (and, looking at
the code, those two drivers look awfully similar ...).
pmcraid_sglist looks as if it's a hardware-dependent structure, so just
removing one entry from the middle of a structure might not be a good idea.
But this is something for the pmcraid folks to clarify.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v2 6/8] scsi/ipr: Use sgl_alloc_order() and sgl_free_order()

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 12:49 AM, Bart Van Assche wrote:
> Use the sgl_alloc_order() and sgl_free_order() functions instead
> of open coding these functions.
> 
> Signed-off-by: Bart Van Assche 
> Reviewed-by: Johannes Thumshirn 
> Cc: linux-scsi@vger.kernel.org
> Cc: Martin K. Petersen 
> Cc: Brian King 
> ---
>  drivers/scsi/Kconfig |  1 +
>  drivers/scsi/ipr.c   | 49 -
>  drivers/scsi/ipr.h   |  2 +-
>  3 files changed, 10 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
> index 41366339b950..d11e75e76195 100644
> --- a/drivers/scsi/Kconfig
> +++ b/drivers/scsi/Kconfig
> @@ -1058,6 +1058,7 @@ config SCSI_IPR
>   depends on PCI && SCSI && ATA
>   select FW_LOADER
>   select IRQ_POLL
> + select SGL_ALLOC
>   ---help---
> This driver supports the IBM Power Linux family RAID adapters.
> This includes IBM pSeries 5712, 5703, 5709, and 570A, as well
> diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
> index f838bd73befa..6fed5db6255e 100644
> --- a/drivers/scsi/ipr.c
> +++ b/drivers/scsi/ipr.c
> @@ -3815,10 +3815,8 @@ static struct device_attribute ipr_iopoll_weight_attr 
> = {
>   **/
>  static struct ipr_sglist *ipr_alloc_ucode_buffer(int buf_len)
>  {
> - int sg_size, order, bsize_elem, num_elem, i, j;
> + int sg_size, order;
>   struct ipr_sglist *sglist;
> - struct scatterlist *scatterlist;
> - struct page *page;
>  
>   /* Get the minimum size per scatter/gather element */
>   sg_size = buf_len / (IPR_MAX_SGLIST - 1);
> @@ -3826,45 +3824,18 @@ static struct ipr_sglist *ipr_alloc_ucode_buffer(int 
> buf_len)
>   /* Get the actual size per element */
>   order = get_order(sg_size);
>  
> - /* Determine the actual number of bytes per element */
> - bsize_elem = PAGE_SIZE * (1 << order);
> -
> - /* Determine the actual number of sg entries needed */
> - if (buf_len % bsize_elem)
> - num_elem = (buf_len / bsize_elem) + 1;
> - else
> - num_elem = buf_len / bsize_elem;
> -
>   /* Allocate a scatter/gather list for the DMA */
> - sglist = kzalloc(sizeof(struct ipr_sglist) +
> -  (sizeof(struct scatterlist) * (num_elem - 1)),
> -  GFP_KERNEL);
> -
> + sglist = kzalloc(sizeof(struct ipr_sglist), GFP_KERNEL);
>   if (sglist == NULL) {
>   ipr_trace;
>   return NULL;
>   }
> -
> - scatterlist = sglist->scatterlist;
> - sg_init_table(scatterlist, num_elem);
> -
>   sglist->order = order;
> - sglist->num_sg = num_elem;
> -
> - /* Allocate a bunch of sg elements */
> - for (i = 0; i < num_elem; i++) {
> - page = alloc_pages(GFP_KERNEL, order);
> - if (!page) {
> - ipr_trace;
> -
> - /* Free up what we already allocated */
> - for (j = i - 1; j >= 0; j--)
> - __free_pages(sg_page([j]), order);
> - kfree(sglist);
> - return NULL;
> - }
> -
> - sg_set_page([i], page, 0, 0);
> + sglist->scatterlist = sgl_alloc_order(buf_len, order, false, GFP_KERNEL,
> +   >num_sg);
> + if (!sglist->scatterlist) {
> + kfree(sglist);
> + return NULL;
>   }
>  
>   return sglist;
> @@ -3882,11 +3853,7 @@ static struct ipr_sglist *ipr_alloc_ucode_buffer(int 
> buf_len)
>   **/
>  static void ipr_free_ucode_buffer(struct ipr_sglist *sglist)
>  {
> - int i;
> -
> - for (i = 0; i < sglist->num_sg; i++)
> - __free_pages(sg_page(>scatterlist[i]), sglist->order);
> -
> + sgl_free_order(sglist->scatterlist, sglist->order);
>   kfree(sglist);
>  }
>  
> diff --git a/drivers/scsi/ipr.h b/drivers/scsi/ipr.h
> index c7f0e9e3cd7d..93570734cbfb 100644
> --- a/drivers/scsi/ipr.h
> +++ b/drivers/scsi/ipr.h
> @@ -1454,7 +1454,7 @@ struct ipr_sglist {
>   u32 num_sg;
>   u32 num_dma_sg;
>   u32 buffer_len;
> - struct scatterlist scatterlist[1];
> + struct scatterlist *scatterlist;
>  };
>  
>  enum ipr_sdt_state {
> 
Not sure if this is a valid conversion.
Originally the driver would allocate a single buffer; with this buffer
we have two distinct buffers.
Given that this is used to download the microcode I'm not sure if this
isn't a hardware-dependent structure which requires a single buffer
including the sglist.
Brian, can you shed some light here?

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v2 5/8] target: Use sgl_alloc_order() and sgl_free()

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 12:49 AM, Bart Van Assche wrote:
> Use the sgl_alloc_order() and sgl_free() functions instead of open
> coding these functions.
> 
> Signed-off-by: Bart Van Assche 
> Cc: Nicholas A. Bellinger 
> Cc: Christoph Hellwig 
> Cc: Hannes Reinecke 
> Cc: Sagi Grimberg 
> ---
>  drivers/target/Kconfig |  1 +
>  drivers/target/target_core_transport.c | 46 
> +++---
>  2 files changed, 5 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/target/Kconfig b/drivers/target/Kconfig
> index e2bc99980f75..4c44d7bed01a 100644
> --- a/drivers/target/Kconfig
> +++ b/drivers/target/Kconfig
> @@ -5,6 +5,7 @@ menuconfig TARGET_CORE
>   select CONFIGFS_FS
>   select CRC_T10DIF
>   select BLK_SCSI_REQUEST # only for scsi_command_size_tbl..
> + select SGL_ALLOC
>   default n
>   help
>   Say Y or M here to enable the TCM Storage Engine and ConfigFS enabled
> diff --git a/drivers/target/target_core_transport.c 
> b/drivers/target/target_core_transport.c
> index 836d552b0385..9bbd08be9d60 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -2293,13 +2293,7 @@ static void target_complete_ok_work(struct work_struct 
> *work)
>  
>  void target_free_sgl(struct scatterlist *sgl, int nents)
>  {
> - struct scatterlist *sg;
> - int count;
> -
> - for_each_sg(sgl, sg, nents, count)
> - __free_page(sg_page(sg));
> -
> - kfree(sgl);
> + sgl_free(sgl);
>  }
>  EXPORT_SYMBOL(target_free_sgl);
>  
> @@ -2423,42 +2417,10 @@ int
>  target_alloc_sgl(struct scatterlist **sgl, unsigned int *nents, u32 length,
>bool zero_page, bool chainable)
>  {
> - struct scatterlist *sg;
> - struct page *page;
> - gfp_t zero_flag = (zero_page) ? __GFP_ZERO : 0;
> - unsigned int nalloc, nent;
> - int i = 0;
> -
> - nalloc = nent = DIV_ROUND_UP(length, PAGE_SIZE);
> - if (chainable)
> - nalloc++;
> - sg = kmalloc_array(nalloc, sizeof(struct scatterlist), GFP_KERNEL);
> - if (!sg)
> - return -ENOMEM;
> + gfp_t gfp = GFP_KERNEL | (zero_page ? __GFP_ZERO : 0);
>  
> - sg_init_table(sg, nalloc);
> -
> - while (length) {
> - u32 page_len = min_t(u32, length, PAGE_SIZE);
> - page = alloc_page(GFP_KERNEL | zero_flag);
> - if (!page)
> - goto out;
> -
> - sg_set_page([i], page, page_len, 0);
> - length -= page_len;
> - i++;
> - }
> - *sgl = sg;
> - *nents = nent;
> - return 0;
> -
> -out:
> - while (i > 0) {
> - i--;
> - __free_page(sg_page([i]));
> - }
> - kfree(sg);
> - return -ENOMEM;
> + *sgl = sgl_alloc_order(length, 0, chainable, gfp, nents);
> + return *sgl ? 0 : -ENOMEM;
>  }
>  EXPORT_SYMBOL(target_alloc_sgl);
>  
> The calling convention from target_alloc_sgl() is decidedly dodgy.
Can't we convert it into returning the sgl, and remove the first parameter?

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v2 4/8] nvmet/rdma: Use sgl_alloc() and sgl_free()

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 12:49 AM, Bart Van Assche wrote:
> Use the sgl_alloc() and sgl_free() functions instead of open coding
> these functions.
> 
> Signed-off-by: Bart Van Assche 
> Reviewed-by: Johannes Thumshirn 
> Cc: Keith Busch 
> Cc: Christoph Hellwig 
> Cc: Sagi Grimberg 
> ---
>  drivers/nvme/target/Kconfig |  1 +
>  drivers/nvme/target/rdma.c  | 63 
> +++--
>  2 files changed, 5 insertions(+), 59 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v2 3/8] nvmet/fc: Use sgl_alloc() and sgl_free()

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 12:49 AM, Bart Van Assche wrote:
> Use the sgl_alloc() and sgl_free() functions instead of open coding
> these functions.
> 
> Signed-off-by: Bart Van Assche 
> Reviewed-by: Johannes Thumshirn 
> Cc: Keith Busch 
> Cc: Christoph Hellwig 
> Cc: James Smart 
> Cc: Sagi Grimberg 
> ---
>  drivers/nvme/target/Kconfig |  1 +
>  drivers/nvme/target/fc.c| 36 ++--
>  2 files changed, 3 insertions(+), 34 deletions(-)
> 
Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v2 2/8] crypto: scompress - use sgl_alloc() and sgl_free()

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 12:49 AM, Bart Van Assche wrote:
> Use the sgl_alloc() and sgl_free() functions instead of open coding
> these functions.
> 
> Signed-off-by: Bart Van Assche 
> Cc: Ard Biesheuvel 
> Cc: Herbert Xu 
> ---
>  crypto/Kconfig |  1 +
>  crypto/scompress.c | 51 ++-
>  2 files changed, 3 insertions(+), 49 deletions(-)
> Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

Re: [PATCH v2 1/8] lib/scatterlist: Introduce sgl_alloc() and sgl_free()

2017-10-17 Thread Hannes Reinecke

On 10/17/2017 12:49 AM, Bart Van Assche wrote:
> Many kernel drivers contain code that allocates and frees both a
> scatterlist and the pages that populate that scatterlist.
> Introduce functions in lib/scatterlist.c that perform these tasks
> instead of duplicating this functionality in multiple drivers.
> Only include these functions in the build if CONFIG_SGL_ALLOC=y
> to avoid that the kernel size increases if this functionality is
> not used.
> 
> Signed-off-by: Bart Van Assche 
> ---
>  include/linux/scatterlist.h |  10 +
>  lib/Kconfig |   4 ++
>  lib/scatterlist.c   | 105 
> 
>  3 files changed, 119 insertions(+)
> 

Reviewed-by: Hannes Reinecke 

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

1 2 >

100 matches

Mail list logo