from:"Benjamin Block"

Re: [PATCH 13/20] scsi: zfcp: Manual replacement of the deprecated strlcpy() with return values

2021-02-22 Thread Benjamin Block

On Mon, Feb 22, 2021 at 04:12:24PM +0100, Romain Perier wrote:
> The strlcpy() reads the entire source buffer first, it is dangerous if
> the source buffer lenght is unbounded or possibility non NULL-terminated.
> It can lead to linear read overflows, crashes, etc...
> 
> As recommended in the deprecated interfaces [1], it should be replaced
> by strscpy.
> 
> This commit replaces all calls to strlcpy that handle the return values
> by the corresponding strscpy calls with new handling of the return
> values (as it is quite different between the two functions).
> 
> [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
> 
> Signed-off-by: Romain Perier 
> ---
>  drivers/s390/scsi/zfcp_fc.c |8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_fc.c b/drivers/s390/scsi/zfcp_fc.c
> index d24cafe02708..8a65241011b9 100644
> --- a/drivers/s390/scsi/zfcp_fc.c
> +++ b/drivers/s390/scsi/zfcp_fc.c
> @@ -877,14 +877,16 @@ static void zfcp_fc_rspn(struct zfcp_adapter *adapter,
>   struct zfcp_fsf_ct_els *ct_els = _req->ct_els;
>   struct zfcp_fc_rspn_req *rspn_req = _req->u.rspn.req;
>   struct fc_ct_hdr *rspn_rsp = _req->u.rspn.rsp;
> - int ret, len;
> + int ret;
> + ssize_t len;
>  
>   zfcp_fc_ct_ns_init(_req->ct_hdr, FC_NS_RSPN_ID,
>  FC_SYMBOLIC_NAME_SIZE);
>   hton24(rspn_req->rspn.fr_fid.fp_fid, fc_host_port_id(shost));
> - len = strlcpy(rspn_req->rspn.fr_name, fc_host_symbolic_name(shost),
> + len = strscpy(rspn_req->rspn.fr_name, fc_host_symbolic_name(shost),
> FC_SYMBOLIC_NAME_SIZE);
> - rspn_req->rspn.fr_name_len = len;
> + if (len != -E2BIG)
> + rspn_req->rspn.fr_name_len = len;

That is a bug. Leaving `rspn.fr_name_len` uninitialized defeats the
purpose of sending a RSPN.

How about:
if (len == -E2BIG)
rspn_req->rspn.fr_name_len = FC_SYMBOLIC_NAME_SIZE - 1;
else
rspn_req->rspn.fr_name_len = len;

>  
>   sg_init_one(_req->sg_req, rspn_req, sizeof(*rspn_req));
>   sg_init_one(_req->sg_rsp, rspn_rsp, sizeof(*rspn_rsp));
> 

-- 
Best Regards, Benjamin Block  / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH/https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH v2] scsi: zfcp: move the position of put_device

2020-12-14 Thread Benjamin Block

On Tue, Dec 01, 2020 at 10:47:16AM +0800, Qinglang Miao wrote:
> Have the `put_device()` call after `device_unregister()` in both
> `zfcp_unit_remove()` and `zfcp_sysfs_port_remove_store()` to make
> it more natural, for put_device() ought to be the last time we
> touch the object in both functions.
> 
> And add comments after put_device to make codes clearer.
> 
> Suggested-by: Steffen Maier 
> Suggested-by: Benjamin Block 
> Signed-off-by: Qinglang Miao 
> ---
>  v2: add comments after put_device as Steffen suggested.
> 
>  drivers/s390/scsi/zfcp_sysfs.c | 4 ++--
>  drivers/s390/scsi/zfcp_unit.c  | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 

Got a bit delayed.
Looks good, I queue it, and we send it with the changes for 5.12 once
the merge window for 5.11 is over.

Thanks.

-- 
Best Regards, Benjamin Block  / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH/https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH] scsi: zfcp: fix use-after-free in zfcp_unit_remove

2020-11-26 Thread Benjamin Block

On Thu, Nov 26, 2020 at 08:07:32PM +0800, Qinglang Miao wrote:
> 在 2020/11/26 17:42, Benjamin Block 写道:
> > On Thu, Nov 26, 2020 at 09:13:53AM +0100, Cornelia Huck wrote:
> > > On Thu, 26 Nov 2020 09:27:41 +0800
> > > Qinglang Miao  wrote:
> > > > 在 2020/11/26 1:06, Benjamin Block 写道:
> > > > > On Fri, Nov 20, 2020 at 03:48:54PM +0800, Qinglang Miao wrote:

> > Let's go by example. If we assume the reference count of `unit->dev` is
> > R, and the function starts with R = 1 (otherwise the deivce would've
> > been freed already), we get:
> > 
> >  int zfcp_unit_remove(struct zfcp_port *port, u64 fcp_lun)
> >  {
> > struct zfcp_unit *unit;
> > struct scsi_device *sdev;
> > write_lock_irq(>unit_list_lock);
> > // unit->dev (R = 1)
> > unit = _zfcp_unit_find(port, fcp_lun);
> > // get_device(>dev)
> > // unit->dev (R = 2)
> > if (unit)
> > list_del(>list);
> > write_unlock_irq(>unit_list_lock);
> > if (!unit)
> > return -EINVAL;
> > sdev = zfcp_unit_sdev(unit);
> > if (sdev) {
> > scsi_remove_device(sdev);
> > scsi_device_put(sdev);
> > }
> > // unit->dev (R = 2)
> > put_device(>dev);
> > // unit->dev (R = 1)
> > device_unregister(>dev);
> > // unit->dev (R = 0)
> > return 0;
> >  }
> > 
> > If we now apply this patch, we'd end up with R = 1 after
> > `device_unregister()`, and the device would not be properly removed.
> > 
> > If you still think that's wrong, then you'll need to better explain why.
> > 
> Hi Banjamin and Cornelia,
> 
> Your replies make me reliaze that I've been holding a mistake understanding
> of put_device() as well as reference count.
> 
> Thanks for you two's patient explanation !!
> 
> BTW, should I send a v2 on these two patches to move the position of
> put_device()?

Feel free to do so.

I think having the `put_device()` call after `device_unregister()` in
both `zfcp_unit_remove()` and `zfcp_sysfs_port_remove_store()` is more
natural, because it ought to be the last time we touch the object in
both functions.


-- 
Best Regards, Benjamin Block  / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH/https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH] scsi: zfcp: fix use-after-free in zfcp_unit_remove

2020-11-26 Thread Benjamin Block

On Thu, Nov 26, 2020 at 09:13:53AM +0100, Cornelia Huck wrote:
> On Thu, 26 Nov 2020 09:27:41 +0800
> Qinglang Miao  wrote:
> 
> > 在 2020/11/26 1:06, Benjamin Block 写道:
> > > On Fri, Nov 20, 2020 at 03:48:54PM +0800, Qinglang Miao wrote:  
> > >> kfree(port) is called in put_device(>dev) so that following
> > >> use would cause use-after-free bug.
> > >>
> > >> The former put_device is redundant for device_unregister contains
> > >> put_device already. So just remove it to fix this.
> > >>
> > >> Fixes: 86bdf218a717 ("[SCSI] zfcp: cleanup unit sysfs attribute usage")
> > >> Reported-by: Hulk Robot 
> > >> Signed-off-by: Qinglang Miao 
> > >> ---
> > >>   drivers/s390/scsi/zfcp_unit.c | 2 --
> > >>   1 file changed, 2 deletions(-)
> > >>
> > >> diff --git a/drivers/s390/scsi/zfcp_unit.c 
> > >> b/drivers/s390/scsi/zfcp_unit.c
> > >> index e67bf7388..664b77853 100644
> > >> --- a/drivers/s390/scsi/zfcp_unit.c
> > >> +++ b/drivers/s390/scsi/zfcp_unit.c
> > >> @@ -255,8 +255,6 @@ int zfcp_unit_remove(struct zfcp_port *port, u64 
> > >> fcp_lun)
> > >>  scsi_device_put(sdev);
> > >>  }
> > >>   
> > >> -put_device(>dev);
> > >> -
> > >>  device_unregister(>dev);  
> > >>  >>  return 0;  
> > > 
> > > Same as in the other mail for `zfcp_sysfs_port_remove_store()`. We
> > > explicitly get a new ref in `_zfcp_unit_find()`, so we also need to put
> > > that away again.
> > >  
> > Sorry, Benjamin, I don't think so, because device_unregister calls 
> > put_device inside.
> > 
> > It seem's that another put_device before or after device_unregister is 
> > useless and even might cause an use-after-free.
> 
> The issue here (and in the other patches that I had commented on) is
> that the references have different origins. device_register() acquires
> a reference, and that reference is given up when you call
> device_unregister(). However, the code here grabs an extra reference,
> and it of course has to give it up again when it no longer needs it.
> 
> This is something that is not that easy to spot by an automated check,
> I guess?
> 

Indeed.

I do think the two patches for zfcp have merit, but not by simply
removing the put_device(), but by moving it.

For this patch in particular, I'd think the "proper logic" would be to
move the `put_device()` to after the `device_unregister()`:

device_unregister(>dev);
put_device(>dev);

return 0;

As Cornelia pointed out, the extra `get_device()` we do in
`_zfcp_unit_find()` needs to be reversed, otherwise we have a dangling
reference and probably some sort of memory-/resource-leak.

Let's go by example. If we assume the reference count of `unit->dev` is
R, and the function starts with R = 1 (otherwise the deivce would've
been freed already), we get:

int zfcp_unit_remove(struct zfcp_port *port, u64 fcp_lun)
{
struct zfcp_unit *unit;
struct scsi_device *sdev;

write_lock_irq(>unit_list_lock);
// unit->dev (R = 1)
unit = _zfcp_unit_find(port, fcp_lun);
// get_device(>dev)
// unit->dev (R = 2)
if (unit)
list_del(>list);
write_unlock_irq(>unit_list_lock);

if (!unit)
return -EINVAL;

sdev = zfcp_unit_sdev(unit);
if (sdev) {
scsi_remove_device(sdev);
scsi_device_put(sdev);
}

// unit->dev (R = 2)
put_device(>dev);
// unit->dev (R = 1)
device_unregister(>dev);
// unit->dev (R = 0)

return 0;
}

If we now apply this patch, we'd end up with R = 1 after
`device_unregister()`, and the device would not be properly removed.

If you still think that's wrong, then you'll need to better explain why.


-- 
Best Regards, Benjamin Block  / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH/https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH] scsi: zfcp: fix use-after-free in zfcp_unit_remove

2020-11-25 Thread Benjamin Block

On Fri, Nov 20, 2020 at 03:48:54PM +0800, Qinglang Miao wrote:
> kfree(port) is called in put_device(>dev) so that following
> use would cause use-after-free bug.
> 
> The former put_device is redundant for device_unregister contains
> put_device already. So just remove it to fix this.
> 
> Fixes: 86bdf218a717 ("[SCSI] zfcp: cleanup unit sysfs attribute usage")
> Reported-by: Hulk Robot 
> Signed-off-by: Qinglang Miao 
> ---
>  drivers/s390/scsi/zfcp_unit.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_unit.c b/drivers/s390/scsi/zfcp_unit.c
> index e67bf7388..664b77853 100644
> --- a/drivers/s390/scsi/zfcp_unit.c
> +++ b/drivers/s390/scsi/zfcp_unit.c
> @@ -255,8 +255,6 @@ int zfcp_unit_remove(struct zfcp_port *port, u64 fcp_lun)
>   scsi_device_put(sdev);
>   }
>  
> - put_device(>dev);
> -
>   device_unregister(>dev);
>  
>   return 0;

Same as in the other mail for `zfcp_sysfs_port_remove_store()`. We
explicitly get a new ref in `_zfcp_unit_find()`, so we also need to put
that away again.

-- 
Best Regards, Benjamin Block  / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH/https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH] scsi: zfcp: fix use-after-free in zfcp_sysfs_port_remove_store

2020-11-25 Thread Benjamin Block

On Fri, Nov 20, 2020 at 03:48:53PM +0800, Qinglang Miao wrote:
> kfree(port) is called in put_device(>dev) so that following
> use would cause use-after-free bug.
> 
> the former put_device is redundant for device_unregister contains
> put_device already. So just remove it to fix this.
> 
> Fixes: 83d4e1c33d93 ("[SCSI] zfcp: cleanup port sysfs attribute usage")
> Reported-by: Hulk Robot 
> Signed-off-by: Qinglang Miao 
> ---
>  drivers/s390/scsi/zfcp_sysfs.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_sysfs.c b/drivers/s390/scsi/zfcp_sysfs.c
> index 8d9662e8b..06285e452 100644
> --- a/drivers/s390/scsi/zfcp_sysfs.c
> +++ b/drivers/s390/scsi/zfcp_sysfs.c
> @@ -327,8 +327,6 @@ static ssize_t zfcp_sysfs_port_remove_store(struct device 
> *dev,
>   list_del(>list);
>   write_unlock_irq(>port_list_lock);
>  
> - put_device(>dev);
> -
>   zfcp_erp_port_shutdown(port, 0, "syprs_1");
>   device_unregister(>dev);
>   out:

Hmm, the placement of the put_device() is indeed strange, now that I
think about it. But just removing it and then having a dangling
reference also seems wrong - we do get one explicitly in
`zfcp_get_port_by_wwpn()`.

My first thought would be to move it after the `device_unregister()`
call, and before the `out:` label.


-- 
Best Regards, Benjamin Block  / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH/https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH 2/2] scsi: register sysfs for scsi/iscsi workqueues

2020-06-23 Thread Benjamin Block

On Mon, Jun 22, 2020 at 10:40:09AM -0500, Mike Christie wrote:
> On 6/11/20 5:07 AM, Bob Liu wrote:
> > This patch enable setting cpu affinity through "cpumask" for below
> > scsi/iscsi workqueues, so as to get better isolation.
> > - scsi_wq_*
> > - scsi_tmf_*
> > - iscsi_q_xx
> > - iscsi_eh
> > 
> > Signed-off-by: Bob Liu 
> > ---
> >   drivers/scsi/hosts.c| 4 ++--
> >   drivers/scsi/libiscsi.c | 2 +-
> >   drivers/scsi/scsi_transport_iscsi.c | 2 +-
> >   3 files changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> > index 1d669e4..4b9f80d 100644
> > --- a/drivers/scsi/hosts.c
> > +++ b/drivers/scsi/hosts.c
> > @@ -272,7 +272,7 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, 
> > struct device *dev,
> > if (shost->transportt->create_work_queue) {
> > snprintf(shost->work_q_name, sizeof(shost->work_q_name),
> >  "scsi_wq_%d", shost->host_no);
> > -   shost->work_q = create_singlethread_workqueue(
> > +   shost->work_q = create_singlethread_workqueue_noorder(
> > shost->work_q_name);
> > if (!shost->work_q) {
> > error = -EINVAL;
> 
> This patch seems ok for the iscsi, fc, tmf, and non transport class scan
> uses. We are either heavy handed with flushes or did not need ordering.
> 
> I don't know about the zfcp use though, so I cc'd  the developers listed as
> maintainers. It looks like for zfcp we can do:

Thx for the notice.

> 
> zfcp_scsi_rport_register->fc_remote_port_add->fc_remote_port_create->scsi_queue_work
> to scan the scsi target on the rport.
> 
> and then zfcp_scsi_rport_register can call zfcp_unit_queue_scsi_scan->
> scsi_queue_work which will scan for a specific lun.
> 
> It looks ok if those are not ordered, but I would get their review to make
> sure.

I am not aware of any temporal requirements of those LUN-scans, so I
think making them not explicitly ordered shouldn't hurt us.

The target scan itself is protected again by `shost->scan_mutex`.. so
all fine I think.

-- 
Best Regards, Benjamin Block  / Linux on IBM Z Kernel Development / IBM Systems
IBM Deutschland Research & Development GmbH/https://www.ibm.com/privacy
Vorsitz. AufsR.: Gregor Pillen /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

[PATCH v2 1/2] scsi: replace GFP_ATOMIC with GFP_KERNEL for allocations in scsi_scan.c

2019-02-21 Thread Benjamin Block

We had a test-report where, under memory pressure, adding LUNs to the
systems would fail (the tests add LUNs strictly in sequence):

[ 5525.853432] scsi 0:0:1:1088045124: Direct-Access IBM  2107900
  .148 PQ: 0 ANSI: 5
[ 5525.853826] scsi 0:0:1:1088045124: alua: supports implicit TPGS
[ 5525.853830] scsi 0:0:1:1088045124: alua: device 
naa.6005076303ffd32744da port group 0 rel port 43
[ 5525.853931] sd 0:0:1:1088045124: Attached scsi generic sg10 type 0
[ 5525.854075] sd 0:0:1:1088045124: [sdk] Disabling DIF Type 1 protection
[ 5525.855495] sd 0:0:1:1088045124: [sdk] 2097152 512-byte logical blocks: 
(1.07 GB/1.00 GiB)
[ 5525.855606] sd 0:0:1:1088045124: [sdk] Write Protect is off
[ 5525.855609] sd 0:0:1:1088045124: [sdk] Mode Sense: ed 00 00 08
[ 5525.855795] sd 0:0:1:1088045124: [sdk] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA
[ 5525.857838]  sdk: sdk1
[ 5525.859468] sd 0:0:1:1088045124: [sdk] Attached SCSI disk
[ 5525.865073] sd 0:0:1:1088045124: alua: transition timeout set to 60 seconds
[ 5525.865078] sd 0:0:1:1088045124: alua: port group 00 state A preferred 
supports tolusnA
[ 5526.015070] sd 0:0:1:1088045124: alua: port group 00 state A preferred 
supports tolusnA
[ 5526.015213] sd 0:0:1:1088045124: alua: port group 00 state A preferred 
supports tolusnA
[ 5526.587439] scsi_alloc_sdev: Allocation failure during SCSI scanning, some 
SCSI devices might not be configured
[ 5526.588562] scsi_alloc_sdev: Allocation failure during SCSI scanning, some 
SCSI devices might not be configured

Looking at the code of scsi_alloc_sdev(), and all the calling contexts,
there seems to be no reason to use GFP_ATMOIC here. All the different
call-contexts use a mutex at some point, and nothing in between that
requires no sleeping, as far as I could see. Additionally, the code that
later allocates the block queue for the device (scsi_mq_alloc_queue())
already uses GFP_KERNEL.

There are similar allocations in two other functions:
scsi_probe_and_add_lun(), and scsi_add_lun(),; that can also be done
with GFP_KERNEL.

Here is the contexts for the three functions so far:

scsi_alloc_sdev()
scsi_probe_and_add_lun()
scsi_sequential_lun_scan()
__scsi_scan_target()
scsi_scan_target()
mutex_lock()
scsi_scan_channel()
scsi_scan_host_selected()
mutex_lock()
scsi_report_lun_scan()
__scsi_scan_target()
...
__scsi_add_device()
mutex_lock()
__scsi_scan_target()
...
scsi_report_lun_scan()
...
scsi_get_host_dev()
mutex_lock()

scsi_probe_and_add_lun()
...

scsi_add_lun()
scsi_probe_and_add_lun()
...

So replace all these, and give them a bit of a better chance to succeed,
with more chances of reclaim.

Signed-off-by: Benjamin Block 
---
 drivers/scsi/scsi_scan.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index dd0d516f65e2..53380e07b40e 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -220,7 +220,7 @@ static struct scsi_device *scsi_alloc_sdev(struct 
scsi_target *starget,
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
 
sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
-  GFP_ATOMIC);
+  GFP_KERNEL);
if (!sdev)
goto out;
 
@@ -788,7 +788,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned 
char *inq_result,
 */
sdev->inquiry = kmemdup(inq_result,
max_t(size_t, sdev->inquiry_len, 36),
-   GFP_ATOMIC);
+   GFP_KERNEL);
if (sdev->inquiry == NULL)
return SCSI_SCAN_NO_RESPONSE;
 
@@ -1079,7 +1079,7 @@ static int scsi_probe_and_add_lun(struct scsi_target 
*starget,
if (!sdev)
goto out;
 
-   result = kmalloc(result_len, GFP_ATOMIC |
+   result = kmalloc(result_len, GFP_KERNEL |
((shost->unchecked_isa_dma) ? __GFP_DMA : 0));
if (!result)
goto out_free_sdev;
-- 
2.20.1

[PATCH v2 2/2] scsi: whitespace cleanup in scsi_scan.c

2019-02-21 Thread Benjamin Block

Noticed during editing that vim would remove some trailing spaces.

Signed-off-by: Benjamin Block 
---
 drivers/scsi/scsi_scan.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 53380e07b40e..7e1a6c3dd42c 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -108,7 +108,7 @@ MODULE_PARM_DESC(scan, "sync, async, manual, or none. "
 static unsigned int scsi_inq_timeout = SCSI_TIMEOUT/HZ + 18;
 
 module_param_named(inq_timeout, scsi_inq_timeout, uint, S_IRUGO|S_IWUSR);
-MODULE_PARM_DESC(inq_timeout, 
+MODULE_PARM_DESC(inq_timeout,
 "Timeout (in seconds) waiting for devices to answer INQUIRY."
 " Default is 20. Some devices may need more; most need less.");
 
@@ -604,7 +604,7 @@ static int scsi_probe_lun(struct scsi_device *sdev, 
unsigned char *inq_result,
 * not-ready to ready transition [asc/ascq=0x28/0x0]
 * or power-on, reset [asc/ascq=0x29/0x0], continue.
 * INQUIRY should not yield UNIT_ATTENTION
-* but many buggy devices do so anyway. 
+* but many buggy devices do so anyway.
 */
if (driver_byte(result) == DRIVER_SENSE &&
scsi_sense_valid()) {
@@ -850,7 +850,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned 
char *inq_result,
 * Don't set the device offline here; rather let the upper
 * level drivers eval the PQ to decide whether they should
 * attach. So remove ((inq_result[0] >> 5) & 7) == 1 check.
-*/ 
+*/
 
sdev->inq_periph_qual = (inq_result[0] >> 5) & 7;
sdev->lockable = sdev->removable;
@@ -994,7 +994,7 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned 
char *inq_result,
 }
 
 #ifdef CONFIG_SCSI_LOGGING
-/** 
+/**
  * scsi_inq_str - print INQUIRY data from min to max index, strip trailing 
whitespace
  * @buf:   Output buffer with at least end-first+1 bytes of space
  * @inq:   Inquiry buffer (input)
@@ -1495,7 +1495,7 @@ EXPORT_SYMBOL(__scsi_add_device);
 int scsi_add_device(struct Scsi_Host *host, uint channel,
uint target, u64 lun)
 {
-   struct scsi_device *sdev = 
+   struct scsi_device *sdev =
__scsi_add_device(host, channel, target, lun, NULL);
if (IS_ERR(sdev))
return PTR_ERR(sdev);
-- 
2.20.1

Re: [PATCH] scsi: replace GFP_ATOMIC with GFP_KERNEL for sdev allocation

2019-02-20 Thread Benjamin Block

On Wed, Feb 20, 2019 at 11:11:31AM -0800, Bart Van Assche wrote:
> On Wed, 2019-02-20 at 19:48 +0100, Benjamin Block wrote:
> > We had a test-report where, under memory pressure, adding LUNs to the
> > systems would fail (the tests add LUNs strictly in sequence):
> 
> Hi Benjamin,
> 
> There are two more instances of GFP_ATOMIC in scsi_scan.c. Have you verified
> whether or not it is safe to change these into GFP_KERNEL too?
> 

No, I was lazy, but I can take a look tomorrow and fix them up as well
if they are similar to scsi_alloc_sdev().

-- 
With Best Regards, Benjamin Block  /  Linux on IBM Z Kernel Development
IBM Systems & Technology Group   /  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Matthias Hartmann   /  Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

[PATCH] scsi: replace GFP_ATOMIC with GFP_KERNEL for sdev allocation

2019-02-20 Thread Benjamin Block

We had a test-report where, under memory pressure, adding LUNs to the
systems would fail (the tests add LUNs strictly in sequence):

[ 5525.853432] scsi 0:0:1:1088045124: Direct-Access IBM  2107900
  .148 PQ: 0 ANSI: 5
[ 5525.853826] scsi 0:0:1:1088045124: alua: supports implicit TPGS
[ 5525.853830] scsi 0:0:1:1088045124: alua: device 
naa.6005076303ffd32744da port group 0 rel port 43
[ 5525.853931] sd 0:0:1:1088045124: Attached scsi generic sg10 type 0
[ 5525.854075] sd 0:0:1:1088045124: [sdk] Disabling DIF Type 1 protection
[ 5525.855495] sd 0:0:1:1088045124: [sdk] 2097152 512-byte logical blocks: 
(1.07 GB/1.00 GiB)
[ 5525.855606] sd 0:0:1:1088045124: [sdk] Write Protect is off
[ 5525.855609] sd 0:0:1:1088045124: [sdk] Mode Sense: ed 00 00 08
[ 5525.855795] sd 0:0:1:1088045124: [sdk] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA
[ 5525.857838]  sdk: sdk1
[ 5525.859468] sd 0:0:1:1088045124: [sdk] Attached SCSI disk
[ 5525.865073] sd 0:0:1:1088045124: alua: transition timeout set to 60 seconds
[ 5525.865078] sd 0:0:1:1088045124: alua: port group 00 state A preferred 
supports tolusnA
[ 5526.015070] sd 0:0:1:1088045124: alua: port group 00 state A preferred 
supports tolusnA
[ 5526.015213] sd 0:0:1:1088045124: alua: port group 00 state A preferred 
supports tolusnA
[ 5526.587439] scsi_alloc_sdev: Allocation failure during SCSI scanning, some 
SCSI devices might not be configured
[ 5526.588562] scsi_alloc_sdev: Allocation failure during SCSI scanning, some 
SCSI devices might not be configured

Looking at the code of scsi_alloc_sdev(), and all the calling contexts,
there seems to be no reason to use GFP_ATMOIC here. All the different
call-contexts use a mutex at some point, and nothing in between that
requires no sleeping, as far as I could see. Additionally, the code that
allocates the block queue for the device later (scsi_mq_alloc_queue())
already uses GFP_KERNEL.

So replace it, and give the allocation a bit of a better chance to succeed,
with more ways of reclaim.

Signed-off-by: Benjamin Block 
---
 drivers/scsi/scsi_scan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index dd0d516f65e2..e49e6099b852 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -220,7 +220,7 @@ static struct scsi_device *scsi_alloc_sdev(struct 
scsi_target *starget,
struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
 
sdev = kzalloc(sizeof(*sdev) + shost->transportt->device_size,
-  GFP_ATOMIC);
+  GFP_KERNEL);
if (!sdev)
goto out;
 
-- 
2.20.1

Re: Question on handling managed IRQs when hotplugging CPUs

2019-02-06 Thread Benjamin Block

On Wed, Feb 06, 2019 at 09:21:40AM +, John Garry wrote:
> On 05/02/2019 18:23, Christoph Hellwig wrote:
> > On Tue, Feb 05, 2019 at 03:09:28PM +, John Garry wrote:
> > > For SCSI devices, unfortunately not all IO sent to the HW originates from
> > > blk-mq or any other single entity.
> > 
> > Where else would SCSI I/O originate from?
> 
> Please note that I was referring to other management IO, like SAS SMP, TMFs,
> and other proprietary commands which the driver may generate for the HBA -
> https://marc.info/?l=linux-scsi=154831889001973=2 discusses some of them
> also.
> 

Especially the TMFs send via SCSI EH are a bit of a pain I guess,
because they are entirely managed by the device drivers, but depending
on the device driver they might not even qualify for the problem Hannes
is seeing.

-- 
With Best Regards, Benjamin Block  /  Linux on IBM Z Kernel Development
IBM Systems & Technology Group   /  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Matthias Hartmann   /  Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH] s390: zfcp_aux: remove unnecessary null pointer check before mempool_destroy

2018-09-10 Thread Benjamin Block

On Sat, Sep 08, 2018 at 05:41:45PM +0800, zhong jiang wrote:
> mempool_destroy has taken null pointer check into account. so remove the
> redundant check.
> 
> Signed-off-by: zhong jiang 
> ---
>  drivers/s390/scsi/zfcp_aux.c | 21 +++--
>  1 file changed, 7 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c
> index 94f4d8f..e06c3f2 100644
> --- a/drivers/s390/scsi/zfcp_aux.c
> +++ b/drivers/s390/scsi/zfcp_aux.c
> @@ -248,20 +248,13 @@ static int zfcp_allocate_low_mem_buffers(struct 
> zfcp_adapter *adapter)
>  
>  static void zfcp_free_low_mem_buffers(struct zfcp_adapter *adapter)
>  {
> - if (adapter->pool.erp_req)
> - mempool_destroy(adapter->pool.erp_req);
> - if (adapter->pool.scsi_req)
> - mempool_destroy(adapter->pool.scsi_req);
> - if (adapter->pool.scsi_abort)
> - mempool_destroy(adapter->pool.scsi_abort);
> - if (adapter->pool.qtcb_pool)
> - mempool_destroy(adapter->pool.qtcb_pool);
> - if (adapter->pool.status_read_req)
> - mempool_destroy(adapter->pool.status_read_req);
> - if (adapter->pool.sr_data)
> - mempool_destroy(adapter->pool.sr_data);
> - if (adapter->pool.gid_pn)
> - mempool_destroy(adapter->pool.gid_pn);
> + mempool_destroy(adapter->pool.erp_req);
> + mempool_destroy(adapter->pool.scsi_req);
> + mempool_destroy(adapter->pool.scsi_abort);
> + mempool_destroy(adapter->pool.qtcb_pool);
> + mempool_destroy(adapter->pool.status_read_req);
> + mempool_destroy(adapter->pool.sr_data);
> + mempool_destroy(adapter->pool.gid_pn);
>  }
>  

Acked-by: Benjamin Block 

Thank you. We'll send it along when we next send patches upstream.

--
With Best Regards, Benjamin Block  /  Linux on IBM Z Kernel Development
IBM Systems & Technology Group   /  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz   /  Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH] s390: zfcp_aux: remove unnecessary null pointer check before mempool_destroy

2018-09-10 Thread Benjamin Block

On Sat, Sep 08, 2018 at 05:41:45PM +0800, zhong jiang wrote:
> mempool_destroy has taken null pointer check into account. so remove the
> redundant check.
> 
> Signed-off-by: zhong jiang 
> ---
>  drivers/s390/scsi/zfcp_aux.c | 21 +++--
>  1 file changed, 7 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c
> index 94f4d8f..e06c3f2 100644
> --- a/drivers/s390/scsi/zfcp_aux.c
> +++ b/drivers/s390/scsi/zfcp_aux.c
> @@ -248,20 +248,13 @@ static int zfcp_allocate_low_mem_buffers(struct 
> zfcp_adapter *adapter)
>  
>  static void zfcp_free_low_mem_buffers(struct zfcp_adapter *adapter)
>  {
> - if (adapter->pool.erp_req)
> - mempool_destroy(adapter->pool.erp_req);
> - if (adapter->pool.scsi_req)
> - mempool_destroy(adapter->pool.scsi_req);
> - if (adapter->pool.scsi_abort)
> - mempool_destroy(adapter->pool.scsi_abort);
> - if (adapter->pool.qtcb_pool)
> - mempool_destroy(adapter->pool.qtcb_pool);
> - if (adapter->pool.status_read_req)
> - mempool_destroy(adapter->pool.status_read_req);
> - if (adapter->pool.sr_data)
> - mempool_destroy(adapter->pool.sr_data);
> - if (adapter->pool.gid_pn)
> - mempool_destroy(adapter->pool.gid_pn);
> + mempool_destroy(adapter->pool.erp_req);
> + mempool_destroy(adapter->pool.scsi_req);
> + mempool_destroy(adapter->pool.scsi_abort);
> + mempool_destroy(adapter->pool.qtcb_pool);
> + mempool_destroy(adapter->pool.status_read_req);
> + mempool_destroy(adapter->pool.sr_data);
> + mempool_destroy(adapter->pool.gid_pn);
>  }
>  

Acked-by: Benjamin Block 

Thank you. We'll send it along when we next send patches upstream.

--
With Best Regards, Benjamin Block  /  Linux on IBM Z Kernel Development
IBM Systems & Technology Group   /  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz   /  Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer

2017-08-24 Thread Benjamin Block

On Thu, Aug 24, 2017 at 10:45:56AM +0200, Christoph Hellwig wrote:
> >  /**
> > - * bsg_destroy_job - routine to teardown/delete a bsg job
> > + * bsg_teardown_job - routine to teardown a bsg job
> >   * @job: bsg_job that is to be torn down
> >   */
> > -static void bsg_destroy_job(struct kref *kref)
> > +static void bsg_teardown_job(struct kref *kref)
> 
> Why this rename?  The destroy name seems to be one of the most
> common patterns for the kref_put callbacks.
>

Hmm, I did it mostly so it is symmetric with bsg_prepare_job() and it
doesn't really itself destroy the job-struct anymore. If there are other
thing amiss I can change that along with them, if it bothers poeple.


Beste Grüße / Best regards,
      - Benjamin Block

> 
> Otherwise this looks fine:
> 
> Reviewed-by: Christoph Hellwig <h...@lst.de>
> 

-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer

2017-08-24 Thread Benjamin Block

On Thu, Aug 24, 2017 at 10:45:56AM +0200, Christoph Hellwig wrote:
> >  /**
> > - * bsg_destroy_job - routine to teardown/delete a bsg job
> > + * bsg_teardown_job - routine to teardown a bsg job
> >   * @job: bsg_job that is to be torn down
> >   */
> > -static void bsg_destroy_job(struct kref *kref)
> > +static void bsg_teardown_job(struct kref *kref)
> 
> Why this rename?  The destroy name seems to be one of the most
> common patterns for the kref_put callbacks.
>

Hmm, I did it mostly so it is symmetric with bsg_prepare_job() and it
doesn't really itself destroy the job-struct anymore. If there are other
thing amiss I can change that along with them, if it bothers poeple.


Beste Grüße / Best regards,
      - Benjamin Block

> 
> Otherwise this looks fine:
> 
> Reviewed-by: Christoph Hellwig 
> 

-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

[PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer

2017-08-23 Thread Benjamin Block

Since we split the scsi_request out of struct request bsg fails to
provide a reply-buffer for the drivers. This was done via the pointer
for sense-data, that is not preallocated anymore.

Failing to allocate/assign it results in illegal dereferences because
LLDs use this pointer unquestioned.

An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:01590007 R3:0024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: 
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 65cb0100 task.stack: 65cb4000
Krnl PSW : 0704e0018000 03ff801e4156 
(zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0001 5fa9d0d0 5fa9d078 00e16866
   03ff0290 6b6b6b6b6b6b6b6b 59f78f00 000f
   593a0958 593a0958 60d88800 5ddd4c38
   58b50100 0700659cba08 03ff801e8556 659cb9a8
Krnl Code: 03ff801e4146: e3102054lg  %r1,80(%r2)
   03ff801e414c: 58402040   l   %r4,64(%r2)
  #03ff801e4150: e3502024   lg  %r5,32(%r2)
  >03ff801e4156: 50405004   st  %r4,4(%r5)
   03ff801e415a: e54c5008   mvhi8(%r5),0
   03ff801e4160: e33010280012   lt  %r3,40(%r1)
   03ff801e4166: a718fffb   lhi %r1,-5
   03ff801e416a: 1803   lr  %r0,%r3
Call Trace:
([<03ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
 [<03ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
 [<03ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
 [<009b91b6>] qdio_kick_handler+0x2ae/0x2c8
 [<009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
 [<001684c2>] tasklet_action+0x15a/0x1d8
 [<00bd28ec>] __do_softirq+0x3ec/0x848
 [<001675a4>] irq_exit+0x74/0xf8
 [<0010dd6a>] do_IRQ+0xba/0xf0
 [<00bd19e8>] io_int_handler+0x104/0x2d4
 [<001033b6>] enabled_wait+0xb6/0x188
([<0010339e>] enabled_wait+0x9e/0x188)
 [<0010396a>] arch_cpu_idle+0x32/0x50
 [<00bd0112>] default_idle_call+0x52/0x68
 [<001cd0fa>] do_idle+0x102/0x188
 [<001cd41e>] cpu_startup_entry+0x3e/0x48
 [<00118c64>] smp_start_secondary+0x11c/0x130
 [<00bd2016>] restart_int_handler+0x62/0x78
 [<>]   (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
 [<03ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]

Kernel panic - not syncing: Fatal exception in interrupt

This patch moves bsg-lib to allocate and setup struct bsg_job ahead of
time, including the allocation of a buffer for the reply-data.

This means, struct bsg_job is not allocated separately anymore, but as part
of struct request allocation - similar to struct scsi_cmd. Reflect this in
the function names that used to handle creation/destruction of struct
bsg_job.

Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Suggested-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Cc: <sta...@vger.kernel.org> #4.11+
---
 block/bsg-lib.c | 74 +
 include/linux/blkdev.h  |  1 -
 include/linux/bsg-lib.h |  2 ++
 3 files changed, 46 insertions(+), 31 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c4513b23f57a..dd56d7460cb9 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -29,26 +29,25 @@
 #include 
 
 /**
- * bsg_destroy_job - routine to teardown/delete a bsg job
+ * bsg_teardown_job - routine to teardown a bsg job
  * @job: bsg_job that is to be torn down
  */
-static void bsg_destroy_job(struct kref *kref)
+static void bsg_teardown_job(struct kref *kref)
 {
struct bsg_job *job = container_of(kref, struct bsg_job, kref);
struct request *rq = job->req;
 
-   blk_end_request_all(rq, BLK_STS_OK);
-
put_device(job->dev);   /* release reference for the request */
 
kfree(job->request_payload.sg_list);
kfree(job->reply_payload.sg_list);
-   kfree(job);
+
+   blk_end_request_all(rq, BLK_STS_OK);
 }
 
 void bsg_job_put(struct bsg_job *job)
 {
-   kref_put(>kref, bsg_destroy_job);
+   kref_p

[PATCH v2 1/1] bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer

2017-08-23 Thread Benjamin Block

Since we split the scsi_request out of struct request bsg fails to
provide a reply-buffer for the drivers. This was done via the pointer
for sense-data, that is not preallocated anymore.

Failing to allocate/assign it results in illegal dereferences because
LLDs use this pointer unquestioned.

An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:01590007 R3:0024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: 
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 65cb0100 task.stack: 65cb4000
Krnl PSW : 0704e0018000 03ff801e4156 
(zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0001 5fa9d0d0 5fa9d078 00e16866
   03ff0290 6b6b6b6b6b6b6b6b 59f78f00 000f
   593a0958 593a0958 60d88800 5ddd4c38
   58b50100 0700659cba08 03ff801e8556 659cb9a8
Krnl Code: 03ff801e4146: e3102054lg  %r1,80(%r2)
   03ff801e414c: 58402040   l   %r4,64(%r2)
  #03ff801e4150: e3502024   lg  %r5,32(%r2)
  >03ff801e4156: 50405004   st  %r4,4(%r5)
   03ff801e415a: e54c5008   mvhi8(%r5),0
   03ff801e4160: e33010280012   lt  %r3,40(%r1)
   03ff801e4166: a718fffb   lhi %r1,-5
   03ff801e416a: 1803   lr  %r0,%r3
Call Trace:
([<03ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
 [<03ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
 [<03ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
 [<009b91b6>] qdio_kick_handler+0x2ae/0x2c8
 [<009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
 [<001684c2>] tasklet_action+0x15a/0x1d8
 [<00bd28ec>] __do_softirq+0x3ec/0x848
 [<001675a4>] irq_exit+0x74/0xf8
 [<0010dd6a>] do_IRQ+0xba/0xf0
 [<00bd19e8>] io_int_handler+0x104/0x2d4
 [<001033b6>] enabled_wait+0xb6/0x188
([<0010339e>] enabled_wait+0x9e/0x188)
 [<0010396a>] arch_cpu_idle+0x32/0x50
 [<00bd0112>] default_idle_call+0x52/0x68
 [<001cd0fa>] do_idle+0x102/0x188
 [<001cd41e>] cpu_startup_entry+0x3e/0x48
 [<00118c64>] smp_start_secondary+0x11c/0x130
 [<00bd2016>] restart_int_handler+0x62/0x78
 [<>]   (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
 [<03ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]

Kernel panic - not syncing: Fatal exception in interrupt

This patch moves bsg-lib to allocate and setup struct bsg_job ahead of
time, including the allocation of a buffer for the reply-data.

This means, struct bsg_job is not allocated separately anymore, but as part
of struct request allocation - similar to struct scsi_cmd. Reflect this in
the function names that used to handle creation/destruction of struct
bsg_job.

Reported-by: Steffen Maier 
Suggested-by: Christoph Hellwig 
Signed-off-by: Benjamin Block 
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Cc:  #4.11+
---
 block/bsg-lib.c | 74 +
 include/linux/blkdev.h  |  1 -
 include/linux/bsg-lib.h |  2 ++
 3 files changed, 46 insertions(+), 31 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c4513b23f57a..dd56d7460cb9 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -29,26 +29,25 @@
 #include 
 
 /**
- * bsg_destroy_job - routine to teardown/delete a bsg job
+ * bsg_teardown_job - routine to teardown a bsg job
  * @job: bsg_job that is to be torn down
  */
-static void bsg_destroy_job(struct kref *kref)
+static void bsg_teardown_job(struct kref *kref)
 {
struct bsg_job *job = container_of(kref, struct bsg_job, kref);
struct request *rq = job->req;
 
-   blk_end_request_all(rq, BLK_STS_OK);
-
put_device(job->dev);   /* release reference for the request */
 
kfree(job->request_payload.sg_list);
kfree(job->reply_payload.sg_list);
-   kfree(job);
+
+   blk_end_request_all(rq, BLK_STS_OK);
 }
 
 void bsg_job_put(struct bsg_job *job)
 {
-   kref_put(>kref, bsg_destroy_job);
+   kref_put(>kref, bsg_teardown_job);
 }
 EXPORT_SYMBOL_GPL(bsg_job_put);
 
@@ -100,7 +99,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);

[PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG

2017-08-23 Thread Benjamin Block

Hello all,

This is the second try for fixing the regression in the BSG-interface that
exists since v4.11 (for more infos see the first series).

I separated my other changes from the bug-fix so that it is easier to apply
if judged good. I will rebase my cleanups I sent in v1 and send them when I
get a bit more time. But the regression-fix is more important, so here's
that.

I did some more tests on it than on v1, including some heavy parallel I/O
on the same blk-queue using both BSG and the normal SCSI-stack at the same
time (throwing some intentional bad commands in it too). That seemed to
work all well enough - i.e. it didn't crash and got the expected results. I
haven't done any external error-inject, but IMO that would be beyond the
scope right now.

The fix is based on Christoph's idea, I discussed this with him off-list
already.

I rebased the series on Jens' for-next.

Reviews are more than welcome :)


Beste Grüße / Best regards,
  - Benjamin Block

Benjamin Block (1):
  bsg-lib: fix kernel panic resulting from missing allocation of
reply-buffer

 block/bsg-lib.c | 74 +
 include/linux/blkdev.h  |  1 -
 include/linux/bsg-lib.h |  2 ++
 3 files changed, 46 insertions(+), 31 deletions(-)

-- 
Linux on z Systems Development / IBM Systems & Technology Group
   IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

[PATCH v2 0/1] bsg: fix regression resulting in panics when sending commands via BSG

2017-08-23 Thread Benjamin Block

Hello all,

This is the second try for fixing the regression in the BSG-interface that
exists since v4.11 (for more infos see the first series).

I separated my other changes from the bug-fix so that it is easier to apply
if judged good. I will rebase my cleanups I sent in v1 and send them when I
get a bit more time. But the regression-fix is more important, so here's
that.

I did some more tests on it than on v1, including some heavy parallel I/O
on the same blk-queue using both BSG and the normal SCSI-stack at the same
time (throwing some intentional bad commands in it too). That seemed to
work all well enough - i.e. it didn't crash and got the expected results. I
haven't done any external error-inject, but IMO that would be beyond the
scope right now.

The fix is based on Christoph's idea, I discussed this with him off-list
already.

I rebased the series on Jens' for-next.

Reviews are more than welcome :)


Beste Grüße / Best regards,
  - Benjamin Block

Benjamin Block (1):
  bsg-lib: fix kernel panic resulting from missing allocation of
reply-buffer

 block/bsg-lib.c | 74 +
 include/linux/blkdev.h  |  1 -
 include/linux/bsg-lib.h |  2 ++
 3 files changed, 46 insertions(+), 31 deletions(-)

-- 
Linux on z Systems Development / IBM Systems & Technology Group
   IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-14 Thread Benjamin Block

On Sun, Aug 13, 2017 at 04:39:40PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 11, 2017 at 06:01:42PM +0200, Benjamin Block wrote:
> > When the BSG interface is used with bsg-lib, and the user sends a
> > Bidirectional command - so when he gives an input- and output-buffer
> > (most users of our interface will likely do that, if they wanna get the
> > transport-level response data) - bsg will allocate two requests from the
> > queue. The first request's bio is used to map the input and the second
> > request's bio for the output (see bsg_map_hdr() in the if-statement with
> > (op == REQ_OP_SCSI_OUT && hdr->din_xfer_len)).
> > 
> > When we now allocate the full space of bsg_job, sense, dd_data for each
> > request, these will be wasted on the (linked) second request. They will
> > go unused all the time, as only the first request's bsg_job, sense and
> > dd_data is used by the LLDs and BSG itself.
> > 
> > Right now, because we don't allocate this on each request, those spaces
> > are only allocated for the first request in bsg-lib.
> > 
> > Maybe we can ignore this, if it gets to complicated, I don't wanne
> > prolong this unnecessary.
> 
> We have the same 'issue' with bidirection scsi commands - it's a side
> effect of having two request structures for these commands, and the
> only real fix would be to have a single request structure, which would
> be nice especially if we can't do it without growing struct request.
> 

Alright, I was not aware of that. That is fair then. Thx.


    Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-14 Thread Benjamin Block

On Sun, Aug 13, 2017 at 04:39:40PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 11, 2017 at 06:01:42PM +0200, Benjamin Block wrote:
> > When the BSG interface is used with bsg-lib, and the user sends a
> > Bidirectional command - so when he gives an input- and output-buffer
> > (most users of our interface will likely do that, if they wanna get the
> > transport-level response data) - bsg will allocate two requests from the
> > queue. The first request's bio is used to map the input and the second
> > request's bio for the output (see bsg_map_hdr() in the if-statement with
> > (op == REQ_OP_SCSI_OUT && hdr->din_xfer_len)).
> > 
> > When we now allocate the full space of bsg_job, sense, dd_data for each
> > request, these will be wasted on the (linked) second request. They will
> > go unused all the time, as only the first request's bsg_job, sense and
> > dd_data is used by the LLDs and BSG itself.
> > 
> > Right now, because we don't allocate this on each request, those spaces
> > are only allocated for the first request in bsg-lib.
> > 
> > Maybe we can ignore this, if it gets to complicated, I don't wanne
> > prolong this unnecessary.
> 
> We have the same 'issue' with bidirection scsi commands - it's a side
> effect of having two request structures for these commands, and the
> only real fix would be to have a single request structure, which would
> be nice especially if we can't do it without growing struct request.
> 

Alright, I was not aware of that. That is fair then. Thx.


    Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-14 Thread Benjamin Block

Hey Christoph,

I looked over the patch in detail, see below.

> From f5b03b82df0569c035022c1c2535696186907f1a Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <h...@lst.de>
> Date: Fri, 11 Aug 2017 11:03:29 +0200
> Subject: bsg-lib: allocate sense data for each request
>
> Since we split the scsi_request out of the request the driver is supposed
> to provide storage for the sense buffer.  The bsg-lib code failed to do so,
> though and will crash anytime it is used.
>
> This patch moves bsg-lib to allocate and setup the bsg_job ahead of time,
> and allocate the sense data, which is used as reply buffer in bsg.
>
> Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
> Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
> Cc: <sta...@vger.kernel.org> #4.11+
> ---
>  block/bsg-lib.c | 51 
> +++--
>  include/linux/blkdev.h  |  1 -
>  include/linux/bsg-lib.h |  2 ++
>  3 files changed, 30 insertions(+), 24 deletions(-)
>
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index c4513b23f57a..c07333c3b785 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -37,13 +37,11 @@ static void bsg_destroy_job(struct kref *kref)
>   struct bsg_job *job = container_of(kref, struct bsg_job, kref);
>   struct request *rq = job->req;
>
> - blk_end_request_all(rq, BLK_STS_OK);
> -
>   put_device(job->dev);   /* release reference for the request */
>
>   kfree(job->request_payload.sg_list);
>   kfree(job->reply_payload.sg_list);
> - kfree(job);
> + blk_end_request_all(rq, BLK_STS_OK);

What is the reason for moving that last line? Just wondering whether
that might change the behavior somehow, although it doesn't look like it
from the code.

>  }
>
>  void bsg_job_put(struct bsg_job *job)
> @@ -100,7 +98,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
>   */
>  static void bsg_softirq_done(struct request *rq)
>  {
> - struct bsg_job *job = rq->special;
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
>
>   bsg_job_put(job);
>  }
> @@ -129,24 +127,11 @@ static int bsg_map_buffer(struct bsg_buffer *buf, 
> struct request *req)
>  static int bsg_create_job(struct device *dev, struct request *req)
>  {
>   struct request *rsp = req->next_rq;
> - struct request_queue *q = req->q;
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   struct scsi_request *rq = scsi_req(req);
> - struct bsg_job *job;
>   int ret;
>
> - BUG_ON(req->special);
> -
> - job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - req->special = job;
> - job->req = req;
> - if (q->bsg_job_size)
> - job->dd_data = (void *)[1];
> - job->request = rq->cmd;
>   job->request_len = rq->cmd_len;
> - job->reply = rq->sense;
>   job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
>* allocated */
>   if (req->bio) {
> @@ -187,7 +172,6 @@ static void bsg_request_fn(struct request_queue *q)
>  {
>   struct device *dev = q->queuedata;
>   struct request *req;
> - struct bsg_job *job;
>   int ret;
>
>   if (!get_device(dev))
> @@ -207,8 +191,7 @@ static void bsg_request_fn(struct request_queue *q)
>   continue;
>   }
>
> - job = req->special;
> - ret = q->bsg_job_fn(job);
> + ret = q->bsg_job_fn(blk_mq_rq_to_pdu(req));
>   spin_lock_irq(q->queue_lock);
>   if (ret)
>   break;
> @@ -219,6 +202,27 @@ static void bsg_request_fn(struct request_queue *q)
>   spin_lock_irq(q->queue_lock);
>  }
>
> +static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t 
> gfp)
> +{
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
> +
> + memset(job, 0, sizeof(*job));
> + job->req = req;
> + job->request = job->sreq.cmd;

That doesn't work with bsg.c if the request submitted by the user is
bigger than BLK_MAX_CDB. There is code in blk_fill_sgv4_hdr_rq() that
will reassign the req->cmd point in that case to something else.

This is maybe wrong in the same vein as my Patch 1 is. If not for the
legacy code in bsg.c, setting this here, will miss changes to that
pointer between request-allocation and job-submission.

So maybe just move this to bsg_create_job().

> + job->dd_data = job + 1;
> + job

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-14 Thread Benjamin Block

Hey Christoph,

I looked over the patch in detail, see below.

> From f5b03b82df0569c035022c1c2535696186907f1a Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Fri, 11 Aug 2017 11:03:29 +0200
> Subject: bsg-lib: allocate sense data for each request
>
> Since we split the scsi_request out of the request the driver is supposed
> to provide storage for the sense buffer.  The bsg-lib code failed to do so,
> though and will crash anytime it is used.
>
> This patch moves bsg-lib to allocate and setup the bsg_job ahead of time,
> and allocate the sense data, which is used as reply buffer in bsg.
>
> Reported-by: Steffen Maier 
> Signed-off-by: Benjamin Block 
> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
> Cc:  #4.11+
> ---
>  block/bsg-lib.c | 51 
> +++--
>  include/linux/blkdev.h  |  1 -
>  include/linux/bsg-lib.h |  2 ++
>  3 files changed, 30 insertions(+), 24 deletions(-)
>
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index c4513b23f57a..c07333c3b785 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -37,13 +37,11 @@ static void bsg_destroy_job(struct kref *kref)
>   struct bsg_job *job = container_of(kref, struct bsg_job, kref);
>   struct request *rq = job->req;
>
> - blk_end_request_all(rq, BLK_STS_OK);
> -
>   put_device(job->dev);   /* release reference for the request */
>
>   kfree(job->request_payload.sg_list);
>   kfree(job->reply_payload.sg_list);
> - kfree(job);
> + blk_end_request_all(rq, BLK_STS_OK);

What is the reason for moving that last line? Just wondering whether
that might change the behavior somehow, although it doesn't look like it
from the code.

>  }
>
>  void bsg_job_put(struct bsg_job *job)
> @@ -100,7 +98,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
>   */
>  static void bsg_softirq_done(struct request *rq)
>  {
> - struct bsg_job *job = rq->special;
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
>
>   bsg_job_put(job);
>  }
> @@ -129,24 +127,11 @@ static int bsg_map_buffer(struct bsg_buffer *buf, 
> struct request *req)
>  static int bsg_create_job(struct device *dev, struct request *req)
>  {
>   struct request *rsp = req->next_rq;
> - struct request_queue *q = req->q;
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   struct scsi_request *rq = scsi_req(req);
> - struct bsg_job *job;
>   int ret;
>
> - BUG_ON(req->special);
> -
> - job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - req->special = job;
> - job->req = req;
> - if (q->bsg_job_size)
> - job->dd_data = (void *)[1];
> - job->request = rq->cmd;
>   job->request_len = rq->cmd_len;
> - job->reply = rq->sense;
>   job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
>* allocated */
>   if (req->bio) {
> @@ -187,7 +172,6 @@ static void bsg_request_fn(struct request_queue *q)
>  {
>   struct device *dev = q->queuedata;
>   struct request *req;
> - struct bsg_job *job;
>   int ret;
>
>   if (!get_device(dev))
> @@ -207,8 +191,7 @@ static void bsg_request_fn(struct request_queue *q)
>   continue;
>   }
>
> - job = req->special;
> - ret = q->bsg_job_fn(job);
> + ret = q->bsg_job_fn(blk_mq_rq_to_pdu(req));
>   spin_lock_irq(q->queue_lock);
>   if (ret)
>   break;
> @@ -219,6 +202,27 @@ static void bsg_request_fn(struct request_queue *q)
>   spin_lock_irq(q->queue_lock);
>  }
>
> +static int bsg_init_rq(struct request_queue *q, struct request *req, gfp_t 
> gfp)
> +{
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
> +
> + memset(job, 0, sizeof(*job));
> + job->req = req;
> + job->request = job->sreq.cmd;

That doesn't work with bsg.c if the request submitted by the user is
bigger than BLK_MAX_CDB. There is code in blk_fill_sgv4_hdr_rq() that
will reassign the req->cmd point in that case to something else.

This is maybe wrong in the same vein as my Patch 1 is. If not for the
legacy code in bsg.c, setting this here, will miss changes to that
pointer between request-allocation and job-submission.

So maybe just move this to bsg_create_job().

> + job->dd_data = job + 1;
> + job->reply = job->sreq.sense = kzalloc(job->reply_len, gfp);

job->reply_len will be 0 here, won't it? You pro

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-11 Thread Benjamin Block

On Fri, Aug 11, 2017 at 05:35:53PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 11, 2017 at 05:32:03PM +0200, Benjamin Block wrote:
> > So when the bsg interface is used with something different than the
> > bsg-lib request queue?
> 
> Yes.
> 
> > I haven't actually thought about that (presuming
> > the bsg-lib queue was the only one being used). Fair enough, I haven't
> > completely read that code now, but that seems bad then, to reassign a
> > space allocated in someone else's request queue. 
> > 
> > That still leaves open that we now over-allocate space in bsg-lib, or?
> 
> Which space do we over-allocate?

When the BSG interface is used with bsg-lib, and the user sends a
Bidirectional command - so when he gives an input- and output-buffer
(most users of our interface will likely do that, if they wanna get the
transport-level response data) - bsg will allocate two requests from the
queue. The first request's bio is used to map the input and the second
request's bio for the output (see bsg_map_hdr() in the if-statement with
(op == REQ_OP_SCSI_OUT && hdr->din_xfer_len)).

When we now allocate the full space of bsg_job, sense, dd_data for each
request, these will be wasted on the (linked) second request. They will
go unused all the time, as only the first request's bsg_job, sense and
dd_data is used by the LLDs and BSG itself.

Right now, because we don't allocate this on each request, those spaces
are only allocated for the first request in bsg-lib.

Maybe we can ignore this, if it gets to complicated, I don't wanne
prolong this unnecessary.

> 
> > My diff tells that this was the same patch as before.
> 
> Next try:

I will have a look when I am back in the office next week.


Beste Grüße / Best regards,
  - Benjamin Block
> 
> ---
> From f5b03b82df0569c035022c1c2535696186907f1a Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <h...@lst.de>
> Date: Fri, 11 Aug 2017 11:03:29 +0200
> Subject: bsg-lib: allocate sense data for each request
> 
> Since we split the scsi_request out of the request the driver is supposed
> to provide storage for the sense buffer.  The bsg-lib code failed to do so,
> though and will crash anytime it is used.
> 
> This patch moves bsg-lib to allocate and setup the bsg_job ahead of time,
> and allocate the sense data, which is used as reply buffer in bsg.
> 
> Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
> Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
> Cc: <sta...@vger.kernel.org> #4.11+
> ---
>  block/bsg-lib.c | 51 
> +++--
>  include/linux/blkdev.h  |  1 -
>  include/linux/bsg-lib.h |  2 ++
>  3 files changed, 30 insertions(+), 24 deletions(-)
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index c4513b23f57a..c07333c3b785 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -37,13 +37,11 @@ static void bsg_destroy_job(struct kref *kref)
>   struct bsg_job *job = container_of(kref, struct bsg_job, kref);
>   struct request *rq = job->req;
> 
> - blk_end_request_all(rq, BLK_STS_OK);
> -
>   put_device(job->dev);   /* release reference for the request */
> 
>   kfree(job->request_payload.sg_list);
>   kfree(job->reply_payload.sg_list);
> - kfree(job);
> + blk_end_request_all(rq, BLK_STS_OK);
>  }
> 
>  void bsg_job_put(struct bsg_job *job)
> @@ -100,7 +98,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
>   */
>  static void bsg_softirq_done(struct request *rq)
>  {
> - struct bsg_job *job = rq->special;
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> 
>   bsg_job_put(job);
>  }
> @@ -129,24 +127,11 @@ static int bsg_map_buffer(struct bsg_buffer *buf, 
> struct request *req)
>  static int bsg_create_job(struct device *dev, struct request *req)
>  {
>   struct request *rsp = req->next_rq;
> - struct request_queue *q = req->q;
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   struct scsi_request *rq = scsi_req(req);
> - struct bsg_job *job;
>   int ret;
> 
> - BUG_ON(req->special);
> -
> - job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - req->special = job;
> - job->req = req;
> - if (q->bsg_job_size)
> - job->dd_data = (void *)[1];
> - job->request = rq->cmd;
>   job->request_len = rq->cmd_len;
> - job->reply = rq->sense;
>

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-11 Thread Benjamin Block

On Fri, Aug 11, 2017 at 05:35:53PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 11, 2017 at 05:32:03PM +0200, Benjamin Block wrote:
> > So when the bsg interface is used with something different than the
> > bsg-lib request queue?
> 
> Yes.
> 
> > I haven't actually thought about that (presuming
> > the bsg-lib queue was the only one being used). Fair enough, I haven't
> > completely read that code now, but that seems bad then, to reassign a
> > space allocated in someone else's request queue. 
> > 
> > That still leaves open that we now over-allocate space in bsg-lib, or?
> 
> Which space do we over-allocate?

When the BSG interface is used with bsg-lib, and the user sends a
Bidirectional command - so when he gives an input- and output-buffer
(most users of our interface will likely do that, if they wanna get the
transport-level response data) - bsg will allocate two requests from the
queue. The first request's bio is used to map the input and the second
request's bio for the output (see bsg_map_hdr() in the if-statement with
(op == REQ_OP_SCSI_OUT && hdr->din_xfer_len)).

When we now allocate the full space of bsg_job, sense, dd_data for each
request, these will be wasted on the (linked) second request. They will
go unused all the time, as only the first request's bsg_job, sense and
dd_data is used by the LLDs and BSG itself.

Right now, because we don't allocate this on each request, those spaces
are only allocated for the first request in bsg-lib.

Maybe we can ignore this, if it gets to complicated, I don't wanne
prolong this unnecessary.

> 
> > My diff tells that this was the same patch as before.
> 
> Next try:

I will have a look when I am back in the office next week.


Beste Grüße / Best regards,
  - Benjamin Block
> 
> ---
> From f5b03b82df0569c035022c1c2535696186907f1a Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Fri, 11 Aug 2017 11:03:29 +0200
> Subject: bsg-lib: allocate sense data for each request
> 
> Since we split the scsi_request out of the request the driver is supposed
> to provide storage for the sense buffer.  The bsg-lib code failed to do so,
> though and will crash anytime it is used.
> 
> This patch moves bsg-lib to allocate and setup the bsg_job ahead of time,
> and allocate the sense data, which is used as reply buffer in bsg.
> 
> Reported-by: Steffen Maier 
> Signed-off-by: Benjamin Block 
> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
> Cc:  #4.11+
> ---
>  block/bsg-lib.c | 51 
> +++--
>  include/linux/blkdev.h  |  1 -
>  include/linux/bsg-lib.h |  2 ++
>  3 files changed, 30 insertions(+), 24 deletions(-)
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index c4513b23f57a..c07333c3b785 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -37,13 +37,11 @@ static void bsg_destroy_job(struct kref *kref)
>   struct bsg_job *job = container_of(kref, struct bsg_job, kref);
>   struct request *rq = job->req;
> 
> - blk_end_request_all(rq, BLK_STS_OK);
> -
>   put_device(job->dev);   /* release reference for the request */
> 
>   kfree(job->request_payload.sg_list);
>   kfree(job->reply_payload.sg_list);
> - kfree(job);
> + blk_end_request_all(rq, BLK_STS_OK);
>  }
> 
>  void bsg_job_put(struct bsg_job *job)
> @@ -100,7 +98,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
>   */
>  static void bsg_softirq_done(struct request *rq)
>  {
> - struct bsg_job *job = rq->special;
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> 
>   bsg_job_put(job);
>  }
> @@ -129,24 +127,11 @@ static int bsg_map_buffer(struct bsg_buffer *buf, 
> struct request *req)
>  static int bsg_create_job(struct device *dev, struct request *req)
>  {
>   struct request *rsp = req->next_rq;
> - struct request_queue *q = req->q;
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   struct scsi_request *rq = scsi_req(req);
> - struct bsg_job *job;
>   int ret;
> 
> - BUG_ON(req->special);
> -
> - job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - req->special = job;
> - job->req = req;
> - if (q->bsg_job_size)
> - job->dd_data = (void *)[1];
> - job->request = rq->cmd;
>   job->request_len = rq->cmd_len;
> - job->reply = rq->sense;
>   job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
>

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-11 Thread Benjamin Block

On Fri, Aug 11, 2017 at 04:36:49PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 11, 2017 at 03:49:29PM +0200, Benjamin Block wrote:
> > On Fri, Aug 11, 2017 at 11:14:15AM +0200, Christoph Hellwig wrote:
> > > But patch 1 still creates an additional copy of the sense data for
> > > all bsg users.
> > >
> > 
> > Huh? What additional copy? There is one reply-buffer and that is copied
> > into the user-buffer should it contain valid data. Just like in your
> > patch, neither you, nor me touches any of the copy-code. There is also
> > no changes to how the driver get their data into that buffer, it will
> > still be copied in both cases.
> 
> You're right - I misread your patch.  But that does make it worse as
> this means that with your patch we re-assign the scsi_request.sense
> pointer when using bsg.  That will lead to crashes if using the bsg
> code against e.g. a normal scsi device using bsg when that request
> later gets reused for something that is not bsg.
>

So when the bsg interface is used with something different than the
bsg-lib request queue? I haven't actually thought about that (presuming
the bsg-lib queue was the only one being used). Fair enough, I haven't
completely read that code now, but that seems bad then, to reassign a
space allocated in someone else's request queue. 

That still leaves open that we now over-allocate space in bsg-lib, or?

> 
> > 
> > > 
> > > Can you test the patch below which implements my suggestion?  Your
> > > other patches should still apply fine on top modulo minor context
> > > changes.
> > 
> > Only your patch on top of 4.13-rc4. din_xferp (D) is also empty, which is
> > not taken from the sense-buffer.
> 
> Can't parse this.
> 
> > =
> > BUG kmalloc-1024 (Not tainted): Invalid object pointer 0x4ad9e0f0
> > -
> 
> Oops - if we don't allocate the job separately we should not free it either.
> Updated patch for that below:
>

My diff tells that this was the same patch as before.


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-11 Thread Benjamin Block

On Fri, Aug 11, 2017 at 04:36:49PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 11, 2017 at 03:49:29PM +0200, Benjamin Block wrote:
> > On Fri, Aug 11, 2017 at 11:14:15AM +0200, Christoph Hellwig wrote:
> > > But patch 1 still creates an additional copy of the sense data for
> > > all bsg users.
> > >
> > 
> > Huh? What additional copy? There is one reply-buffer and that is copied
> > into the user-buffer should it contain valid data. Just like in your
> > patch, neither you, nor me touches any of the copy-code. There is also
> > no changes to how the driver get their data into that buffer, it will
> > still be copied in both cases.
> 
> You're right - I misread your patch.  But that does make it worse as
> this means that with your patch we re-assign the scsi_request.sense
> pointer when using bsg.  That will lead to crashes if using the bsg
> code against e.g. a normal scsi device using bsg when that request
> later gets reused for something that is not bsg.
>

So when the bsg interface is used with something different than the
bsg-lib request queue? I haven't actually thought about that (presuming
the bsg-lib queue was the only one being used). Fair enough, I haven't
completely read that code now, but that seems bad then, to reassign a
space allocated in someone else's request queue. 

That still leaves open that we now over-allocate space in bsg-lib, or?

> 
> > 
> > > 
> > > Can you test the patch below which implements my suggestion?  Your
> > > other patches should still apply fine on top modulo minor context
> > > changes.
> > 
> > Only your patch on top of 4.13-rc4. din_xferp (D) is also empty, which is
> > not taken from the sense-buffer.
> 
> Can't parse this.
> 
> > =
> > BUG kmalloc-1024 (Not tainted): Invalid object pointer 0x4ad9e0f0
> > -
> 
> Oops - if we don't allocate the job separately we should not free it either.
> Updated patch for that below:
>

My diff tells that this was the same patch as before.


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-11 Thread Benjamin Block

d: GB   
4.13.0-rc4-bsg-regression+ #2
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Call Trace:
([<00117532>] show_stack+0x8a/0xe0)
 [<00bcbaee>] dump_stack+0x96/0xd8
 [<003cd5fc>] slab_err+0xac/0xc0
 [<003d68e4>] free_debug_processing+0x554/0x570
 [<003d69ae>] __slab_free+0xae/0x618
 [<003d7dce>] kfree+0x44e/0x4a0
 [<00859b4e>] blk_done_softirq+0x146/0x160
 [<00bf4ec0>] __do_softirq+0x3d0/0x840
 [<001662a6>] run_ksoftirqd+0x3e/0xb8
 [<001957fc>] smpboot_thread_fn+0x2f4/0x318
 [<0018f6f6>] kthread+0x166/0x178
 [<00bf3cf2>] kernel_thread_starter+0x6/0xc
 [<00bf3cec>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
FIX kmalloc-1024: Object at 0x4ad9d650 not freed
=
BUG kmalloc-1024 (Tainted: GB  ): Invalid object pointer 
0x4ad9eb90
-

INFO: Slab 0x03d1012b6600 objects=24 used=17 fp=0x4ad99548 
flags=0x3fffc008101
CPU: 2 PID: 20 Comm: ksoftirqd/2 Tainted: GB   
4.13.0-rc4-bsg-regression+ #2
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Call Trace:
([<00117532>] show_stack+0x8a/0xe0)
 [<00bcbaee>] dump_stack+0x96/0xd8
 [<003cd5fc>] slab_err+0xac/0xc0
 [<003d68e4>] free_debug_processing+0x554/0x570
 [<003d69ae>] __slab_free+0xae/0x618
 [<003d7dce>] kfree+0x44e/0x4a0
 [<00859b4e>] blk_done_softirq+0x146/0x160
 [<00bf4ec0>] __do_softirq+0x3d0/0x840
 [<001662a6>] run_ksoftirqd+0x3e/0xb8
 [<001957fc>] smpboot_thread_fn+0x2f4/0x318
 [<0018f6f6>] kthread+0x166/0x178
 [<00bf3cf2>] kernel_thread_starter+0x6/0xc
 [<00bf3cec>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
FIX kmalloc-1024: Object at 0x4ad9eb90 not freed

> 
> ---
> From 4cd32ee48e334b62b55bff0d380833b978454040 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <h...@lst.de>
> Date: Fri, 11 Aug 2017 11:03:29 +0200
> Subject: bsg-lib: allocate sense data for each request
> 
> Since we split the scsi_request out of the request the driver is supposed
> to provide storage for the sense buffer.  The bsg-lib code failed to do so,
> though and will crash anytime it is used.
> 
> This patch moves bsg-lib to allocate and setup the bsg_job ahead of time,
> and allocate the sense data, which is used as reply buffer in bsg.
> 
> Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
> Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
> Cc: <sta...@vger.kernel.org> #4.11+
> ---
>  block/bsg-lib.c | 53 
> +++--
>  include/linux/blkdev.h  |  1 -
>  include/linux/bsg-lib.h |  2 ++
>  3 files changed, 31 insertions(+), 25 deletions(-)
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index c4513b23f57a..215893dbd038 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -100,7 +100,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
>   */
>  static void bsg_softirq_done(struct request *rq)
>  {
> - struct bsg_job *job = rq->special;
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> 
>   bsg_job_put(job);
>  }
> @@ -129,26 +129,9 @@ static int bsg_map_buffer(struct bsg_buffer *buf, struct 
> request *req)
>  static int bsg_create_job(struct device *dev, struct request *req)
>  {
>   struct request *rsp = req->next_rq;
> - struct request_queue *q = req->q;
> - struct scsi_request *rq = scsi_req(req);
> - struct bsg_job *job;
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   int ret;
> 
> - BUG_ON(req->special);
> -
> - job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - req->special = job;
> - job->req = req;
> - if (q->bsg_job_size)
> - job->dd_data = (void *)[1];
> - job->request = rq->cmd;
> - job->request_len = rq->cmd_len;
> - job->reply = rq->sense;
> - job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
> -  * allocated */
>   if (req->bio) {
>   ret = bsg_map_buffer(>request_payload, req);
>   if (ret)
> @@ -187,7 +170,6 @@ static void bsg_request_fn(struct request_queue *q)
>  {
>   struct device *dev = q->qu

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-11 Thread Benjamin Block

d: GB   
4.13.0-rc4-bsg-regression+ #2
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Call Trace:
([<00117532>] show_stack+0x8a/0xe0)
 [<00bcbaee>] dump_stack+0x96/0xd8
 [<003cd5fc>] slab_err+0xac/0xc0
 [<003d68e4>] free_debug_processing+0x554/0x570
 [<003d69ae>] __slab_free+0xae/0x618
 [<003d7dce>] kfree+0x44e/0x4a0
 [<00859b4e>] blk_done_softirq+0x146/0x160
 [<00bf4ec0>] __do_softirq+0x3d0/0x840
 [<001662a6>] run_ksoftirqd+0x3e/0xb8
 [<001957fc>] smpboot_thread_fn+0x2f4/0x318
 [<0018f6f6>] kthread+0x166/0x178
 [<00bf3cf2>] kernel_thread_starter+0x6/0xc
 [<00bf3cec>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
FIX kmalloc-1024: Object at 0x4ad9d650 not freed
=
BUG kmalloc-1024 (Tainted: GB  ): Invalid object pointer 
0x4ad9eb90
-

INFO: Slab 0x03d1012b6600 objects=24 used=17 fp=0x4ad99548 
flags=0x3fffc008101
CPU: 2 PID: 20 Comm: ksoftirqd/2 Tainted: GB   
4.13.0-rc4-bsg-regression+ #2
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
Call Trace:
([<00117532>] show_stack+0x8a/0xe0)
 [<00bcbaee>] dump_stack+0x96/0xd8
 [<003cd5fc>] slab_err+0xac/0xc0
 [<003d68e4>] free_debug_processing+0x554/0x570
 [<003d69ae>] __slab_free+0xae/0x618
 [<003d7dce>] kfree+0x44e/0x4a0
 [<00859b4e>] blk_done_softirq+0x146/0x160
 [<00bf4ec0>] __do_softirq+0x3d0/0x840
 [<001662a6>] run_ksoftirqd+0x3e/0xb8
 [<001957fc>] smpboot_thread_fn+0x2f4/0x318
 [<0018f6f6>] kthread+0x166/0x178
 [<00bf3cf2>] kernel_thread_starter+0x6/0xc
 [<00bf3cec>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
FIX kmalloc-1024: Object at 0x4ad9eb90 not freed

> 
> ---
> From 4cd32ee48e334b62b55bff0d380833b978454040 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Fri, 11 Aug 2017 11:03:29 +0200
> Subject: bsg-lib: allocate sense data for each request
> 
> Since we split the scsi_request out of the request the driver is supposed
> to provide storage for the sense buffer.  The bsg-lib code failed to do so,
> though and will crash anytime it is used.
> 
> This patch moves bsg-lib to allocate and setup the bsg_job ahead of time,
> and allocate the sense data, which is used as reply buffer in bsg.
> 
> Reported-by: Steffen Maier 
> Signed-off-by: Benjamin Block 
> Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
> Cc:  #4.11+
> ---
>  block/bsg-lib.c | 53 
> +++--
>  include/linux/blkdev.h  |  1 -
>  include/linux/bsg-lib.h |  2 ++
>  3 files changed, 31 insertions(+), 25 deletions(-)
> 
> diff --git a/block/bsg-lib.c b/block/bsg-lib.c
> index c4513b23f57a..215893dbd038 100644
> --- a/block/bsg-lib.c
> +++ b/block/bsg-lib.c
> @@ -100,7 +100,7 @@ EXPORT_SYMBOL_GPL(bsg_job_done);
>   */
>  static void bsg_softirq_done(struct request *rq)
>  {
> - struct bsg_job *job = rq->special;
> + struct bsg_job *job = blk_mq_rq_to_pdu(rq);
> 
>   bsg_job_put(job);
>  }
> @@ -129,26 +129,9 @@ static int bsg_map_buffer(struct bsg_buffer *buf, struct 
> request *req)
>  static int bsg_create_job(struct device *dev, struct request *req)
>  {
>   struct request *rsp = req->next_rq;
> - struct request_queue *q = req->q;
> - struct scsi_request *rq = scsi_req(req);
> - struct bsg_job *job;
> + struct bsg_job *job = blk_mq_rq_to_pdu(req);
>   int ret;
> 
> - BUG_ON(req->special);
> -
> - job = kzalloc(sizeof(struct bsg_job) + q->bsg_job_size, GFP_KERNEL);
> - if (!job)
> - return -ENOMEM;
> -
> - req->special = job;
> - job->req = req;
> - if (q->bsg_job_size)
> - job->dd_data = (void *)[1];
> - job->request = rq->cmd;
> - job->request_len = rq->cmd_len;
> - job->reply = rq->sense;
> - job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
> -  * allocated */
>   if (req->bio) {
>   ret = bsg_map_buffer(>request_payload, req);
>   if (ret)
> @@ -187,7 +170,6 @@ static void bsg_request_fn(struct request_queue *q)
>  {
>   struct device *dev = q->queuedata;
>   struct request *req;
> - struct bsg_job *job;
>   int ret;
> 
>   if (!g

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-10 Thread Benjamin Block

On Fri, Aug 11, 2017 at 12:10:38AM +0200, Benjamin Block wrote:
> On Thu, Aug 10, 2017 at 11:32:17AM +0200, Christoph Hellwig wrote:
> > We can't use an on-stack buffer for the sense data, as drivers will
> > dma to it.  So we should reuse the SCSI init_rq_fn() for the BSG
> > queues and/or implement the same scheme.
> > 
> 
...
> 
>  struct sg_io_v4
>  +--+
>  |  |  
>  | request>++
>  |   + _len |  ||
>  |   (A)|  | BSG Request|
>  |  |  | e.g. struct fc_bsg_request | Depends on BSG 
> implementation
>  |  |  || FC vs. iSCSI vs. ...
>  |  |  ++
>  |  |
>  | response--->++ Used as _Output_
>  |   + max_len  |  || User doesn't initialize
>  |   (B)|  | BSG Reply  | User provides (optional)
>  |  |  | e.g. struct fc_bsg_reply   |   memory; May be NULL.
>  |  |  ||
>  |  |  ++
>  |  |
>  | dout_xferp->+---+ Stuff send on the wire by
>  |   + _len |  |   |   the LLD
>  |   (C)|  | Transport Protocol IU | Aligned on PAGE_SIZE
>  |  |  | e.g. FC-GS-7 CT_IU|
>  |  |  |   |
>  |  |  +---+
>  |  |
>  | din_xferp-->+---+ Buffer for response data by
>  |   + _len |  |   |   the LLD
>  |   (D)|  | Transport Protocol IU | Aligned on PAGE_SIZE
>  |  |  | e.g. FC-GS-7 CT_IU|
>  |  |  |   |
>  |  |  +---+
>  +--+
> 
...
> 
>  struct request (E)
>  +--+
>  |  |  struct scsi_request
>  | scsi_request--->+-+
>  |  |  | |
>  |  |  | cmd-> Copy of (A)
>  |  |  |  + _len | Space in struct or kzalloc
>  |  |  |  (G)|
>  |  |  | |
>  |  |  | sense---> Space for BSG Reply
>  |  |  |  + _len | Same Data-Structure as (B)
>  |  |  |  (H)| NOT actually pointer (B)
>  |  |  | | 'reply_buffer' in my patch 
>  |  |  +-+
>  |  |
>  | bio> Mapped via blk_rq_map_user() to (C) dout_xferp
>  |  |
>  | next_rq-+
>  |  |  |
>  +--+  |
>|
>  struct request (F)|(if used)
>  +--+<-+
>  |  |
>  | scsi_request---> Unused here
>  |  |
>  | bio> Mapped via blk_rq_map_user() to (D) din_xferp
>  |  |
>  +--+
> 
...
> 
>  struct bsg_job
>  +-+
>  | |
>  | request---> (G) scsi_request->cmd -> Copy of (A)
>  |   + _len|   e.g. struct fc_bsg_request
>  | |
>  | reply-> (H) scsi_request->sense -> 'reply_buffer'
>  |   + _len|   e.g. struct fc_bsg_reply
>  | |
>  | request_payload---> struct scatterlist ... map (E)->bio
>  |   + _len|
>  |   (I)   |
>  | |
>  | reply_payload-> struct scatterlist ... map (F)->bio
>  |   + _len|
>  |   (J)   |
>  | |
>  +-+
> 

> 
> This worked till it broke. Right now every driver that tries to access
> (H) will panic the system, or cause very undefined behavior. I suspect
> no driver right now tries to do any DMA into (H); before the regression,
> this has been also an on-stack variable (I suspect since BSG was
> introduced, haven't checked though).
> 
> The asymmetries between the first struct request (E) and the following
> (F) also makes it hard to use the same scheme as in other drivers, where
> init_rq_fn() gets to initialize each request in the same'ish way. Or?
> Just looking at it right now, this would require some bigger rework that
> is not appropriate for a stable bug-fix.
> 

Just some more brain-dump here.

One more problem for direct DMA into (H) in the current BSG setup is
probably, that the transport classes have each their own private format
for the BSG reply (

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-10 Thread Benjamin Block

On Fri, Aug 11, 2017 at 12:10:38AM +0200, Benjamin Block wrote:
> On Thu, Aug 10, 2017 at 11:32:17AM +0200, Christoph Hellwig wrote:
> > We can't use an on-stack buffer for the sense data, as drivers will
> > dma to it.  So we should reuse the SCSI init_rq_fn() for the BSG
> > queues and/or implement the same scheme.
> > 
> 
...
> 
>  struct sg_io_v4
>  +--+
>  |  |  
>  | request>++
>  |   + _len |  ||
>  |   (A)|  | BSG Request|
>  |  |  | e.g. struct fc_bsg_request | Depends on BSG 
> implementation
>  |  |  || FC vs. iSCSI vs. ...
>  |  |  ++
>  |  |
>  | response--->++ Used as _Output_
>  |   + max_len  |  || User doesn't initialize
>  |   (B)|  | BSG Reply  | User provides (optional)
>  |  |  | e.g. struct fc_bsg_reply   |   memory; May be NULL.
>  |  |  ||
>  |  |  ++
>  |  |
>  | dout_xferp->+---+ Stuff send on the wire by
>  |   + _len |  |   |   the LLD
>  |   (C)|  | Transport Protocol IU | Aligned on PAGE_SIZE
>  |  |  | e.g. FC-GS-7 CT_IU|
>  |  |  |   |
>  |  |  +---+
>  |  |
>  | din_xferp-->+---+ Buffer for response data by
>  |   + _len |  |   |   the LLD
>  |   (D)|  | Transport Protocol IU | Aligned on PAGE_SIZE
>  |  |  | e.g. FC-GS-7 CT_IU|
>  |  |  |   |
>  |  |  +---+
>  +--+
> 
...
> 
>  struct request (E)
>  +--+
>  |  |  struct scsi_request
>  | scsi_request--->+-+
>  |  |  | |
>  |  |  | cmd-> Copy of (A)
>  |  |  |  + _len | Space in struct or kzalloc
>  |  |  |  (G)|
>  |  |  | |
>  |  |  | sense---> Space for BSG Reply
>  |  |  |  + _len | Same Data-Structure as (B)
>  |  |  |  (H)| NOT actually pointer (B)
>  |  |  | | 'reply_buffer' in my patch 
>  |  |  +-+
>  |  |
>  | bio> Mapped via blk_rq_map_user() to (C) dout_xferp
>  |  |
>  | next_rq-+
>  |  |  |
>  +--+  |
>|
>  struct request (F)|(if used)
>  +--+<-+
>  |  |
>  | scsi_request---> Unused here
>  |  |
>  | bio> Mapped via blk_rq_map_user() to (D) din_xferp
>  |  |
>  +--+
> 
...
> 
>  struct bsg_job
>  +-+
>  | |
>  | request---> (G) scsi_request->cmd -> Copy of (A)
>  |   + _len|   e.g. struct fc_bsg_request
>  | |
>  | reply-> (H) scsi_request->sense -> 'reply_buffer'
>  |   + _len|   e.g. struct fc_bsg_reply
>  | |
>  | request_payload---> struct scatterlist ... map (E)->bio
>  |   + _len|
>  |   (I)   |
>  | |
>  | reply_payload-> struct scatterlist ... map (F)->bio
>  |   + _len|
>  |   (J)   |
>  | |
>  +-+
> 

> 
> This worked till it broke. Right now every driver that tries to access
> (H) will panic the system, or cause very undefined behavior. I suspect
> no driver right now tries to do any DMA into (H); before the regression,
> this has been also an on-stack variable (I suspect since BSG was
> introduced, haven't checked though).
> 
> The asymmetries between the first struct request (E) and the following
> (F) also makes it hard to use the same scheme as in other drivers, where
> init_rq_fn() gets to initialize each request in the same'ish way. Or?
> Just looking at it right now, this would require some bigger rework that
> is not appropriate for a stable bug-fix.
> 

Just some more brain-dump here.

One more problem for direct DMA into (H) in the current BSG setup is
probably, that the transport classes have each their own private format
for the BSG reply (

Re: [RFC PATCH 5/6] bsg: reduce unnecessary arguments for bsg_map_hdr()

2017-08-10 Thread Benjamin Block

On Thu, Aug 10, 2017 at 11:35:31AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 09, 2017 at 04:11:19PM +0200, Benjamin Block wrote:
> > Since struct bsg_command is now used in every calling case, we don't
> > need separation of arguments anymore that are contained in the same
> > bsg_command.
> > 
> > Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
> > ---
> >  block/bsg.c | 13 ++---
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/block/bsg.c b/block/bsg.c
> > index 8517361a9b3f..6ee2ffca808a 100644
> > --- a/block/bsg.c
> > +++ b/block/bsg.c
> > @@ -212,11 +212,12 @@ bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
> >   * map sg_io_v4 to a request.
> >   */
> >  static struct request *
> > -bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t 
> > has_write_perm,
> > -   u8 *reply_buffer)
> > +bsg_map_hdr(struct bsg_device *bd, struct bsg_command *bc,
> > +   fmode_t has_write_perm)
> 
> I wish we could just rename the argument to mode and pass on the
> whole file->f_mode while you are cleaning up this code.  That should
> be a separate patch, though.
> 

Hmm, I did a quick pass through the code and the only place this seems
to be used, is to pass it to blk_verify_command() if the subcommand used
in the BSG request is a SCSI Command. And this has the same semantics.
So I guess this would require adjustments to the whole stack, as this is
also used from the 'normal' SG side of the world.



Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [RFC PATCH 5/6] bsg: reduce unnecessary arguments for bsg_map_hdr()

2017-08-10 Thread Benjamin Block

On Thu, Aug 10, 2017 at 11:35:31AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 09, 2017 at 04:11:19PM +0200, Benjamin Block wrote:
> > Since struct bsg_command is now used in every calling case, we don't
> > need separation of arguments anymore that are contained in the same
> > bsg_command.
> > 
> > Signed-off-by: Benjamin Block 
> > ---
> >  block/bsg.c | 13 ++---
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/block/bsg.c b/block/bsg.c
> > index 8517361a9b3f..6ee2ffca808a 100644
> > --- a/block/bsg.c
> > +++ b/block/bsg.c
> > @@ -212,11 +212,12 @@ bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
> >   * map sg_io_v4 to a request.
> >   */
> >  static struct request *
> > -bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t 
> > has_write_perm,
> > -   u8 *reply_buffer)
> > +bsg_map_hdr(struct bsg_device *bd, struct bsg_command *bc,
> > +   fmode_t has_write_perm)
> 
> I wish we could just rename the argument to mode and pass on the
> whole file->f_mode while you are cleaning up this code.  That should
> be a separate patch, though.
> 

Hmm, I did a quick pass through the code and the only place this seems
to be used, is to pass it to blk_verify_command() if the subcommand used
in the BSG request is a SCSI Command. And this has the same semantics.
So I guess this would require adjustments to the whole stack, as this is
also used from the 'normal' SG side of the world.



Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [RFC PATCH 4/6] bsg: refactor ioctl to use regular BSG-command infrastructure for SG_IO

2017-08-10 Thread Benjamin Block

On Thu, Aug 10, 2017 at 10:24:56AM +0200, Johannes Thumshirn wrote:
> On Wed, Aug 09, 2017 at 04:11:18PM +0200, Benjamin Block wrote:
> > +   return 0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL);
> 
> return !(bc->hdr.flags & BSG_FLAG_Q_AT_TAIL); and make the function return
> bool? I have to admit, this is the 1st time I have seen the above construct.
> 

Hmm yeah, odd. To be honest, I just copied the expression so that it is
obvious that no behavior changed, but just the place. I'll replace it
with what you suggest.



Beste Grüße / Best regards,
      - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [RFC PATCH 4/6] bsg: refactor ioctl to use regular BSG-command infrastructure for SG_IO

2017-08-10 Thread Benjamin Block

On Thu, Aug 10, 2017 at 10:24:56AM +0200, Johannes Thumshirn wrote:
> On Wed, Aug 09, 2017 at 04:11:18PM +0200, Benjamin Block wrote:
> > +   return 0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL);
> 
> return !(bc->hdr.flags & BSG_FLAG_Q_AT_TAIL); and make the function return
> bool? I have to admit, this is the 1st time I have seen the above construct.
> 

Hmm yeah, odd. To be honest, I just copied the expression so that it is
obvious that no behavior changed, but just the place. I'll replace it
with what you suggest.



Beste Grüße / Best regards,
      - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-10 Thread Benjamin Block

 on (H), the LLD fills in some information for
accounting and such. In case of FC, there is also some
protocol-information, but this is by no means a standard IU.

This is then passed up the stack pretty quick and if the user passed
something with (B), data from (H) is copied into (B). But this is
optional. The main payload is transferred to the user via (J) which is a
remap of (D).

So this is my understanding here. (I don't wanna say that this is
necessarily completely correct ;-), pleas correct me, if I am wrong. The
lack of proper documentation is also not helping at all.)

This worked till it broke. Right now every driver that tries to access
(H) will panic the system, or cause very undefined behavior. I suspect
no driver right now tries to do any DMA into (H); before the regression,
this has been also an on-stack variable (I suspect since BSG was
introduced, haven't checked though).

The asymmetries between the first struct request (E) and the following
(F) also makes it hard to use the same scheme as in other drivers, where
init_rq_fn() gets to initialize each request in the same'ish way. Or?
Just looking at it right now, this would require some bigger rework that
is not appropriate for a stable bug-fix.

I think it would be best if we fix the possible panics every user of
this is gonna experience and after that we can still think about
improving it further beyond what the rest of the patch set already does,
if that is necessary.


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-10 Thread Benjamin Block

 on (H), the LLD fills in some information for
accounting and such. In case of FC, there is also some
protocol-information, but this is by no means a standard IU.

This is then passed up the stack pretty quick and if the user passed
something with (B), data from (H) is copied into (B). But this is
optional. The main payload is transferred to the user via (J) which is a
remap of (D).

So this is my understanding here. (I don't wanna say that this is
necessarily completely correct ;-), pleas correct me, if I am wrong. The
lack of proper documentation is also not helping at all.)

This worked till it broke. Right now every driver that tries to access
(H) will panic the system, or cause very undefined behavior. I suspect
no driver right now tries to do any DMA into (H); before the regression,
this has been also an on-stack variable (I suspect since BSG was
introduced, haven't checked though).

The asymmetries between the first struct request (E) and the following
(F) also makes it hard to use the same scheme as in other drivers, where
init_rq_fn() gets to initialize each request in the same'ish way. Or?
Just looking at it right now, this would require some bigger rework that
is not appropriate for a stable bug-fix.

I think it would be best if we fix the possible panics every user of
this is gonna experience and after that we can still think about
improving it further beyond what the rest of the patch set already does,
if that is necessary.


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

[RFC PATCH 2/6] bsg: assign sense_len instead of fixed SCSI_SENSE_BUFFERSIZE

2017-08-09 Thread Benjamin Block

We do set rq->sense_len when we assigne the reply-buffer in
blk_fill_sgv4_hdr_rq(). No point in possibly deviating from this value
later on.

bsg-lib.h specifies:
unsigned int reply_len;
/*
 * On entry : reply_len indicates the buffer size allocated for
 * the reply.
 *
 * ...
 */

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg-lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c4513b23f57a..c7c2c6bbb5ae 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -147,8 +147,8 @@ static int bsg_create_job(struct device *dev, struct 
request *req)
job->request = rq->cmd;
job->request_len = rq->cmd_len;
job->reply = rq->sense;
-   job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
-* allocated */
+   job->reply_len = rq->sense_len;
+
if (req->bio) {
ret = bsg_map_buffer(>request_payload, req);
if (ret)
-- 
2.12.2

[RFC PATCH 2/6] bsg: assign sense_len instead of fixed SCSI_SENSE_BUFFERSIZE

2017-08-09 Thread Benjamin Block

We do set rq->sense_len when we assigne the reply-buffer in
blk_fill_sgv4_hdr_rq(). No point in possibly deviating from this value
later on.

bsg-lib.h specifies:
unsigned int reply_len;
/*
 * On entry : reply_len indicates the buffer size allocated for
 * the reply.
 *
 * ...
 */

Signed-off-by: Benjamin Block 
---
 block/bsg-lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index c4513b23f57a..c7c2c6bbb5ae 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -147,8 +147,8 @@ static int bsg_create_job(struct device *dev, struct 
request *req)
job->request = rq->cmd;
job->request_len = rq->cmd_len;
job->reply = rq->sense;
-   job->reply_len = SCSI_SENSE_BUFFERSIZE; /* Size of sense buffer
-* allocated */
+   job->reply_len = rq->sense_len;
+
if (req->bio) {
ret = bsg_map_buffer(>request_payload, req);
if (ret)
-- 
2.12.2

[RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-09 Thread Benjamin Block

In contrast to the normal SCSI-lib, the BSG block-queue doesn't make use of
any extra init_rq_fn() to make additional allocations during
request-creation, and the request sense-pointer is not used to transport
SCSI sense data, but is used as backing for the bsg_job->reply pointer;
that in turn is used in the LLDs to store protocol IUs or similar stuff.
This 're-purposing' of the sense-pointer is done in the BSG blk-lib
(bsg_create_job()), during the queue-processing.

Failing to allocate/assign it results in illegal dereferences because LLDs
use this pointer unquestioned, as can be seen in the various
BSG-implementations:

drivers/scsi/libfc/fc_lport.c:  fc_lport_bsg_request()
drivers/scsi/qla2xxx/qla_bsg.c: qla24xx_bsg_request()
drivers/scsi/qla4xxx/ql4_bsg.c: qla4xxx_process_vendor_specific()
drivers/s390/scsi/zfcp_fc.c:zfcp_fc_ct_els_job_handler()
...

An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:01590007 R3:0024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: 
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 65cb0100 task.stack: 65cb4000
Krnl PSW : 0704e0018000 03ff801e4156 
(zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0001 5fa9d0d0 5fa9d078 00e16866
   03ff0290 6b6b6b6b6b6b6b6b 59f78f00 000f
   593a0958 593a0958 60d88800 5ddd4c38
   58b50100 0700659cba08 03ff801e8556 659cb9a8
Krnl Code: 03ff801e4146: e3102054lg  %r1,80(%r2)
   03ff801e414c: 58402040   l   %r4,64(%r2)
  #03ff801e4150: e3502024   lg  %r5,32(%r2)
  >03ff801e4156: 50405004   st  %r4,4(%r5)
   03ff801e415a: e54c5008   mvhi8(%r5),0
   03ff801e4160: e33010280012   lt  %r3,40(%r1)
   03ff801e4166: a718fffb   lhi %r1,-5
   03ff801e416a: 1803   lr  %r0,%r3
Call Trace:
([<03ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
 [<03ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
 [<03ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
 [<009b91b6>] qdio_kick_handler+0x2ae/0x2c8
 [<009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
 [<001684c2>] tasklet_action+0x15a/0x1d8
 [<00bd28ec>] __do_softirq+0x3ec/0x848
 [<001675a4>] irq_exit+0x74/0xf8
 [<0010dd6a>] do_IRQ+0xba/0xf0
 [<00bd19e8>] io_int_handler+0x104/0x2d4
 [<001033b6>] enabled_wait+0xb6/0x188
([<0010339e>] enabled_wait+0x9e/0x188)
 [<0010396a>] arch_cpu_idle+0x32/0x50
 [<00bd0112>] default_idle_call+0x52/0x68
 [<001cd0fa>] do_idle+0x102/0x188
 [<001cd41e>] cpu_startup_entry+0x3e/0x48
 [<00118c64>] smp_start_secondary+0x11c/0x130
 [<00bd2016>] restart_int_handler+0x62/0x78
 [<>]   (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
 [<03ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]

Kernel panic - not syncing: Fatal exception in interrupt

To prevent this, allocate a buffer when the BSG blk-request is setup, and
before it is queued for LLD processing.

Reported-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Cc: <sta...@vger.kernel.org> #4.11+
---
 block/bsg.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 37663b664666..285b1b8126c3 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -74,6 +74,8 @@ static int bsg_major;
 
 static struct kmem_cache *bsg_cmd_cachep;
 
+#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
+
 /*
  * our internal command type
  */
@@ -85,6 +87,7 @@ struct bsg_command {
struct bio *bidi_bio;
int err;
struct sg_io_v4 hdr;
+   u8 reply_buffer[BSG_COMMAND_REPLY_BUFFERSIZE];
 };
 
 static void bsg_free_command(struct bsg_command *bc)
@@ -137,7 +140,7 @@ static inline struct hlist_head *bsg_dev_idx_hash(int index)
 
 static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq,
struct sg_io_v4 *hdr, struct bsg_devic

[RFC PATCH 1/6] bsg: fix kernel panic resulting from missing allocation of a reply-buffer

2017-08-09 Thread Benjamin Block

In contrast to the normal SCSI-lib, the BSG block-queue doesn't make use of
any extra init_rq_fn() to make additional allocations during
request-creation, and the request sense-pointer is not used to transport
SCSI sense data, but is used as backing for the bsg_job->reply pointer;
that in turn is used in the LLDs to store protocol IUs or similar stuff.
This 're-purposing' of the sense-pointer is done in the BSG blk-lib
(bsg_create_job()), during the queue-processing.

Failing to allocate/assign it results in illegal dereferences because LLDs
use this pointer unquestioned, as can be seen in the various
BSG-implementations:

drivers/scsi/libfc/fc_lport.c:  fc_lport_bsg_request()
drivers/scsi/qla2xxx/qla_bsg.c: qla24xx_bsg_request()
drivers/scsi/qla4xxx/ql4_bsg.c: qla4xxx_process_vendor_specific()
drivers/s390/scsi/zfcp_fc.c:zfcp_fc_ct_els_job_handler()
...

An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):

Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:01590007 R3:0024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: 
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 65cb0100 task.stack: 65cb4000
Krnl PSW : 0704e0018000 03ff801e4156 
(zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0001 5fa9d0d0 5fa9d078 00e16866
   03ff0290 6b6b6b6b6b6b6b6b 59f78f00 000f
   593a0958 593a0958 60d88800 5ddd4c38
   58b50100 0700659cba08 03ff801e8556 659cb9a8
Krnl Code: 03ff801e4146: e3102054lg  %r1,80(%r2)
   03ff801e414c: 58402040   l   %r4,64(%r2)
  #03ff801e4150: e3502024   lg  %r5,32(%r2)
  >03ff801e4156: 50405004   st  %r4,4(%r5)
   03ff801e415a: e54c5008   mvhi8(%r5),0
   03ff801e4160: e33010280012   lt  %r3,40(%r1)
   03ff801e4166: a718fffb   lhi %r1,-5
   03ff801e416a: 1803   lr  %r0,%r3
Call Trace:
([<03ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
 [<03ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
 [<03ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
 [<009b91b6>] qdio_kick_handler+0x2ae/0x2c8
 [<009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
 [<001684c2>] tasklet_action+0x15a/0x1d8
 [<00bd28ec>] __do_softirq+0x3ec/0x848
 [<001675a4>] irq_exit+0x74/0xf8
 [<0010dd6a>] do_IRQ+0xba/0xf0
 [<00bd19e8>] io_int_handler+0x104/0x2d4
 [<001033b6>] enabled_wait+0xb6/0x188
([<0010339e>] enabled_wait+0x9e/0x188)
 [<0010396a>] arch_cpu_idle+0x32/0x50
 [<00bd0112>] default_idle_call+0x52/0x68
 [<001cd0fa>] do_idle+0x102/0x188
 [<001cd41e>] cpu_startup_entry+0x3e/0x48
 [<00118c64>] smp_start_secondary+0x11c/0x130
 [<00bd2016>] restart_int_handler+0x62/0x78
 [<>]   (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
 [<03ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]

Kernel panic - not syncing: Fatal exception in interrupt

To prevent this, allocate a buffer when the BSG blk-request is setup, and
before it is queued for LLD processing.

Reported-by: Steffen Maier 
Signed-off-by: Benjamin Block 
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Cc:  #4.11+
---
 block/bsg.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 37663b664666..285b1b8126c3 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -74,6 +74,8 @@ static int bsg_major;
 
 static struct kmem_cache *bsg_cmd_cachep;
 
+#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
+
 /*
  * our internal command type
  */
@@ -85,6 +87,7 @@ struct bsg_command {
struct bio *bidi_bio;
int err;
struct sg_io_v4 hdr;
+   u8 reply_buffer[BSG_COMMAND_REPLY_BUFFERSIZE];
 };
 
 static void bsg_free_command(struct bsg_command *bc)
@@ -137,7 +140,7 @@ static inline struct hlist_head *bsg_dev_idx_hash(int index)
 
 static int blk_fill_sgv4_hdr_rq(struct request_queue *q, struct request *rq,
struct sg_io_v4 *hdr, struct bsg_device *bd,
-   fmode_t has_write_perm)
+

[RFC PATCH 6/6] bsg: reduce unnecessary arguments for blk_complete_sgv4_hdr_rq()

2017-08-09 Thread Benjamin Block

Since struct bsg_command is now used in every calling case, we don't
need separation of arguments anymore that are contained in the same
bsg_command.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 6ee2ffca808a..09f767cdf816 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -399,13 +399,14 @@ static struct bsg_command *bsg_get_done_cmd(struct 
bsg_device *bd)
return bc;
 }
 
-static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
-   struct bio *bio, struct bio *bidi_bio)
+static int blk_complete_sgv4_hdr_rq(struct bsg_command *bc)
 {
+   struct sg_io_v4 *hdr = >hdr;
+   struct request *rq = bc->rq;
struct scsi_request *req = scsi_req(rq);
int ret = 0;
 
-   dprintk("rq %p bio %p 0x%x\n", rq, bio, req->result);
+   dprintk("rq %p bio %p 0x%x\n", rq, bc->bio, req->result);
/*
 * fill in all the output members
 */
@@ -432,7 +433,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
if (rq->next_rq) {
hdr->dout_resid = req->resid_len;
hdr->din_resid = scsi_req(rq->next_rq)->resid_len;
-   blk_rq_unmap_user(bidi_bio);
+   blk_rq_unmap_user(bc->bidi_bio);
blk_put_request(rq->next_rq);
} else if (rq_data_dir(rq) == READ)
hdr->din_resid = req->resid_len;
@@ -448,7 +449,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
if (!ret && req->result < 0)
ret = req->result;
 
-   blk_rq_unmap_user(bio);
+   blk_rq_unmap_user(bc->bio);
scsi_req_free_cmd(req);
blk_put_request(rq);
 
@@ -507,8 +508,7 @@ static int bsg_complete_all_commands(struct bsg_device *bd)
if (IS_ERR(bc))
break;
 
-   tret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-   bc->bidi_bio);
+   tret = blk_complete_sgv4_hdr_rq(bc);
if (!ret)
ret = tret;
 
@@ -542,8 +542,7 @@ __bsg_read(char __user *buf, size_t count, struct 
bsg_device *bd,
 * after completing the request. so do that here,
 * bsg_complete_work() cannot do that for us
 */
-   ret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-  bc->bidi_bio);
+   ret = blk_complete_sgv4_hdr_rq(bc);
 
if (copy_to_user(buf, >hdr, sizeof(bc->hdr)))
ret = -EFAULT;
@@ -944,8 +943,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
blk_execute_rq(bd->queue, NULL, bc->rq, at_head);
bc->hdr.duration = jiffies_to_msecs(jiffies - bc->hdr.duration);
 
-   ret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-  bc->bidi_bio);
+   ret = blk_complete_sgv4_hdr_rq(bc);
 
if (copy_to_user(uarg, >hdr, sizeof(bc->hdr)))
ret = -EFAULT;
-- 
2.12.2

[RFC PATCH 6/6] bsg: reduce unnecessary arguments for blk_complete_sgv4_hdr_rq()

2017-08-09 Thread Benjamin Block

Since struct bsg_command is now used in every calling case, we don't
need separation of arguments anymore that are contained in the same
bsg_command.

Signed-off-by: Benjamin Block 
---
 block/bsg.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 6ee2ffca808a..09f767cdf816 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -399,13 +399,14 @@ static struct bsg_command *bsg_get_done_cmd(struct 
bsg_device *bd)
return bc;
 }
 
-static int blk_complete_sgv4_hdr_rq(struct request *rq, struct sg_io_v4 *hdr,
-   struct bio *bio, struct bio *bidi_bio)
+static int blk_complete_sgv4_hdr_rq(struct bsg_command *bc)
 {
+   struct sg_io_v4 *hdr = >hdr;
+   struct request *rq = bc->rq;
struct scsi_request *req = scsi_req(rq);
int ret = 0;
 
-   dprintk("rq %p bio %p 0x%x\n", rq, bio, req->result);
+   dprintk("rq %p bio %p 0x%x\n", rq, bc->bio, req->result);
/*
 * fill in all the output members
 */
@@ -432,7 +433,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
if (rq->next_rq) {
hdr->dout_resid = req->resid_len;
hdr->din_resid = scsi_req(rq->next_rq)->resid_len;
-   blk_rq_unmap_user(bidi_bio);
+   blk_rq_unmap_user(bc->bidi_bio);
blk_put_request(rq->next_rq);
} else if (rq_data_dir(rq) == READ)
hdr->din_resid = req->resid_len;
@@ -448,7 +449,7 @@ static int blk_complete_sgv4_hdr_rq(struct request *rq, 
struct sg_io_v4 *hdr,
if (!ret && req->result < 0)
ret = req->result;
 
-   blk_rq_unmap_user(bio);
+   blk_rq_unmap_user(bc->bio);
scsi_req_free_cmd(req);
blk_put_request(rq);
 
@@ -507,8 +508,7 @@ static int bsg_complete_all_commands(struct bsg_device *bd)
if (IS_ERR(bc))
break;
 
-   tret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-   bc->bidi_bio);
+   tret = blk_complete_sgv4_hdr_rq(bc);
if (!ret)
ret = tret;
 
@@ -542,8 +542,7 @@ __bsg_read(char __user *buf, size_t count, struct 
bsg_device *bd,
 * after completing the request. so do that here,
 * bsg_complete_work() cannot do that for us
 */
-   ret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-  bc->bidi_bio);
+   ret = blk_complete_sgv4_hdr_rq(bc);
 
if (copy_to_user(buf, >hdr, sizeof(bc->hdr)))
ret = -EFAULT;
@@ -944,8 +943,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
blk_execute_rq(bd->queue, NULL, bc->rq, at_head);
bc->hdr.duration = jiffies_to_msecs(jiffies - bc->hdr.duration);
 
-   ret = blk_complete_sgv4_hdr_rq(bc->rq, >hdr, bc->bio,
-  bc->bidi_bio);
+   ret = blk_complete_sgv4_hdr_rq(bc);
 
if (copy_to_user(uarg, >hdr, sizeof(bc->hdr)))
ret = -EFAULT;
-- 
2.12.2

[RFC PATCH 5/6] bsg: reduce unnecessary arguments for bsg_map_hdr()

2017-08-09 Thread Benjamin Block

Since struct bsg_command is now used in every calling case, we don't
need separation of arguments anymore that are contained in the same
bsg_command.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 8517361a9b3f..6ee2ffca808a 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -212,11 +212,12 @@ bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
  * map sg_io_v4 to a request.
  */
 static struct request *
-bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t 
has_write_perm,
-   u8 *reply_buffer)
+bsg_map_hdr(struct bsg_device *bd, struct bsg_command *bc,
+   fmode_t has_write_perm)
 {
struct request_queue *q = bd->queue;
struct request *rq, *next_rq = NULL;
+   struct sg_io_v4 *hdr = >hdr;
int ret;
unsigned int op, dxfer_len;
void __user *dxferp = NULL;
@@ -244,7 +245,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, 
fmode_t has_write_perm,
if (IS_ERR(rq))
return rq;
 
-   ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, reply_buffer,
+   ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, bc->reply_buffer,
   has_write_perm);
if (ret)
goto out;
@@ -633,8 +634,7 @@ static int __bsg_write(struct bsg_device *bd, const char 
__user *buf,
/*
 * get a request, fill in the blanks, and add to request queue
 */
-   rq = bsg_map_hdr(bd, >hdr, has_write_perm,
-bc->reply_buffer);
+   rq = bsg_map_hdr(bd, bc, has_write_perm);
if (IS_ERR(rq)) {
ret = PTR_ERR(rq);
rq = NULL;
@@ -934,8 +934,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
goto sg_io_out;
}
 
-   bc->rq = bsg_map_hdr(bd, >hdr, file->f_mode & FMODE_WRITE,
-bc->reply_buffer);
+   bc->rq = bsg_map_hdr(bd, bc, file->f_mode & FMODE_WRITE);
if (IS_ERR(bc->rq)) {
ret = PTR_ERR(bc->rq);
goto sg_io_out;
-- 
2.12.2

[RFC PATCH 5/6] bsg: reduce unnecessary arguments for bsg_map_hdr()

2017-08-09 Thread Benjamin Block

Since struct bsg_command is now used in every calling case, we don't
need separation of arguments anymore that are contained in the same
bsg_command.

Signed-off-by: Benjamin Block 
---
 block/bsg.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 8517361a9b3f..6ee2ffca808a 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -212,11 +212,12 @@ bsg_validate_sgv4_hdr(struct sg_io_v4 *hdr, int *op)
  * map sg_io_v4 to a request.
  */
 static struct request *
-bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, fmode_t 
has_write_perm,
-   u8 *reply_buffer)
+bsg_map_hdr(struct bsg_device *bd, struct bsg_command *bc,
+   fmode_t has_write_perm)
 {
struct request_queue *q = bd->queue;
struct request *rq, *next_rq = NULL;
+   struct sg_io_v4 *hdr = >hdr;
int ret;
unsigned int op, dxfer_len;
void __user *dxferp = NULL;
@@ -244,7 +245,7 @@ bsg_map_hdr(struct bsg_device *bd, struct sg_io_v4 *hdr, 
fmode_t has_write_perm,
if (IS_ERR(rq))
return rq;
 
-   ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, reply_buffer,
+   ret = blk_fill_sgv4_hdr_rq(q, rq, hdr, bd, bc->reply_buffer,
   has_write_perm);
if (ret)
goto out;
@@ -633,8 +634,7 @@ static int __bsg_write(struct bsg_device *bd, const char 
__user *buf,
/*
 * get a request, fill in the blanks, and add to request queue
 */
-   rq = bsg_map_hdr(bd, >hdr, has_write_perm,
-bc->reply_buffer);
+   rq = bsg_map_hdr(bd, bc, has_write_perm);
if (IS_ERR(rq)) {
ret = PTR_ERR(rq);
rq = NULL;
@@ -934,8 +934,7 @@ static long bsg_ioctl(struct file *file, unsigned int cmd, 
unsigned long arg)
goto sg_io_out;
}
 
-   bc->rq = bsg_map_hdr(bd, >hdr, file->f_mode & FMODE_WRITE,
-bc->reply_buffer);
+   bc->rq = bsg_map_hdr(bd, bc, file->f_mode & FMODE_WRITE);
if (IS_ERR(bc->rq)) {
ret = PTR_ERR(bc->rq);
goto sg_io_out;
-- 
2.12.2

[RFC PATCH 4/6] bsg: refactor ioctl to use regular BSG-command infrastructure for SG_IO

2017-08-09 Thread Benjamin Block

Before, the SG_IO ioctl for BSG devices used to use its own on-stack data
to assemble and send the specified command. The read and write calls use
their own infrastructure build around the struct bsg_command and a custom
slab-pool for that.

Rafactor this, so that SG_IO ioctl also uses struct bsg_command and the
surrounding infrastructure. This way we use global defines like
BSG_COMMAND_REPLY_BUFFERSIZE only in one place, rather than two, the
handling of BSG commands gets more consistent, and it reduces some code-
duplications (the bio-pointer handling). It also reduces the stack
footprint by 320 to 384 bytes (depending on how large pointers are), and
uses the active slab-implementation for efficient alloc/free.

There are two other side-effects:
 - the 'duration' field in the sg header is now also filled for SG_IO
   calls, unlike before were it was always zero.
 - the BSG device queue-limit is also applied to SG_IO, unlike before were
   you could flood one BSG device with as many commands as you'd like. If
   one can trust older SG documentation this limit is applicable to either
   normal writes, or SG_IO calls; but this wasn't enforced before for
   SG_IO.

A complete unification is not possible, as it then would also enqueue SG_IO
commands in the BGS devices's command list, but this is only for the read-
and write-calls.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 60 
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index b924f1c23c58..8517361a9b3f 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -320,6 +320,17 @@ static void bsg_rq_end_io(struct request *rq, blk_status_t 
status)
wake_up(>wq_done);
 }
 
+static int bsg_prep_add_command(struct bsg_command *bc, struct request *rq)
+{
+   bc->rq = rq;
+   bc->bio = rq->bio;
+   if (rq->next_rq)
+   bc->bidi_bio = rq->next_rq->bio;
+   bc->hdr.duration = jiffies;
+
+   return 0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL);
+}
+
 /*
  * do final setup of a 'bc' and submit the matching 'rq' to the block
  * layer for io
@@ -327,16 +338,11 @@ static void bsg_rq_end_io(struct request *rq, 
blk_status_t status)
 static void bsg_add_command(struct bsg_device *bd, struct request_queue *q,
struct bsg_command *bc, struct request *rq)
 {
-   int at_head = (0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL));
+   int at_head = bsg_prep_add_command(bc, rq);
 
/*
 * add bc command to busy queue and submit rq for io
 */
-   bc->rq = rq;
-   bc->bio = rq->bio;
-   if (rq->next_rq)
-   bc->bidi_bio = rq->next_rq->bio;
-   bc->hdr.duration = jiffies;
spin_lock_irq(>lock);
list_add_tail(>list, >busy_list);
spin_unlock_irq(>lock);
@@ -916,31 +922,37 @@ static long bsg_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg)
return scsi_cmd_ioctl(bd->queue, NULL, file->f_mode, cmd, uarg);
}
case SG_IO: {
-   struct request *rq;
-   u8 reply_buffer[BSG_COMMAND_REPLY_BUFFERSIZE] = { 0, };
-   struct bio *bio, *bidi_bio = NULL;
-   struct sg_io_v4 hdr;
+   struct bsg_command *bc;
int at_head;
 
-   if (copy_from_user(, uarg, sizeof(hdr)))
-   return -EFAULT;
+   bc = bsg_alloc_command(bd);
+   if (IS_ERR(bc))
+   return PTR_ERR(bc);
 
-   rq = bsg_map_hdr(bd, , file->f_mode & FMODE_WRITE,
-reply_buffer);
-   if (IS_ERR(rq))
-   return PTR_ERR(rq);
+   if (copy_from_user(>hdr, uarg, sizeof(bc->hdr))) {
+   ret = -EFAULT;
+   goto sg_io_out;
+   }
 
-   bio = rq->bio;
-   if (rq->next_rq)
-   bidi_bio = rq->next_rq->bio;
+   bc->rq = bsg_map_hdr(bd, >hdr, file->f_mode & FMODE_WRITE,
+bc->reply_buffer);
+   if (IS_ERR(bc->rq)) {
+   ret = PTR_ERR(bc->rq);
+   goto sg_io_out;
+   }
 
-   at_head = (0 == (hdr.flags & BSG_FLAG_Q_AT_TAIL));
-   blk_execute_rq(bd->queue, NULL, rq, at_head);
-   ret = blk_complete_sgv4_hdr_rq(rq, , bio, bidi_bio);
+   at_head = bsg_prep_add_command(bc, bc->rq);
+   blk_execute_rq(bd->queue, NULL, bc->rq, at_head);
+   bc->hdr.duration = jiffies_to_msecs(jiffies - bc->hdr.duration);
 
-   if (copy_to_user(uarg, , sizeof(hdr)))
-

[RFC PATCH 4/6] bsg: refactor ioctl to use regular BSG-command infrastructure for SG_IO

2017-08-09 Thread Benjamin Block

Before, the SG_IO ioctl for BSG devices used to use its own on-stack data
to assemble and send the specified command. The read and write calls use
their own infrastructure build around the struct bsg_command and a custom
slab-pool for that.

Rafactor this, so that SG_IO ioctl also uses struct bsg_command and the
surrounding infrastructure. This way we use global defines like
BSG_COMMAND_REPLY_BUFFERSIZE only in one place, rather than two, the
handling of BSG commands gets more consistent, and it reduces some code-
duplications (the bio-pointer handling). It also reduces the stack
footprint by 320 to 384 bytes (depending on how large pointers are), and
uses the active slab-implementation for efficient alloc/free.

There are two other side-effects:
 - the 'duration' field in the sg header is now also filled for SG_IO
   calls, unlike before were it was always zero.
 - the BSG device queue-limit is also applied to SG_IO, unlike before were
   you could flood one BSG device with as many commands as you'd like. If
   one can trust older SG documentation this limit is applicable to either
   normal writes, or SG_IO calls; but this wasn't enforced before for
   SG_IO.

A complete unification is not possible, as it then would also enqueue SG_IO
commands in the BGS devices's command list, but this is only for the read-
and write-calls.

Signed-off-by: Benjamin Block 
---
 block/bsg.c | 60 
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index b924f1c23c58..8517361a9b3f 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -320,6 +320,17 @@ static void bsg_rq_end_io(struct request *rq, blk_status_t 
status)
wake_up(>wq_done);
 }
 
+static int bsg_prep_add_command(struct bsg_command *bc, struct request *rq)
+{
+   bc->rq = rq;
+   bc->bio = rq->bio;
+   if (rq->next_rq)
+   bc->bidi_bio = rq->next_rq->bio;
+   bc->hdr.duration = jiffies;
+
+   return 0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL);
+}
+
 /*
  * do final setup of a 'bc' and submit the matching 'rq' to the block
  * layer for io
@@ -327,16 +338,11 @@ static void bsg_rq_end_io(struct request *rq, 
blk_status_t status)
 static void bsg_add_command(struct bsg_device *bd, struct request_queue *q,
struct bsg_command *bc, struct request *rq)
 {
-   int at_head = (0 == (bc->hdr.flags & BSG_FLAG_Q_AT_TAIL));
+   int at_head = bsg_prep_add_command(bc, rq);
 
/*
 * add bc command to busy queue and submit rq for io
 */
-   bc->rq = rq;
-   bc->bio = rq->bio;
-   if (rq->next_rq)
-   bc->bidi_bio = rq->next_rq->bio;
-   bc->hdr.duration = jiffies;
spin_lock_irq(>lock);
list_add_tail(>list, >busy_list);
spin_unlock_irq(>lock);
@@ -916,31 +922,37 @@ static long bsg_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg)
return scsi_cmd_ioctl(bd->queue, NULL, file->f_mode, cmd, uarg);
}
case SG_IO: {
-   struct request *rq;
-   u8 reply_buffer[BSG_COMMAND_REPLY_BUFFERSIZE] = { 0, };
-   struct bio *bio, *bidi_bio = NULL;
-   struct sg_io_v4 hdr;
+   struct bsg_command *bc;
int at_head;
 
-   if (copy_from_user(, uarg, sizeof(hdr)))
-   return -EFAULT;
+   bc = bsg_alloc_command(bd);
+   if (IS_ERR(bc))
+   return PTR_ERR(bc);
 
-   rq = bsg_map_hdr(bd, , file->f_mode & FMODE_WRITE,
-reply_buffer);
-   if (IS_ERR(rq))
-   return PTR_ERR(rq);
+   if (copy_from_user(>hdr, uarg, sizeof(bc->hdr))) {
+   ret = -EFAULT;
+   goto sg_io_out;
+   }
 
-   bio = rq->bio;
-   if (rq->next_rq)
-   bidi_bio = rq->next_rq->bio;
+   bc->rq = bsg_map_hdr(bd, >hdr, file->f_mode & FMODE_WRITE,
+bc->reply_buffer);
+   if (IS_ERR(bc->rq)) {
+   ret = PTR_ERR(bc->rq);
+   goto sg_io_out;
+   }
 
-   at_head = (0 == (hdr.flags & BSG_FLAG_Q_AT_TAIL));
-   blk_execute_rq(bd->queue, NULL, rq, at_head);
-   ret = blk_complete_sgv4_hdr_rq(rq, , bio, bidi_bio);
+   at_head = bsg_prep_add_command(bc, bc->rq);
+   blk_execute_rq(bd->queue, NULL, bc->rq, at_head);
+   bc->hdr.duration = jiffies_to_msecs(jiffies - bc->hdr.duration);
 
-   if (copy_to_user(uarg, , sizeof(hdr)))
-   return -EFAULT;
+   ret = blk_complete_s

[RFC PATCH 3/6] bsg: scsi-transport: add compile-tests to prevent reply-buffer overflows

2017-08-09 Thread Benjamin Block

The BSG implementations use the bsg_job's reply buffer as storage for their
own custom reply structures (e.g.: struct fc_bsg_reply or
struct iscsi_bsg_reply). The size of bsg_job's reply buffer and those of
the implementations is not dependent in any way the compiler can currently
check.

To make it easier to notice accidental violations add an explicit compile-
time check that tests whether the implementations' reply buffer is at most
as large as bsg_job's.

To do so, we have to move the size-define from bsg.c to a common header.

Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 block/bsg.c | 3 +--
 drivers/scsi/scsi_transport_fc.c| 3 +++
 drivers/scsi/scsi_transport_iscsi.c | 3 +++
 include/linux/bsg-lib.h | 2 ++
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 285b1b8126c3..b924f1c23c58 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -74,8 +75,6 @@ static int bsg_major;
 
 static struct kmem_cache *bsg_cmd_cachep;
 
-#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
-
 /*
  * our internal command type
  */
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 892fbd9800d9..ce6654b5d329 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3736,6 +3736,9 @@ static int fc_bsg_dispatch(struct bsg_job *job)
 {
struct Scsi_Host *shost = fc_bsg_to_shost(job);
 
+   BUILD_BUG_ON(sizeof(struct fc_bsg_reply) >
+BSG_COMMAND_REPLY_BUFFERSIZE);
+
if (scsi_is_fc_rport(job->dev))
return fc_bsg_rport_dispatch(shost, job);
else
diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index a424eaeafeb0..4e021c949ad7 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1483,6 +1483,9 @@ static int iscsi_bsg_host_dispatch(struct bsg_job *job)
int cmdlen = sizeof(uint32_t);  /* start with length of msgcode */
int ret;
 
+   BUILD_BUG_ON(sizeof(struct iscsi_bsg_reply) >
+BSG_COMMAND_REPLY_BUFFERSIZE);
+
/* check if we have the msgcode value at least */
if (job->request_len < sizeof(uint32_t)) {
ret = -ENOMSG;
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index e34dde2da0ef..85d7c7678cc6 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -25,6 +25,8 @@
 
 #include 
 
+#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
+
 struct request;
 struct device;
 struct scatterlist;
-- 
2.12.2

[RFC PATCH 3/6] bsg: scsi-transport: add compile-tests to prevent reply-buffer overflows

2017-08-09 Thread Benjamin Block

The BSG implementations use the bsg_job's reply buffer as storage for their
own custom reply structures (e.g.: struct fc_bsg_reply or
struct iscsi_bsg_reply). The size of bsg_job's reply buffer and those of
the implementations is not dependent in any way the compiler can currently
check.

To make it easier to notice accidental violations add an explicit compile-
time check that tests whether the implementations' reply buffer is at most
as large as bsg_job's.

To do so, we have to move the size-define from bsg.c to a common header.

Signed-off-by: Benjamin Block 
---
 block/bsg.c | 3 +--
 drivers/scsi/scsi_transport_fc.c| 3 +++
 drivers/scsi/scsi_transport_iscsi.c | 3 +++
 include/linux/bsg-lib.h | 2 ++
 4 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 285b1b8126c3..b924f1c23c58 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -74,8 +75,6 @@ static int bsg_major;
 
 static struct kmem_cache *bsg_cmd_cachep;
 
-#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
-
 /*
  * our internal command type
  */
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 892fbd9800d9..ce6654b5d329 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3736,6 +3736,9 @@ static int fc_bsg_dispatch(struct bsg_job *job)
 {
struct Scsi_Host *shost = fc_bsg_to_shost(job);
 
+   BUILD_BUG_ON(sizeof(struct fc_bsg_reply) >
+BSG_COMMAND_REPLY_BUFFERSIZE);
+
if (scsi_is_fc_rport(job->dev))
return fc_bsg_rport_dispatch(shost, job);
else
diff --git a/drivers/scsi/scsi_transport_iscsi.c 
b/drivers/scsi/scsi_transport_iscsi.c
index a424eaeafeb0..4e021c949ad7 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1483,6 +1483,9 @@ static int iscsi_bsg_host_dispatch(struct bsg_job *job)
int cmdlen = sizeof(uint32_t);  /* start with length of msgcode */
int ret;
 
+   BUILD_BUG_ON(sizeof(struct iscsi_bsg_reply) >
+BSG_COMMAND_REPLY_BUFFERSIZE);
+
/* check if we have the msgcode value at least */
if (job->request_len < sizeof(uint32_t)) {
ret = -ENOMSG;
diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h
index e34dde2da0ef..85d7c7678cc6 100644
--- a/include/linux/bsg-lib.h
+++ b/include/linux/bsg-lib.h
@@ -25,6 +25,8 @@
 
 #include 
 
+#define BSG_COMMAND_REPLY_BUFFERSIZE   SCSI_SENSE_BUFFERSIZE
+
 struct request;
 struct device;
 struct scatterlist;
-- 
2.12.2

[RFC PATCH 0/6] bsg: fix regression resulting in panics when sending commands via BSG and some sanity cleanups

2017-08-09 Thread Benjamin Block

Hello all,

Steffen noticed recently that we have a regression in the BSG code that
prevents us from sending any traffic over this interface. After I
researched this a bit, it turned out that this affects not only zFCP, but
likely all LLDs that implements the BSG API. This was introduced in 4.11
(details in Patch 1 of this series).

I imagine the regression happened because of some very "unfortunate"
variable- namings. I can not fix them, as they involve the base struct
request, but I tried to add some cleanups that should make the
relationships between stuff more visible in the future I hope.

Patch 1   - Regression Fix; Also tagged for stable
Patch 2-6 - Cleanups

I tagged this as RFC. Patches 2-6 are a 'nice to have' IMO, Patch 1 is
obviously necessary, and if it is OK, I can re-send it separately if
necessary. If you don't like the changes in the other patches, I don't mind
dropping them.

I am not sure about Patch 4. It certainly works, but it changes
user-visible behavior, in what I believe is within the behavior described
by the SG interface. It makes the different methods of how BSG passes
commands down to the LLDs more conform with each other - even though I
can't make them the exact same. More details in the patch description.

I rebased the series on Jens' for-next and I have function-tested the
series on s390x's zFCP with the tools provided in the zfcp HBA API library
(https://www.ibm.com/developerworks/linux/linux390/zfcp-hbaapi.html) and
some custom code to test the read/write interface of BSG.

Reviews are more than welcome :)


Beste Grüße / Best regards,
  - Benjamin Block

Benjamin Block (6):
  bsg: fix kernel panic resulting from missing allocation of a
reply-buffer
  bsg: assign sense_len instead of fixed SCSI_SENSE_BUFFERSIZE
  bsg: scsi-transport: add compile-tests to prevent reply-buffer
overflows
  bsg: refactor ioctl to use regular BSG-command infrastructure for
SG_IO
  bsg: reduce unecessary arguments for bsg_map_hdr()
  bsg: reduce unecessary arguments for blk_complete_sgv4_hdr_rq()

 block/bsg-lib.c |  4 +-
 block/bsg.c | 90 ++---
 drivers/scsi/scsi_transport_fc.c|  3 ++
 drivers/scsi/scsi_transport_iscsi.c |  3 ++
 include/linux/bsg-lib.h |  2 +
 5 files changed, 65 insertions(+), 37 deletions(-)

--
2.12.2

[RFC PATCH 0/6] bsg: fix regression resulting in panics when sending commands via BSG and some sanity cleanups

2017-08-09 Thread Benjamin Block

Hello all,

Steffen noticed recently that we have a regression in the BSG code that
prevents us from sending any traffic over this interface. After I
researched this a bit, it turned out that this affects not only zFCP, but
likely all LLDs that implements the BSG API. This was introduced in 4.11
(details in Patch 1 of this series).

I imagine the regression happened because of some very "unfortunate"
variable- namings. I can not fix them, as they involve the base struct
request, but I tried to add some cleanups that should make the
relationships between stuff more visible in the future I hope.

Patch 1   - Regression Fix; Also tagged for stable
Patch 2-6 - Cleanups

I tagged this as RFC. Patches 2-6 are a 'nice to have' IMO, Patch 1 is
obviously necessary, and if it is OK, I can re-send it separately if
necessary. If you don't like the changes in the other patches, I don't mind
dropping them.

I am not sure about Patch 4. It certainly works, but it changes
user-visible behavior, in what I believe is within the behavior described
by the SG interface. It makes the different methods of how BSG passes
commands down to the LLDs more conform with each other - even though I
can't make them the exact same. More details in the patch description.

I rebased the series on Jens' for-next and I have function-tested the
series on s390x's zFCP with the tools provided in the zfcp HBA API library
(https://www.ibm.com/developerworks/linux/linux390/zfcp-hbaapi.html) and
some custom code to test the read/write interface of BSG.

Reviews are more than welcome :)


Beste Grüße / Best regards,
  - Benjamin Block

Benjamin Block (6):
  bsg: fix kernel panic resulting from missing allocation of a
reply-buffer
  bsg: assign sense_len instead of fixed SCSI_SENSE_BUFFERSIZE
  bsg: scsi-transport: add compile-tests to prevent reply-buffer
overflows
  bsg: refactor ioctl to use regular BSG-command infrastructure for
SG_IO
  bsg: reduce unecessary arguments for bsg_map_hdr()
  bsg: reduce unecessary arguments for blk_complete_sgv4_hdr_rq()

 block/bsg-lib.c |  4 +-
 block/bsg.c | 90 ++---
 drivers/scsi/scsi_transport_fc.c|  3 ++
 drivers/scsi/scsi_transport_iscsi.c |  3 ++
 include/linux/bsg-lib.h |  2 +
 5 files changed, 65 insertions(+), 37 deletions(-)

--
2.12.2

[PATCH] MAINTAINERS: Add myself to S390 ZFCP DRIVER as a co-maintainer

2017-07-28 Thread Benjamin Block

I have been working with Steffen on zFCP for quite a while now and we
decided adding me as a co-maintainer might be a good thing.

Acked-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Signed-off-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f6ec1be48609..2c1a52d1376c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11342,6 +11342,7 @@ F:  drivers/s390/crypto/
 
 S390 ZFCP DRIVER
 M: Steffen Maier <ma...@linux.vnet.ibm.com>
+M: Benjamin Block <bbl...@linux.vnet.ibm.com>
 L: linux-s...@vger.kernel.org
 W: http://www.ibm.com/developerworks/linux/linux390/
 S: Supported
-- 
2.11.2

[PATCH] MAINTAINERS: Add myself to S390 ZFCP DRIVER as a co-maintainer

2017-07-28 Thread Benjamin Block

I have been working with Steffen on zFCP for quite a while now and we
decided adding me as a co-maintainer might be a good thing.

Acked-by: Steffen Maier 
Signed-off-by: Benjamin Block 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f6ec1be48609..2c1a52d1376c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11342,6 +11342,7 @@ F:  drivers/s390/crypto/
 
 S390 ZFCP DRIVER
 M: Steffen Maier 
+M: Benjamin Block 
 L: linux-s...@vger.kernel.org
 W: http://www.ibm.com/developerworks/linux/linux390/
 S: Supported
-- 
2.11.2

Re: A bug in scsi_alloc_target of drivers/scsi/scsi_scan.c

2017-05-16 Thread Benjamin Block

Hello Dashi,

On Tue, May 09, 2017 at 09:08:14AM +, Dashi DS1 Cao wrote:
> When debugging a race condition in scsi_remove_target of 3.12, I ran into 
> this possible bug within scsi_alloc_target.
> When an existing "struct scsi_target" is found and used, the starget just got 
> through kzmalloc should be freed, rather than dong a "put_device(dev)".

But that is exactly what is done when put_device is called and the
internal ref-count drops below 1. It will go through the kobj-core and
end up in scsi_target_dev_release().

Also this specific code was changed in 12fb8c1574d7d in 2010, see the
commit message there.

Beste Grüße / Best regards,
  - Benjamin Block
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 81d4151..96795d4 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -483,7 +483,7 @@ static struct scsi_target *scsi_alloc_target(struct 
> device *parent,
> 
> spin_unlock_irqrestore(shost->host_lock, flags);
> if (ref_got) {
> -   put_device(dev);
> +   kfree(starget);
> return found_target;
> }
> /*
> --
> 
> Dashi Cao
> 
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: A bug in scsi_alloc_target of drivers/scsi/scsi_scan.c

2017-05-16 Thread Benjamin Block

Hello Dashi,

On Tue, May 09, 2017 at 09:08:14AM +, Dashi DS1 Cao wrote:
> When debugging a race condition in scsi_remove_target of 3.12, I ran into 
> this possible bug within scsi_alloc_target.
> When an existing "struct scsi_target" is found and used, the starget just got 
> through kzmalloc should be freed, rather than dong a "put_device(dev)".

But that is exactly what is done when put_device is called and the
internal ref-count drops below 1. It will go through the kobj-core and
end up in scsi_target_dev_release().

Also this specific code was changed in 12fb8c1574d7d in 2010, see the
commit message there.

Beste Grüße / Best regards,
  - Benjamin Block
> 
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 81d4151..96795d4 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -483,7 +483,7 @@ static struct scsi_target *scsi_alloc_target(struct 
> device *parent,
> 
> spin_unlock_irqrestore(shost->host_lock, flags);
> if (ref_got) {
> -   put_device(dev);
> +   kfree(starget);
> return found_target;
> }
> /*
> --
> 
> Dashi Cao
> 
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH 21/29] drivers, s390: convert fc_fcp_pkt.ref_cnt from atomic_t to refcount_t

2017-03-06 Thread Benjamin Block

On Mon, Mar 06, 2017 at 04:27:11PM +0100, Johannes Thumshirn wrote:
> On 03/06/2017 03:21 PM, Elena Reshetova wrote:
> > refcount_t type and corresponding API should be
> > used instead of atomic_t when the variable is used as
> > a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free
> > situations.
> 
> The subject is wrong, should be something like "scsi: libfc convert
> fc_fcp_pkt.ref_cnt from atomic_t to refcount_t" but not s390.
> 

Yes please, I was extremely confused for a moment here.



Beste Grüße / Best regards,
      - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [PATCH 21/29] drivers, s390: convert fc_fcp_pkt.ref_cnt from atomic_t to refcount_t

2017-03-06 Thread Benjamin Block

On Mon, Mar 06, 2017 at 04:27:11PM +0100, Johannes Thumshirn wrote:
> On 03/06/2017 03:21 PM, Elena Reshetova wrote:
> > refcount_t type and corresponding API should be
> > used instead of atomic_t when the variable is used as
> > a reference counter. This allows to avoid accidental
> > refcounter overflows that might lead to use-after-free
> > situations.
> 
> The subject is wrong, should be something like "scsi: libfc convert
> fc_fcp_pkt.ref_cnt from atomic_t to refcount_t" but not s390.
> 

Yes please, I was extremely confused for a moment here.



Beste Grüße / Best regards,
      - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

Re: [dm-devel] bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-08 Thread Benjamin Block

On 10:39 Mon 08 Aug , Jens Axboe wrote:
> On 08/08/2016 10:32 AM, Benjamin Block wrote:
> >On 12:06 Fri 05 Aug , Mike Snitzer wrote:
> >>On Fri, Aug 05 2016 at 11:54am -0400,
> >>Jens Axboe <ax...@kernel.dk> wrote:
> >>
> >>>On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> >>>>On Fri, Aug 05 2016 at 11:33P -0400,
> >>>>Jens Axboe <ax...@kernel.dk> wrote:
> >>>>
> >>>>>On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> >>>>>>On Wed, Aug 03 2016 at 11:35am -0400,
> >>>>>>Benjamin Block <bbl...@linux.vnet.ibm.com> wrote:
> >>>>>>
> >>>>>>>Hej Mike,
> >>>>>>>
> >>>>>>>when running a debug-kernel today with several multipath-devices using
> >>>>>>>the round-robin path selector I noticed that the kernel throws these
> >>>>>>>warnings here:
> >>>>>>>
> >>>>>>>BUG: using smp_processor_id() in preemptible [] code: 
> >>>>>>>kdmwork-252:0/881
> >>>>>>>caller is rr_select_path+0x36/0x108 [dm_round_robin]
> >>>>>>>CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> >>>>>>> 617679b8 61767a48 0002 
> >>>>>>> 
> >>>>>>> 61767ae8 61767a60 61767a60 
> >>>>>>> 001145d0
> >>>>>>>  00b962ae 00bb291e 
> >>>>>>> 000b
> >>>>>>> 61767aa8 61767a48  
> >>>>>>> 
> >>>>>>> 07b962ae 001145d0 61767a48 
> >>>>>>> 61767aa8
> >>>>>>>Call Trace:
> >>>>>>>([<001144a2>] show_trace+0x8a/0xe0)
> >>>>>>>([<00114586>] show_stack+0x8e/0xf0)
> >>>>>>>([<006c7fdc>] dump_stack+0x9c/0xe0)
> >>>>>>>([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> >>>>>>>([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> >>>>>>>([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> >>>>>>>([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> >>>>>>>([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> >>>>>>>([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> >>>>>>>([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> >>>>>>>([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> >>>>>>>([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> >>>>>>>([<001681da>] kthread+0x112/0x120)
> >>>>>>>([<0098378a>] kernel_thread_starter+0x6/0xc)
> >>>>>>>([<00983784>] kernel_thread_starter+0x0/0xc)
> >>>>>>>no locks held by kdmwork-252:0/881.
> >>>>>>>
> >[:snip:]
> >>>
> >>>I always forget the details (if this confuses lockdep or not), but you
> >>>could potentially turn it into:
> >>>
> >>>local_irq_save(flags);
> >>>x = this_cpu_ptr();
> >>>[...]
> >>>
> >>>spin_lock(>lock);
> >>>[...]
> >>>
> >>>instead.
> >>
> >>Cool, I've coded up the patch (compile tested only).
> >>
> >>Benjamin, any chance you could test this against your v4.7 kernel and
> >>report back?
> >>
> >>Thanks,
> >>Mike
> >>
> >>diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
> >>index 4ace1da..ed446f8 100644
> >>--- a/drivers/md/dm-round-robin.c
> >>+++ b/drivers/md/dm-round-robin.c
> >>@@ -210,14 +210,17 @@ static struct dm_path *rr_select_path(struct 
> >>path_selector *ps, size_t nr_bytes)
> >>struct path_info *pi = NULL;
> >>struct dm_path *current_path = NULL;
> >>
> >>+   local_irq_save(flags);
> >>current_path = *this_cpu_ptr(s->current_path);
> >>if (current_path) {
> &

Re: [dm-devel] bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-08 Thread Benjamin Block

On 10:39 Mon 08 Aug , Jens Axboe wrote:
> On 08/08/2016 10:32 AM, Benjamin Block wrote:
> >On 12:06 Fri 05 Aug , Mike Snitzer wrote:
> >>On Fri, Aug 05 2016 at 11:54am -0400,
> >>Jens Axboe  wrote:
> >>
> >>>On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> >>>>On Fri, Aug 05 2016 at 11:33P -0400,
> >>>>Jens Axboe  wrote:
> >>>>
> >>>>>On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> >>>>>>On Wed, Aug 03 2016 at 11:35am -0400,
> >>>>>>Benjamin Block  wrote:
> >>>>>>
> >>>>>>>Hej Mike,
> >>>>>>>
> >>>>>>>when running a debug-kernel today with several multipath-devices using
> >>>>>>>the round-robin path selector I noticed that the kernel throws these
> >>>>>>>warnings here:
> >>>>>>>
> >>>>>>>BUG: using smp_processor_id() in preemptible [] code: 
> >>>>>>>kdmwork-252:0/881
> >>>>>>>caller is rr_select_path+0x36/0x108 [dm_round_robin]
> >>>>>>>CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> >>>>>>> 617679b8 61767a48 0002 
> >>>>>>> 
> >>>>>>> 61767ae8 61767a60 61767a60 
> >>>>>>> 001145d0
> >>>>>>>  00b962ae 00bb291e 
> >>>>>>> 000b
> >>>>>>> 61767aa8 61767a48  
> >>>>>>> 
> >>>>>>> 07b962ae 001145d0 61767a48 
> >>>>>>> 61767aa8
> >>>>>>>Call Trace:
> >>>>>>>([<001144a2>] show_trace+0x8a/0xe0)
> >>>>>>>([<00114586>] show_stack+0x8e/0xf0)
> >>>>>>>([<006c7fdc>] dump_stack+0x9c/0xe0)
> >>>>>>>([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> >>>>>>>([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> >>>>>>>([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> >>>>>>>([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> >>>>>>>([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> >>>>>>>([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> >>>>>>>([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> >>>>>>>([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> >>>>>>>([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> >>>>>>>([<001681da>] kthread+0x112/0x120)
> >>>>>>>([<0098378a>] kernel_thread_starter+0x6/0xc)
> >>>>>>>([<00983784>] kernel_thread_starter+0x0/0xc)
> >>>>>>>no locks held by kdmwork-252:0/881.
> >>>>>>>
> >[:snip:]
> >>>
> >>>I always forget the details (if this confuses lockdep or not), but you
> >>>could potentially turn it into:
> >>>
> >>>local_irq_save(flags);
> >>>x = this_cpu_ptr();
> >>>[...]
> >>>
> >>>spin_lock(>lock);
> >>>[...]
> >>>
> >>>instead.
> >>
> >>Cool, I've coded up the patch (compile tested only).
> >>
> >>Benjamin, any chance you could test this against your v4.7 kernel and
> >>report back?
> >>
> >>Thanks,
> >>Mike
> >>
> >>diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
> >>index 4ace1da..ed446f8 100644
> >>--- a/drivers/md/dm-round-robin.c
> >>+++ b/drivers/md/dm-round-robin.c
> >>@@ -210,14 +210,17 @@ static struct dm_path *rr_select_path(struct 
> >>path_selector *ps, size_t nr_bytes)
> >>struct path_info *pi = NULL;
> >>struct dm_path *current_path = NULL;
> >>
> >>+   local_irq_save(flags);
> >>current_path = *this_cpu_ptr(s->current_path);
> >>if (current_path) {
> >>percpu_counter_dec(>repeat_count);
> >>-

Re: [dm-devel] bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-08 Thread Benjamin Block

On 12:06 Fri 05 Aug , Mike Snitzer wrote:
> On Fri, Aug 05 2016 at 11:54am -0400,
> Jens Axboe <ax...@kernel.dk> wrote:
> 
> > On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> > >On Fri, Aug 05 2016 at 11:33P -0400,
> > >Jens Axboe <ax...@kernel.dk> wrote:
> > >
> > >>On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> > >>>On Wed, Aug 03 2016 at 11:35am -0400,
> > >>>Benjamin Block <bbl...@linux.vnet.ibm.com> wrote:
> > >>>
> > >>>>Hej Mike,
> > >>>>
> > >>>>when running a debug-kernel today with several multipath-devices using
> > >>>>the round-robin path selector I noticed that the kernel throws these
> > >>>>warnings here:
> > >>>>
> > >>>>BUG: using smp_processor_id() in preemptible [] code: 
> > >>>>kdmwork-252:0/881
> > >>>>caller is rr_select_path+0x36/0x108 [dm_round_robin]
> > >>>>CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> > >>>>  617679b8 61767a48 0002 
> > >>>> 
> > >>>>  61767ae8 61767a60 61767a60 
> > >>>> 001145d0
> > >>>>   00b962ae 00bb291e 
> > >>>> 000b
> > >>>>  61767aa8 61767a48  
> > >>>> 
> > >>>>  07b962ae 001145d0 61767a48 
> > >>>> 61767aa8
> > >>>>Call Trace:
> > >>>>([<001144a2>] show_trace+0x8a/0xe0)
> > >>>>([<00114586>] show_stack+0x8e/0xf0)
> > >>>>([<006c7fdc>] dump_stack+0x9c/0xe0)
> > >>>>([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> > >>>>([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> > >>>>([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> > >>>>([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> > >>>>([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> > >>>>([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> > >>>>([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> > >>>>([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> > >>>>([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> > >>>>([<001681da>] kthread+0x112/0x120)
> > >>>>([<0098378a>] kernel_thread_starter+0x6/0xc)
> > >>>>([<00983784>] kernel_thread_starter+0x0/0xc)
> > >>>>no locks held by kdmwork-252:0/881.
> > >>>>
[:snip:]
> > 
> > I always forget the details (if this confuses lockdep or not), but you
> > could potentially turn it into:
> > 
> > local_irq_save(flags);
> > x = this_cpu_ptr();
> > [...]
> > 
> > spin_lock(>lock);
> > [...]
> > 
> > instead.
> 
> Cool, I've coded up the patch (compile tested only).
> 
> Benjamin, any chance you could test this against your v4.7 kernel and
> report back?
> 
> Thanks,
> Mike
> 
> diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
> index 4ace1da..ed446f8 100644
> --- a/drivers/md/dm-round-robin.c
> +++ b/drivers/md/dm-round-robin.c
> @@ -210,14 +210,17 @@ static struct dm_path *rr_select_path(struct 
> path_selector *ps, size_t nr_bytes)
>   struct path_info *pi = NULL;
>   struct dm_path *current_path = NULL;
> 
> + local_irq_save(flags);
>   current_path = *this_cpu_ptr(s->current_path);
>   if (current_path) {
>   percpu_counter_dec(>repeat_count);
> - if (percpu_counter_read_positive(>repeat_count) > 0)
> + if (percpu_counter_read_positive(>repeat_count) > 0) {
> + local_irq_restore(flags);
>   return current_path;
> + }
>   }
> 
> - spin_lock_irqsave(>lock, flags);
> + spin_lock(>lock);
>   if (!list_empty(>valid_paths)) {
>   pi = list_entry(s->valid_paths.next, struct path_info, list);
>   list_move_tail(>list, >valid_paths);
> @@ -225,7 +228,8 @@ static struct dm_path *rr_select_path(struct 
&g

Re: [dm-devel] bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-08 Thread Benjamin Block

On 12:06 Fri 05 Aug , Mike Snitzer wrote:
> On Fri, Aug 05 2016 at 11:54am -0400,
> Jens Axboe  wrote:
> 
> > On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> > >On Fri, Aug 05 2016 at 11:33P -0400,
> > >Jens Axboe  wrote:
> > >
> > >>On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> > >>>On Wed, Aug 03 2016 at 11:35am -0400,
> > >>>Benjamin Block  wrote:
> > >>>
> > >>>>Hej Mike,
> > >>>>
> > >>>>when running a debug-kernel today with several multipath-devices using
> > >>>>the round-robin path selector I noticed that the kernel throws these
> > >>>>warnings here:
> > >>>>
> > >>>>BUG: using smp_processor_id() in preemptible [] code: 
> > >>>>kdmwork-252:0/881
> > >>>>caller is rr_select_path+0x36/0x108 [dm_round_robin]
> > >>>>CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> > >>>>  617679b8 61767a48 0002 
> > >>>> 
> > >>>>  61767ae8 61767a60 61767a60 
> > >>>> 001145d0
> > >>>>   00b962ae 00bb291e 
> > >>>> 000b
> > >>>>  61767aa8 61767a48  
> > >>>> 
> > >>>>  07b962ae 001145d0 61767a48 
> > >>>> 61767aa8
> > >>>>Call Trace:
> > >>>>([<001144a2>] show_trace+0x8a/0xe0)
> > >>>>([<00114586>] show_stack+0x8e/0xf0)
> > >>>>([<006c7fdc>] dump_stack+0x9c/0xe0)
> > >>>>([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> > >>>>([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> > >>>>([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> > >>>>([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> > >>>>([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> > >>>>([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> > >>>>([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> > >>>>([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> > >>>>([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> > >>>>([<001681da>] kthread+0x112/0x120)
> > >>>>([<0098378a>] kernel_thread_starter+0x6/0xc)
> > >>>>([<00983784>] kernel_thread_starter+0x0/0xc)
> > >>>>no locks held by kdmwork-252:0/881.
> > >>>>
[:snip:]
> > 
> > I always forget the details (if this confuses lockdep or not), but you
> > could potentially turn it into:
> > 
> > local_irq_save(flags);
> > x = this_cpu_ptr();
> > [...]
> > 
> > spin_lock(>lock);
> > [...]
> > 
> > instead.
> 
> Cool, I've coded up the patch (compile tested only).
> 
> Benjamin, any chance you could test this against your v4.7 kernel and
> report back?
> 
> Thanks,
> Mike
> 
> diff --git a/drivers/md/dm-round-robin.c b/drivers/md/dm-round-robin.c
> index 4ace1da..ed446f8 100644
> --- a/drivers/md/dm-round-robin.c
> +++ b/drivers/md/dm-round-robin.c
> @@ -210,14 +210,17 @@ static struct dm_path *rr_select_path(struct 
> path_selector *ps, size_t nr_bytes)
>   struct path_info *pi = NULL;
>   struct dm_path *current_path = NULL;
> 
> + local_irq_save(flags);
>   current_path = *this_cpu_ptr(s->current_path);
>   if (current_path) {
>   percpu_counter_dec(>repeat_count);
> - if (percpu_counter_read_positive(>repeat_count) > 0)
> + if (percpu_counter_read_positive(>repeat_count) > 0) {
> + local_irq_restore(flags);
>   return current_path;
> + }
>   }
> 
> - spin_lock_irqsave(>lock, flags);
> + spin_lock(>lock);
>   if (!list_empty(>valid_paths)) {
>   pi = list_entry(s->valid_paths.next, struct path_info, list);
>   list_move_tail(>list, >valid_paths);
> @@ -225,7 +228,8 @@ static struct dm_path *rr_select_path(struct 
> path_selector *ps, size_t nr_bytes)
>   set_percpu_curr

Re: bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-07 Thread Benjamin Block

On 12:06 Fri 05 Aug , Mike Snitzer wrote:
> On Fri, Aug 05 2016 at 11:54am -0400,
> Jens Axboe <ax...@kernel.dk> wrote:
> 
> > On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> > >On Fri, Aug 05 2016 at 11:33P -0400,
> > >Jens Axboe <ax...@kernel.dk> wrote:
> > >
> > >>On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> > >>>On Wed, Aug 03 2016 at 11:35am -0400,
> > >>>Benjamin Block <bbl...@linux.vnet.ibm.com> wrote:
> > >>>
> > >>>>Hej Mike,
> > >>>>
> > >>>>when running a debug-kernel today with several multipath-devices using
> > >>>>the round-robin path selector I noticed that the kernel throws these
> > >>>>warnings here:
> > >>>>
> > >>>>BUG: using smp_processor_id() in preemptible [] code: 
> > >>>>kdmwork-252:0/881
> > >>>>caller is rr_select_path+0x36/0x108 [dm_round_robin]
> > >>>>CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> > >>>>  617679b8 61767a48 0002 
> > >>>> 
> > >>>>  61767ae8 61767a60 61767a60 
> > >>>> 001145d0
> > >>>>   00b962ae 00bb291e 
> > >>>> 000b
> > >>>>  61767aa8 61767a48  
> > >>>> 
> > >>>>  07b962ae 001145d0 61767a48 
> > >>>> 61767aa8
> > >>>>Call Trace:
> > >>>>([<001144a2>] show_trace+0x8a/0xe0)
> > >>>>([<00114586>] show_stack+0x8e/0xf0)
> > >>>>([<006c7fdc>] dump_stack+0x9c/0xe0)
> > >>>>([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> > >>>>([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> > >>>>([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> > >>>>([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> > >>>>([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> > >>>>([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> > >>>>([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> > >>>>([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> > >>>>([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> > >>>>([<001681da>] kthread+0x112/0x120)
> > >>>>([<0098378a>] kernel_thread_starter+0x6/0xc)
> > >>>>([<00983784>] kernel_thread_starter+0x0/0xc)
> > >>>>no locks held by kdmwork-252:0/881.
> > >>>>
> > >>>>
> > >>>>There are several more of these warnings, but all have the same
> > >>>>stack-trace (this is on s390x, but this looks like its only common code)
> > >>>>- sometimes the process-context is multipath.
> > >>>>
> > >>>>Looking at the changes in this function, it rather looks like this is
> > >>>>caused by changes made in commit
> > >>>>b0b477c7e0dd93f8916d106018ded1331b81bf61 (dm round robin: use percpu
> > >>>>'repeat_count' and 'current_path').
> > >>>>
> > >>>>The kernel is a stock v4.7 with some debug options enabled (prominently
> > >>>>DEBUG_PREEMPT). Need any more info?
> > >>>
> > >>>As you can see from commit b0b477c7e0dd9 the round-robin path selector
> > >>>is now using percpu data (pointer) and a percpu_counter.
> > >>>
> > >>>I'm really not sure how else to access this percpu data.
> > >>>
> > >>>Cc'ing LKML, linux-block, Jens and hch to cast a wider net in the hopes
> > >>>of getting an answer to how to fix this.
> > >>
> > >>From a quick look, looks like you are using this_cpu_ptr() without
> > >>having preemption disabled.
> > >
> > >Right, that is what it looked like to me too.
> > >
> > >I'm just not sure on what the proper pattern is to fix this.
> > >
> > >I'll look closer though.
> > 
> > I always forget the details (if this confuses lockdep or not), b

Re: bug: using smp_processor_id() in preemptible code in rr_select_path()

2016-08-07 Thread Benjamin Block

On 12:06 Fri 05 Aug , Mike Snitzer wrote:
> On Fri, Aug 05 2016 at 11:54am -0400,
> Jens Axboe  wrote:
> 
> > On 08/05/2016 09:42 AM, Mike Snitzer wrote:
> > >On Fri, Aug 05 2016 at 11:33P -0400,
> > >Jens Axboe  wrote:
> > >
> > >>On 08/05/2016 09:27 AM, Mike Snitzer wrote:
> > >>>On Wed, Aug 03 2016 at 11:35am -0400,
> > >>>Benjamin Block  wrote:
> > >>>
> > >>>>Hej Mike,
> > >>>>
> > >>>>when running a debug-kernel today with several multipath-devices using
> > >>>>the round-robin path selector I noticed that the kernel throws these
> > >>>>warnings here:
> > >>>>
> > >>>>BUG: using smp_processor_id() in preemptible [] code: 
> > >>>>kdmwork-252:0/881
> > >>>>caller is rr_select_path+0x36/0x108 [dm_round_robin]
> > >>>>CPU: 1 PID: 881 Comm: kdmwork-252:0 Not tainted 4.7.0-debug #4
> > >>>>  617679b8 61767a48 0002 
> > >>>> 
> > >>>>  61767ae8 61767a60 61767a60 
> > >>>> 001145d0
> > >>>>   00b962ae 00bb291e 
> > >>>> 000b
> > >>>>  61767aa8 61767a48  
> > >>>> 
> > >>>>  07b962ae 001145d0 61767a48 
> > >>>> 61767aa8
> > >>>>Call Trace:
> > >>>>([<001144a2>] show_trace+0x8a/0xe0)
> > >>>>([<00114586>] show_stack+0x8e/0xf0)
> > >>>>([<006c7fdc>] dump_stack+0x9c/0xe0)
> > >>>>([<006fbbc0>] check_preemption_disabled+0x108/0x130)
> > >>>>([<03ff80268646>] rr_select_path+0x36/0x108 [dm_round_robin])
> > >>>>([<03ff80259a42>] choose_path_in_pg+0x42/0xc8 [dm_multipath])
> > >>>>([<03ff80259b62>] choose_pgpath+0x9a/0x1a0 [dm_multipath])
> > >>>>([<03ff8025b51a>] __multipath_map.isra.5+0x72/0x228 [dm_multipath])
> > >>>>([<03ff8025b75e>] multipath_map+0x3e/0x50 [dm_multipath])
> > >>>>([<03ff80225eb6>] map_request+0x66/0x458 [dm_mod])
> > >>>>([<03ff802262ec>] map_tio_request+0x44/0x70 [dm_mod])
> > >>>>([<0016835a>] kthread_worker_fn+0xf2/0x1d8)
> > >>>>([<001681da>] kthread+0x112/0x120)
> > >>>>([<0098378a>] kernel_thread_starter+0x6/0xc)
> > >>>>([<00983784>] kernel_thread_starter+0x0/0xc)
> > >>>>no locks held by kdmwork-252:0/881.
> > >>>>
> > >>>>
> > >>>>There are several more of these warnings, but all have the same
> > >>>>stack-trace (this is on s390x, but this looks like its only common code)
> > >>>>- sometimes the process-context is multipath.
> > >>>>
> > >>>>Looking at the changes in this function, it rather looks like this is
> > >>>>caused by changes made in commit
> > >>>>b0b477c7e0dd93f8916d106018ded1331b81bf61 (dm round robin: use percpu
> > >>>>'repeat_count' and 'current_path').
> > >>>>
> > >>>>The kernel is a stock v4.7 with some debug options enabled (prominently
> > >>>>DEBUG_PREEMPT). Need any more info?
> > >>>
> > >>>As you can see from commit b0b477c7e0dd9 the round-robin path selector
> > >>>is now using percpu data (pointer) and a percpu_counter.
> > >>>
> > >>>I'm really not sure how else to access this percpu data.
> > >>>
> > >>>Cc'ing LKML, linux-block, Jens and hch to cast a wider net in the hopes
> > >>>of getting an answer to how to fix this.
> > >>
> > >>From a quick look, looks like you are using this_cpu_ptr() without
> > >>having preemption disabled.
> > >
> > >Right, that is what it looked like to me too.
> > >
> > >I'm just not sure on what the proper pattern is to fix this.
> > >
> > >I'll look closer though.
> > 
> > I always forget the details (if this confuses lockdep or not), but you
> > could potentially turn it into:
> >

Re: [PATCH] SCSI-zfcp: Delete unnecessary checks before the function call "mempool_destroy"

2015-12-04 Thread Benjamin Block

Hej Markus,

On 14:23 Mon 16 Nov , SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Mon, 16 Nov 2015 14:19:03 +0100
> 
> The mempool_destroy() function tests whether its argument is NULL
> and then returns immediately. Thus the test around the calls is not needed.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  drivers/s390/scsi/zfcp_aux.c | 21 +++--
>  1 file changed, 7 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c
> index c00ac46..abc9c93 100644
> --- a/drivers/s390/scsi/zfcp_aux.c
> +++ b/drivers/s390/scsi/zfcp_aux.c
> @@ -248,20 +248,13 @@ static int zfcp_allocate_low_mem_buffers(struct 
> zfcp_adapter *adapter)
> 
>  static void zfcp_free_low_mem_buffers(struct zfcp_adapter *adapter)
>  {
> - if (adapter->pool.erp_req)
> - mempool_destroy(adapter->pool.erp_req);
> - if (adapter->pool.scsi_req)
> - mempool_destroy(adapter->pool.scsi_req);
> - if (adapter->pool.scsi_abort)
> - mempool_destroy(adapter->pool.scsi_abort);
> - if (adapter->pool.qtcb_pool)
> - mempool_destroy(adapter->pool.qtcb_pool);
> - if (adapter->pool.status_read_req)
> - mempool_destroy(adapter->pool.status_read_req);
> - if (adapter->pool.sr_data)
> - mempool_destroy(adapter->pool.sr_data);
> - if (adapter->pool.gid_pn)
> - mempool_destroy(adapter->pool.gid_pn);
> + mempool_destroy(adapter->pool.erp_req);
> + mempool_destroy(adapter->pool.scsi_req);
> + mempool_destroy(adapter->pool.scsi_abort);
> + mempool_destroy(adapter->pool.qtcb_pool);
> + mempool_destroy(adapter->pool.status_read_req);
> + mempool_destroy(adapter->pool.sr_data);
> + mempool_destroy(adapter->pool.gid_pn);
>  }
> 
>  /**
> -- 
> 2.6.2

Looks good to me, will have to wait though till Steffen is back around.

Reviewed-by: Benjamin Block 


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294


pgpGumO4lbIOX.pgp
Description: PGP signature

Re: [PATCH] SCSI-zfcp: Delete unnecessary checks before the function call "mempool_destroy"

2015-12-04 Thread Benjamin Block

Hej Markus,

On 14:23 Mon 16 Nov , SF Markus Elfring wrote:
> From: Markus Elfring <elfr...@users.sourceforge.net>
> Date: Mon, 16 Nov 2015 14:19:03 +0100
> 
> The mempool_destroy() function tests whether its argument is NULL
> and then returns immediately. Thus the test around the calls is not needed.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring <elfr...@users.sourceforge.net>
> ---
>  drivers/s390/scsi/zfcp_aux.c | 21 +++--
>  1 file changed, 7 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c
> index c00ac46..abc9c93 100644
> --- a/drivers/s390/scsi/zfcp_aux.c
> +++ b/drivers/s390/scsi/zfcp_aux.c
> @@ -248,20 +248,13 @@ static int zfcp_allocate_low_mem_buffers(struct 
> zfcp_adapter *adapter)
> 
>  static void zfcp_free_low_mem_buffers(struct zfcp_adapter *adapter)
>  {
> - if (adapter->pool.erp_req)
> - mempool_destroy(adapter->pool.erp_req);
> - if (adapter->pool.scsi_req)
> - mempool_destroy(adapter->pool.scsi_req);
> - if (adapter->pool.scsi_abort)
> - mempool_destroy(adapter->pool.scsi_abort);
> - if (adapter->pool.qtcb_pool)
> - mempool_destroy(adapter->pool.qtcb_pool);
> - if (adapter->pool.status_read_req)
> - mempool_destroy(adapter->pool.status_read_req);
> - if (adapter->pool.sr_data)
> - mempool_destroy(adapter->pool.sr_data);
> - if (adapter->pool.gid_pn)
> - mempool_destroy(adapter->pool.gid_pn);
> + mempool_destroy(adapter->pool.erp_req);
> + mempool_destroy(adapter->pool.scsi_req);
> + mempool_destroy(adapter->pool.scsi_abort);
> + mempool_destroy(adapter->pool.qtcb_pool);
> + mempool_destroy(adapter->pool.status_read_req);
> + mempool_destroy(adapter->pool.sr_data);
> + mempool_destroy(adapter->pool.gid_pn);
>  }
> 
>  /**
> -- 
> 2.6.2

Looks good to me, will have to wait though till Steffen is back around.

Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>


Beste Grüße / Best regards,
  - Benjamin Block
-- 
Linux on z Systems Development / IBM Systems & Technology Group
  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz /Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294


pgpGumO4lbIOX.pgp
Description: PGP signature

Re: [PATCH] ACPI: Run fixed button devices' notify callback in the process context

2014-08-22 Thread Benjamin Block

On 08/22/2014 07:33 PM, Rafael J. Wysocki wrote:
> On Friday, August 22, 2014 03:37:55 PM Lan Tianyu wrote:
>> Currently fixed button devices' notify callbacks are running in the
>> interrupt context. It's not necessary and prevent calling functions
>> with mutex lock(E,G evaluating ACPI method). Otherwise, it's different
>> with non-fixed button device whose notify callback is running in the process
>> context. This patch is to make fixed button device's notify
>> callback in the process context and this also can avoid dead lock
>> when using netlink to report button event to user space.
> 
> I guess this is the main reason for the patch?
> 
> Is there a bug report regarding this?
>

There is: https://lkml.org/lkml/2014/8/21/606

> 
>> Signed-off-by: Lan Tianyu 
>> ---
>>  drivers/acpi/scan.c | 9 +++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
>> index 0a817ad..bfb7fc5 100644
>> --- a/drivers/acpi/scan.c
>> +++ b/drivers/acpi/scan.c
>> @@ -922,12 +922,17 @@ static void acpi_device_notify(acpi_handle handle, u32 
>> event, void *data)
>>  device->driver->ops.notify(device, event);
>>  }
>>  
>> -static acpi_status acpi_device_notify_fixed(void *data)
>> +static void acpi_device_notify_fixed_run(void *data)
>>  {
>>  struct acpi_device *device = data;
>>  
>> -/* Fixed hardware devices have no handles */
>>  acpi_device_notify(NULL, ACPI_FIXED_HARDWARE_EVENT, device);
>> +}
>> +
>> +static acpi_status acpi_device_notify_fixed(void *data)
>> +{
>> +/* Fixed hardware devices have no handles */
>> +acpi_os_execute(OSL_NOTIFY_HANDLER, acpi_device_notify_fixed_run, data);
>>  return AE_OK;
>>  }
>>  
>>
> 




signature.asc
Description: OpenPGP digital signature

Re: [PATCH] ACPI: Run fixed button devices' notify callback in the process context

2014-08-22 Thread Benjamin Block

On 08/22/2014 07:33 PM, Rafael J. Wysocki wrote:
 On Friday, August 22, 2014 03:37:55 PM Lan Tianyu wrote:
 Currently fixed button devices' notify callbacks are running in the
 interrupt context. It's not necessary and prevent calling functions
 with mutex lock(E,G evaluating ACPI method). Otherwise, it's different
 with non-fixed button device whose notify callback is running in the process
 context. This patch is to make fixed button device's notify
 callback in the process context and this also can avoid dead lock
 when using netlink to report button event to user space.
 
 I guess this is the main reason for the patch?
 
 Is there a bug report regarding this?


There is: https://lkml.org/lkml/2014/8/21/606

 
 Signed-off-by: Lan Tianyu tianyu@intel.com
 ---
  drivers/acpi/scan.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

 diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
 index 0a817ad..bfb7fc5 100644
 --- a/drivers/acpi/scan.c
 +++ b/drivers/acpi/scan.c
 @@ -922,12 +922,17 @@ static void acpi_device_notify(acpi_handle handle, u32 
 event, void *data)
  device-driver-ops.notify(device, event);
  }
  
 -static acpi_status acpi_device_notify_fixed(void *data)
 +static void acpi_device_notify_fixed_run(void *data)
  {
  struct acpi_device *device = data;
  
 -/* Fixed hardware devices have no handles */
  acpi_device_notify(NULL, ACPI_FIXED_HARDWARE_EVENT, device);
 +}
 +
 +static acpi_status acpi_device_notify_fixed(void *data)
 +{
 +/* Fixed hardware devices have no handles */
 +acpi_os_execute(OSL_NOTIFY_HANDLER, acpi_device_notify_fixed_run, data);
  return AE_OK;
  }
  

 




signature.asc
Description: OpenPGP digital signature

Re: BUG: lockdep (inconsistent usage) in netlink

2014-08-21 Thread Benjamin Block

On 08/21/2014 08:52 PM, Benjamin Block wrote:
> Hello,
> 
> while rebooting one of my dev-machines I stumbled over this
> lockdep-mess-up:
> 
>> =
>> [ INFO: inconsistent lock state ]
>> 3.17.0-rc1-1-gb83ca8c #2 Tainted: G   O  
>> -
>> inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>> swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
>>  (&(>lock)->rlock#3){?.-...}, at: [] 
>> skb_queue_tail+0x2b/0x60
>> {HARDIRQ-ON-W} state was registered at:
>>   [] __lock_acquire+0x877/0x1c90
>>   [] lock_acquire+0xca/0x120
>>   [] _raw_spin_lock_bh+0x44/0x80
>>   [] netlink_poll+0xf8/0x1c0
>>   [] sock_poll+0x161/0x190
>>   [] SyS_epoll_ctl+0x51b/0xd10
>>   [] system_call_fastpath+0x16/0x1b
>> irq event stamp: 1699744
>> hardirqs last  enabled at (1699741): [] 
>> cpuidle_enter_state+0xc4/0x190
>> hardirqs last disabled at (1699742): [] 
>> common_interrupt+0x6a/0x6f
>> softirqs last  enabled at (1699744): [] 
>> _local_bh_enable+0x4a/0x50
>> softirqs last disabled at (1699743): [] irq_enter+0x30/0x70
>>
>> other info that might help us debug this:
>>  Possible unsafe locking scenario:
>>
>>CPU0
>>
>>   lock(&(>lock)->rlock#3);
>>   
>> lock(&(>lock)->rlock#3);
>>
>>  *** DEADLOCK ***
>>
>> no locks held by swapper/0/0.
>>
>> stack backtrace:
>> CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   O   
>> 3.17.0-rc1-1-gb83ca8c #2
>> Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
>>  8295a5b0 8802158039a8 81af20fa 
>>  822164e0 880215803a08 81aee400 
>>   88020001 8105ac0f 82d2abe0
>> Call Trace:
>>[] dump_stack+0x4e/0x68
>>  [] print_usage_bug+0x1ec/0x1fd
>>  [] ? save_stack_trace+0x2f/0x50
>>  [] ? print_irq_inversion_bug+0x200/0x200
>>  [] mark_lock+0x191/0x2b0
>>  [] __lock_acquire+0x7ea/0x1c90
>>  [] ? __lock_acquire+0x914/0x1c90
>>  [] ? print_irq_inversion_bug+0x200/0x200
>>  [] ? __lock_acquire+0x914/0x1c90
>>  [] lock_acquire+0xca/0x120
>>  [] ? skb_queue_tail+0x2b/0x60
>>  [] _raw_spin_lock_irqsave+0x50/0x90
>>  [] ? skb_queue_tail+0x2b/0x60
>>  [] skb_queue_tail+0x2b/0x60
>>  [] __netlink_sendskb+0x21f/0x250
>>  [] netlink_broadcast_filtered+0x273/0x3b0
>>  [] netlink_broadcast+0x1d/0x20
>>  [] ? nla_reserve+0x2a/0x40
>>  [] acpi_bus_generate_netlink_event+0x160/0x178
>>  [] acpi_button_notify+0xe1/0xec
>>  [] acpi_device_notify+0x19/0x1b
>>  [] acpi_device_notify_fixed+0x18/0x1c
>>  [] acpi_ev_fixed_event_detect+0xe6/0x10d
>>  [] acpi_ev_sci_xrupt_handler+0x19/0x3f
>>  [] acpi_irq+0x16/0x31
>>  [] handle_irq_event_percpu+0x6a/0x1d0
>>  [] handle_irq_event+0x48/0x70
>>  [] ? handle_fasteoi_irq+0x2f/0x160
>>  [] handle_fasteoi_irq+0xc7/0x160
>>  [] handle_irq+0x134/0x150
>>  [] ? atomic_notifier_call_chain+0x16/0x20
>>  [] ? __exit_idle+0x2c/0x30
>>  [] do_IRQ+0x5e/0x100
>>  [] common_interrupt+0x6f/0x6f
>>[] ? cpuidle_enter_state+0xcf/0x190
>>  [] ? cpuidle_enter_state+0xc4/0x190
>>  [] cpuidle_enter+0x17/0x20
>>  [] cpu_startup_entry+0x3a1/0x3c0
>>  [] rest_init+0xc4/0xd0
>>  [] ? rest_init+0x5/0xd0
>>  [] ? ftrace_init+0xa8/0x13b
>>  [] start_kernel+0x461/0x46e
>>  [] ? set_init_arg+0x57/0x57
>>  [] x86_64_start_reservations+0x2a/0x2c
>>  [] x86_64_start_kernel+0xfd/0x101
> 
> Sadly I couldn't reproduce it. This looks all to be very general
> functions and my best guess is, netlink_poll() needs to be irq-save.
> Thing is, the corresponding code is quite old and I can't really bisec
> it, because the none-reproducibility.
>

Thinking more about it.. this seems to be unlikely. More like the
acpi-irq chain should not do netlink-events still in irq-context - just
guessing here, sry :).

I tracked around a little and came up with more recent commits in that
call-chain:

commit 0bf6368ee8f25826d0645c0f7a4f17c8845356a4
- adds acpi_bus_generate_netlink_event to the chain

Again, all other places around the chain seems quit old or unrelated.

> 
> There is only the small ipv6-fib patch applied, I send in earlier today
> (https://lkml.org/lkml/2014/8/21/506). This should have nothing to do
> with this here.
> 

- Benjamin



signature.asc
Description: OpenPGP digital signature

BUG: lockdep (inconsistent usage) in netlink

2014-08-21 Thread Benjamin Block

Hello,

while rebooting one of my dev-machines I stumbled over this
lockdep-mess-up:

> =
> [ INFO: inconsistent lock state ]
> 3.17.0-rc1-1-gb83ca8c #2 Tainted: G   O  
> -
> inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
>  (&(>lock)->rlock#3){?.-...}, at: [] 
> skb_queue_tail+0x2b/0x60
> {HARDIRQ-ON-W} state was registered at:
>   [] __lock_acquire+0x877/0x1c90
>   [] lock_acquire+0xca/0x120
>   [] _raw_spin_lock_bh+0x44/0x80
>   [] netlink_poll+0xf8/0x1c0
>   [] sock_poll+0x161/0x190
>   [] SyS_epoll_ctl+0x51b/0xd10
>   [] system_call_fastpath+0x16/0x1b
> irq event stamp: 1699744
> hardirqs last  enabled at (1699741): [] 
> cpuidle_enter_state+0xc4/0x190
> hardirqs last disabled at (1699742): [] 
> common_interrupt+0x6a/0x6f
> softirqs last  enabled at (1699744): [] 
> _local_bh_enable+0x4a/0x50
> softirqs last disabled at (1699743): [] irq_enter+0x30/0x70
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>CPU0
>
>   lock(&(>lock)->rlock#3);
>   
> lock(&(>lock)->rlock#3);
>
>  *** DEADLOCK ***
>
> no locks held by swapper/0/0.
>
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   O   
> 3.17.0-rc1-1-gb83ca8c #2
> Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
>  8295a5b0 8802158039a8 81af20fa 
>  822164e0 880215803a08 81aee400 
>   88020001 8105ac0f 82d2abe0
> Call Trace:
>[] dump_stack+0x4e/0x68
>  [] print_usage_bug+0x1ec/0x1fd
>  [] ? save_stack_trace+0x2f/0x50
>  [] ? print_irq_inversion_bug+0x200/0x200
>  [] mark_lock+0x191/0x2b0
>  [] __lock_acquire+0x7ea/0x1c90
>  [] ? __lock_acquire+0x914/0x1c90
>  [] ? print_irq_inversion_bug+0x200/0x200
>  [] ? __lock_acquire+0x914/0x1c90
>  [] lock_acquire+0xca/0x120
>  [] ? skb_queue_tail+0x2b/0x60
>  [] _raw_spin_lock_irqsave+0x50/0x90
>  [] ? skb_queue_tail+0x2b/0x60
>  [] skb_queue_tail+0x2b/0x60
>  [] __netlink_sendskb+0x21f/0x250
>  [] netlink_broadcast_filtered+0x273/0x3b0
>  [] netlink_broadcast+0x1d/0x20
>  [] ? nla_reserve+0x2a/0x40
>  [] acpi_bus_generate_netlink_event+0x160/0x178
>  [] acpi_button_notify+0xe1/0xec
>  [] acpi_device_notify+0x19/0x1b
>  [] acpi_device_notify_fixed+0x18/0x1c
>  [] acpi_ev_fixed_event_detect+0xe6/0x10d
>  [] acpi_ev_sci_xrupt_handler+0x19/0x3f
>  [] acpi_irq+0x16/0x31
>  [] handle_irq_event_percpu+0x6a/0x1d0
>  [] handle_irq_event+0x48/0x70
>  [] ? handle_fasteoi_irq+0x2f/0x160
>  [] handle_fasteoi_irq+0xc7/0x160
>  [] handle_irq+0x134/0x150
>  [] ? atomic_notifier_call_chain+0x16/0x20
>  [] ? __exit_idle+0x2c/0x30
>  [] do_IRQ+0x5e/0x100
>  [] common_interrupt+0x6f/0x6f
>[] ? cpuidle_enter_state+0xcf/0x190
>  [] ? cpuidle_enter_state+0xc4/0x190
>  [] cpuidle_enter+0x17/0x20
>  [] cpu_startup_entry+0x3a1/0x3c0
>  [] rest_init+0xc4/0xd0
>  [] ? rest_init+0x5/0xd0
>  [] ? ftrace_init+0xa8/0x13b
>  [] start_kernel+0x461/0x46e
>  [] ? set_init_arg+0x57/0x57
>  [] x86_64_start_reservations+0x2a/0x2c
>  [] x86_64_start_kernel+0xfd/0x101

Sadly I couldn't reproduce it. This looks all to be very general
functions and my best guess is, netlink_poll() needs to be irq-save.
Thing is, the corresponding code is quite old and I can't really bisec
it, because the none-reproducibility.

There is only the small ipv6-fib patch applied, I send in earlier today
(https://lkml.org/lkml/2014/8/21/506). This should have nothing to do
with this here.

-- 
If a listener nods his head when you're explaining your program, wake him up.
--
  best regards,
- Benjamin Block


signature.asc
Description: Digital signature

[PATCH v2] net: ipv6: fib: don't sleep inside atomic lock

2014-08-21 Thread Benjamin Block

The function fib6_commit_metrics() allocates a piece of memory in mode
GFP_KERNEL while holding an atomic lock from higher up in the stack, in
the function __ip6_ins_rt(). This produces the following BUG:

> BUG: sleeping function called from invalid context at mm/slub.c:1250
> in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
> 2 locks held by dhcpcd/2909:
>  #0:  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
>  #1:  (>tb6_lock){++--+.}, at: [] 
> ip6_route_add+0x65a/0x800
> CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
> Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
>  0008 8800c8f13858 81af135a 
>  880212202430 8800c8f13878 810f8d3a 880212202c98
>  0010 8800c8f138c8 8121ad0e 0001
> Call Trace:
>  [] dump_stack+0x4e/0x68
>  [] __might_sleep+0x10a/0x120
>  [] kmem_cache_alloc_trace+0x4e/0x190
>  [] ? fib6_commit_metrics+0x66/0x110
>  [] fib6_commit_metrics+0x66/0x110
>  [] fib6_add+0x883/0xa80
>  [] ? ip6_route_add+0x65a/0x800
>  [] ip6_route_add+0x675/0x800
>  [] ? ip6_route_add+0x6a/0x800
>  [] inet6_rtm_newroute+0x5c/0x80
>  [] rtnetlink_rcv_msg+0x211/0x260
>  [] ? rtnl_lock+0x17/0x20
>  [] ? lock_release_holdtime+0x28/0x180
>  [] ? rtnl_lock+0x17/0x20
>  [] ? __rtnl_unlock+0x20/0x20
>  [] netlink_rcv_skb+0x6e/0xd0
>  [] rtnetlink_rcv+0x25/0x40
>  [] netlink_unicast+0xd9/0x180
>  [] netlink_sendmsg+0x700/0x770
>  [] ? local_clock+0x25/0x30
>  [] sock_sendmsg+0x6c/0x90
>  [] ? might_fault+0xa3/0xb0
>  [] ? verify_iovec+0x7d/0xf0
>  [] ___sys_sendmsg+0x37e/0x3b0
>  [] ? trace_hardirqs_on_caller+0x185/0x220
>  [] ? mutex_unlock+0xe/0x10
>  [] ? netlink_insert+0xbc/0xe0
>  [] ? netlink_autobind.isra.30+0x125/0x150
>  [] ? netlink_autobind.isra.30+0x60/0x150
>  [] ? netlink_bind+0x159/0x230
>  [] ? might_fault+0x5a/0xb0
>  [] ? SYSC_bind+0x7e/0xd0
>  [] __sys_sendmsg+0x4d/0x80
>  [] SyS_sendmsg+0x12/0x20
>  [] system_call_fastpath+0x16/0x1b

Fixing this by replacing the mode GFP_KERNEL with GFP_ATOMIC.

Signed-off-by: Benjamin Block 
---
 net/ipv6/ip6_fib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index cb4459b..76b7f5e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -643,7 +643,7 @@ static int fib6_commit_metrics(struct dst_entry *dst,
if (dst->flags & DST_HOST) {
mp = dst_metrics_write_ptr(dst);
} else {
-   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
+   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC);
if (!mp)
return -ENOMEM;
dst_init_metrics(dst, mp, 0);
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] net: ipv6: fib: don't sleep inside atomic lock

2014-08-21 Thread Benjamin Block

On 14:54 Thu 21 Aug , Hannes Frederic Sowa wrote:
> On Mi, 2014-08-20 at 18:16 +0200, Benjamin Block wrote:
> > The function fib6_commit_metrics() allocates a piece of memory in mode
> > GFP_KERNEL while holding an atomic lock from higher up in the stack, in
> > the function __ip6_ins_rt(). This produces the following BUG:
> > 
> > > BUG: sleeping function called from invalid context at mm/slub.c:1250
> > > in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
> > > 2 locks held by dhcpcd/2909:
> > >  #0:  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
> > >  #1:  (>tb6_lock){++--+.}, at: [] 
> > > ip6_route_add+0x65a/0x800
> > > CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
> > > Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
> > >  0008 8800c8f13858 81af135a 
> > >  880212202430 8800c8f13878 810f8d3a 880212202c98
> > >  0010 8800c8f138c8 8121ad0e 0001
> > > Call Trace:
> > >  [] dump_stack+0x4e/0x68
> > >  [] __might_sleep+0x10a/0x120
> > >  [] kmem_cache_alloc_trace+0x4e/0x190
> > >  [] ? fib6_commit_metrics+0x66/0x110
> > >  [] fib6_commit_metrics+0x66/0x110
> > >  [] fib6_add+0x883/0xa80
> > >  [] ? ip6_route_add+0x65a/0x800
> > >  [] ip6_route_add+0x675/0x800
> > >  [] ? ip6_route_add+0x6a/0x800
> > >  [] inet6_rtm_newroute+0x5c/0x80
> > >  [] rtnetlink_rcv_msg+0x211/0x260
> > >  [] ? rtnl_lock+0x17/0x20
> > >  [] ? lock_release_holdtime+0x28/0x180
> > >  [] ? rtnl_lock+0x17/0x20
> > >  [] ? __rtnl_unlock+0x20/0x20
> > >  [] netlink_rcv_skb+0x6e/0xd0
> > >  [] rtnetlink_rcv+0x25/0x40
> > >  [] netlink_unicast+0xd9/0x180
> > >  [] netlink_sendmsg+0x700/0x770
> > >  [] ? local_clock+0x25/0x30
> > >  [] sock_sendmsg+0x6c/0x90
> > >  [] ? might_fault+0xa3/0xb0
> > >  [] ? verify_iovec+0x7d/0xf0
> > >  [] ___sys_sendmsg+0x37e/0x3b0
> > >  [] ? trace_hardirqs_on_caller+0x185/0x220
> > >  [] ? mutex_unlock+0xe/0x10
> > >  [] ? netlink_insert+0xbc/0xe0
> > >  [] ? netlink_autobind.isra.30+0x125/0x150
> > >  [] ? netlink_autobind.isra.30+0x60/0x150
> > >  [] ? netlink_bind+0x159/0x230
> > >  [] ? might_fault+0x5a/0xb0
> > >  [] ? SYSC_bind+0x7e/0xd0
> > >  [] __sys_sendmsg+0x4d/0x80
> > >  [] SyS_sendmsg+0x12/0x20
> > >  [] system_call_fastpath+0x16/0x1b
> > 
> > Fixing this by replacing the mode GFP_KERNEL with GFP_NOWAIT.
> > 
> > Signed-off-by: Benjamin Block 
> > ---
> >  net/ipv6/ip6_fib.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> > index cb4459b..7eef9fc 100644
> > --- a/net/ipv6/ip6_fib.c
> > +++ b/net/ipv6/ip6_fib.c
> > @@ -643,7 +643,7 @@ static int fib6_commit_metrics(struct dst_entry *dst,
> > if (dst->flags & DST_HOST) {
> > mp = dst_metrics_write_ptr(dst);
> > } else {
> > -   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
> > +   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_NOWAIT);
> > if (!mp)
> > return -ENOMEM;
> > dst_init_metrics(dst, mp, 0);
> 
> I actually think we should go with GFP_ATOMIC like the rest of the
> stack. The code path we are talking about here mostly uses GFP_ATOMIC,
> so in case this one GFP_NOWAITit bails out via error path, all those
> GFP_ATOMIC allocations before were useless.
> 

Fair enough, I was thinking about it and was more along the line, the
original author obviously wasn't concerned with any delay, and thus
GFP_NOWAIT would suffice as well.

-- 
BOFH Excuse #174:
Backbone adjustment
--
  best regards,
- Benjamin Block


signature.asc
Description: Digital signature

Re: [PATCH] net: ipv6: fib: don't sleep inside atomic lock

2014-08-21 Thread Benjamin Block

On 14:54 Thu 21 Aug , Hannes Frederic Sowa wrote:
 On Mi, 2014-08-20 at 18:16 +0200, Benjamin Block wrote:
  The function fib6_commit_metrics() allocates a piece of memory in mode
  GFP_KERNEL while holding an atomic lock from higher up in the stack, in
  the function __ip6_ins_rt(). This produces the following BUG:
  
   BUG: sleeping function called from invalid context at mm/slub.c:1250
   in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
   2 locks held by dhcpcd/2909:
#0:  (rtnl_mutex){+.+.+.}, at: [81978e67] rtnl_lock+0x17/0x20
#1:  (tb-tb6_lock){++--+.}, at: [81a6951a] 
   ip6_route_add+0x65a/0x800
   CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
   Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
0008 8800c8f13858 81af135a 
880212202430 8800c8f13878 810f8d3a 880212202c98
0010 8800c8f138c8 8121ad0e 0001
   Call Trace:
[81af135a] dump_stack+0x4e/0x68
[810f8d3a] __might_sleep+0x10a/0x120
[8121ad0e] kmem_cache_alloc_trace+0x4e/0x190
[81a6bcd6] ? fib6_commit_metrics+0x66/0x110
[81a6bcd6] fib6_commit_metrics+0x66/0x110
[81a6cbf3] fib6_add+0x883/0xa80
[81a6951a] ? ip6_route_add+0x65a/0x800
[81a69535] ip6_route_add+0x675/0x800
[81a68f2a] ? ip6_route_add+0x6a/0x800
[81a6990c] inet6_rtm_newroute+0x5c/0x80
[8197cf01] rtnetlink_rcv_msg+0x211/0x260
[81978e67] ? rtnl_lock+0x17/0x20
[81119708] ? lock_release_holdtime+0x28/0x180
[81978e67] ? rtnl_lock+0x17/0x20
[8197ccf0] ? __rtnl_unlock+0x20/0x20
[819a989e] netlink_rcv_skb+0x6e/0xd0
[81978ee5] rtnetlink_rcv+0x25/0x40
[819a8e59] netlink_unicast+0xd9/0x180
[819a9600] netlink_sendmsg+0x700/0x770
[81103735] ? local_clock+0x25/0x30
[8194e83c] sock_sendmsg+0x6c/0x90
[811f98e3] ? might_fault+0xa3/0xb0
[8195ca6d] ? verify_iovec+0x7d/0xf0
[8194ec3e] ___sys_sendmsg+0x37e/0x3b0
[8111ef15] ? trace_hardirqs_on_caller+0x185/0x220
[81af979e] ? mutex_unlock+0xe/0x10
[819a55ec] ? netlink_insert+0xbc/0xe0
[819a65e5] ? netlink_autobind.isra.30+0x125/0x150
[819a6520] ? netlink_autobind.isra.30+0x60/0x150
[819a84f9] ? netlink_bind+0x159/0x230
[811f989a] ? might_fault+0x5a/0xb0
[8194f25e] ? SYSC_bind+0x7e/0xd0
[8194f8cd] __sys_sendmsg+0x4d/0x80
[8194f912] SyS_sendmsg+0x12/0x20
[81afc692] system_call_fastpath+0x16/0x1b
  
  Fixing this by replacing the mode GFP_KERNEL with GFP_NOWAIT.
  
  Signed-off-by: Benjamin Block b...@mageta.org
  ---
   net/ipv6/ip6_fib.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
  index cb4459b..7eef9fc 100644
  --- a/net/ipv6/ip6_fib.c
  +++ b/net/ipv6/ip6_fib.c
  @@ -643,7 +643,7 @@ static int fib6_commit_metrics(struct dst_entry *dst,
  if (dst-flags  DST_HOST) {
  mp = dst_metrics_write_ptr(dst);
  } else {
  -   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
  +   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_NOWAIT);
  if (!mp)
  return -ENOMEM;
  dst_init_metrics(dst, mp, 0);
 
 I actually think we should go with GFP_ATOMIC like the rest of the
 stack. The code path we are talking about here mostly uses GFP_ATOMIC,
 so in case this one GFP_NOWAITit bails out via error path, all those
 GFP_ATOMIC allocations before were useless.
 

Fair enough, I was thinking about it and was more along the line, the
original author obviously wasn't concerned with any delay, and thus
GFP_NOWAIT would suffice as well.

-- 
BOFH Excuse #174:
Backbone adjustment
--
  best regards,
- Benjamin Block


signature.asc
Description: Digital signature

[PATCH v2] net: ipv6: fib: don't sleep inside atomic lock

2014-08-21 Thread Benjamin Block

The function fib6_commit_metrics() allocates a piece of memory in mode
GFP_KERNEL while holding an atomic lock from higher up in the stack, in
the function __ip6_ins_rt(). This produces the following BUG:

 BUG: sleeping function called from invalid context at mm/slub.c:1250
 in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
 2 locks held by dhcpcd/2909:
  #0:  (rtnl_mutex){+.+.+.}, at: [81978e67] rtnl_lock+0x17/0x20
  #1:  (tb-tb6_lock){++--+.}, at: [81a6951a] 
 ip6_route_add+0x65a/0x800
 CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
 Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
  0008 8800c8f13858 81af135a 
  880212202430 8800c8f13878 810f8d3a 880212202c98
  0010 8800c8f138c8 8121ad0e 0001
 Call Trace:
  [81af135a] dump_stack+0x4e/0x68
  [810f8d3a] __might_sleep+0x10a/0x120
  [8121ad0e] kmem_cache_alloc_trace+0x4e/0x190
  [81a6bcd6] ? fib6_commit_metrics+0x66/0x110
  [81a6bcd6] fib6_commit_metrics+0x66/0x110
  [81a6cbf3] fib6_add+0x883/0xa80
  [81a6951a] ? ip6_route_add+0x65a/0x800
  [81a69535] ip6_route_add+0x675/0x800
  [81a68f2a] ? ip6_route_add+0x6a/0x800
  [81a6990c] inet6_rtm_newroute+0x5c/0x80
  [8197cf01] rtnetlink_rcv_msg+0x211/0x260
  [81978e67] ? rtnl_lock+0x17/0x20
  [81119708] ? lock_release_holdtime+0x28/0x180
  [81978e67] ? rtnl_lock+0x17/0x20
  [8197ccf0] ? __rtnl_unlock+0x20/0x20
  [819a989e] netlink_rcv_skb+0x6e/0xd0
  [81978ee5] rtnetlink_rcv+0x25/0x40
  [819a8e59] netlink_unicast+0xd9/0x180
  [819a9600] netlink_sendmsg+0x700/0x770
  [81103735] ? local_clock+0x25/0x30
  [8194e83c] sock_sendmsg+0x6c/0x90
  [811f98e3] ? might_fault+0xa3/0xb0
  [8195ca6d] ? verify_iovec+0x7d/0xf0
  [8194ec3e] ___sys_sendmsg+0x37e/0x3b0
  [8111ef15] ? trace_hardirqs_on_caller+0x185/0x220
  [81af979e] ? mutex_unlock+0xe/0x10
  [819a55ec] ? netlink_insert+0xbc/0xe0
  [819a65e5] ? netlink_autobind.isra.30+0x125/0x150
  [819a6520] ? netlink_autobind.isra.30+0x60/0x150
  [819a84f9] ? netlink_bind+0x159/0x230
  [811f989a] ? might_fault+0x5a/0xb0
  [8194f25e] ? SYSC_bind+0x7e/0xd0
  [8194f8cd] __sys_sendmsg+0x4d/0x80
  [8194f912] SyS_sendmsg+0x12/0x20
  [81afc692] system_call_fastpath+0x16/0x1b

Fixing this by replacing the mode GFP_KERNEL with GFP_ATOMIC.

Signed-off-by: Benjamin Block b...@mageta.org
---
 net/ipv6/ip6_fib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index cb4459b..76b7f5e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -643,7 +643,7 @@ static int fib6_commit_metrics(struct dst_entry *dst,
if (dst-flags  DST_HOST) {
mp = dst_metrics_write_ptr(dst);
} else {
-   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
+   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_ATOMIC);
if (!mp)
return -ENOMEM;
dst_init_metrics(dst, mp, 0);
-- 
1.8.5.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG: lockdep (inconsistent usage) in netlink

2014-08-21 Thread Benjamin Block

Hello,

while rebooting one of my dev-machines I stumbled over this
lockdep-mess-up:

 =
 [ INFO: inconsistent lock state ]
 3.17.0-rc1-1-gb83ca8c #2 Tainted: G   O  
 -
 inconsistent {HARDIRQ-ON-W} - {IN-HARDIRQ-W} usage.
 swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
  ((list-lock)-rlock#3){?.-...}, at: [819580db] 
 skb_queue_tail+0x2b/0x60
 {HARDIRQ-ON-W} state was registered at:
   [8111c9f7] __lock_acquire+0x877/0x1c90
   [8111e45a] lock_acquire+0xca/0x120
   [81afc744] _raw_spin_lock_bh+0x44/0x80
   [819a8918] netlink_poll+0xf8/0x1c0
   [8194e031] sock_poll+0x161/0x190
   [81271ffb] SyS_epoll_ctl+0x51b/0xd10
   [81afd452] system_call_fastpath+0x16/0x1b
 irq event stamp: 1699744
 hardirqs last  enabled at (1699741): [8189b1d4] 
 cpuidle_enter_state+0xc4/0x190
 hardirqs last disabled at (1699742): [81afdfaa] 
 common_interrupt+0x6a/0x6f
 softirqs last  enabled at (1699744): [810d7fda] 
 _local_bh_enable+0x4a/0x50
 softirqs last disabled at (1699743): [810d88f0] irq_enter+0x30/0x70
 
 other info that might help us debug this:
  Possible unsafe locking scenario:

CPU0

   lock((list-lock)-rlock#3);
   Interrupt
 lock((list-lock)-rlock#3);

  *** DEADLOCK ***

 no locks held by swapper/0/0.

 stack backtrace:
 CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   O   
 3.17.0-rc1-1-gb83ca8c #2
 Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
  8295a5b0 8802158039a8 81af20fa 
  822164e0 880215803a08 81aee400 
   88020001 8105ac0f 82d2abe0
 Call Trace:
  IRQ  [81af20fa] dump_stack+0x4e/0x68
  [81aee400] print_usage_bug+0x1ec/0x1fd
  [8105ac0f] ? save_stack_trace+0x2f/0x50
  [8111b600] ? print_irq_inversion_bug+0x200/0x200
  [8111c061] mark_lock+0x191/0x2b0
  [8111c96a] __lock_acquire+0x7ea/0x1c90
  [8111ca94] ? __lock_acquire+0x914/0x1c90
  [8111b600] ? print_irq_inversion_bug+0x200/0x200
  [8111ca94] ? __lock_acquire+0x914/0x1c90
  [8111e45a] lock_acquire+0xca/0x120
  [819580db] ? skb_queue_tail+0x2b/0x60
  [81afc590] _raw_spin_lock_irqsave+0x50/0x90
  [819580db] ? skb_queue_tail+0x2b/0x60
  [819580db] skb_queue_tail+0x2b/0x60
  [819a774f] __netlink_sendskb+0x21f/0x250
  [819a7d63] netlink_broadcast_filtered+0x273/0x3b0
  [819a7ebd] netlink_broadcast+0x1d/0x20
  [8152fb8a] ? nla_reserve+0x2a/0x40
  [81589728] acpi_bus_generate_netlink_event+0x160/0x178
  [815a8db9] acpi_button_notify+0xe1/0xec
  [81580648] acpi_device_notify+0x19/0x1b
  [81580662] acpi_device_notify_fixed+0x18/0x1c
  [8158f039] acpi_ev_fixed_event_detect+0xe6/0x10d
  [8159157a] acpi_ev_sci_xrupt_handler+0x19/0x3f
  [8157c1a9] acpi_irq+0x16/0x31
  [81131e2a] handle_irq_event_percpu+0x6a/0x1d0
  [81131fd8] handle_irq_event+0x48/0x70
  [8113534f] ? handle_fasteoi_irq+0x2f/0x160
  [811353e7] handle_fasteoi_irq+0xc7/0x160
  [8104cd94] handle_irq+0x134/0x150
  [810f4876] ? atomic_notifier_call_chain+0x16/0x20
  [81054dec] ? __exit_idle+0x2c/0x30
  [81affe7e] do_IRQ+0x5e/0x100
  [81afdfaf] common_interrupt+0x6f/0x6f
  EOI  [8189b1df] ? cpuidle_enter_state+0xcf/0x190
  [8189b1d4] ? cpuidle_enter_state+0xc4/0x190
  [8189b387] cpuidle_enter+0x17/0x20
  [8ae1] cpu_startup_entry+0x3a1/0x3c0
  [81ae92a4] rest_init+0xc4/0xd0
  [81ae91e5] ? rest_init+0x5/0xd0
  [825718a1] ? ftrace_init+0xa8/0x13b
  [8255103a] start_kernel+0x461/0x46e
  [82550939] ? set_init_arg+0x57/0x57
  [825505af] x86_64_start_reservations+0x2a/0x2c
  [825506ae] x86_64_start_kernel+0xfd/0x101

Sadly I couldn't reproduce it. This looks all to be very general
functions and my best guess is, netlink_poll() needs to be irq-save.
Thing is, the corresponding code is quite old and I can't really bisec
it, because the none-reproducibility.

There is only the small ipv6-fib patch applied, I send in earlier today
(https://lkml.org/lkml/2014/8/21/506). This should have nothing to do
with this here.

-- 
If a listener nods his head when you're explaining your program, wake him up.
--
  best regards,
- Benjamin Block


signature.asc
Description: Digital signature

Re: BUG: lockdep (inconsistent usage) in netlink

2014-08-21 Thread Benjamin Block

On 08/21/2014 08:52 PM, Benjamin Block wrote:
 Hello,
 
 while rebooting one of my dev-machines I stumbled over this
 lockdep-mess-up:
 
 =
 [ INFO: inconsistent lock state ]
 3.17.0-rc1-1-gb83ca8c #2 Tainted: G   O  
 -
 inconsistent {HARDIRQ-ON-W} - {IN-HARDIRQ-W} usage.
 swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
  ((list-lock)-rlock#3){?.-...}, at: [819580db] 
 skb_queue_tail+0x2b/0x60
 {HARDIRQ-ON-W} state was registered at:
   [8111c9f7] __lock_acquire+0x877/0x1c90
   [8111e45a] lock_acquire+0xca/0x120
   [81afc744] _raw_spin_lock_bh+0x44/0x80
   [819a8918] netlink_poll+0xf8/0x1c0
   [8194e031] sock_poll+0x161/0x190
   [81271ffb] SyS_epoll_ctl+0x51b/0xd10
   [81afd452] system_call_fastpath+0x16/0x1b
 irq event stamp: 1699744
 hardirqs last  enabled at (1699741): [8189b1d4] 
 cpuidle_enter_state+0xc4/0x190
 hardirqs last disabled at (1699742): [81afdfaa] 
 common_interrupt+0x6a/0x6f
 softirqs last  enabled at (1699744): [810d7fda] 
 _local_bh_enable+0x4a/0x50
 softirqs last disabled at (1699743): [810d88f0] irq_enter+0x30/0x70

 other info that might help us debug this:
  Possible unsafe locking scenario:

CPU0

   lock((list-lock)-rlock#3);
   Interrupt
 lock((list-lock)-rlock#3);

  *** DEADLOCK ***

 no locks held by swapper/0/0.

 stack backtrace:
 CPU: 0 PID: 0 Comm: swapper/0 Tainted: G   O   
 3.17.0-rc1-1-gb83ca8c #2
 Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
  8295a5b0 8802158039a8 81af20fa 
  822164e0 880215803a08 81aee400 
   88020001 8105ac0f 82d2abe0
 Call Trace:
  IRQ  [81af20fa] dump_stack+0x4e/0x68
  [81aee400] print_usage_bug+0x1ec/0x1fd
  [8105ac0f] ? save_stack_trace+0x2f/0x50
  [8111b600] ? print_irq_inversion_bug+0x200/0x200
  [8111c061] mark_lock+0x191/0x2b0
  [8111c96a] __lock_acquire+0x7ea/0x1c90
  [8111ca94] ? __lock_acquire+0x914/0x1c90
  [8111b600] ? print_irq_inversion_bug+0x200/0x200
  [8111ca94] ? __lock_acquire+0x914/0x1c90
  [8111e45a] lock_acquire+0xca/0x120
  [819580db] ? skb_queue_tail+0x2b/0x60
  [81afc590] _raw_spin_lock_irqsave+0x50/0x90
  [819580db] ? skb_queue_tail+0x2b/0x60
  [819580db] skb_queue_tail+0x2b/0x60
  [819a774f] __netlink_sendskb+0x21f/0x250
  [819a7d63] netlink_broadcast_filtered+0x273/0x3b0
  [819a7ebd] netlink_broadcast+0x1d/0x20
  [8152fb8a] ? nla_reserve+0x2a/0x40
  [81589728] acpi_bus_generate_netlink_event+0x160/0x178
  [815a8db9] acpi_button_notify+0xe1/0xec
  [81580648] acpi_device_notify+0x19/0x1b
  [81580662] acpi_device_notify_fixed+0x18/0x1c
  [8158f039] acpi_ev_fixed_event_detect+0xe6/0x10d
  [8159157a] acpi_ev_sci_xrupt_handler+0x19/0x3f
  [8157c1a9] acpi_irq+0x16/0x31
  [81131e2a] handle_irq_event_percpu+0x6a/0x1d0
  [81131fd8] handle_irq_event+0x48/0x70
  [8113534f] ? handle_fasteoi_irq+0x2f/0x160
  [811353e7] handle_fasteoi_irq+0xc7/0x160
  [8104cd94] handle_irq+0x134/0x150
  [810f4876] ? atomic_notifier_call_chain+0x16/0x20
  [81054dec] ? __exit_idle+0x2c/0x30
  [81affe7e] do_IRQ+0x5e/0x100
  [81afdfaf] common_interrupt+0x6f/0x6f
  EOI  [8189b1df] ? cpuidle_enter_state+0xcf/0x190
  [8189b1d4] ? cpuidle_enter_state+0xc4/0x190
  [8189b387] cpuidle_enter+0x17/0x20
  [8ae1] cpu_startup_entry+0x3a1/0x3c0
  [81ae92a4] rest_init+0xc4/0xd0
  [81ae91e5] ? rest_init+0x5/0xd0
  [825718a1] ? ftrace_init+0xa8/0x13b
  [8255103a] start_kernel+0x461/0x46e
  [82550939] ? set_init_arg+0x57/0x57
  [825505af] x86_64_start_reservations+0x2a/0x2c
  [825506ae] x86_64_start_kernel+0xfd/0x101
 
 Sadly I couldn't reproduce it. This looks all to be very general
 functions and my best guess is, netlink_poll() needs to be irq-save.
 Thing is, the corresponding code is quite old and I can't really bisec
 it, because the none-reproducibility.


Thinking more about it.. this seems to be unlikely. More like the
acpi-irq chain should not do netlink-events still in irq-context - just
guessing here, sry :).

I tracked around a little and came up with more recent commits in that
call-chain:

commit 0bf6368ee8f25826d0645c0f7a4f17c8845356a4
- adds acpi_bus_generate_netlink_event to the chain

Again, all other places around the chain seems quit old or unrelated.

 
 There is only the small ipv6-fib patch applied, I send in earlier today
 (https://lkml.org/lkml/2014/8/21/506). This should have nothing to do
 with this here.
 

- Benjamin



signature.asc
Description: OpenPGP

[PATCH] net: ipv6: fib: don't sleep inside atomic lock

2014-08-20 Thread Benjamin Block

The function fib6_commit_metrics() allocates a piece of memory in mode
GFP_KERNEL while holding an atomic lock from higher up in the stack, in
the function __ip6_ins_rt(). This produces the following BUG:

> BUG: sleeping function called from invalid context at mm/slub.c:1250
> in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
> 2 locks held by dhcpcd/2909:
>  #0:  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
>  #1:  (>tb6_lock){++--+.}, at: [] 
> ip6_route_add+0x65a/0x800
> CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
> Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
>  0008 8800c8f13858 81af135a 
>  880212202430 8800c8f13878 810f8d3a 880212202c98
>  0010 8800c8f138c8 8121ad0e 0001
> Call Trace:
>  [] dump_stack+0x4e/0x68
>  [] __might_sleep+0x10a/0x120
>  [] kmem_cache_alloc_trace+0x4e/0x190
>  [] ? fib6_commit_metrics+0x66/0x110
>  [] fib6_commit_metrics+0x66/0x110
>  [] fib6_add+0x883/0xa80
>  [] ? ip6_route_add+0x65a/0x800
>  [] ip6_route_add+0x675/0x800
>  [] ? ip6_route_add+0x6a/0x800
>  [] inet6_rtm_newroute+0x5c/0x80
>  [] rtnetlink_rcv_msg+0x211/0x260
>  [] ? rtnl_lock+0x17/0x20
>  [] ? lock_release_holdtime+0x28/0x180
>  [] ? rtnl_lock+0x17/0x20
>  [] ? __rtnl_unlock+0x20/0x20
>  [] netlink_rcv_skb+0x6e/0xd0
>  [] rtnetlink_rcv+0x25/0x40
>  [] netlink_unicast+0xd9/0x180
>  [] netlink_sendmsg+0x700/0x770
>  [] ? local_clock+0x25/0x30
>  [] sock_sendmsg+0x6c/0x90
>  [] ? might_fault+0xa3/0xb0
>  [] ? verify_iovec+0x7d/0xf0
>  [] ___sys_sendmsg+0x37e/0x3b0
>  [] ? trace_hardirqs_on_caller+0x185/0x220
>  [] ? mutex_unlock+0xe/0x10
>  [] ? netlink_insert+0xbc/0xe0
>  [] ? netlink_autobind.isra.30+0x125/0x150
>  [] ? netlink_autobind.isra.30+0x60/0x150
>  [] ? netlink_bind+0x159/0x230
>  [] ? might_fault+0x5a/0xb0
>  [] ? SYSC_bind+0x7e/0xd0
>  [] __sys_sendmsg+0x4d/0x80
>  [] SyS_sendmsg+0x12/0x20
>  [] system_call_fastpath+0x16/0x1b

Fixing this by replacing the mode GFP_KERNEL with GFP_NOWAIT.

Signed-off-by: Benjamin Block 
---
 net/ipv6/ip6_fib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index cb4459b..7eef9fc 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -643,7 +643,7 @@ static int fib6_commit_metrics(struct dst_entry *dst,
if (dst->flags & DST_HOST) {
mp = dst_metrics_write_ptr(dst);
} else {
-   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
+   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_NOWAIT);
if (!mp)
return -ENOMEM;
dst_init_metrics(dst, mp, 0);
-- 
1.8.5.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] net: ipv6: fib: don't sleep inside atomic lock

2014-08-20 Thread Benjamin Block

The function fib6_commit_metrics() allocates a piece of memory in mode
GFP_KERNEL while holding an atomic lock from higher up in the stack, in
the function __ip6_ins_rt(). This produces the following BUG:

 BUG: sleeping function called from invalid context at mm/slub.c:1250
 in_atomic(): 1, irqs_disabled(): 0, pid: 2909, name: dhcpcd
 2 locks held by dhcpcd/2909:
  #0:  (rtnl_mutex){+.+.+.}, at: [81978e67] rtnl_lock+0x17/0x20
  #1:  (tb-tb6_lock){++--+.}, at: [81a6951a] 
 ip6_route_add+0x65a/0x800
 CPU: 1 PID: 2909 Comm: dhcpcd Not tainted 3.17.0-rc1 #1
 Hardware name: ASUS All Series/Q87T, BIOS 0216 10/16/2013
  0008 8800c8f13858 81af135a 
  880212202430 8800c8f13878 810f8d3a 880212202c98
  0010 8800c8f138c8 8121ad0e 0001
 Call Trace:
  [81af135a] dump_stack+0x4e/0x68
  [810f8d3a] __might_sleep+0x10a/0x120
  [8121ad0e] kmem_cache_alloc_trace+0x4e/0x190
  [81a6bcd6] ? fib6_commit_metrics+0x66/0x110
  [81a6bcd6] fib6_commit_metrics+0x66/0x110
  [81a6cbf3] fib6_add+0x883/0xa80
  [81a6951a] ? ip6_route_add+0x65a/0x800
  [81a69535] ip6_route_add+0x675/0x800
  [81a68f2a] ? ip6_route_add+0x6a/0x800
  [81a6990c] inet6_rtm_newroute+0x5c/0x80
  [8197cf01] rtnetlink_rcv_msg+0x211/0x260
  [81978e67] ? rtnl_lock+0x17/0x20
  [81119708] ? lock_release_holdtime+0x28/0x180
  [81978e67] ? rtnl_lock+0x17/0x20
  [8197ccf0] ? __rtnl_unlock+0x20/0x20
  [819a989e] netlink_rcv_skb+0x6e/0xd0
  [81978ee5] rtnetlink_rcv+0x25/0x40
  [819a8e59] netlink_unicast+0xd9/0x180
  [819a9600] netlink_sendmsg+0x700/0x770
  [81103735] ? local_clock+0x25/0x30
  [8194e83c] sock_sendmsg+0x6c/0x90
  [811f98e3] ? might_fault+0xa3/0xb0
  [8195ca6d] ? verify_iovec+0x7d/0xf0
  [8194ec3e] ___sys_sendmsg+0x37e/0x3b0
  [8111ef15] ? trace_hardirqs_on_caller+0x185/0x220
  [81af979e] ? mutex_unlock+0xe/0x10
  [819a55ec] ? netlink_insert+0xbc/0xe0
  [819a65e5] ? netlink_autobind.isra.30+0x125/0x150
  [819a6520] ? netlink_autobind.isra.30+0x60/0x150
  [819a84f9] ? netlink_bind+0x159/0x230
  [811f989a] ? might_fault+0x5a/0xb0
  [8194f25e] ? SYSC_bind+0x7e/0xd0
  [8194f8cd] __sys_sendmsg+0x4d/0x80
  [8194f912] SyS_sendmsg+0x12/0x20
  [81afc692] system_call_fastpath+0x16/0x1b

Fixing this by replacing the mode GFP_KERNEL with GFP_NOWAIT.

Signed-off-by: Benjamin Block b...@mageta.org
---
 net/ipv6/ip6_fib.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index cb4459b..7eef9fc 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -643,7 +643,7 @@ static int fib6_commit_metrics(struct dst_entry *dst,
if (dst-flags  DST_HOST) {
mp = dst_metrics_write_ptr(dst);
} else {
-   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_KERNEL);
+   mp = kzalloc(sizeof(u32) * RTAX_MAX, GFP_NOWAIT);
if (!mp)
return -ENOMEM;
dst_init_metrics(dst, mp, 0);
-- 
1.8.5.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

78 matches

Mail list logo