[GIT PULL] Block fixes for 4.12-rc2

2017-05-20 Thread Jens Axboe
Hi Linus,

A small collection of fixes that should go into this cycle. This pull
request contains:

- A pull request from Christoph for NVMe, which ended up being manually
  applied to avoid pulling in newer bits in master. Mostly fibre channel
  fixes from James, but also a few fixes from Jon and Vijay

- A pull request from Konrad, with just a single fix for xen-blkback
  from Gustavo.

- A fuseblk bdi fix from Jan, fixing a regression in this series with
  the dynamic backing devices.

- A blktrace fix from Shaohua, replacing sscanf() with kstrtoull().

- A request leak fix for drbd from Lars, fixing a regression in the last
  series with the kref changes. This will go to stable as well.

Please pull!


  git://git.kernel.dk/linux-block.git for-linus



Gustavo A. R. Silva (1):
  block: xen-blkback: add null check to avoid null pointer dereference

James Smart (4):
  nvme-fc: correct port role bits
  nvme-fc: require target or discovery role for fc-nvme targets
  nvme-fc: stop queues on error detection
  nvmet-fc: remove target cpu scheduling flag

Jan Kara (1):
  fuseblk: Fix warning in super_setup_bdi_name()

Jens Axboe (1):
  Merge branch 'stable/for-jens-4.12' of 
git://git.kernel.org/.../konrad/xen into for-linus

Jon Derrick (1):
  nvme: unmap CMB and remove sysfs file in reset path

Lars Ellenberg (1):
  drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()

Shaohua Li (1):
  blktrace: fix integer parse

Vijay Immanuel (1):
  nvmet: release the sq ref on rdma read errors

 drivers/block/drbd/drbd_req.c  | 27 +++
 drivers/block/xen-blkback/xenbus.c |  8 +---
 drivers/nvme/host/fc.c | 10 ++
 drivers/nvme/host/pci.c|  7 ++-
 drivers/nvme/target/core.c |  6 ++
 drivers/nvme/target/fc.c   |  4 +---
 drivers/nvme/target/fcloop.c   |  1 -
 drivers/nvme/target/nvmet.h|  1 +
 drivers/nvme/target/rdma.c |  1 +
 drivers/scsi/lpfc/lpfc_nvmet.c |  1 -
 fs/fuse/inode.c|  9 -
 include/linux/nvme-fc-driver.h | 16 
 kernel/trace/blktrace.c|  4 ++--
 13 files changed, 59 insertions(+), 36 deletions(-)

-- 
Jens Axboe



Re: [GIT PULL] nvme fixes for 4.12-rc2

2017-05-20 Thread Jens Axboe
On 05/20/2017 07:10 AM, Christoph Hellwig wrote:
> Hi Jens,
> 
> a few fixes for 4.12-rc2.  There are a few more that we need to get into
> 4.12, but I'd rather get the current queue to you ASAP.
> 
> 
> The following changes since commit 56868a460b83c0f93d339256a81064d89aadae8e:
> 
>   Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide (2017-05-09 
> 15:56:58 -0700)
> 
> are available in the git repository at:
> 
>   git://git.infradead.org/nvme.git nvme-4.12
> 
> for you to fetch changes up to 3d5d1207d61b522b6aac24f896e53401e876b26e:
> 
>   nvmet: release the sq ref on rdma read errors (2017-05-11 12:52:46 +0200)
> 
> 
> James Smart (4):
>   nvme-fc: correct port role bits
>   nvme-fc: require target or discovery role for fc-nvme targets
>   nvme-fc: stop queues on error detection
>   nvmet-fc: remove target cpu scheduling flag
> 
> Jon Derrick (1):
>   nvme: unmap CMB and remove sysfs file in reset path
> 
> Vijay Immanuel (1):
>   nvmet: release the sq ref on rdma read errors

Thanks - I hand applied these, your base is much further ahead than my
for-linus.

-- 
Jens Axboe



[GIT PULL] nvme fixes for 4.12-rc2

2017-05-20 Thread Christoph Hellwig
Hi Jens,

a few fixes for 4.12-rc2.  There are a few more that we need to get into
4.12, but I'd rather get the current queue to you ASAP.


The following changes since commit 56868a460b83c0f93d339256a81064d89aadae8e:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide (2017-05-09 
15:56:58 -0700)

are available in the git repository at:

  git://git.infradead.org/nvme.git nvme-4.12

for you to fetch changes up to 3d5d1207d61b522b6aac24f896e53401e876b26e:

  nvmet: release the sq ref on rdma read errors (2017-05-11 12:52:46 +0200)


James Smart (4):
  nvme-fc: correct port role bits
  nvme-fc: require target or discovery role for fc-nvme targets
  nvme-fc: stop queues on error detection
  nvmet-fc: remove target cpu scheduling flag

Jon Derrick (1):
  nvme: unmap CMB and remove sysfs file in reset path

Vijay Immanuel (1):
  nvmet: release the sq ref on rdma read errors

 drivers/nvme/host/fc.c | 10 ++
 drivers/nvme/host/pci.c|  7 ++-
 drivers/nvme/target/core.c |  6 ++
 drivers/nvme/target/fc.c   |  4 +---
 drivers/nvme/target/fcloop.c   |  1 -
 drivers/nvme/target/nvmet.h|  1 +
 drivers/nvme/target/rdma.c |  1 +
 drivers/scsi/lpfc/lpfc_nvmet.c |  1 -
 include/linux/nvme-fc-driver.h | 16 
 9 files changed, 29 insertions(+), 18 deletions(-)


Re: [PATCH BUGFIX] block, bfq: access and cache blkg data only when safe

2017-05-20 Thread Paolo Valente

> Il giorno 19 mag 2017, alle ore 16:54, Tejun Heo  ha scritto:
> 
> Hello, Paolo.
> 
> On Fri, May 19, 2017 at 10:39:08AM +0200, Paolo Valente wrote:
>> Operations on blkg objects in blk-cgroup are protected with the
>> request_queue lock, which is no more the lock that protects
>> I/O-scheduler operations in blk-mq. The latter are now protected with
>> finer-grained per-scheduler-instance locks. As a consequence, if blkg
>> and blkg-related objects are accessed in a blk-mq I/O scheduler, it is
>> possible to have races, unless proper care is taken for these
>> accesses. BFQ does access these objects, and does incur these races.
>> 
>> This commit addresses this issue without introducing further locks, by
>> exploiting the following facts.  Destroy operations on a blkg invoke,
>> as a first step, hooks of the scheduler associated with the blkg. And
>> these hooks are executed with bfqd->lock held for BFQ. As a
>> consequence, for any blkg associated with the request queue an
>> instance of BFQ is attached to, we are guaranteed that such a blkg is
>> not destroyed and that all the pointers it contains are consistent,
>> (only) while that instance is holding its bfqd->lock. A blkg_lookup
>> performed with bfqd->lock held then returns a fully consistent blkg,
>> which remains consistent until this lock is held.
>> 
>> In view of these facts, this commit caches any needed blkg data (only)
>> when it (safely) detects a parent-blkg change for an internal entity,
>> and, to cache these data safely, it gets the new blkg, through a
>> blkg_lookup, and copies data while keeping the bfqd->lock held. As of
>> now, BFQ needs to cache only the path of the blkg, which is used in
>> the bfq_log_* functions.
>> 
>> This commit also removes or updates some stale comments on locking
>> issues related to blk-cgroup operations.
> 
> For a quick fix, this is fine but I think it'd be much better to
> update blkcg core so that we protect lookups with rcu and refcnt the
> blkg with percpu refs so that we can use blkcg correctly for all
> purposes with blk-mq.  There's no reason to hold up the immediate fix
> for that but it'd be nice to at least note what we should be doing in
> the longer term in a comment.
> 

Ok.

I have started thinking of a blk-cgroup-wide solution, but, Tejun and
Jens, the more I think about it, the more I see a more structural bug
:( The bug seems to affect CFQ too, even if CFQ still uses the
request_queue lock.  Hoping that the bug is only in my mind, here is
first my understanding of how the data structures related to the bug
are performed, and, second, why handling them this way apparently
leads to the bug.

For a given instance of [B|C]FQ (i.e., of BFQ or CFQ), a [b|c]fq_group
(descriptor of a group in [B|C]FQ) is created on the creation of each
blkg associated with the same request queue that instance of [B|C]FQ
is attached to.  The schedulable entities that belong to this blkg
(only queues in CFQ, or both queues and generic entities in BFQ), are
then associated with this [b|c]fq_group on the arrival on new I/O
requests for them: these entities contain a pointer to a
[b|c]fq_group, and the pointer is assigned the address of this new
[b|c]fq_group.  The functions where the association occurs are
bfq_get_rq_private for BFQ and cfq_set_request for CFQ.  Both hooks
are executed before the hook for actually enqueueing the request.  Any
access to group information is performed through this [b|c]fq_group
field.  The associated blkg is accessed through the policy_data
pointer in the bfq_group (the policy data in its turn contains a
pointer to the blkg)

Consider a process or a group that is moved from a given source group
to a different group, or simply removed from a group (although I
didn't yet succeed in just removing a process from a group :) ).  The
pointer to the [b|c]fq_group contained in the schedulable entity
belonging to the source group *is not* updated, in BFQ, if the entity
is idle, and *is not* updated *unconditionally* in CFQ.  The update
will happen in bfq_get_rq_private or cfq_set_request, on the arrival
of a new request.  But, if the move happens right after the arrival of
a request, then all the scheduler functions executed until a new
request arrives for that entity will see a stale [b|c]fq_group.  Much
worse, if also a blkcg_deactivate_policy or a blkg_destroy are
executed right after the move, then both the policy data pointed by
the [b|c]fq_group and the [b|c]fq_group itself may be deallocated.
So, all the functions of the scheduler invoked before next request
arrival may use dangling references!

The symptom reported by BFQ users has been actually the dereference of
dangling bfq_group or policy data pointers in a request_insert

What do you think, have I been mistaken in some step?

Thanks,
Paolo

> Thanks.
> 
> -- 
> tejun