Re: [PATCH v4 0/2] ext4: fix DAX dma vs truncate/hole-punch

2018-08-06 Thread Christoph Hellwig
> > > This allows the direct I/O path to do I/O and raise & lower 
> > > page->_refcount
> > > while we're executing a truncate/hole punch.  This leads to us trying to 
> > > free
> > > a page with an elevated refcount.
> 
> I don't see how this is possible in XFS - maybe I'm missing
> something, but "direct IO submission during truncate" is not
> something that should ever be happening in XFS, DAX or not.

The pages involved in a direct I/O are not that of the file that
the direct I/O read/write syscalls are called on, but those of the
memory regions the direct I/O read/write syscalls operate on.
Those pages could be file backed and undergo a truncate at the
same time.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v5 01/13] PCI/P2PDMA: Support peer-to-peer memory

2018-09-01 Thread Christoph Hellwig
On Fri, Aug 31, 2018 at 09:48:40AM -0600, Logan Gunthorpe wrote:
> Pretty easy. P2P detection is pretty much just pci_p2pdma_distance() ,
> which has nothing to do with the ZONE_DEVICE support.
> 
> (And the distance function makes use of a number of static functions
> which could be combined into a simpler interface, should we need it.)

I'd ѕay lets get things merged as-is, so that we can review the
non-ZONE_DEVICE users.  I'm a little curious how that is going to work,
so having it as a full series would be useful.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests

2018-09-01 Thread Christoph Hellwig
On Thu, Aug 30, 2018 at 01:11:18PM -0600, Jens Axboe wrote:
> I think this belongs in the caller - both the validity check, and
> passing in NOMERGE for this type of request. I don't want to impose
> this overhead on everything, for a pretty niche case.

It is just a single branch, which will be predicted as not taken
for non-P2P users.  The benefit is that we get proper error checking
by doing it in the block code.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v5 10/13] nvme-pci: Add support for P2P memory in requests

2018-09-05 Thread Christoph Hellwig
On Tue, Sep 04, 2018 at 09:47:07AM -0600, Logan Gunthorpe wrote:
> 
> 
> On 04/09/18 09:16 AM, Jason Gunthorpe wrote:
> >>if (iod->nents) {
> >> -  dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> >> +  /* P2PDMA requests do not need to be unmapped */
> >> +  if (!is_pci_p2pdma_page(sg_page(iod->sg)))
> >> +  dma_unmap_sg(dev->dev, iod->sg, iod->nents, dma_dir);
> > 
> > This seems like a poor direction, if we add IOMMU hairpin support we
> > will need unmapping.
> 
> It can always be added later. In any case, you'll have to convince
> Christoph who requested the change; I'm not that invested in this decision.

Yes, no point to add dead code here.  In the long run we should
aim for hiding the p2p address translation behind the normal DMA API
anyway, but we're not quite ready for it yet.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests

2018-09-05 Thread Christoph Hellwig
On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
> The point is that the caller doesn't necessarily know where the bio
> will end up, hence the caller can't fully check if the whole stack
> supports P2P.

The caller must necessarily know where the bio will end up, as for P2P
support we need to query if the bio target is P2P capable vs the
source of the P2P memory.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests

2018-09-05 Thread Christoph Hellwig
On Wed, Sep 05, 2018 at 01:54:31PM -0600, Jens Axboe wrote:
> On 9/5/18 1:56 PM, Christoph Hellwig wrote:
> > On Wed, Sep 05, 2018 at 01:45:04PM -0600, Jens Axboe wrote:
> >> The point is that the caller doesn't necessarily know where the bio
> >> will end up, hence the caller can't fully check if the whole stack
> >> supports P2P.
> > 
> > The caller must necessarily know where the bio will end up, as for P2P
> > support we need to query if the bio target is P2P capable vs the
> > source of the P2P memory.
> 
> Then what's the point of having the check at all?

Just an additional little safe guard.  If you think it isn't worth
it I guess we can just drop it for now.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests

2018-09-05 Thread Christoph Hellwig
On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I thought about that when reviewing the previous series, and even
started hacking it up.  In the end I decided against it for the above
reason - it just adds code, but doesn't actually help with anything
as it is trivial to forget, and not using it will in fact just work.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests

2018-09-10 Thread Christoph Hellwig
On Wed, Sep 05, 2018 at 03:03:18PM -0600, Logan Gunthorpe wrote:
> There is no special p2p submission process. In the nvme-of case we are
> using the existing process and with the code in blk-core it didn't
> change it's process at all. Creating a helper will create one and I can
> look at making a pci_p2pdma_submit_bio() for v6; but if the developer
> screws up and still calls the regular submit_bio() things will only be
> very subtly broken and that won't be obvious.

I just saw you added that "helper" in your tree.  Please don't, it is
a negative value add as it doesn't help anything with the checking.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v5 07/13] block: Add PCI P2P flag for request queue and check support for requests

2018-09-11 Thread Christoph Hellwig
On Mon, Sep 10, 2018 at 12:11:01PM -0600, Logan Gunthorpe wrote:
> > I just saw you added that "helper" in your tree.  Please don't, it is
> > a negative value add as it doesn't help anything with the checking.
> 
> Alright, so what's the consensus then? Just have a check in
> nvmet_bdev_execute_rw() to add REQ_NOMERGE when appropriate?

Yes.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v6 01/13] PCI/P2PDMA: Support peer-to-peer memory

2018-09-21 Thread Christoph Hellwig
On Fri, Sep 21, 2018 at 09:37:33AM -0600, Logan Gunthorpe wrote:
> On 2018-09-21 7:00 AM, Bjorn Helgaas wrote:
> > On Thu, Sep 20, 2018 at 04:47:49PM -0600, Logan Gunthorpe wrote:
> >> On 2018-09-20 4:38 PM, Bjorn Helgaas wrote:
>  +#define pr_fmt(fmt) "pci-p2pdma: " fmt
> >>>
> >>> Is pr_fmt() actually used anywhere?
> >>
> >> It's used in a Patch 4 to print errors in the configfs helpers.
> > 
> > Ideally pr_fmt() would be added in the same patch that uses it, but
> > obviously pretty trivial.
> 
> Hmm, well Christoph asked me to do it this way. ;)

Sorry.  I somehow was under the impression we already had a printk
in the earlier patch, but it looks like I was confused.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Problems with VM_MIXEDMAP removal from /proc//smaps

2018-10-02 Thread Christoph Hellwig
On Tue, Oct 02, 2018 at 04:29:59PM +0200, Jan Kara wrote:
> > OK naive question from me, how do we want an application to be able to
> > check if it is running on a DAX mapping?
> 
> The question from me is: Should application really care?

No, it should not.  DAX is an implementation detail thay may change
or go away at any time.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Problems with VM_MIXEDMAP removal from /proc//smaps

2018-10-02 Thread Christoph Hellwig
On Tue, Oct 02, 2018 at 04:20:10PM +0200, Johannes Thumshirn wrote:
> Provide a F_GETDAX fcntl(2) command so an application can query
> whether it can make use of DAX or not.

How does an application "make use of DAX"?  What actual user visible
semantics are associated with a file that has this flag set?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Problems with VM_MIXEDMAP removal from /proc//smaps

2018-10-02 Thread Christoph Hellwig
On Tue, Oct 02, 2018 at 04:44:13PM +0200, Johannes Thumshirn wrote:
> On Tue, Oct 02, 2018 at 07:37:13AM -0700, Christoph Hellwig wrote:
> > No, it should not.  DAX is an implementation detail thay may change
> > or go away at any time.
> 
> Well we had an issue with an application checking for dax, this is how
> we landed here in the first place.

So what exacty is that "DAX" they are querying about (and no, I'm not
joking, nor being philosophical).
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Problems with VM_MIXEDMAP removal from /proc//smaps

2018-10-02 Thread Christoph Hellwig
On Tue, Oct 02, 2018 at 05:01:24PM +0200, Johannes Thumshirn wrote:
> On Tue, Oct 02, 2018 at 07:45:47AM -0700, Christoph Hellwig wrote:
> > How does an application "make use of DAX"?  What actual user visible
> > semantics are associated with a file that has this flag set?
> 
> There may not be any user visible semantics of DAX, but there are
> promises we gave to application developers praising DAX as _the_
> method to map data on persistent memory and get around "the penalty of
> the page cache" (however big this is).

Who is "we"?  As someone involved with DAX code I think it is a steaming
pile of *, and we are still looking for cases where it actually
works without bugs.  That's why the experimental tag still is on it
for example.

> As I said in another mail to this thread, applications have started to
> poke in procfs to see whether they can use DAX or not.

And what are they actually doing with that?

> 
> Party A has promised party B

We have never promised anyone anything.

> So technically e1fb4a086495 is a user visible regression and in the
> past we have reverted patches introducing these, even if the patch is
> generally correct and poking in /proc/self/smaps is a bad idea.

What actually stops working here and why?  If some stupid app doesn't work
without mixedmap and we want to apply the don't break userspace mantra
hard we should just always expose it.

> I just wanted to give them a documented way to check for this
> promise. Being neutral if this promise is right or wrong, good or bad,
> or whatever. That's not my call, but I prefer not having angry users,
> yelling at me because of broken applications.

There is no promise, sorry.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Problems with VM_MIXEDMAP removal from /proc//smaps

2018-10-04 Thread Christoph Hellwig
On Thu, Oct 04, 2018 at 12:09:49PM +0200, Johannes Thumshirn wrote:
> On Tue, Oct 02, 2018 at 08:06:34AM -0700, Christoph Hellwig wrote:
> > There is no promise, sorry.
> 
> Well there have been lot's of articles on for instance lwn.net [1] [2]
> [3] describing how to avoid the "overhead" of the page cache when
> running on persistent memory.

Since when is an article on some website a promise (of what exactly)
by linux kernel developers?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v9 13/13] nvmet: Optionally use PCI P2P memory

2018-10-05 Thread Christoph Hellwig
On Thu, Oct 04, 2018 at 04:29:19PM -0600, Logan Gunthorpe wrote:
> 
> 
> On 2018-10-04 4:20 p.m., Sagi Grimberg wrote:
> >> +static inline struct device *nvmet_ns_dev(struct nvmet_ns *ns)
> >> +{
> >> +  return disk_to_dev(ns->bdev->bd_disk);
> >> +}
> > 
> > This needs to handle non bdev namespaces.
> 
> As it's coded now the helper never gets called unless ns->bdev is not
> null. But in general, yes you are right, we should probably return NULL
> if ns->bdev is NULL.

I'd rather skip that for now.

> > index ef286b72d958..3d12f5f4568d 100644
> > --- a/drivers/nvme/target/fc.c
> > +++ b/drivers/nvme/target/fc.c
> > @@ -2280,6 +2280,7 @@ nvmet_fc_handle_fcp_rqst(struct nvmet_fc_tgtport 
> > *tgtport,
> >  fod->req.cmd = &fod->cmdiubuf.sqe;
> >  fod->req.rsp = &fod->rspiubuf.cqe;
> >  fod->req.port = tgtport->pe->port;
> > +   fod->req.p2p_client = tgtport->dev;
> > 
> >  /* clear any response payload */
> >  memset(&fod->rspiubuf, 0, sizeof(fod->rspiubuf));
> > --
> 
> Sure, I guess that makes sense. I've never tried it with fc hardware but
> I assume there's no reason it wouldn't work.
> 
> I'll queue these changes up for a v10.

And I'd wait until someone has actually tested this case.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v9 02/13] PCI/P2PDMA: Add sysfs group to display p2pmem stats

2018-10-05 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v9 04/13] PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers

2018-10-05 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v9 03/13] PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset

2018-10-05 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v9 07/13] block: Add PCI P2P flag for request queue and check support for requests

2018-10-05 Thread Christoph Hellwig
On Thu, Oct 04, 2018 at 03:27:41PM -0600, Logan Gunthorpe wrote:
> QUEUE_FLAG_PCI_P2P is introduced meaning a driver's request queue
> supports targeting P2P memory. This will be used by P2P providers and
> orchestrators (in subsequent patches) to ensure block devices can
> support P2P memory before submitting P2P backed pages to submit_bio().
> 
> Signed-off-by: Logan Gunthorpe 

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v9 08/13] IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()

2018-10-05 Thread Christoph Hellwig
Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v9 09/13] nvme-pci: Use PCI p2pmem subsystem to manage the CMB

2018-10-05 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v9 11/13] nvme-pci: Add a quirk for a pseudo CMB

2018-10-05 Thread Christoph Hellwig
On Thu, Oct 04, 2018 at 03:27:45PM -0600, Logan Gunthorpe wrote:
> Introduce a quirk to use CMB-like memory on older devices that have
> an exposed BAR but do not advertise support for using CMBLOC and
> CMBSIZE.
> 
> We'd like to use some of these older cards to test P2P memory.

I can't sat that I really like adding support for non-standardized
"features" (vs bugs) in the nvme driver.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v9 12/13] nvmet: Introduce helper functions to allocate and free request SGLs

2018-10-05 Thread Christoph Hellwig
Looks fine,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: Snapshot target and DAX-capable devices

2018-12-12 Thread Christoph Hellwig
Does it really make sense to enhance dm-snapshot?  I thought all serious
users of snapshots had moved on to dm-thinp?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [RFC PATCH 4/5] acpi/hmat: Register special purpose memory as a device

2019-04-09 Thread Christoph Hellwig
On Thu, Apr 04, 2019 at 12:08:49PM -0700, Dan Williams wrote:
> Memory that has been tagged EFI_SPECIAL_PURPOSE, and has performance
> properties described by the ACPI HMAT is expected to have an application
> specific consumer.
> 
> Those consumers may want 100% of the memory capacity to be reserved from
> any usage by the kernel. By default, with this enabling, a platform
> device is created to represent this differentiated resource.

This sounds more than weird.  Since when did we let the firmware decide
who can use the memory?
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH v5 1/6] libnvdimm: nd_region flush callback support

2019-04-18 Thread Christoph Hellwig
On Thu, Apr 18, 2019 at 09:05:05AM -0700, Dan Williams wrote:
> > > I'd either add a comment about avoiding retpoline overhead here or just
> > > make ->flush == NULL mean generic_nvdimm_flush(). Just so that people 
> > > don't
> > > get confused by the code.
> >
> > Isn't this premature optimization?  I really don't like adding things
> > like this without some numbers to show it's worth it.
> 
> I don't think it's premature given this optimization technique is
> already being deployed elsewhere, see:
> 
> https://lwn.net/Articles/774347/

For one this one was backed by numbers, and second after feedback
from Linux we switched to the NULL pointer check instead.
___
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm


Re: [PATCH 01/19] dax: remove block device dependencies

2020-01-07 Thread Christoph Hellwig
On Mon, Dec 16, 2019 at 01:10:14PM -0500, Vivek Goyal wrote:
> > Agree. In retrospect it was my laziness in the dax-device
> > implementation to expect the block-device to be available.
> > 
> > It looks like fs_dax_get_by_bdev() is an intercept point where a
> > dax_device could be dynamically created to represent the subset range
> > indicated by the block-device partition. That would open up more
> > cleanup opportunities.
> 
> Hi Dan,
> 
> After a long time I got time to look at it again. Want to work on this
> cleanup so that I can make progress with virtiofs DAX paches.
> 
> I am not sure I understand the requirements fully. I see that right now
> dax_device is created per device and all block partitions refer to it. If
> we want to create one dax_device per partition, then it looks like this
> will be structured more along the lines how block layer handles disk and
> partitions. (One gendisk for disk and block_devices for partitions,
> including partition 0). That probably means state belong to whole device
> will be in common structure say dax_device_common, and per partition state
> will be in dax_device and dax_device can carry a pointer to
> dax_device_common.
> 
> I am also not sure what does it mean to partition dax devices. How will
> partitions be exported to user space.

Dan, last time we talked you agreed that partitioned dax devices are
rather pointless IIRC.  Should we just deprecate partitions on DAX
devices and then remove them after a cycle or two?
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: dax: Get rid of fs_dax_get_by_host() helper

2020-01-10 Thread Christoph Hellwig
On Mon, Jan 06, 2020 at 01:11:17PM -0500, Vivek Goyal wrote:
> Looks like nobody is using fs_dax_get_by_host() except fs_dax_get_by_bdev()
> and it can easily use dax_get_by_host() instead.
> 
> IIUC, fs_dax_get_by_host() was only introduced so that one could compile
> with CONFIG_FS_DAX=n and CONFIG_DAX=m. fs_dax_get_by_bdev() achieves
> the same purpose and hence it looks like fs_dax_get_by_host() is not
> needed anymore.
>  
> Signed-off-by: Vivek Goyal 

Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 01/19] dax: remove block device dependencies

2020-01-10 Thread Christoph Hellwig
On Thu, Jan 09, 2020 at 12:03:01PM -0800, Dan Williams wrote:
> > So I'd find two options reasonably consistent:
> > 1) Keep status quo where partitions are created and support DAX.
> > 2) Stop partition creation altogether, if anyones wants to split pmem
> > device further, he can use dm-linear for that (i.e., kpartx).
> >
> > But I'm not sure if the ship hasn't already sailed for option 2) to be
> > feasible without angry users and Linus reverting the change.
> 
> Christoph? I feel myself leaning more and more to the "keep pmem
> partitions" camp.
> 
> I don't see "drop partition support" effort ending well given the long
> standing "ext4 fails to mount when dax is not available" precedent.

Do we have any evidence of existing setups with DAX and partitions?
Can we just throw in a patch to reject that case for now before actually
removing the code and see if anyone screams.  And fix ext4 up while
we are at it.

> I think the next least bad option is to have a dax_get_by_host()
> variant that passes an offset and length pair rather than requiring a
> later bdev_dax_pgoff() to recall the offset. This also prevents
> needing to add another dax-device object representation.

IFF we have to keep partition support, yes.  But keeping it just seems
like a really bad idea.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 1/5] mm/memremap_pages: Kill unused __devm_memremap_pages()

2020-01-30 Thread Christoph Hellwig
On Thu, Jan 30, 2020 at 12:06:01PM -0800, Dan Williams wrote:
> Kill this definition that was introduced in commit 41e94a851304 ("add
> devm_memremap_pages") add never used.
> 
> Cc: Christoph Hellwig 
> Signed-off-by: Dan Williams 

Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [RFC] dax,pmem: Provide a dax operation to zero range of memory

2020-01-30 Thread Christoph Hellwig
On Thu, Jan 23, 2020 at 11:52:49AM -0500, Vivek Goyal wrote:
> Hi,
> 
> This is an RFC patch to provide a dax operation to zero a range of memory.
> It will also clear poison in the process. This is primarily compile tested
> patch. I don't have real hardware to test the poison logic. I am posting
> this to figure out if this is the right direction or not.
> 
> Motivation from this patch comes from Christoph's feedback that he will
> rather prefer a dax way to zero a range instead of relying on having to
> call blkdev_issue_zeroout() in __dax_zero_page_range().
> 
> https://lkml.org/lkml/2019/8/26/361
> 
> My motivation for this change is virtiofs DAX support. There we use DAX
> but we don't have a block device. So any dax code which has the assumption
> that there is always a block device associated is a problem. So this
> is more of a cleanup of one of the places where dax has this dependency
> on block device and if we add a dax operation for zeroing a range, it
> can help with not having to call blkdev_issue_zeroout() in dax path.
> 
> I have yet to take care of stacked block drivers (dm/md).
> 
> Current poison clearing logic is primarily written with assumption that
> I/O is sector aligned. With this new method, this assumption is broken
> and one can pass any range of memory to zero. I have fixed few places
> in existing logic to be able to handle an arbitrary start/end. I am
> not sure are there other dependencies which might need fixing or
> prohibit us from providing this method.
> 
> Any feedback or comment is welcome.
> 
> Thanks
> Vivek
> 
> ---
>  drivers/dax/super.c   |   13 +
>  drivers/nvdimm/pmem.c |   67 
> ++
>  fs/dax.c  |   39 -
>  include/linux/dax.h   |3 ++
>  4 files changed, 85 insertions(+), 37 deletions(-)
> 
> Index: rhvgoyal-linux/drivers/nvdimm/pmem.c
> ===
> --- rhvgoyal-linux.orig/drivers/nvdimm/pmem.c 2020-01-23 11:32:11.075139183 
> -0500
> +++ rhvgoyal-linux/drivers/nvdimm/pmem.c  2020-01-23 11:32:28.660139183 
> -0500
> @@ -52,8 +52,8 @@ static void hwpoison_clear(struct pmem_d
>   if (is_vmalloc_addr(pmem->virt_addr))
>   return;
>  
> - pfn_start = PHYS_PFN(phys);
> - pfn_end = pfn_start + PHYS_PFN(len);
> + pfn_start = PFN_UP(phys);
> + pfn_end = PFN_DOWN(phys + len);
>   for (pfn = pfn_start; pfn < pfn_end; pfn++) {
>   struct page *page = pfn_to_page(pfn);
>  

This change looks unrelated to the rest.

> + sector_end = ALIGN_DOWN((offset - pmem->data_offset + len), 512)/512;
> + nr_sectors =  sector_end - sector_start;
>  
>   cleared = nvdimm_clear_poison(dev, pmem->phys_addr + offset, len);
>   if (cleared < len)
>   rc = BLK_STS_IOERR;
> - if (cleared > 0 && cleared / 512) {
> + if (cleared > 0 && nr_sectors > 0) {
>   hwpoison_clear(pmem, pmem->phys_addr + offset, cleared);
> - cleared /= 512;
> - dev_dbg(dev, "%#llx clear %ld sector%s\n",
> - (unsigned long long) sector, cleared,
> - cleared > 1 ? "s" : "");
> - badblocks_clear(&pmem->bb, sector, cleared);
> + dev_dbg(dev, "%#llx clear %d sector%s\n",
> + (unsigned long long) sector_start, nr_sectors,
> + nr_sectors > 1 ? "s" : "");
> + badblocks_clear(&pmem->bb, sector_start, nr_sectors);
>   if (pmem->bb_state)
>   sysfs_notify_dirent(pmem->bb_state);
>   }

As does this one?

>  int __dax_zero_page_range(struct block_device *bdev,
>   struct dax_device *dax_dev, sector_t sector,
>   unsigned int offset, unsigned int size)
>  {
> + pgoff_t pgoff;
> + long rc, id;
>  
> + rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, &pgoff);
> + if (rc)
> + return rc;
> +
> + id = dax_read_lock();
> + rc = dax_zero_page_range(dax_dev, pgoff, offset, size);
> + if (rc == -EOPNOTSUPP) {
>   void *kaddr;
>  
> + /* If driver does not implement zero page range, fallback */

I think we'll want to restructure this a bit.  First make the new
method mandatory, and just provide a generic_dax_zero_page_range or
similar for the non-pmem instances.

Then __dax_zero_page_range and iomap_dax_zero should merge, and maybe
eventually iomap_zero_range_actor and iomap_zero_range should be split
into a pagecache and DAX variant, lifting the IS_DAXD check into the
callers.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [RFC][PATCH] dax: Do not try to clear poison for partial pages

2020-01-30 Thread Christoph Hellwig
On Wed, Jan 29, 2020 at 04:03:37PM -0500, Vivek Goyal wrote:
> I am looking into getting rid of dependency on block device in dax
> path. One such place is __dax_zero_page_range() which checks if
> range being zeroed is aligned to block device logical size, then
> it calls bdev_issue_zeroout() instead of doing memset(). Calling
> blkdev_issue_zeroout() also clears bad blocks and poison if any
> in that range.
> 
> This path is used by iomap_zero_range() which in-turn is being
> used by filesystems to zero partial filesystem system blocks.
> For zeroing full filesystem blocks we seem to be calling
> blkdev_issue_zeroout() which clears bad blocks and poison in that
> range.
> 
> So this code currently only seems to help with partial filesystem
> block zeroing. That too only if truncation/hole_punch happens on
> device logical block size boundary.
> 
> To avoid using blkdev_issue_zeroout() in this path, I proposed another
> patch here which adds another dax operation to zero out a rangex and
> clear poison.

I'll have to defer to Dan and other on the poison clearing issue,
as it keeps confusing me everytime I look into how it is supposed
to work..  But in the end we'll need a path that doesn't realy on
the block device to clear poison anyway, so I think a method will
ultimatively be needed.  That being said this patch looks nice :)
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [RFC] dax,pmem: Provide a dax operation to zero range of memory

2020-02-03 Thread Christoph Hellwig
On Fri, Jan 31, 2020 at 03:31:58PM -0800, Dan Williams wrote:
> > Should we (XFS) make fallocate(ZERO_RANGE) detect when it's operating on
> > a written extent in a DAX file and call this instead of what it does now
> > (punch range and reallocate unwritten)?
> 
> If it eliminates more block assumptions, then yes. In general I think
> there are opportunities to use "native" direct_access instead of
> block-i/o for other areas too, like metadata i/o.

Yes, and at least for XFS there aren't too many places where we rely
on block I/O after this.  It is the buffer cache and the log code,
and I actually have a WIP conversion for the latter here:


http://git.infradead.org/users/hch/xfs.git/shortlog/refs/heads/xfs-log-dax

which I need to dust off, similar with the cache flushing changes.

But more importantly with just the patch in this thread we should be
able to stop the block device pointer in struct iomap for DAX file
systems, and thus be able to union the bdev, dax_dev and inline data
fields, which should make their usage much more clear, and reduce the
stack footprint.

> (d) dax fsync is just cache flush, so it can't fail, or are you
> talking about errors in metadata?

And based on our discussion even that cache flush sounds like a bad
idea, and might be a reason why all the file system bypass or
weirdo file systems are faster than XFS.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 1/5] dax, pmem: Add a dax operation zero_page_range

2020-02-05 Thread Christoph Hellwig
> + /*
> +  * There are no users as of now. Once users are there, fix dm code
> +  * to be able to split a long range across targets.
> +  */

This comment confused me.  I think this wants to say something like:

/*
 * There are now callers that want to zero across a page boundary as of
 * now.  Once there are users this check can be removed after the
 * device mapper code has been updated to split ranges across targets.
 */

> +static int pmem_dax_zero_page_range(struct dax_device *dax_dev, pgoff_t 
> pgoff,
> + unsigned int offset, size_t len)
> +{
> + int rc = 0;
> + phys_addr_t phys_pos = pgoff * PAGE_SIZE + offset;

Any reason not to pass a phys_addr_t in the calling convention for the
method and maybe also for dax_zero_page_range itself?

> + sector_start = ALIGN(phys_pos, 512)/512;
> + sector_end = ALIGN_DOWN(phys_pos + bytes, 512)/512;

Missing whitespaces.  Also this could use DIV_ROUND_UP and
DIV_ROUND_DOWN.

> + if (sector_end > sector_start)
> + nr_sectors = sector_end - sector_start;
> +
> + if (nr_sectors &&
> + unlikely(is_bad_pmem(&pmem->bb, sector_start,
> +  nr_sectors * 512)))
> + bad_pmem = true;

How could nr_sectors be zero?

> + write_pmem(pmem_addr, page, 0, bytes);
> + if (unlikely(bad_pmem)) {
> + /*
> +  * Pass block aligned offset and length. That seems
> +  * to work as of now. Other finer grained alignment
> +  * cases can be addressed later if need be.
> +  */
> + rc = pmem_clear_poison(pmem, ALIGN(pmem_off, 512),
> +nr_sectors * 512);
> + write_pmem(pmem_addr, page, 0, bytes);
> + }

This code largerly duplicates the write side of pmem_do_bvec.  I
think it might make sense to split pmem_do_bvec into a read and a write
side as a prep patch, and then reuse the write side here.

> +int generic_dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff,
> +  unsigned int offset, size_t len);

This should probably go into a separare are of the header and have
comment about being a section for generic helpers for drivers.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 2/5] s390,dax: Add dax zero_page_range operation to dcssblk driver

2020-02-05 Thread Christoph Hellwig
> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> index 63502ca537eb..f6709200bcd0 100644
> --- a/drivers/s390/block/dcssblk.c
> +++ b/drivers/s390/block/dcssblk.c
> @@ -62,6 +62,7 @@ static const struct dax_operations dcssblk_dax_ops = {
>   .dax_supported = generic_fsdax_supported,
>   .copy_from_iter = dcssblk_dax_copy_from_iter,
>   .copy_to_iter = dcssblk_dax_copy_to_iter,
> + .zero_page_range = dcssblk_dax_zero_page_range,
>  };
>  
>  struct dcssblk_dev_info {
> @@ -941,6 +942,12 @@ dcssblk_dax_direct_access(struct dax_device *dax_dev, 
> pgoff_t pgoff,
>   return __dcssblk_direct_access(dev_info, pgoff, nr_pages, kaddr, pfn);
>  }
>  
> +static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev,pgoff_t 
> pgoff,
> +unsigned offset, size_t len)
> +{
> + return generic_dax_zero_page_range(dax_dev, pgoff, offset, len);
> +}

Wouldn't this need a forward declaration?  Then again given that dcssblk
is the only caller of generic_dax_zero_page_range we might as well merge
the two.  If you want to keep the generic one it could be wired up to
dcssblk_dax_ops directly, though.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 3/5] dm,dax: Add dax zero_page_range operation

2020-02-05 Thread Christoph Hellwig
On Mon, Feb 03, 2020 at 03:00:27PM -0500, Vivek Goyal wrote:
> This patch adds support for dax zero_page_range operation to dm targets.

Any way to share the code with the dax copy iter here?
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 4/5] dax,iomap: Start using dax native zero_page_range()

2020-02-05 Thread Christoph Hellwig
On Mon, Feb 03, 2020 at 03:00:28PM -0500, Vivek Goyal wrote:
> + id = dax_read_lock();
> + rc = dax_zero_page_range(dax_dev, pgoff, offset, size);
> + dax_read_unlock(id);
> + return rc;

Is there a good reason not to move the locking into dax_zero_page_range?
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 5/5] dax,iomap: Add helper dax_iomap_zero() to zero a range

2020-02-05 Thread Christoph Hellwig
> +int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
> +   struct iomap *iomap)
>  {
>   pgoff_t pgoff;
>   long rc, id;
> + sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
>  
> - rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, &pgoff);
> + rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
>   if (rc)
>   return rc;
>  
>   id = dax_read_lock();
> - rc = dax_zero_page_range(dax_dev, pgoff, offset, size);
> + rc = dax_zero_page_range(iomap->dax_dev, pgoff, offset, size);
>   dax_read_unlock(id);
>   return rc;
>  }
> -EXPORT_SYMBOL_GPL(__dax_zero_page_range);
> +EXPORT_SYMBOL_GPL(dax_iomap_zero);

This function is only used by fs/iomap/buffered-io.c, so no need to
export it.

>  #ifdef CONFIG_FS_DAX
> -int __dax_zero_page_range(struct block_device *bdev,
> - struct dax_device *dax_dev, sector_t sector,
> - unsigned int offset, unsigned int length);
> +int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
> +   struct iomap *iomap);
>  #else
> -static inline int __dax_zero_page_range(struct block_device *bdev,
> - struct dax_device *dax_dev, sector_t sector,
> - unsigned int offset, unsigned int length)
> +static inline int dax_iomap_zero(loff_t pos, unsigned offset, unsigned size,
> +  struct iomap *iomap)
>  {
>   return -ENXIO;
>  }

Given that the only caller is under an IS_DAX() check you could just
declare the function unconditionally and let the compiler optimize
away the guaranteed dead call for the !CONFIG_FS_DAX case, like we
do with various other functions.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [patch] dax: pass NOWAIT flag to iomap_apply

2020-02-05 Thread Christoph Hellwig
On Wed, Feb 05, 2020 at 02:15:58PM -0500, Jeff Moyer wrote:
> fstests generic/471 reports a failure when run with MOUNT_OPTIONS="-o
> dax".  The reason is that the initial pwrite to an empty file with the
> RWF_NOWAIT flag set does not return -EAGAIN.  It turns out that
> dax_iomap_rw doesn't pass that flag through to iomap_apply.
> 
> With this patch applied, generic/471 passes for me.
> 
> Signed-off-by: Jeff Moyer 

Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 1/5] dax, pmem: Add a dax operation zero_page_range

2020-02-05 Thread Christoph Hellwig
On Wed, Feb 05, 2020 at 04:40:44PM -0800, Dan Williams wrote:
> > I don't have any reason not to pass phys_addr_t. If that sounds better,
> > will make changes.
> 
> The problem is device-mapper. That wants to use offset to route
> through the map to the leaf device. If it weren't for the firmware
> communication requirement you could do:
> 
> dax_direct_access(...)
> generic_dax_zero_page_range(...)
> 
> ...but as long as the firmware error clearing path is required I think
> we need to do pass the pgoff through the interface and do the pgoff to
> virt / phys translation inside the ops handler.

Maybe phys_addr_t was the wrong type - but why do we split the offset
into the block device argument into a pgoff and offset into page instead
of a single 64-bit value?
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v3 1/7] pmem: Add functions for reading/writing page to/from pmem

2020-02-17 Thread Christoph Hellwig
On Fri, Feb 07, 2020 at 03:26:46PM -0500, Vivek Goyal wrote:
> +static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
> + unsigned int len, unsigned int off, unsigned int op,
> + sector_t sector)
> +{
> + if (!op_is_write(op))
> + return pmem_do_read(pmem, page, off, sector, len);
> +
> + return pmem_do_write(pmem, page, off, sector, len);

Why not:

if (op_is_write(op))
return pmem_do_write(pmem, page, off, sector, len);
return pmem_do_read(pmem, page, off, sector, len);

that being said I don't see the point of this pmem_do_bvec helper given
that it only has two callers.

The rest looks good to me.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v3 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges

2020-02-17 Thread Christoph Hellwig
On Fri, Feb 07, 2020 at 03:26:47PM -0500, Vivek Goyal wrote:
> Currently pmem_do_write() is written with assumption that all I/O is
> sector aligned. Soon I want to use this function in zero_page_range()
> where range passed in does not have to be sector aligned.
> 
> Modify this function to be able to deal with an arbitrary range. Which
> is specified by pmem_off and len.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  drivers/nvdimm/pmem.c | 30 ++
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 9ad07cb8c9fc..281fe04d25fd 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -154,15 +154,23 @@ static blk_status_t pmem_do_read(struct pmem_device 
> *pmem,
>  
>  static blk_status_t pmem_do_write(struct pmem_device *pmem,
>   struct page *page, unsigned int page_off,
> - sector_t sector, unsigned int len)
> + u64 pmem_off, unsigned int len)
>  {
>   blk_status_t rc = BLK_STS_OK;
>   bool bad_pmem = false;
> - phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
> - void *pmem_addr = pmem->virt_addr + pmem_off;
> -
> - if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
> - bad_pmem = true;
> + phys_addr_t pmem_real_off = pmem_off + pmem->data_offset;
> + void *pmem_addr = pmem->virt_addr + pmem_real_off;
> + sector_t sector_start, sector_end;
> + unsigned nr_sectors;
> +
> + sector_start = DIV_ROUND_UP(pmem_off, SECTOR_SIZE);
> + sector_end = (pmem_off + len) >> SECTOR_SHIFT;
> + if (sector_end > sector_start) {
> + nr_sectors = sector_end - sector_start;
> + if (unlikely(is_bad_pmem(&pmem->bb, sector_start,
> +  nr_sectors << SECTOR_SHIFT)))
> + bad_pmem = true;

I don't think an unlikely annotation makes much sense for assigning
a boolean value to a flag variable.

> + /*
> +  * Pass sector aligned offset and length. That seems
> +  * to work as of now. Other finer grained alignment
> +  * cases can be addressed later if need be.
> +  */

This comment seems pretty scary.  What other cases can you think of?
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v3 3/7] dax, pmem: Add a dax operation zero_page_range

2020-02-17 Thread Christoph Hellwig
> + int rc;
> + struct pmem_device *pmem = dax_get_private(dax_dev);
> + struct page *page = ZERO_PAGE(0);

Nit: I tend to find code easier to read if variable declarations
with assignments are above those without.

Also I don't think we need the page variable here.

> + rc = pmem_do_write(pmem, page, 0, offset, len);
> + if (rc > 0)
> + return -EIO;

pmem_do_write returns a blk_status_t, so the type of rc and the > check
seem odd.  But I think pmem_do_write (and pmem_do_read) might be better
off returning a normal errno anyway.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v3 6/7] dax,iomap: Start using dax native zero_page_range()

2020-02-17 Thread Christoph Hellwig
On Fri, Feb 07, 2020 at 03:26:51PM -0500, Vivek Goyal wrote:
> Get rid of calling block device interface for zeroing in iomap dax
> zeroing path and use dax native zeroing interface instead.
> 
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Vivek Goyal 

Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v3 7/7] dax,iomap: Add helper dax_iomap_zero() to zero a range

2020-02-17 Thread Christoph Hellwig
On Fri, Feb 07, 2020 at 03:26:52PM -0500, Vivek Goyal wrote:
> Add a helper dax_ioamp_zero() to zero a range. This patch basically
> merges __dax_zero_page_range() and iomap_dax_zero().
> 
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Vivek Goyal 
> ---
>  fs/dax.c   | 12 ++--
>  fs/iomap/buffered-io.c |  9 +
>  include/linux/dax.h| 17 +++--
>  3 files changed, 10 insertions(+), 28 deletions(-)

Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 1/6] dax: Define a helper dax_pgoff() which takes in dax_offset as argument

2020-02-17 Thread Christoph Hellwig
On Wed, Feb 12, 2020 at 12:07:28PM -0500, Vivek Goyal wrote:
> Create a new helper dax_pgoff() which will replace bdev_dax_pgoff(). 
> Difference
> between two is that dax_pgoff() takes in "sector_t dax_offset" as an argument
> instead of "struct block_device".
> 
> dax_offset specifies any offset into dax device which should be added to
> sector while calculating pgoff.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  drivers/dax/super.c | 12 
>  include/linux/dax.h |  1 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 0aa4b6bc5101..e9daa30e4250 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -56,6 +56,18 @@ int bdev_dax_pgoff(struct block_device *bdev, sector_t 
> sector, size_t size,
>  }
>  EXPORT_SYMBOL(bdev_dax_pgoff);
>  
> +int dax_pgoff(sector_t dax_offset, sector_t sector, size_t size, pgoff_t 
> *pgoff)

Please add a kerneldoc document.  I can't really make sense of what
dax_offset and sector mean here and why they are passed separately.

> +{
> + phys_addr_t phys_off = (dax_offset + sector) * 512;

<< SECTOR_SHIFT;

> +
> + if (pgoff)
> + *pgoff = PHYS_PFN(phys_off);

What is the use case of not passing a pgoff argument?

> + if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
> + return -EINVAL;
> + return 0;
> +}
> +EXPORT_SYMBOL(dax_pgoff);

EXPORT_SYMBOL_GPL, please.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 2/6] dax,iomap,ext4,ext2,xfs: Save dax_offset in "struct iomap"

2020-02-17 Thread Christoph Hellwig
On Wed, Feb 12, 2020 at 12:07:29PM -0500, Vivek Goyal wrote:
> Add a new field "sector_t dax_offset" to "struct iomap". This will be
> filled by filesystems and dax code will make use of this to convert
> sector into page offset (dax_pgoff()), instead of bdev_dax_pgoff(). This
> removes the dependency of having to pass in block device for dax operations.

NAK.  The iomap is not a structure to bolt random layering violation
onto.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [dm-devel] [PATCH v4 1/7] pmem: Add functions for reading/writing page to/from pmem

2020-02-18 Thread Christoph Hellwig
On Mon, Feb 17, 2020 at 01:16:47PM -0500, Vivek Goyal wrote:
> This splits pmem_do_bvec() into pmem_do_read() and pmem_do_write().
> pmem_do_write() will be used by pmem zero_page_range() as well. Hence
> sharing the same code.
> 
> Suggested-by: Christoph Hellwig 
> Signed-off-by: Vivek Goyal 

Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [dm-devel] [PATCH v4 2/7] pmem: Enable pmem_do_write() to deal with arbitrary ranges

2020-02-18 Thread Christoph Hellwig
On Mon, Feb 17, 2020 at 01:16:48PM -0500, Vivek Goyal wrote:
> Currently pmem_do_write() is written with assumption that all I/O is
> sector aligned. Soon I want to use this function in zero_page_range()
> where range passed in does not have to be sector aligned.
> 
> Modify this function to be able to deal with an arbitrary range. Which
> is specified by pmem_off and len.
> 
> Signed-off-by: Vivek Goyal 
> ---
>  drivers/nvdimm/pmem.c | 32 +++-
>  1 file changed, 23 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 075b11682192..fae8f67da9de 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -154,15 +154,23 @@ static blk_status_t pmem_do_read(struct pmem_device 
> *pmem,
>  
>  static blk_status_t pmem_do_write(struct pmem_device *pmem,
>   struct page *page, unsigned int page_off,
> - sector_t sector, unsigned int len)
> + u64 pmem_off, unsigned int len)
>  {
>   blk_status_t rc = BLK_STS_OK;
>   bool bad_pmem = false;
> - phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
> - void *pmem_addr = pmem->virt_addr + pmem_off;
> -
> - if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
> - bad_pmem = true;
> + phys_addr_t pmem_real_off = pmem_off + pmem->data_offset;
> + void *pmem_addr = pmem->virt_addr + pmem_real_off;
> + sector_t sector_start, sector_end;
> + unsigned nr_sectors;
> +
> + sector_start = DIV_ROUND_UP(pmem_off, SECTOR_SIZE);
> + sector_end = (pmem_off + len) >> SECTOR_SHIFT;
> + if (sector_end > sector_start) {
> + nr_sectors = sector_end - sector_start;
> + if (is_bad_pmem(&pmem->bb, sector_start,
> + nr_sectors << SECTOR_SHIFT))
> + bad_pmem = true;
> + }
>  
>   /*
>* Note that we write the data both before and after
> @@ -181,7 +189,13 @@ static blk_status_t pmem_do_write(struct pmem_device 
> *pmem,
>   flush_dcache_page(page);
>   write_pmem(pmem_addr, page, page_off, len);
>   if (unlikely(bad_pmem)) {
> - rc = pmem_clear_poison(pmem, pmem_off, len);
> + /*
> +  * Pass sector aligned offset and length. That seems
> +  * to work as of now. Other finer grained alignment
> +  * cases can be addressed later if need be.
> +  */
> + rc = pmem_clear_poison(pmem, ALIGN(pmem_real_off, SECTOR_SIZE),
> +nr_sectors << SECTOR_SHIFT);
>   write_pmem(pmem_addr, page, page_off, len);

I'm still scared about the as of now commnet.  If the interface to
clearing poison is page aligned I think we should document that in the
actual pmem_clear_poison function, and make that take the unaligned
offset.  I also think we want some feedback from Dan or other what the
official interface is instead of "seems to work".
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [dm-devel] [PATCH v4 3/7] dax, pmem: Add a dax operation zero_page_range

2020-02-18 Thread Christoph Hellwig
> +static int pmem_dax_zero_page_range(struct dax_device *dax_dev, u64 offset,
> + size_t len)
> +{
> + struct pmem_device *pmem = dax_get_private(dax_dev);
> + blk_status_t rc;
> +
> + rc = pmem_do_write(pmem, ZERO_PAGE(0), 0, offset, len);
> + return blk_status_to_errno(rc);

No real need for the rc variable here.

Otherwise looks good:

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v5 3/8] pmem: Enable pmem_do_write() to deal with arbitrary ranges

2020-02-20 Thread Christoph Hellwig
On Tue, Feb 18, 2020 at 04:48:36PM -0500, Vivek Goyal wrote:
> Currently pmem_do_write() is written with assumption that all I/O is
> sector aligned. Soon I want to use this function in zero_page_range()
> where range passed in does not have to be sector aligned.
> 
> Modify this function to be able to deal with an arbitrary range. Which
> is specified by pmem_off and len.
> 
> Signed-off-by: Vivek Goyal 

Looks good,

Reviewed-by: Christoph Hellwig 
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v5 2/8] drivers/pmem: Allow pmem_clear_poison() to accept arbitrary offset and len

2020-02-20 Thread Christoph Hellwig
On Tue, Feb 18, 2020 at 04:48:35PM -0500, Vivek Goyal wrote:
> Currently pmem_clear_poison() expects offset and len to be sector aligned.
> Atleast that seems to be the assumption with which code has been written.
> It is called only from pmem_do_bvec() which is called only from pmem_rw_page()
> and pmem_make_request() which will only passe sector aligned offset and len.
> 
> Soon we want use this function from dax_zero_page_range() code path which
> can try to zero arbitrary range of memory with-in a page. So update this
> function to assume that offset and length can be arbitrary and do the
> necessary alignments as needed.
> 
> nvdimm_clear_poison() seems to assume offset and len to be aligned to
> clear_err_unit boundary. But this is currently internal detail and is
> not exported for others to use. So for now, continue to align offset and
> length to SECTOR_SIZE boundary. Improving it further and to align it
> to clear_err_unit boundary is a TODO item for future.
> 
> Signed-off-by: Vivek Goyal 

This looks sensibel to me, but I'd really like to have Dan take at look.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v5 2/8] drivers/pmem: Allow pmem_clear_poison() to accept arbitrary offset and len

2020-02-28 Thread Christoph Hellwig
On Thu, Feb 27, 2020 at 07:28:41PM -0800, Dan Williams wrote:
> "don't perform generic memory-error-handling, there's an fs that owns
> this daxdev and wants to disposition errors". The fs/dax.c
> infrastructure that sets up ->index and ->mapping would still need to
> be there for ext4 until its ready to take on the same responsibility.
> Last I checked the ext4 reverse mapping implementation was not as
> capable as XFS. This goes back to why the initial design avoided
> relying on not universally available / stable reverse-mapping
> facilities and opted for extending the generic mm/memory-failure.c
> implementation.

The important but is that we stop using ->index and ->mapping when XFS
is used as that enables using DAX+reflinks, which actually is the most
requested DAX feature on XFS (way more than silly runtime flag switches
for example).
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v6 1/6] pmem: Add functions for reading/writing page to/from pmem

2020-03-04 Thread Christoph Hellwig
On Sat, Feb 29, 2020 at 09:04:00AM +0100, Pankaj Gupta wrote:
> > +   phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
> 
> minor nit,  maybe 512 is replaced by macro? Looks like its used at multiple
> places, maybe can keep at is for now.

That would be the existing SECTOR_SIZE macro.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 8/9] dax: fix empty-body warnings in bus.c

2020-04-19 Thread Christoph Hellwig
On Sat, Apr 18, 2020 at 11:41:10AM -0700, Randy Dunlap wrote:
>   rc = -ENOMEM;
>   } else
> - /* nothing to remove */;
> + do_empty(); /* nothing to remove */
>   } else if (action == ID_REMOVE) {
>   list_del(&dax_id->list);
>   kfree(dax_id);
>   } else
> - /* dax_id already added */;
> + do_empty(); /* dax_id already added */

This is just nasty.  Please just always turn this bogus warning off
as the existing code is a perfectly readable idiom while the new code
is just nasty crap for no good reason at all.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 02/15] simdisk: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/xtensa/platforms/iss/simdisk.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/xtensa/platforms/iss/simdisk.c 
b/arch/xtensa/platforms/iss/simdisk.c
index 49322b66cda93..31b5020077a05 100644
--- a/arch/xtensa/platforms/iss/simdisk.c
+++ b/arch/xtensa/platforms/iss/simdisk.c
@@ -103,7 +103,7 @@ static void simdisk_transfer(struct simdisk *dev, unsigned 
long sector,
 
 static blk_qc_t simdisk_make_request(struct request_queue *q, struct bio *bio)
 {
-   struct simdisk *dev = q->queuedata;
+   struct simdisk *dev = bio->bi_disk->private_data;
struct bio_vec bvec;
struct bvec_iter iter;
sector_t sector = bio->bi_iter.bi_sector;
@@ -273,8 +273,6 @@ static int __init simdisk_setup(struct simdisk *dev, int 
which,
goto out_alloc_queue;
}
 
-   dev->queue->queuedata = dev;
-
dev->gd = alloc_disk(SIMDISK_MINORS);
if (dev->gd == NULL) {
pr_err("alloc_disk failed\n");
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 01/15] nfblock: use gendisk private_data

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/m68k/emu/nfblock.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/m68k/emu/nfblock.c b/arch/m68k/emu/nfblock.c
index c3a630440512e..5c6f04b00e866 100644
--- a/arch/m68k/emu/nfblock.c
+++ b/arch/m68k/emu/nfblock.c
@@ -61,7 +61,7 @@ struct nfhd_device {
 
 static blk_qc_t nfhd_make_request(struct request_queue *queue, struct bio *bio)
 {
-   struct nfhd_device *dev = queue->queuedata;
+   struct nfhd_device *dev = bio->bi_disk->private_data;
struct bio_vec bvec;
struct bvec_iter iter;
int dir, len, shift;
@@ -122,7 +122,6 @@ static int __init nfhd_init_one(int id, u32 blocks, u32 
bsize)
if (dev->queue == NULL)
goto free_dev;
 
-   dev->queue->queuedata = dev;
blk_queue_logical_block_size(dev->queue, bsize);
 
dev->disk = alloc_disk(16);
@@ -136,6 +135,7 @@ static int __init nfhd_init_one(int id, u32 blocks, u32 
bsize)
sprintf(dev->disk->disk_name, "nfhd%u", dev_id);
set_capacity(dev->disk, (sector_t)blocks * (bsize / 512));
dev->disk->queue = dev->queue;
+   dev->disk->private_data = dev;
 
add_disk(dev->disk);
 
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 08/15] zram: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/block/zram/zram_drv.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index ebb234f36909c..e1a6c74c7a4ba 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1593,7 +1593,7 @@ static void __zram_make_request(struct zram *zram, struct 
bio *bio)
  */
 static blk_qc_t zram_make_request(struct request_queue *queue, struct bio *bio)
 {
-   struct zram *zram = queue->queuedata;
+   struct zram *zram = bio->bi_disk->private_data;
 
if (!valid_io_request(zram, bio->bi_iter.bi_sector,
bio->bi_iter.bi_size)) {
@@ -1916,7 +1916,6 @@ static int zram_add(void)
zram->disk->first_minor = device_id;
zram->disk->fops = &zram_devops;
zram->disk->queue = queue;
-   zram->disk->queue->queuedata = zram;
zram->disk->private_data = zram;
snprintf(zram->disk->disk_name, 16, "zram%d", device_id);
 
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 09/15] lightnvm: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/lightnvm/core.c  | 1 -
 drivers/lightnvm/pblk-init.c | 2 +-
 drivers/lightnvm/pblk.h  | 2 +-
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index db38a68abb6c0..85c5490cdfd2e 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -400,7 +400,6 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct 
nvm_ioctl_create *create)
}
 
tdisk->private_data = targetdata;
-   tqueue->queuedata = targetdata;
 
mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
if (dev->geo.mdts) {
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 9a967a2e83dd7..bec904ec0f7c0 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -49,7 +49,7 @@ struct bio_set pblk_bio_set;
 
 static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
 {
-   struct pblk *pblk = q->queuedata;
+   struct pblk *pblk = bio->bi_disk->private_data;
 
if (bio_op(bio) == REQ_OP_DISCARD) {
pblk_discard(pblk, bio);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 86ffa875bfe16..ed364afaed0d8 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -1255,7 +1255,7 @@ static inline int pblk_boundary_ppa_checks(struct 
nvm_tgt_dev *tgt_dev,
continue;
}
 
-   print_ppa(tgt_dev->q->queuedata, ppa, "boundary", i);
+   print_ppa(tgt_dev->disk->private_data, ppa, "boundary", i);
 
return 1;
}
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 11/15] dm: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 0eb93da44ea2a..2aaae6c1ed312 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1783,7 +1783,7 @@ static blk_qc_t dm_process_bio(struct mapped_device *md,
 
 static blk_qc_t dm_make_request(struct request_queue *q, struct bio *bio)
 {
-   struct mapped_device *md = q->queuedata;
+   struct mapped_device *md = bio->bi_disk->private_data;
blk_qc_t ret = BLK_QC_T_NONE;
int srcu_idx;
struct dm_table *map;
@@ -1980,7 +1980,6 @@ static struct mapped_device *alloc_dev(int minor)
md->queue = blk_alloc_queue(dm_make_request, numa_node_id);
if (!md->queue)
goto bad;
-   md->queue->queuedata = md;
 
md->disk = alloc_disk_node(1, md->numa_node_id);
if (!md->disk)
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 04/15] null_blk: stop using ->queuedata for bio mode

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/block/null_blk_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c
index 8efd8778e2095..d14df79feca89 100644
--- a/drivers/block/null_blk_main.c
+++ b/drivers/block/null_blk_main.c
@@ -1365,7 +1365,7 @@ static blk_qc_t null_queue_bio(struct request_queue *q, 
struct bio *bio)
 {
sector_t sector = bio->bi_iter.bi_sector;
sector_t nr_sectors = bio_sectors(bio);
-   struct nullb *nullb = q->queuedata;
+   struct nullb *nullb = bio->bi_disk->private_data;
struct nullb_queue *nq = nullb_to_queue(nullb);
struct nullb_cmd *cmd;
 
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 12/15] md: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/md/md.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 271e8a5873549..c079ecf77c564 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -466,7 +466,7 @@ static blk_qc_t md_make_request(struct request_queue *q, 
struct bio *bio)
 {
const int rw = bio_data_dir(bio);
const int sgrp = op_stat_group(bio_op(bio));
-   struct mddev *mddev = q->queuedata;
+   struct mddev *mddev = bio->bi_disk->private_data;
unsigned int sectors;
 
if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && (rw == WRITE)) {
@@ -5626,7 +5626,6 @@ static int md_alloc(dev_t dev, char *name)
mddev->queue = blk_alloc_queue(md_make_request, NUMA_NO_NODE);
if (!mddev->queue)
goto abort;
-   mddev->queue->queuedata = mddev;
 
blk_set_stacking_limits(&mddev->queue->limits);
 
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 14/15] nvdimm/btt: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/nvdimm/btt.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 3b09419218d6f..eca3e48fefda1 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1442,7 +1442,7 @@ static int btt_do_bvec(struct btt *btt, struct 
bio_integrity_payload *bip,
 static blk_qc_t btt_make_request(struct request_queue *q, struct bio *bio)
 {
struct bio_integrity_payload *bip = bio_integrity(bio);
-   struct btt *btt = q->queuedata;
+   struct btt *btt = bio->bi_disk->private_data;
struct bvec_iter iter;
unsigned long start;
struct bio_vec bvec;
@@ -1543,7 +1543,6 @@ static int btt_blk_init(struct btt *btt)
blk_queue_logical_block_size(btt->btt_queue, btt->sector_size);
blk_queue_max_hw_sectors(btt->btt_queue, UINT_MAX);
blk_queue_flag_set(QUEUE_FLAG_NONROT, btt->btt_queue);
-   btt->btt_queue->queuedata = btt;
 
if (btt_meta_size(btt)) {
int rc = nd_integrity_init(btt->btt_disk, btt_meta_size(btt));
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 03/15] drbd: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/block/drbd/drbd_main.c | 1 -
 drivers/block/drbd/drbd_req.c  | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index c094c3c2c5d4d..be787b268f044 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2805,7 +2805,6 @@ enum drbd_ret_code drbd_create_device(struct 
drbd_config_context *adm_ctx, unsig
if (!q)
goto out_no_q;
device->rq_queue = q;
-   q->queuedata   = device;
 
disk = alloc_disk(1);
if (!disk)
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 840c3aef3c5c9..02c104a0c45e0 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1614,7 +1614,7 @@ void do_submit(struct work_struct *ws)
 
 blk_qc_t drbd_make_request(struct request_queue *q, struct bio *bio)
 {
-   struct drbd_device *device = (struct drbd_device *) q->queuedata;
+   struct drbd_device *device = bio->bi_disk->private_data;
unsigned long start_jif;
 
blk_queue_split(q, &bio);
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 06/15] rsxx: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/block/rsxx/dev.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 8ffa8260dcafe..6dde80b096c62 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -133,7 +133,7 @@ static void bio_dma_done_cb(struct rsxx_cardinfo *card,
 
 static blk_qc_t rsxx_make_request(struct request_queue *q, struct bio *bio)
 {
-   struct rsxx_cardinfo *card = q->queuedata;
+   struct rsxx_cardinfo *card = bio->bi_disk->private_data;
struct rsxx_bio_meta *bio_meta;
blk_status_t st = BLK_STS_IOERR;
 
@@ -282,8 +282,6 @@ int rsxx_setup_dev(struct rsxx_cardinfo *card)
card->queue->limits.discard_alignment   = RSXX_HW_BLK_SIZE;
}
 
-   card->queue->queuedata = card;
-
snprintf(card->gendisk->disk_name, sizeof(card->gendisk->disk_name),
 "rsxx%d", card->disk_id);
card->gendisk->major = card->major;
@@ -304,7 +302,6 @@ void rsxx_destroy_dev(struct rsxx_cardinfo *card)
card->gendisk = NULL;
 
blk_cleanup_queue(card->queue);
-   card->queue->queuedata = NULL;
unregister_blkdev(card->major, DRIVER_NAME);
 }
 
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 13/15] nvdimm/blk: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/nvdimm/blk.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index 43751fab9d36a..ffe4728bad8b1 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -165,7 +165,7 @@ static int nsblk_do_bvec(struct nd_namespace_blk *nsblk,
 static blk_qc_t nd_blk_make_request(struct request_queue *q, struct bio *bio)
 {
struct bio_integrity_payload *bip;
-   struct nd_namespace_blk *nsblk;
+   struct nd_namespace_blk *nsblk = bio->bi_disk->private_data;
struct bvec_iter iter;
unsigned long start;
struct bio_vec bvec;
@@ -176,7 +176,6 @@ static blk_qc_t nd_blk_make_request(struct request_queue 
*q, struct bio *bio)
return BLK_QC_T_NONE;
 
bip = bio_integrity(bio);
-   nsblk = q->queuedata;
rw = bio_data_dir(bio);
do_acct = nd_iostat_start(bio, &start);
bio_for_each_segment(bvec, bio, iter) {
@@ -258,7 +257,6 @@ static int nsblk_attach_disk(struct nd_namespace_blk *nsblk)
blk_queue_max_hw_sectors(q, UINT_MAX);
blk_queue_logical_block_size(q, nsblk_sector_size(nsblk));
blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
-   q->queuedata = nsblk;
 
disk = alloc_disk(0);
if (!disk)
@@ -268,6 +266,7 @@ static int nsblk_attach_disk(struct nd_namespace_blk *nsblk)
disk->fops  = &nd_blk_fops;
disk->queue = q;
disk->flags = GENHD_FL_EXT_DEVT;
+   disk->private_data  = nsblk;
nvdimm_namespace_disk_name(&nsblk->common, disk->disk_name);
 
if (devm_add_action_or_reset(dev, nd_blk_release_disk, disk))
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 05/15] ps3vram: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/block/ps3vram.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index 821d4d8b1d763..5a1d1d137c724 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -587,7 +587,7 @@ static struct bio *ps3vram_do_bio(struct 
ps3_system_bus_device *dev,
 
 static blk_qc_t ps3vram_make_request(struct request_queue *q, struct bio *bio)
 {
-   struct ps3_system_bus_device *dev = q->queuedata;
+   struct ps3_system_bus_device *dev = bio->bi_disk->private_data;
struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
int busy;
 
@@ -745,7 +745,6 @@ static int ps3vram_probe(struct ps3_system_bus_device *dev)
}
 
priv->queue = queue;
-   queue->queuedata = dev;
blk_queue_max_segments(queue, BLK_MAX_SEGMENTS);
blk_queue_max_segment_size(queue, BLK_MAX_SEGMENT_SIZE);
blk_queue_max_hw_sectors(queue, BLK_SAFE_MAX_SECTORS);
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 10/15] bcache: stop setting ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/md/bcache/super.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index d98354fa28e3e..a0fb5af2beeda 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -871,7 +871,6 @@ static int bcache_device_init(struct bcache_device *d, 
unsigned int block_size,
return -ENOMEM;
 
d->disk->queue  = q;
-   q->queuedata= d;
q->backing_dev_info->congested_data = d;
q->limits.max_hw_sectors= UINT_MAX;
q->limits.max_sectors   = UINT_MAX;
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


remove a few uses of ->queuedata

2020-05-08 Thread Christoph Hellwig
Hi all,

various bio based drivers use queue->queuedata despite already having
set up disk->private_data, which can be used just as easily.  This
series cleans them up to only use a single private data pointer.

blk-mq based drivers that have code pathes that can't easily get at
the gendisk are unaffected by this series.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 15/15] nvdimm/pmem: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/nvdimm/pmem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 2df6994acf836..f8dc5941215bf 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -196,7 +196,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
unsigned long start;
struct bio_vec bvec;
struct bvec_iter iter;
-   struct pmem_device *pmem = q->queuedata;
+   struct pmem_device *pmem = bio->bi_disk->private_data;
struct nd_region *nd_region = to_region(pmem);
 
if (bio->bi_opf & REQ_PREFLUSH)
@@ -231,7 +231,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
 static int pmem_rw_page(struct block_device *bdev, sector_t sector,
   struct page *page, unsigned int op)
 {
-   struct pmem_device *pmem = bdev->bd_queue->queuedata;
+   struct pmem_device *pmem = bdev->bd_disk->private_data;
blk_status_t rc;
 
if (op_is_write(op))
@@ -464,7 +464,6 @@ static int pmem_attach_disk(struct device *dev,
blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
if (pmem->pfn_flags & PFN_MAP)
blk_queue_flag_set(QUEUE_FLAG_DAX, q);
-   q->queuedata = pmem;
 
disk = alloc_disk_node(0, nid);
if (!disk)
@@ -474,6 +473,7 @@ static int pmem_attach_disk(struct device *dev,
disk->fops  = &pmem_fops;
disk->queue = q;
disk->flags = GENHD_FL_EXT_DEVT;
+   disk->private_data  = pmem;
disk->queue->backing_dev_info->capabilities |= BDI_CAP_SYNCHRONOUS_IO;
nvdimm_namespace_disk_name(ndns, disk->disk_name);
set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 07/15] umem: stop using ->queuedata

2020-05-08 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 drivers/block/umem.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/umem.c b/drivers/block/umem.c
index d84e8a878df24..e59bff24e02cf 100644
--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -521,7 +521,8 @@ static int mm_check_plugged(struct cardinfo *card)
 
 static blk_qc_t mm_make_request(struct request_queue *q, struct bio *bio)
 {
-   struct cardinfo *card = q->queuedata;
+   struct cardinfo *card = bio->bi_disk->private_data;
+
pr_debug("mm_make_request %llu %u\n",
 (unsigned long long)bio->bi_iter.bi_sector,
 bio->bi_iter.bi_size);
@@ -888,7 +889,6 @@ static int mm_pci_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
card->queue = blk_alloc_queue(mm_make_request, NUMA_NO_NODE);
if (!card->queue)
goto failed_alloc;
-   card->queue->queuedata = card;
 
tasklet_init(&card->tasklet, process_page, (unsigned long)card);
 
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: remove a few uses of ->queuedata

2020-05-09 Thread Christoph Hellwig
On Fri, May 08, 2020 at 11:04:45AM -0700, Dan Williams wrote:
> On Fri, May 8, 2020 at 9:16 AM Christoph Hellwig  wrote:
> >
> > Hi all,
> >
> > various bio based drivers use queue->queuedata despite already having
> > set up disk->private_data, which can be used just as easily.  This
> > series cleans them up to only use a single private data pointer.
> 
> ...but isn't the queue pretty much guaranteed to be cache hot and the
> gendisk cache cold? I'm not immediately seeing what else needs the
> gendisk in the I/O path. Is there another motivation I'm missing?

->private_data is right next to the ->queue pointer, pat0 and part_tbl
which are all used in the I/O submission path (generic_make_request /
generic_make_request_checks).  This is mostly a prep cleanup patch to
also remove the pointless queue argument from ->make_request - then
->queue is an extra dereference and extra churn.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: remove a few uses of ->queuedata

2020-05-09 Thread Christoph Hellwig
On Sat, May 09, 2020 at 06:13:21AM +0800, Ming Lei wrote:
> On Fri, May 08, 2020 at 06:15:02PM +0200, Christoph Hellwig wrote:
> > Hi all,
> > 
> > various bio based drivers use queue->queuedata despite already having
> > set up disk->private_data, which can be used just as easily.  This
> > series cleans them up to only use a single private data pointer.
> > 
> > blk-mq based drivers that have code pathes that can't easily get at
> > the gendisk are unaffected by this series.
> 
> Yeah, before adding disk, there still may be requests queued to LLD
> for blk-mq based drivers.
> 
> So are there this similar situation for these bio based drivers?

bio submittsion is based on the gendisk, so we can't submit before
it is added.  The passthrough request based path obviously doesn't apply
here.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 09/15, fіxed] lightnvm: stop using ->queuedata

2020-05-09 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---

fixed the compilation in the print_ppa arguments

 drivers/lightnvm/core.c  | 1 -
 drivers/lightnvm/pblk-init.c | 2 +-
 drivers/lightnvm/pblk.h  | 2 +-
 3 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/lightnvm/core.c b/drivers/lightnvm/core.c
index db38a68abb6c0..85c5490cdfd2e 100644
--- a/drivers/lightnvm/core.c
+++ b/drivers/lightnvm/core.c
@@ -400,7 +400,6 @@ static int nvm_create_tgt(struct nvm_dev *dev, struct 
nvm_ioctl_create *create)
}
 
tdisk->private_data = targetdata;
-   tqueue->queuedata = targetdata;
 
mdts = (dev->geo.csecs >> 9) * NVM_MAX_VLBA;
if (dev->geo.mdts) {
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
index 9a967a2e83dd7..bec904ec0f7c0 100644
--- a/drivers/lightnvm/pblk-init.c
+++ b/drivers/lightnvm/pblk-init.c
@@ -49,7 +49,7 @@ struct bio_set pblk_bio_set;
 
 static blk_qc_t pblk_make_rq(struct request_queue *q, struct bio *bio)
 {
-   struct pblk *pblk = q->queuedata;
+   struct pblk *pblk = bio->bi_disk->private_data;
 
if (bio_op(bio) == REQ_OP_DISCARD) {
pblk_discard(pblk, bio);
diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
index 86ffa875bfe16..49718105bc0dc 100644
--- a/drivers/lightnvm/pblk.h
+++ b/drivers/lightnvm/pblk.h
@@ -1255,7 +1255,7 @@ static inline int pblk_boundary_ppa_checks(struct 
nvm_tgt_dev *tgt_dev,
continue;
}
 
-   print_ppa(tgt_dev->q->queuedata, ppa, "boundary", i);
+   print_ppa(tgt_dev->disk->q->private_data, ppa, "boundary", i);
 
return 1;
}
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: remove a few uses of ->queuedata

2020-05-12 Thread Christoph Hellwig
On Sat, May 09, 2020 at 08:07:14AM -0700, Dan Williams wrote:
> > which are all used in the I/O submission path (generic_make_request /
> > generic_make_request_checks).  This is mostly a prep cleanup patch to
> > also remove the pointless queue argument from ->make_request - then
> > ->queue is an extra dereference and extra churn.
> 
> Ah ok. If the changelogs had been filled in with something like "In
> preparation for removing @q from make_request_fn, stop using
> ->queuedata", I probably wouldn't have looked twice.
> 
> For the nvdimm/ driver updates you can add:
> 
> Reviewed-by: Dan Williams 
> 
> ...or just let me know if you want me to pick those up through the nvdimm 
> tree.

I'd love you to pick them up through the nvdimm tree.  Do you want
to fix up the commit message yourself?
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 12/15] md: stop using ->queuedata

2020-05-13 Thread Christoph Hellwig
On Wed, May 13, 2020 at 11:29:17AM -0700, Song Liu wrote:
> On Fri, May 8, 2020 at 9:17 AM Christoph Hellwig  wrote:
> >
> > Signed-off-by: Christoph Hellwig 
> 
> Thanks for the cleanup. IIUC, you want this go through md tree?

Yes, please pick it up though the md tree.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [RESEND PATCH v7 2/5] seq_buf: Export seq_buf_printf() to external modules

2020-05-20 Thread Christoph Hellwig
s/seq_buf: Export seq_buf_printf() to external modules/
  seq_buf: export seq_buf_printf/
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 06/16] dm: use bio_{start,end}_io_acct

2020-05-25 Thread Christoph Hellwig
Switch dm to use the nicer bio accounting helpers.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f215b86664484..3f39fa1ac756e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -681,11 +681,7 @@ static void start_io_acct(struct dm_io *io)
struct mapped_device *md = io->md;
struct bio *bio = io->orig_bio;
 
-   io->start_time = jiffies;
-
-   generic_start_io_acct(md->queue, bio_op(bio), bio_sectors(bio),
- &dm_disk(md)->part0);
-
+   io->start_time = bio_start_io_acct(bio);
if (unlikely(dm_stats_used(&md->stats)))
dm_stats_account_io(&md->stats, bio_data_dir(bio),
bio->bi_iter.bi_sector, bio_sectors(bio),
@@ -698,8 +694,7 @@ static void end_io_acct(struct dm_io *io)
struct bio *bio = io->orig_bio;
unsigned long duration = jiffies - io->start_time;
 
-   generic_end_io_acct(md->queue, bio_op(bio), &dm_disk(md)->part0,
-   io->start_time);
+   bio_end_io_acct(bio, io->start_time);
 
if (unlikely(dm_stats_used(&md->stats)))
dm_stats_account_io(&md->stats, bio_data_dir(bio),
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 07/16] nvdimm: use bio_{start,end}_io_acct

2020-05-25 Thread Christoph Hellwig
Switch dm to use the nicer bio accounting helpers.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvdimm/blk.c  |  6 --
 drivers/nvdimm/btt.c  |  6 --
 drivers/nvdimm/nd.h   | 19 ---
 drivers/nvdimm/pmem.c |  6 --
 4 files changed, 12 insertions(+), 25 deletions(-)

diff --git a/drivers/nvdimm/blk.c b/drivers/nvdimm/blk.c
index 43751fab9d36a..036e23aef9b04 100644
--- a/drivers/nvdimm/blk.c
+++ b/drivers/nvdimm/blk.c
@@ -178,7 +178,9 @@ static blk_qc_t nd_blk_make_request(struct request_queue 
*q, struct bio *bio)
bip = bio_integrity(bio);
nsblk = q->queuedata;
rw = bio_data_dir(bio);
-   do_acct = nd_iostat_start(bio, &start);
+   do_acct = blk_queue_io_stat(bio->bi_disk->queue);
+   if (do_acct)
+   start = bio_start_io_acct(bio);
bio_for_each_segment(bvec, bio, iter) {
unsigned int len = bvec.bv_len;
 
@@ -195,7 +197,7 @@ static blk_qc_t nd_blk_make_request(struct request_queue 
*q, struct bio *bio)
}
}
if (do_acct)
-   nd_iostat_end(bio, start);
+   bio_end_io_acct(bio, start);
 
bio_endio(bio);
return BLK_QC_T_NONE;
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index 3b09419218d6f..90c0c4bbe77b9 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1452,7 +1452,9 @@ static blk_qc_t btt_make_request(struct request_queue *q, 
struct bio *bio)
if (!bio_integrity_prep(bio))
return BLK_QC_T_NONE;
 
-   do_acct = nd_iostat_start(bio, &start);
+   do_acct = blk_queue_io_stat(bio->bi_disk->queue);
+   if (do_acct)
+   start = bio_start_io_acct(bio);
bio_for_each_segment(bvec, bio, iter) {
unsigned int len = bvec.bv_len;
 
@@ -1477,7 +1479,7 @@ static blk_qc_t btt_make_request(struct request_queue *q, 
struct bio *bio)
}
}
if (do_acct)
-   nd_iostat_end(bio, start);
+   bio_end_io_acct(bio, start);
 
bio_endio(bio);
return BLK_QC_T_NONE;
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 85dbb2a322b9b..85c1ae813ea31 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -396,25 +396,6 @@ static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
 #endif
 int nd_blk_region_init(struct nd_region *nd_region);
 int nd_region_activate(struct nd_region *nd_region);
-void __nd_iostat_start(struct bio *bio, unsigned long *start);
-static inline bool nd_iostat_start(struct bio *bio, unsigned long *start)
-{
-   struct gendisk *disk = bio->bi_disk;
-
-   if (!blk_queue_io_stat(disk->queue))
-   return false;
-
-   *start = jiffies;
-   generic_start_io_acct(disk->queue, bio_op(bio), bio_sectors(bio),
- &disk->part0);
-   return true;
-}
-static inline void nd_iostat_end(struct bio *bio, unsigned long start)
-{
-   struct gendisk *disk = bio->bi_disk;
-
-   generic_end_io_acct(disk->queue, bio_op(bio), &disk->part0, start);
-}
 static inline bool is_bad_pmem(struct badblocks *bb, sector_t sector,
unsigned int len)
 {
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 2df6994acf836..97f948f8f4e62 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -202,7 +202,9 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
if (bio->bi_opf & REQ_PREFLUSH)
ret = nvdimm_flush(nd_region, bio);
 
-   do_acct = nd_iostat_start(bio, &start);
+   do_acct = blk_queue_io_stat(bio->bi_disk->queue);
+   if (do_acct)
+   start = bio_start_io_acct(bio);
bio_for_each_segment(bvec, bio, iter) {
if (op_is_write(bio_op(bio)))
rc = pmem_do_write(pmem, bvec.bv_page, bvec.bv_offset,
@@ -216,7 +218,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, 
struct bio *bio)
}
}
if (do_acct)
-   nd_iostat_end(bio, start);
+   bio_end_io_acct(bio, start);
 
if (bio->bi_opf & REQ_FUA)
ret = nvdimm_flush(nd_region, bio);
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 08/16] zram: nvdimm: use bio_{start,end}_io_acct and disk_{start,end}_io_acct

2020-05-25 Thread Christoph Hellwig
Switch zram to use the nicer bio accounting helpers, and as part of that
ensure each bio is counted as a single I/O request.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/zram/zram_drv.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index ebb234f36909c..6e2ad90b17a37 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1510,13 +1510,8 @@ static void zram_bio_discard(struct zram *zram, u32 
index,
 static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
int offset, unsigned int op, struct bio *bio)
 {
-   unsigned long start_time = jiffies;
-   struct request_queue *q = zram->disk->queue;
int ret;
 
-   generic_start_io_acct(q, op, bvec->bv_len >> SECTOR_SHIFT,
-   &zram->disk->part0);
-
if (!op_is_write(op)) {
atomic64_inc(&zram->stats.num_reads);
ret = zram_bvec_read(zram, bvec, index, offset, bio);
@@ -1526,8 +1521,6 @@ static int zram_bvec_rw(struct zram *zram, struct bio_vec 
*bvec, u32 index,
ret = zram_bvec_write(zram, bvec, index, offset, bio);
}
 
-   generic_end_io_acct(q, op, &zram->disk->part0, start_time);
-
zram_slot_lock(zram, index);
zram_accessed(zram, index);
zram_slot_unlock(zram, index);
@@ -1548,6 +1541,7 @@ static void __zram_make_request(struct zram *zram, struct 
bio *bio)
u32 index;
struct bio_vec bvec;
struct bvec_iter iter;
+   unsigned long start_time;
 
index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;
offset = (bio->bi_iter.bi_sector &
@@ -1563,6 +1557,7 @@ static void __zram_make_request(struct zram *zram, struct 
bio *bio)
break;
}
 
+   start_time = bio_start_io_acct(bio);
bio_for_each_segment(bvec, bio, iter) {
struct bio_vec bv = bvec;
unsigned int unwritten = bvec.bv_len;
@@ -1571,8 +1566,10 @@ static void __zram_make_request(struct zram *zram, 
struct bio *bio)
bv.bv_len = min_t(unsigned int, PAGE_SIZE - offset,
unwritten);
if (zram_bvec_rw(zram, &bv, index, offset,
-bio_op(bio), bio) < 0)
-   goto out;
+bio_op(bio), bio) < 0) {
+   bio->bi_status = BLK_STS_IOERR;
+   break;
+   }
 
bv.bv_offset += bv.bv_len;
unwritten -= bv.bv_len;
@@ -1580,12 +1577,8 @@ static void __zram_make_request(struct zram *zram, 
struct bio *bio)
update_position(&index, &offset, &bv);
} while (unwritten);
}
-
+   bio_end_io_acct(bio, start_time);
bio_endio(bio);
-   return;
-
-out:
-   bio_io_error(bio);
 }
 
 /*
@@ -1633,6 +1626,7 @@ static int zram_rw_page(struct block_device *bdev, 
sector_t sector,
u32 index;
struct zram *zram;
struct bio_vec bv;
+   unsigned long start_time;
 
if (PageTransHuge(page))
return -ENOTSUPP;
@@ -1651,7 +1645,9 @@ static int zram_rw_page(struct block_device *bdev, 
sector_t sector,
bv.bv_len = PAGE_SIZE;
bv.bv_offset = 0;
 
+   start_time = disk_start_io_acct(bdev->bd_disk, SECTORS_PER_PAGE, op);
ret = zram_bvec_rw(zram, &bv, index, offset, op, NULL);
+   disk_end_io_acct(bdev->bd_disk, op, start_time);
 out:
/*
 * If I/O fails, just return error(ie, non-zero) without
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 05/16] bcache: use bio_{start,end}_io_acct

2020-05-25 Thread Christoph Hellwig
Switch bcache to use the nicer bio accounting helpers, and call the
routines where we also sample the start time to give coherent accounting
results.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/bcache/request.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 77d1a26975174..22b483527176b 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -668,9 +668,7 @@ static void backing_request_endio(struct bio *bio)
 static void bio_complete(struct search *s)
 {
if (s->orig_bio) {
-   generic_end_io_acct(s->d->disk->queue, bio_op(s->orig_bio),
-   &s->d->disk->part0, s->start_time);
-
+   bio_end_io_acct(s->orig_bio, s->start_time);
trace_bcache_request_end(s->d, s->orig_bio);
s->orig_bio->bi_status = s->iop.status;
bio_endio(s->orig_bio);
@@ -730,7 +728,7 @@ static inline struct search *search_alloc(struct bio *bio,
s->recoverable  = 1;
s->write= op_is_write(bio_op(bio));
s->read_dirty_data  = 0;
-   s->start_time   = jiffies;
+   s->start_time   = bio_start_io_acct(bio);
 
s->iop.c= d->c;
s->iop.bio  = NULL;
@@ -1082,8 +1080,7 @@ static void detached_dev_end_io(struct bio *bio)
bio->bi_end_io = ddip->bi_end_io;
bio->bi_private = ddip->bi_private;
 
-   generic_end_io_acct(ddip->d->disk->queue, bio_op(bio),
-   &ddip->d->disk->part0, ddip->start_time);
+   bio_end_io_acct(bio, ddip->start_time);
 
if (bio->bi_status) {
struct cached_dev *dc = container_of(ddip->d,
@@ -1108,7 +1105,7 @@ static void detached_dev_do_request(struct bcache_device 
*d, struct bio *bio)
 */
ddip = kzalloc(sizeof(struct detached_dev_io_private), GFP_NOIO);
ddip->d = d;
-   ddip->start_time = jiffies;
+   ddip->start_time = bio_start_io_acct(bio);
ddip->bi_end_io = bio->bi_end_io;
ddip->bi_private = bio->bi_private;
bio->bi_end_io = detached_dev_end_io;
@@ -1190,11 +1187,6 @@ blk_qc_t cached_dev_make_request(struct request_queue 
*q, struct bio *bio)
}
}
 
-   generic_start_io_acct(q,
- bio_op(bio),
- bio_sectors(bio),
- &d->disk->part0);
-
bio_set_dev(bio, dc->bdev);
bio->bi_iter.bi_sector += dc->sb.data_offset;
 
@@ -1311,8 +1303,6 @@ blk_qc_t flash_dev_make_request(struct request_queue *q, 
struct bio *bio)
return BLK_QC_T_NONE;
}
 
-   generic_start_io_acct(q, bio_op(bio), bio_sectors(bio), 
&d->disk->part0);
-
s = search_alloc(bio, d);
cl = &s->cl;
bio = &s->bio.bio;
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 01/16] block: add disk/bio-based accounting helpers

2020-05-25 Thread Christoph Hellwig
Add two new helpers to simplify I/O accounting for bio based drivers.
Currently these drivers use the generic_start_io_acct and
generic_end_io_acct helpers which have very cumbersome calling
conventions, don't actually return the time they started accounting,
and try to deal with accounting for partitions, which can't happen
for bio based drivers.  The new helpers will be used to subsequently
replace uses of the old helpers.

The main function is the bio based wrappes in blkdev.h, but for zram
which wants to account rw_page based I/O lower level routines are
provided as well.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c   | 34 ++
 include/linux/blkdev.h | 26 ++
 2 files changed, 60 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 77e57c2e8d602..8973104f88d90 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1432,6 +1432,40 @@ void blk_account_io_start(struct request *rq, bool 
new_io)
part_stat_unlock();
 }
 
+unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors,
+   unsigned int op)
+{
+   struct hd_struct *part = &disk->part0;
+   const int sgrp = op_stat_group(op);
+   unsigned long now = READ_ONCE(jiffies);
+
+   part_stat_lock();
+   update_io_ticks(part, now, false);
+   part_stat_inc(part, ios[sgrp]);
+   part_stat_add(part, sectors[sgrp], sectors);
+   part_stat_local_inc(part, in_flight[op_is_write(op)]);
+   part_stat_unlock();
+
+   return now;
+}
+EXPORT_SYMBOL(disk_start_io_acct);
+
+void disk_end_io_acct(struct gendisk *disk, unsigned int op,
+   unsigned long start_time)
+{
+   struct hd_struct *part = &disk->part0;
+   const int sgrp = op_stat_group(op);
+   unsigned long now = READ_ONCE(jiffies);
+   unsigned long duration = now - start_time;
+
+   part_stat_lock();
+   update_io_ticks(part, now, true);
+   part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
+   part_stat_local_dec(part, in_flight[op_is_write(op)]);
+   part_stat_unlock();
+}
+EXPORT_SYMBOL(disk_end_io_acct);
+
 /*
  * Steal bios from a request and add them to a bio list.
  * The request must not have been partially completed before.
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7d10f4e632325..76d01a8a13b80 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1892,4 +1892,30 @@ static inline void blk_wake_io_task(struct task_struct 
*waiter)
wake_up_process(waiter);
 }
 
+unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors,
+   unsigned int op);
+void disk_end_io_acct(struct gendisk *disk, unsigned int op,
+   unsigned long start_time);
+
+/**
+ * bio_start_io_acct - start I/O accounting for bio based drivers
+ * @bio:   bio to start account for
+ *
+ * Returns the start time that should be passed back to bio_end_io_acct().
+ */
+static inline unsigned long bio_start_io_acct(struct bio *bio)
+{
+   return disk_start_io_acct(bio->bi_disk, bio_sectors(bio), bio_op(bio));
+}
+
+/**
+ * bio_end_io_acct - end I/O accounting for bio based drivers
+ * @bio:   bio to end account for
+ * @start: start time returned by bio_start_io_acct()
+ */
+static inline void bio_end_io_acct(struct bio *bio, unsigned long start_time)
+{
+   return disk_end_io_acct(bio->bi_disk, bio_op(bio), start_time);
+}
+
 #endif
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 02/16] drbd: use bio_{start,end}_io_acct

2020-05-25 Thread Christoph Hellwig
Switch drbd to use the nicer bio accounting helpers.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/drbd/drbd_req.c | 27 ---
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 840c3aef3c5c9..c80a2f1c3c2a7 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -21,24 +21,6 @@
 
 static bool drbd_may_do_local_read(struct drbd_device *device, sector_t 
sector, int size);
 
-/* Update disk stats at start of I/O request */
-static void _drbd_start_io_acct(struct drbd_device *device, struct 
drbd_request *req)
-{
-   struct request_queue *q = device->rq_queue;
-
-   generic_start_io_acct(q, bio_op(req->master_bio),
-   req->i.size >> 9, &device->vdisk->part0);
-}
-
-/* Update disk stats when completing request upwards */
-static void _drbd_end_io_acct(struct drbd_device *device, struct drbd_request 
*req)
-{
-   struct request_queue *q = device->rq_queue;
-
-   generic_end_io_acct(q, bio_op(req->master_bio),
-   &device->vdisk->part0, req->start_jif);
-}
-
 static struct drbd_request *drbd_req_new(struct drbd_device *device, struct 
bio *bio_src)
 {
struct drbd_request *req;
@@ -263,7 +245,7 @@ void drbd_req_complete(struct drbd_request *req, struct 
bio_and_error *m)
start_new_tl_epoch(first_peer_device(device)->connection);
 
/* Update disk stats */
-   _drbd_end_io_acct(device, req);
+   bio_end_io_acct(req->master_bio, req->start_jif);
 
/* If READ failed,
 * have it be pushed back to the retry work queue,
@@ -1222,16 +1204,15 @@ drbd_request_prepare(struct drbd_device *device, struct 
bio *bio, unsigned long
bio_endio(bio);
return ERR_PTR(-ENOMEM);
}
-   req->start_jif = start_jif;
+
+   /* Update disk stats */
+   req->start_jif = bio_start_io_acct(req->master_bio);
 
if (!get_ldev(device)) {
bio_put(req->private_bio);
req->private_bio = NULL;
}
 
-   /* Update disk stats */
-   _drbd_start_io_acct(device, req);
-
/* process discards always from our submitter thread */
if (bio_op(bio) == REQ_OP_WRITE_ZEROES ||
bio_op(bio) == REQ_OP_DISCARD)
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 14/16] block: remove rcu_read_lock() from part_stat_lock()

2020-05-25 Thread Christoph Hellwig
From: Konstantin Khlebnikov 

The RCU lock is required only in disk_map_sector_rcu() to lookup the
partition.  After that request holds reference to related hd_struct.

Replace get_cpu() with preempt_disable() - returned cpu index is unused.

Signed-off-by: Konstantin Khlebnikov 
[hch: rebased]
Signed-off-by: Christoph Hellwig 
---
 block/genhd.c | 11 ---
 include/linux/part_stat.h |  4 ++--
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 3e7df0a3e6bb0..1a76593276644 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -321,11 +321,12 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk 
*disk, sector_t sector)
struct hd_struct *part;
int i;
 
+   rcu_read_lock();
ptbl = rcu_dereference(disk->part_tbl);
 
part = rcu_dereference(ptbl->last_lookup);
if (part && sector_in_part(part, sector) && hd_struct_try_get(part))
-   return part;
+   goto out_unlock;
 
for (i = 1; i < ptbl->len; i++) {
part = rcu_dereference(ptbl->part[i]);
@@ -339,10 +340,14 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk 
*disk, sector_t sector)
if (!hd_struct_try_get(part))
break;
rcu_assign_pointer(ptbl->last_lookup, part);
-   return part;
+   goto out_unlock;
}
}
-   return &disk->part0;
+
+   part = &disk->part0;
+out_unlock:
+   rcu_read_unlock();
+   return part;
 }
 
 /**
diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h
index 6644197980b92..a6b0938ce82e9 100644
--- a/include/linux/part_stat.h
+++ b/include/linux/part_stat.h
@@ -21,8 +21,8 @@ struct disk_stats {
  *
  * part_stat_read() can be called at any time.
  */
-#define part_stat_lock()   ({ rcu_read_lock(); get_cpu(); })
-#define part_stat_unlock() do { put_cpu(); rcu_read_unlock(); } while (0)
+#define part_stat_lock()   preempt_disable()
+#define part_stat_unlock() preempt_enable()
 
 #define part_stat_get_cpu(part, field, cpu)\
(per_cpu_ptr((part)->dkstats, (cpu))->field)
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 12/16] block: account merge of two requests

2020-05-25 Thread Christoph Hellwig
From: Konstantin Khlebnikov 

Also rename blk_account_io_merge() into blk_account_io_merge_request() to
distinguish it from merging request and bio.

Signed-off-by: Konstantin Khlebnikov 
[hch: rebased]
Signed-off-by: Christoph Hellwig 
---
 block/blk-merge.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 6a4538d39efd2..c3beae5c1be71 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -669,18 +669,16 @@ void blk_rq_set_mixed_merge(struct request *rq)
rq->rq_flags |= RQF_MIXED_MERGE;
 }
 
-static void blk_account_io_merge(struct request *req)
+static void blk_account_io_merge_request(struct request *req)
 {
if (blk_do_io_stat(req)) {
-   struct hd_struct *part;
-
part_stat_lock();
-   part = req->part;
-
-   hd_struct_put(part);
+   part_stat_inc(req->part, merges[op_stat_group(req_op(req))]);
+   hd_struct_put(req->part);
part_stat_unlock();
}
 }
+
 /*
  * Two cases of handling DISCARD merge:
  * If max_discard_segments > 1, the driver takes every bio
@@ -792,7 +790,7 @@ static struct request *attempt_merge(struct request_queue 
*q,
/*
 * 'next' is going away, so update stats accordingly
 */
-   blk_account_io_merge(next);
+   blk_account_io_merge_request(next);
 
/*
 * ownership of bio passed from next to req, return 'next' for
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 10/16] block: move update_io_ticks to blk-core.c

2020-05-25 Thread Christoph Hellwig
All callers are in blk-core.c, so move update_io_ticks over.

Signed-off-by: Christoph Hellwig 
---
 block/bio.c  | 16 
 block/blk-core.c | 15 +++
 block/blk.h  |  1 -
 3 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 3e89c7b37855a..5235da6434aab 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1376,22 +1376,6 @@ void bio_check_pages_dirty(struct bio *bio)
schedule_work(&bio_dirty_work);
 }
 
-void update_io_ticks(struct hd_struct *part, unsigned long now, bool end)
-{
-   unsigned long stamp;
-again:
-   stamp = READ_ONCE(part->stamp);
-   if (unlikely(stamp != now)) {
-   if (likely(cmpxchg(&part->stamp, stamp, now) == stamp)) {
-   __part_stat_add(part, io_ticks, end ? now - stamp : 1);
-   }
-   }
-   if (part->partno) {
-   part = &part_to_disk(part)->part0;
-   goto again;
-   }
-}
-
 static inline bool bio_remaining_done(struct bio *bio)
 {
/*
diff --git a/block/blk-core.c b/block/blk-core.c
index 8973104f88d90..c1675d43c2da0 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1381,6 +1381,21 @@ unsigned int blk_rq_err_bytes(const struct request *rq)
 }
 EXPORT_SYMBOL_GPL(blk_rq_err_bytes);
 
+static void update_io_ticks(struct hd_struct *part, unsigned long now, bool 
end)
+{
+   unsigned long stamp;
+again:
+   stamp = READ_ONCE(part->stamp);
+   if (unlikely(stamp != now)) {
+   if (likely(cmpxchg(&part->stamp, stamp, now) == stamp))
+   __part_stat_add(part, io_ticks, end ? now - stamp : 1);
+   }
+   if (part->partno) {
+   part = &part_to_disk(part)->part0;
+   goto again;
+   }
+}
+
 static void blk_account_io_completion(struct request *req, unsigned int bytes)
 {
if (req->part && blk_do_io_stat(req)) {
diff --git a/block/blk.h b/block/blk.h
index 5db4ec1e85f7b..bdf5e94467aa2 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -344,7 +344,6 @@ void blk_queue_free_zone_bitmaps(struct request_queue *q);
 static inline void blk_queue_free_zone_bitmaps(struct request_queue *q) {}
 #endif
 
-void update_io_ticks(struct hd_struct *part, unsigned long now, bool end);
 struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector);
 
 int blk_alloc_devt(struct hd_struct *part, dev_t *devt);
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 09/16] block: remove generic_{start,end}_io_acct

2020-05-25 Thread Christoph Hellwig
Remove these now unused functions.

Signed-off-by: Christoph Hellwig 
---
 block/bio.c | 39 ---
 include/linux/bio.h |  6 --
 2 files changed, 45 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 9c101a0572ca2..3e89c7b37855a 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1392,45 +1392,6 @@ void update_io_ticks(struct hd_struct *part, unsigned 
long now, bool end)
}
 }
 
-void generic_start_io_acct(struct request_queue *q, int op,
-  unsigned long sectors, struct hd_struct *part)
-{
-   const int sgrp = op_stat_group(op);
-   int rw = op_is_write(op);
-
-   part_stat_lock();
-
-   update_io_ticks(part, jiffies, false);
-   part_stat_inc(part, ios[sgrp]);
-   part_stat_add(part, sectors[sgrp], sectors);
-   part_stat_local_inc(part, in_flight[rw]);
-   if (part->partno)
-   part_stat_local_inc(&part_to_disk(part)->part0, in_flight[rw]);
-
-   part_stat_unlock();
-}
-EXPORT_SYMBOL(generic_start_io_acct);
-
-void generic_end_io_acct(struct request_queue *q, int req_op,
-struct hd_struct *part, unsigned long start_time)
-{
-   unsigned long now = jiffies;
-   unsigned long duration = now - start_time;
-   const int sgrp = op_stat_group(req_op);
-   int rw = op_is_write(req_op);
-
-   part_stat_lock();
-
-   update_io_ticks(part, now, true);
-   part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
-   part_stat_local_dec(part, in_flight[rw]);
-   if (part->partno)
-   part_stat_local_dec(&part_to_disk(part)->part0, in_flight[rw]);
-
-   part_stat_unlock();
-}
-EXPORT_SYMBOL(generic_end_io_acct);
-
 static inline bool bio_remaining_done(struct bio *bio)
 {
/*
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 950c9dc44c4f2..941378ec5b39f 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -444,12 +444,6 @@ void bio_release_pages(struct bio *bio, bool mark_dirty);
 extern void bio_set_pages_dirty(struct bio *bio);
 extern void bio_check_pages_dirty(struct bio *bio);
 
-void generic_start_io_acct(struct request_queue *q, int op,
-   unsigned long sectors, struct hd_struct *part);
-void generic_end_io_acct(struct request_queue *q, int op,
-   struct hd_struct *part,
-   unsigned long start_time);
-
 extern void bio_copy_data_iter(struct bio *dst, struct bvec_iter *dst_iter,
   struct bio *src, struct bvec_iter *src_iter);
 extern void bio_copy_data(struct bio *dst, struct bio *src);
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 04/16] lightnvm/pblk: use bio_{start,end}_io_acct

2020-05-25 Thread Christoph Hellwig
Switch rsxx to use the nicer bio accounting helpers.

Signed-off-by: Christoph Hellwig 
---
 drivers/lightnvm/pblk-cache.c |  8 +++-
 drivers/lightnvm/pblk-read.c  | 11 ---
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/lightnvm/pblk-cache.c b/drivers/lightnvm/pblk-cache.c
index 5c1034c22197c..f185f1a83 100644
--- a/drivers/lightnvm/pblk-cache.c
+++ b/drivers/lightnvm/pblk-cache.c
@@ -21,16 +21,14 @@
 void pblk_write_to_cache(struct pblk *pblk, struct bio *bio,
unsigned long flags)
 {
-   struct request_queue *q = pblk->dev->q;
struct pblk_w_ctx w_ctx;
sector_t lba = pblk_get_lba(bio);
-   unsigned long start_time = jiffies;
+   unsigned long start_time;
unsigned int bpos, pos;
int nr_entries = pblk_get_secs(bio);
int i, ret;
 
-   generic_start_io_acct(q, REQ_OP_WRITE, bio_sectors(bio),
- &pblk->disk->part0);
+   start_time = bio_start_io_acct(bio);
 
/* Update the write buffer head (mem) with the entries that we can
 * write. The write in itself cannot fail, so there is no need to
@@ -79,7 +77,7 @@ void pblk_write_to_cache(struct pblk *pblk, struct bio *bio,
pblk_rl_inserted(&pblk->rl, nr_entries);
 
 out:
-   generic_end_io_acct(q, REQ_OP_WRITE, &pblk->disk->part0, start_time);
+   bio_end_io_acct(bio, start_time);
pblk_write_should_kick(pblk);
 
if (ret == NVM_IO_DONE)
diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c
index 8efd14e683dc4..140927ebf41e9 100644
--- a/drivers/lightnvm/pblk-read.c
+++ b/drivers/lightnvm/pblk-read.c
@@ -187,12 +187,11 @@ static void pblk_end_user_read(struct bio *bio, int error)
 static void __pblk_end_io_read(struct pblk *pblk, struct nvm_rq *rqd,
   bool put_line)
 {
-   struct nvm_tgt_dev *dev = pblk->dev;
struct pblk_g_ctx *r_ctx = nvm_rq_to_pdu(rqd);
struct bio *int_bio = rqd->bio;
unsigned long start_time = r_ctx->start_time;
 
-   generic_end_io_acct(dev->q, REQ_OP_READ, &pblk->disk->part0, 
start_time);
+   bio_end_io_acct(int_bio, start_time);
 
if (rqd->error)
pblk_log_read_err(pblk, rqd);
@@ -263,17 +262,15 @@ static void pblk_read_rq(struct pblk *pblk, struct nvm_rq 
*rqd, struct bio *bio,
 
 void pblk_submit_read(struct pblk *pblk, struct bio *bio)
 {
-   struct nvm_tgt_dev *dev = pblk->dev;
-   struct request_queue *q = dev->q;
sector_t blba = pblk_get_lba(bio);
unsigned int nr_secs = pblk_get_secs(bio);
bool from_cache;
struct pblk_g_ctx *r_ctx;
struct nvm_rq *rqd;
struct bio *int_bio, *split_bio;
+   unsigned long start_time;
 
-   generic_start_io_acct(q, REQ_OP_READ, bio_sectors(bio),
- &pblk->disk->part0);
+   start_time = bio_start_io_acct(bio);
 
rqd = pblk_alloc_rqd(pblk, PBLK_READ);
 
@@ -283,7 +280,7 @@ void pblk_submit_read(struct pblk *pblk, struct bio *bio)
rqd->end_io = pblk_end_io_read;
 
r_ctx = nvm_rq_to_pdu(rqd);
-   r_ctx->start_time = jiffies;
+   r_ctx->start_time = start_time;
r_ctx->lba = blba;
 
if (pblk_alloc_rqd_meta(pblk, rqd)) {
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 13/16] block: add a blk_account_io_merge_bio helper

2020-05-25 Thread Christoph Hellwig
From: Konstantin Khlebnikov 

Move the non-"new_io" branch of blk_account_io_start() into separate
function.  Fix merge accounting for discards (they were counted as write
merges).

The new blk_account_io_merge_bio() doesn't call update_io_ticks() unlike
blk_account_io_start(), as there is no reason for that.

Signed-off-by: Konstantin Khlebnikov 
[hch: rebased]
Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c | 25 -
 block/blk-exec.c |  2 +-
 block/blk-mq.c   |  2 +-
 block/blk.h  |  2 +-
 4 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c1675d43c2da0..bf2f7d4bc0c1c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -636,6 +636,16 @@ void blk_put_request(struct request *req)
 }
 EXPORT_SYMBOL(blk_put_request);
 
+static void blk_account_io_merge_bio(struct request *req)
+{
+   if (!blk_do_io_stat(req))
+   return;
+
+   part_stat_lock();
+   part_stat_inc(req->part, merges[op_stat_group(req_op(req))]);
+   part_stat_unlock();
+}
+
 bool bio_attempt_back_merge(struct request *req, struct bio *bio,
unsigned int nr_segs)
 {
@@ -656,7 +666,7 @@ bool bio_attempt_back_merge(struct request *req, struct bio 
*bio,
 
bio_crypt_free_ctx(bio);
 
-   blk_account_io_start(req, false);
+   blk_account_io_merge_bio(req);
return true;
 }
 
@@ -682,7 +692,7 @@ bool bio_attempt_front_merge(struct request *req, struct 
bio *bio,
 
bio_crypt_do_front_merge(req, bio);
 
-   blk_account_io_start(req, false);
+   blk_account_io_merge_bio(req);
return true;
 }
 
@@ -704,7 +714,7 @@ bool bio_attempt_discard_merge(struct request_queue *q, 
struct request *req,
req->__data_len += bio->bi_iter.bi_size;
req->nr_phys_segments = segments + 1;
 
-   blk_account_io_start(req, false);
+   blk_account_io_merge_bio(req);
return true;
 no_merge:
req_set_nomerge(q, req);
@@ -1329,7 +1339,7 @@ blk_status_t blk_insert_cloned_request(struct 
request_queue *q, struct request *
return BLK_STS_IOERR;
 
if (blk_queue_io_stat(q))
-   blk_account_io_start(rq, true);
+   blk_account_io_start(rq);
 
/*
 * Since we have a scheduler attached on the top device,
@@ -1433,16 +1443,13 @@ void blk_account_io_done(struct request *req, u64 now)
}
 }
 
-void blk_account_io_start(struct request *rq, bool new_io)
+void blk_account_io_start(struct request *rq)
 {
if (!blk_do_io_stat(rq))
return;
 
part_stat_lock();
-   if (!new_io)
-   part_stat_inc(rq->part, merges[rq_data_dir(rq)]);
-   else
-   rq->part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
+   rq->part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
update_io_ticks(rq->part, jiffies, false);
part_stat_unlock();
 }
diff --git a/block/blk-exec.c b/block/blk-exec.c
index e20a852ae432d..85324d53d072f 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -55,7 +55,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct 
gendisk *bd_disk,
rq->rq_disk = bd_disk;
rq->end_io = done;
 
-   blk_account_io_start(rq, true);
+   blk_account_io_start(rq);
 
/*
 * don't check dying flag for MQ because the request won't
diff --git a/block/blk-mq.c b/block/blk-mq.c
index cac11945f6023..c606c74463ccd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1822,7 +1822,7 @@ static void blk_mq_bio_to_request(struct request *rq, 
struct bio *bio,
blk_rq_bio_prep(rq, bio, nr_segs);
blk_crypto_rq_bio_prep(rq, bio, GFP_NOIO);
 
-   blk_account_io_start(rq, true);
+   blk_account_io_start(rq);
 }
 
 static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
diff --git a/block/blk.h b/block/blk.h
index 0ecba2ab383d6..428f7e5d70a86 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -185,7 +185,7 @@ bool bio_attempt_discard_merge(struct request_queue *q, 
struct request *req,
 bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs, struct request **same_queue_rq);
 
-void blk_account_io_start(struct request *req, bool new_io);
+void blk_account_io_start(struct request *req);
 void blk_account_io_done(struct request *req, u64 now);
 
 /*
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


block I/O accounting improvements

2020-05-25 Thread Christoph Hellwig
Hi Jens,

they series contains various improvement for block I/O accounting.  The
first bunch of patches switch the bio based drivers to better accounting
helpers compared to the current mess.  The end contains a fix and various
performanc improvements.  Most of this comes from a series Konstantin
sent a few weeks ago, rebased on changes that landed in your tree since
and my change to always use the percpu version of the disk stats.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 15/16] block: use __this_cpu_add() instead of access by smp_processor_id()

2020-05-25 Thread Christoph Hellwig
From: Konstantin Khlebnikov 

Most architectures have fast path to access percpu for current cpu.
The required preempt_disable() is provided by part_stat_lock().

Signed-off-by: Konstantin Khlebnikov 
[hch: rebased]
Signed-off-by: Christoph Hellwig 
---
 include/linux/part_stat.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h
index a6b0938ce82e9..24125778ef3ec 100644
--- a/include/linux/part_stat.h
+++ b/include/linux/part_stat.h
@@ -54,7 +54,7 @@ static inline void part_stat_set_all(struct hd_struct *part, 
int value)
 part_stat_read(part, field[STAT_DISCARD]))
 
 #define __part_stat_add(part, field, addnd)\
-   (part_stat_get(part, field) += (addnd))
+   __this_cpu_add((part)->dkstats->field, addnd)
 
 #define part_stat_add(part, field, addnd)  do {\
__part_stat_add((part), field, addnd);  \
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 11/16] block: always use a percpu variable for disk stats

2020-05-25 Thread Christoph Hellwig
percpu variables have a perfectly fine working stub implementation
for UP kernels, so use that.

Signed-off-by: Christoph Hellwig 
---
 block/blk.h   |  2 +-
 block/genhd.c | 12 +++--
 block/partitions/core.c   |  5 ++--
 include/linux/genhd.h | 13 -
 include/linux/part_stat.h | 55 ---
 5 files changed, 18 insertions(+), 69 deletions(-)

diff --git a/block/blk.h b/block/blk.h
index bdf5e94467aa2..0ecba2ab383d6 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -378,7 +378,7 @@ static inline void hd_struct_put(struct hd_struct *part)
 
 static inline void hd_free_part(struct hd_struct *part)
 {
-   free_part_stats(part);
+   free_percpu(part->dkstats);
kfree(part->info);
percpu_ref_exit(&part->ref);
 }
diff --git a/block/genhd.c b/block/genhd.c
index 094ed90964964..3e7df0a3e6bb0 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -92,7 +92,6 @@ const char *bdevname(struct block_device *bdev, char *buf)
 }
 EXPORT_SYMBOL(bdevname);
 
-#ifdef CONFIG_SMP
 static void part_stat_read_all(struct hd_struct *part, struct disk_stats *stat)
 {
int cpu;
@@ -112,12 +111,6 @@ static void part_stat_read_all(struct hd_struct *part, 
struct disk_stats *stat)
stat->io_ticks += ptr->io_ticks;
}
 }
-#else /* CONFIG_SMP */
-static void part_stat_read_all(struct hd_struct *part, struct disk_stats *stat)
-{
-   memcpy(stat, &part->dkstats, sizeof(struct disk_stats));
-}
-#endif /* CONFIG_SMP */
 
 static unsigned int part_in_flight(struct request_queue *q,
struct hd_struct *part)
@@ -1688,14 +1681,15 @@ struct gendisk *__alloc_disk_node(int minors, int 
node_id)
 
disk = kzalloc_node(sizeof(struct gendisk), GFP_KERNEL, node_id);
if (disk) {
-   if (!init_part_stats(&disk->part0)) {
+   disk->part0.dkstats = alloc_percpu(struct disk_stats);
+   if (!disk->part0.dkstats) {
kfree(disk);
return NULL;
}
init_rwsem(&disk->lookup_sem);
disk->node_id = node_id;
if (disk_expand_part_tbl(disk, 0)) {
-   free_part_stats(&disk->part0);
+   free_percpu(disk->part0.dkstats);
kfree(disk);
return NULL;
}
diff --git a/block/partitions/core.c b/block/partitions/core.c
index 297004fd22648..78951e33b2d7c 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -387,7 +387,8 @@ static struct hd_struct *add_partition(struct gendisk 
*disk, int partno,
if (!p)
return ERR_PTR(-EBUSY);
 
-   if (!init_part_stats(p)) {
+   p->dkstats = alloc_percpu(struct disk_stats);
+   if (!p->dkstats) {
err = -ENOMEM;
goto out_free;
}
@@ -468,7 +469,7 @@ static struct hd_struct *add_partition(struct gendisk 
*disk, int partno,
 out_free_info:
kfree(p->info);
 out_free_stats:
-   free_part_stats(p);
+   free_percpu(p->dkstats);
 out_free:
kfree(p);
return ERR_PTR(err);
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index a9384449465a3..f0d6d77309a54 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -39,15 +39,6 @@ extern struct class block_class;
 #include 
 #include 
 
-struct disk_stats {
-   u64 nsecs[NR_STAT_GROUPS];
-   unsigned long sectors[NR_STAT_GROUPS];
-   unsigned long ios[NR_STAT_GROUPS];
-   unsigned long merges[NR_STAT_GROUPS];
-   unsigned long io_ticks;
-   local_t in_flight[2];
-};
-
 #define PARTITION_META_INFO_VOLNAMELTH 64
 /*
  * Enough for the string representation of any kind of UUID plus NULL.
@@ -72,11 +63,7 @@ struct hd_struct {
seqcount_t nr_sects_seq;
 #endif
unsigned long stamp;
-#ifdef CONFIG_SMP
struct disk_stats __percpu *dkstats;
-#else
-   struct disk_stats dkstats;
-#endif
struct percpu_ref ref;
 
sector_t alignment_offset;
diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h
index ece607607a864..6644197980b92 100644
--- a/include/linux/part_stat.h
+++ b/include/linux/part_stat.h
@@ -4,19 +4,23 @@
 
 #include 
 
+struct disk_stats {
+   u64 nsecs[NR_STAT_GROUPS];
+   unsigned long sectors[NR_STAT_GROUPS];
+   unsigned long ios[NR_STAT_GROUPS];
+   unsigned long merges[NR_STAT_GROUPS];
+   unsigned long io_ticks;
+   local_t in_flight[2];
+};
+
 /*
  * Macros to operate on percpu disk statistics:
  *
- * {disk|part|all}_stat_{add|sub|inc|dec}() modify the stat counters
- * and should be called between disk_stat_lock() and
- * disk_stat_unlock().
+ * {disk|part|all}_stat_{add|sub|inc|dec}() modify the stat counters and should
+ * be called between disk_stat_lock() and disk_stat_unlock().
  *
  * part_stat_

[PATCH 03/16] rsxx: use bio_{start,end}_io_acct

2020-05-25 Thread Christoph Hellwig
Switch rsxx to use the nicer bio accounting helpers.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/rsxx/dev.c | 19 ++-
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 8ffa8260dcafe..3ba07ab30c84f 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -96,20 +96,6 @@ static const struct block_device_operations rsxx_fops = {
.ioctl  = rsxx_blkdev_ioctl,
 };
 
-static void disk_stats_start(struct rsxx_cardinfo *card, struct bio *bio)
-{
-   generic_start_io_acct(card->queue, bio_op(bio), bio_sectors(bio),
-&card->gendisk->part0);
-}
-
-static void disk_stats_complete(struct rsxx_cardinfo *card,
-   struct bio *bio,
-   unsigned long start_time)
-{
-   generic_end_io_acct(card->queue, bio_op(bio),
-   &card->gendisk->part0, start_time);
-}
-
 static void bio_dma_done_cb(struct rsxx_cardinfo *card,
void *cb_data,
unsigned int error)
@@ -121,7 +107,7 @@ static void bio_dma_done_cb(struct rsxx_cardinfo *card,
 
if (atomic_dec_and_test(&meta->pending_dmas)) {
if (!card->eeh_state && card->gendisk)
-   disk_stats_complete(card, meta->bio, meta->start_time);
+   bio_end_io_acct(meta->bio, meta->start_time);
 
if (atomic_read(&meta->error))
bio_io_error(meta->bio);
@@ -167,10 +153,9 @@ static blk_qc_t rsxx_make_request(struct request_queue *q, 
struct bio *bio)
bio_meta->bio = bio;
atomic_set(&bio_meta->error, 0);
atomic_set(&bio_meta->pending_dmas, 0);
-   bio_meta->start_time = jiffies;
 
if (!unlikely(card->halt))
-   disk_stats_start(card, bio);
+   bio_meta->start_time = bio_start_io_acct(bio);
 
dev_dbg(CARD_TO_DEV(card), "BIO[%c]: meta: %p addr8: x%llx size: %d\n",
 bio_data_dir(bio) ? 'W' : 'R', bio_meta,
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 16/16] block: reduce part_stat_lock() scope

2020-05-25 Thread Christoph Hellwig
We only need the stats lock (aka preempt_disable()) for updating the
states, not for looking up or dropping the hd_struct reference.

Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c  | 5 +++--
 block/blk-merge.c | 3 ++-
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index bf2f7d4bc0c1c..a01fb2b508f0e 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1437,9 +1437,9 @@ void blk_account_io_done(struct request *req, u64 now)
update_io_ticks(part, jiffies, true);
part_stat_inc(part, ios[sgrp]);
part_stat_add(part, nsecs[sgrp], now - req->start_time_ns);
+   part_stat_unlock();
 
hd_struct_put(part);
-   part_stat_unlock();
}
 }
 
@@ -1448,8 +1448,9 @@ void blk_account_io_start(struct request *rq)
if (!blk_do_io_stat(rq))
return;
 
-   part_stat_lock();
rq->part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
+
+   part_stat_lock();
update_io_ticks(rq->part, jiffies, false);
part_stat_unlock();
 }
diff --git a/block/blk-merge.c b/block/blk-merge.c
index c3beae5c1be71..f0b0bae075a0c 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -674,8 +674,9 @@ static void blk_account_io_merge_request(struct request 
*req)
if (blk_do_io_stat(req)) {
part_stat_lock();
part_stat_inc(req->part, merges[op_stat_group(req_op(req))]);
-   hd_struct_put(req->part);
part_stat_unlock();
+
+   hd_struct_put(req->part);
}
 }
 
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH 01/16] block: add disk/bio-based accounting helpers

2020-05-26 Thread Christoph Hellwig
On Mon, May 25, 2020 at 03:28:07PM +0300, Konstantin Khlebnikov wrote:
> I think it would be better to leave this jiffies legacy nonsense in
> callers and pass here request duration in nanoseconds.

jiffies is what the existing interfaces uses.  But now that they come
from the start helper fixing this will become easier.  I'll do that
as a follow on, as I'd rather not bloat this series even more.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 03/16] rsxx: use bio_{start,end}_io_acct

2020-05-26 Thread Christoph Hellwig
Switch rsxx to use the nicer bio accounting helpers.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Konstantin Khlebnikov 
---
 drivers/block/rsxx/dev.c | 19 ++-
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/drivers/block/rsxx/dev.c b/drivers/block/rsxx/dev.c
index 8ffa8260dcafe..3ba07ab30c84f 100644
--- a/drivers/block/rsxx/dev.c
+++ b/drivers/block/rsxx/dev.c
@@ -96,20 +96,6 @@ static const struct block_device_operations rsxx_fops = {
.ioctl  = rsxx_blkdev_ioctl,
 };
 
-static void disk_stats_start(struct rsxx_cardinfo *card, struct bio *bio)
-{
-   generic_start_io_acct(card->queue, bio_op(bio), bio_sectors(bio),
-&card->gendisk->part0);
-}
-
-static void disk_stats_complete(struct rsxx_cardinfo *card,
-   struct bio *bio,
-   unsigned long start_time)
-{
-   generic_end_io_acct(card->queue, bio_op(bio),
-   &card->gendisk->part0, start_time);
-}
-
 static void bio_dma_done_cb(struct rsxx_cardinfo *card,
void *cb_data,
unsigned int error)
@@ -121,7 +107,7 @@ static void bio_dma_done_cb(struct rsxx_cardinfo *card,
 
if (atomic_dec_and_test(&meta->pending_dmas)) {
if (!card->eeh_state && card->gendisk)
-   disk_stats_complete(card, meta->bio, meta->start_time);
+   bio_end_io_acct(meta->bio, meta->start_time);
 
if (atomic_read(&meta->error))
bio_io_error(meta->bio);
@@ -167,10 +153,9 @@ static blk_qc_t rsxx_make_request(struct request_queue *q, 
struct bio *bio)
bio_meta->bio = bio;
atomic_set(&bio_meta->error, 0);
atomic_set(&bio_meta->pending_dmas, 0);
-   bio_meta->start_time = jiffies;
 
if (!unlikely(card->halt))
-   disk_stats_start(card, bio);
+   bio_meta->start_time = bio_start_io_acct(bio);
 
dev_dbg(CARD_TO_DEV(card), "BIO[%c]: meta: %p addr8: x%llx size: %d\n",
 bio_data_dir(bio) ? 'W' : 'R', bio_meta,
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 13/16] block: add a blk_account_io_merge_bio helper

2020-05-26 Thread Christoph Hellwig
From: Konstantin Khlebnikov 

Move the non-"new_io" branch of blk_account_io_start() into separate
function.  Fix merge accounting for discards (they were counted as write
merges).

The new blk_account_io_merge_bio() doesn't call update_io_ticks() unlike
blk_account_io_start(), as there is no reason for that.

Signed-off-by: Konstantin Khlebnikov 
[hch: rebased]
Signed-off-by: Christoph Hellwig 
---
 block/blk-core.c | 25 -
 block/blk-exec.c |  2 +-
 block/blk-mq.c   |  2 +-
 block/blk.h  |  2 +-
 4 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c1675d43c2da0..bf2f7d4bc0c1c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -636,6 +636,16 @@ void blk_put_request(struct request *req)
 }
 EXPORT_SYMBOL(blk_put_request);
 
+static void blk_account_io_merge_bio(struct request *req)
+{
+   if (!blk_do_io_stat(req))
+   return;
+
+   part_stat_lock();
+   part_stat_inc(req->part, merges[op_stat_group(req_op(req))]);
+   part_stat_unlock();
+}
+
 bool bio_attempt_back_merge(struct request *req, struct bio *bio,
unsigned int nr_segs)
 {
@@ -656,7 +666,7 @@ bool bio_attempt_back_merge(struct request *req, struct bio 
*bio,
 
bio_crypt_free_ctx(bio);
 
-   blk_account_io_start(req, false);
+   blk_account_io_merge_bio(req);
return true;
 }
 
@@ -682,7 +692,7 @@ bool bio_attempt_front_merge(struct request *req, struct 
bio *bio,
 
bio_crypt_do_front_merge(req, bio);
 
-   blk_account_io_start(req, false);
+   blk_account_io_merge_bio(req);
return true;
 }
 
@@ -704,7 +714,7 @@ bool bio_attempt_discard_merge(struct request_queue *q, 
struct request *req,
req->__data_len += bio->bi_iter.bi_size;
req->nr_phys_segments = segments + 1;
 
-   blk_account_io_start(req, false);
+   blk_account_io_merge_bio(req);
return true;
 no_merge:
req_set_nomerge(q, req);
@@ -1329,7 +1339,7 @@ blk_status_t blk_insert_cloned_request(struct 
request_queue *q, struct request *
return BLK_STS_IOERR;
 
if (blk_queue_io_stat(q))
-   blk_account_io_start(rq, true);
+   blk_account_io_start(rq);
 
/*
 * Since we have a scheduler attached on the top device,
@@ -1433,16 +1443,13 @@ void blk_account_io_done(struct request *req, u64 now)
}
 }
 
-void blk_account_io_start(struct request *rq, bool new_io)
+void blk_account_io_start(struct request *rq)
 {
if (!blk_do_io_stat(rq))
return;
 
part_stat_lock();
-   if (!new_io)
-   part_stat_inc(rq->part, merges[rq_data_dir(rq)]);
-   else
-   rq->part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
+   rq->part = disk_map_sector_rcu(rq->rq_disk, blk_rq_pos(rq));
update_io_ticks(rq->part, jiffies, false);
part_stat_unlock();
 }
diff --git a/block/blk-exec.c b/block/blk-exec.c
index e20a852ae432d..85324d53d072f 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -55,7 +55,7 @@ void blk_execute_rq_nowait(struct request_queue *q, struct 
gendisk *bd_disk,
rq->rq_disk = bd_disk;
rq->end_io = done;
 
-   blk_account_io_start(rq, true);
+   blk_account_io_start(rq);
 
/*
 * don't check dying flag for MQ because the request won't
diff --git a/block/blk-mq.c b/block/blk-mq.c
index cac11945f6023..c606c74463ccd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1822,7 +1822,7 @@ static void blk_mq_bio_to_request(struct request *rq, 
struct bio *bio,
blk_rq_bio_prep(rq, bio, nr_segs);
blk_crypto_rq_bio_prep(rq, bio, GFP_NOIO);
 
-   blk_account_io_start(rq, true);
+   blk_account_io_start(rq);
 }
 
 static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
diff --git a/block/blk.h b/block/blk.h
index 0ecba2ab383d6..428f7e5d70a86 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -185,7 +185,7 @@ bool bio_attempt_discard_merge(struct request_queue *q, 
struct request *req,
 bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs, struct request **same_queue_rq);
 
-void blk_account_io_start(struct request *req, bool new_io);
+void blk_account_io_start(struct request *req);
 void blk_account_io_done(struct request *req, u64 now);
 
 /*
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 02/16] drbd: use bio_{start,end}_io_acct

2020-05-26 Thread Christoph Hellwig
Switch drbd to use the nicer bio accounting helpers.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Konstantin Khlebnikov 
---
 drivers/block/drbd/drbd_req.c | 27 ---
 1 file changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 840c3aef3c5c9..c80a2f1c3c2a7 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -21,24 +21,6 @@
 
 static bool drbd_may_do_local_read(struct drbd_device *device, sector_t 
sector, int size);
 
-/* Update disk stats at start of I/O request */
-static void _drbd_start_io_acct(struct drbd_device *device, struct 
drbd_request *req)
-{
-   struct request_queue *q = device->rq_queue;
-
-   generic_start_io_acct(q, bio_op(req->master_bio),
-   req->i.size >> 9, &device->vdisk->part0);
-}
-
-/* Update disk stats when completing request upwards */
-static void _drbd_end_io_acct(struct drbd_device *device, struct drbd_request 
*req)
-{
-   struct request_queue *q = device->rq_queue;
-
-   generic_end_io_acct(q, bio_op(req->master_bio),
-   &device->vdisk->part0, req->start_jif);
-}
-
 static struct drbd_request *drbd_req_new(struct drbd_device *device, struct 
bio *bio_src)
 {
struct drbd_request *req;
@@ -263,7 +245,7 @@ void drbd_req_complete(struct drbd_request *req, struct 
bio_and_error *m)
start_new_tl_epoch(first_peer_device(device)->connection);
 
/* Update disk stats */
-   _drbd_end_io_acct(device, req);
+   bio_end_io_acct(req->master_bio, req->start_jif);
 
/* If READ failed,
 * have it be pushed back to the retry work queue,
@@ -1222,16 +1204,15 @@ drbd_request_prepare(struct drbd_device *device, struct 
bio *bio, unsigned long
bio_endio(bio);
return ERR_PTR(-ENOMEM);
}
-   req->start_jif = start_jif;
+
+   /* Update disk stats */
+   req->start_jif = bio_start_io_acct(req->master_bio);
 
if (!get_ldev(device)) {
bio_put(req->private_bio);
req->private_bio = NULL;
}
 
-   /* Update disk stats */
-   _drbd_start_io_acct(device, req);
-
/* process discards always from our submitter thread */
if (bio_op(bio) == REQ_OP_WRITE_ZEROES ||
bio_op(bio) == REQ_OP_DISCARD)
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH 01/16] block: add disk/bio-based accounting helpers

2020-05-26 Thread Christoph Hellwig
Add two new helpers to simplify I/O accounting for bio based drivers.
Currently these drivers use the generic_start_io_acct and
generic_end_io_acct helpers which have very cumbersome calling
conventions, don't actually return the time they started accounting,
and try to deal with accounting for partitions, which can't happen
for bio based drivers.  The new helpers will be used to subsequently
replace uses of the old helpers.

The main API is the bio based wrappes in blkdev.h, but for zram
which wants to account rw_page based I/O lower level routines are
provided as well.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Konstantin Khlebnikov 
---
 block/blk-core.c   | 34 ++
 include/linux/blkdev.h | 28 
 2 files changed, 62 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 77e57c2e8d602..8973104f88d90 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1432,6 +1432,40 @@ void blk_account_io_start(struct request *rq, bool 
new_io)
part_stat_unlock();
 }
 
+unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors,
+   unsigned int op)
+{
+   struct hd_struct *part = &disk->part0;
+   const int sgrp = op_stat_group(op);
+   unsigned long now = READ_ONCE(jiffies);
+
+   part_stat_lock();
+   update_io_ticks(part, now, false);
+   part_stat_inc(part, ios[sgrp]);
+   part_stat_add(part, sectors[sgrp], sectors);
+   part_stat_local_inc(part, in_flight[op_is_write(op)]);
+   part_stat_unlock();
+
+   return now;
+}
+EXPORT_SYMBOL(disk_start_io_acct);
+
+void disk_end_io_acct(struct gendisk *disk, unsigned int op,
+   unsigned long start_time)
+{
+   struct hd_struct *part = &disk->part0;
+   const int sgrp = op_stat_group(op);
+   unsigned long now = READ_ONCE(jiffies);
+   unsigned long duration = now - start_time;
+
+   part_stat_lock();
+   update_io_ticks(part, now, true);
+   part_stat_add(part, nsecs[sgrp], jiffies_to_nsecs(duration));
+   part_stat_local_dec(part, in_flight[op_is_write(op)]);
+   part_stat_unlock();
+}
+EXPORT_SYMBOL(disk_end_io_acct);
+
 /*
  * Steal bios from a request and add them to a bio list.
  * The request must not have been partially completed before.
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7d10f4e632325..6f7ff0fa8fcf8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1892,4 +1892,32 @@ static inline void blk_wake_io_task(struct task_struct 
*waiter)
wake_up_process(waiter);
 }
 
+unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors,
+   unsigned int op);
+void disk_end_io_acct(struct gendisk *disk, unsigned int op,
+   unsigned long start_time);
+
+#ifdef CONFIG_BLOCK
+/**
+ * bio_start_io_acct - start I/O accounting for bio based drivers
+ * @bio:   bio to start account for
+ *
+ * Returns the start time that should be passed back to bio_end_io_acct().
+ */
+static inline unsigned long bio_start_io_acct(struct bio *bio)
+{
+   return disk_start_io_acct(bio->bi_disk, bio_sectors(bio), bio_op(bio));
+}
+
+/**
+ * bio_end_io_acct - end I/O accounting for bio based drivers
+ * @bio:   bio to end account for
+ * @start: start time returned by bio_start_io_acct()
+ */
+static inline void bio_end_io_acct(struct bio *bio, unsigned long start_time)
+{
+   return disk_end_io_acct(bio->bi_disk, bio_op(bio), start_time);
+}
+#endif /* CONFIG_BLOCK */
+
 #endif
-- 
2.26.2
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


  1   2   3   4   5   6   7   8   9   10   >