Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm

2008-03-20 Thread Christoph Hellwig
On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote:
> @@ -143,6 +143,10 @@ static noinline __init void detect_machi
>   /* Running on a P/390 ? */
>   if (cpuinfo->cpu_id.machine == 0x7490)
>   machine_flags |= 4;
> +
> + /* Running under KVM ? */
> + if (cpuinfo->cpu_id.version == 0xfe)
> + machine_flags |= 64;

Shouldn't these have symbolic names?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [kvm-devel] [RFC/PATCH 14/15] guest: detect when running on kvm

2008-03-20 Thread Christoph Hellwig
On Thu, Mar 20, 2008 at 09:37:19PM +0100, Carsten Otte wrote:
> Christoph Hellwig wrote:
>> On Thu, Mar 20, 2008 at 05:25:26PM +0100, Carsten Otte wrote:
>>> @@ -143,6 +143,10 @@ static noinline __init void detect_machi
>>> /* Running on a P/390 ? */
>>> if (cpuinfo->cpu_id.machine == 0x7490)
>>> machine_flags |= 4;
>>> +
>>> +   /* Running under KVM ? */
>>> +   if (cpuinfo->cpu_id.version == 0xfe)
>>> +   machine_flags |= 64;
>>
>> Shouldn't these have symbolic names?
> You mean symbolics for machine_flags? Or symbolics for cpu ids?

Either.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [dm-devel] [PATCH] block: silently error unsupported empty barriers too

2009-08-06 Thread Christoph Hellwig
On Thu, Aug 06, 2009 at 12:45:50PM +0100, Alasdair G Kergon wrote:
> On Thu, Aug 06, 2009 at 12:14:17PM +0100, Mark McLoughlin wrote:
> > We should error all barriers, even empty barriers, on devices like
> > virtio_blk which don't support them.
>  
> Have you considered whether or not virtio_blk actually needs to
> support empty barriers?

virtio_blk on kvm does not support any barriers at all, similar to many
other drivers out there.  If the queue flags say we don't support
barriers higher layers should not submit any barrier requests.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 00/26] Xen-paravirt_ops: Xen guest implementation for paravirt_ops interface

2007-03-16 Thread Christoph Hellwig
On Fri, Mar 16, 2007 at 10:26:55AM -0700, Jeremy Fitzhardinge wrote:
> +#ifdef CONFIG_HIGHPTE
> + .kmap_atomic_pte = native_kmap_atomic_pte,
> +#else
> + .kmap_atomic_pte = paravirt_nop,
> +#endif

This is ifdefing is quite ugly.  Shouldn't native_kmap_atomic_pte
just be a noop in the !CONFIG_HIGHPTE case?

> -void *kmap_atomic(struct page *page, enum km_type type)
> +void *_kmap_atomic(struct page *page, enum km_type type, pgprot_t prot)

We normally call our "secial" function __foo, not _foo.  But in this
case it really should have a more meaningfull name like
kmap_atomic_prot anyway.

> +void *kmap_atomic(struct page *page, enum km_type type)
> +{
> + return _kmap_atomic(page, type, kmap_prot);

And this one should probably be an inline.

> +static inline void *native_kmap_atomic_pte(struct page *page, enum km_type 
> type)
> +{
> + return kmap_atomic(page, type);
> +}
> +
> +#ifndef CONFIG_PARAVIRT
> +#define kmap_atomic_pte(page, type)  kmap_atomic(page, type)
> +#endif

This is all getting rather ugly just for your pagetable hackery.

___
Virtualization mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 27/32] xen: Add the Xen virtual network device driver.

2007-04-29 Thread Christoph Hellwig
On Sun, Apr 29, 2007 at 10:29:02AM -0700, Jeremy Fitzhardinge wrote:
> +#include 

not needed.

> +#include 
> +#include 
> +#include 
> +#ifdef CONFIG_XEN_BALLOON
> +#include 
> +#endif
> +#include 

Please don't try to put such a fucked up include hierachy in.
Just move everything under include/xen or you will soon get
problems with the 80 line length limit for your includes..

Also please make sure that  can be included unconditionally,
as we really don't like ifdefs around includes.

> +struct netfront_info {
> + struct list_head list;
> + struct net_device *netdev;
> +
> + struct net_device_stats stats;
> +
> + struct netif_tx_front_ring tx;
> + struct netif_rx_front_ring rx;
> +
> + spinlock_t   tx_lock;
> + spinlock_t   rx_lock;
> +
> + unsigned int evtchn, irq;
> + unsigned int copying_receiver;
> + unsigned int carrier;

This doesn't not look like exactly smart cacheline alignment :)

> + grant_ref_t gref_tx_head;

What's a grant_ref_t?  Should this really be a typedef or better
a struct type?  Also it really wants a xen_ prefix instead of someting
so generic.

> + * Implement our own carrier flag: the network stack's version causes delays
> + * when the carrier is re-enabled (in particular, dev_activate() may not
> + * immediately be called, which can cause packet loss).
> + */

Did you talk to the networking folks about these problems?

> +#define netfront_carrier_on(netif)   ((netif)->carrier = 1)
> +#define netfront_carrier_off(netif)  ((netif)->carrier = 0)
> +#define netfront_carrier_ok(netif)   ((netif)->carrier)

Please use proper symbolic names for the ctal states and kill these
wrappers.

> +static int setup_device(struct xenbus_device *, struct netfront_info *);
> +static struct net_device *create_netdev(struct xenbus_device *);
> +
> +static void end_access(int, void *);
> +static void netif_disconnect_backend(struct netfront_info *);
> +
> +static int network_connect(struct net_device *);
> +static void network_tx_buf_gc(struct net_device *);
> +static void network_alloc_rx_buffers(struct net_device *);
> +
> +static irqreturn_t netif_int(int irq, void *dev_id);

Any chance you could avoid these forward-prototypes by reordering
the functions a little?

Also a lot of these names are horribly generic.  A proper xennet_
prefix would probably help.

> +static inline int xennet_can_sg(struct net_device *dev)
> +{
> + return dev->features & NETIF_F_SG;
> +}

totally useless wrapper.

> +static int __devexit netfront_remove(struct xenbus_device *dev)
> +{
> + struct netfront_info *info = dev->dev.driver_data;
> +
> + dev_dbg(&dev->dev, "%s\n", dev->nodename);
> +
> + netif_disconnect_backend(info);
> +
> + del_timer_sync(&info->rx_refill_timer);
> +
> + xennet_sysfs_delif(info->netdev);
> +
> + unregister_netdev(info->netdev);
> +
> + free_netdev(info->netdev);

This looks like very wrong ordering to me.  unregister_netdev should
be the first thing in the remove function.

> + SHARED_RING_INIT(rxs);
> + FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE);

Can you replace these shouting macros with proper named functions?

> +  * receive ring. This creates a less bursty demand on the memory
> +  * allocator, so should reduce the chance of failed allocation requests
> +  * both for ourself and for other kernel subsystems.
> +  */
> + batch_target = np->rx_target - (req_prod - np->rx.rsp_cons);
> + for (i = skb_queue_len(&np->rx_batch); i < batch_target; i++) {
> + /*
> +  * Allocate an skb and a page. Do not use __dev_alloc_skb as
> +  * that will allocate page-sized buffers which is not
> +  * necessary here.
> +  * 16 bytes added as necessary headroom for netif_receive_skb.
> +  */
> + skb = alloc_skb(RX_COPY_THRESHOLD + 16 + NET_IP_ALIGN,
> + GFP_ATOMIC | __GFP_NOWARN);

This comment doesn't make any sense, __dev_alloc_skb is:

static inline struct sk_buff *__dev_alloc_skb(unsigned int length,
  gfp_t gfp_mask)
{
struct sk_buff *skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
if (likely(skb))
skb_reserve(skb, NET_SKB_PAD);
return skb;
}

then again what you really should be using here is __netdev_alloc_skb.

> +#ifdef CONFIG_XEN_BALLOON
> + /* Tell the ballon driver what is going on. */
> + balloon_update_driver_allowance(i);
> +#endif

This should be a noop for !CONFIG_XEN_BALLOON in the header,
and there should be no need for ifdefs here.

> + skb->nh.raw = (void *)skb_shinfo(skb)->frags[0].page;
> + skb->h.raw = skb->nh.raw + rx->offset;

Stuff like this won't compile anymore in the current tree.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-

Re: [patch 26/32] xen: Add Xen virtual block device driver.

2007-04-29 Thread Christoph Hellwig
>  source "drivers/s390/block/Kconfig"
> +source "drivers/block/xen/Kconfig"

Please don't add a new subdirectory for a tiny new driver that
really should be only a single .c file.

> +config XEN_BLKDEV_FRONTEND
> + tristate "Block device frontend driver"
> + depends on XEN
> + default y

Please kill the default statement.  We really should have script that
autorejects every new driver that wants to be default y..

> +#include 

not needed, as usual.

> +#define BLKIF_STATE_DISCONNECTED 0
> +#define BLKIF_STATE_CONNECTED1
> +#define BLKIF_STATE_SUSPENDED2

enum, please.

> +static void connect(struct blkfront_info *);
> +static void blkfront_closing(struct xenbus_device *);
> +static int blkfront_remove(struct xenbus_device *);
> +static int talk_to_backend(struct xenbus_device *, struct blkfront_info *);
> +static int setup_blkring(struct xenbus_device *, struct blkfront_info *);
> +
> +static void kick_pending_request_queues(struct blkfront_info *);
> +
> +static irqreturn_t blkif_int(int irq, void *dev_id);
> +static void blkif_restart_queue(struct work_struct *work);
> +static void blkif_recover(struct blkfront_info *);
> +static void blkif_completion(struct blk_shadow *);
> +static void blkif_free(struct blkfront_info *, int);

Please get rid of all the forward-declarations by putting the code
into proper order.

> +static inline int GET_ID_FROM_FREELIST(
> + struct blkfront_info *info)
> +{
> + unsigned long free = info->shadow_free;
> + BUG_ON(free > BLK_RING_SIZE);
> + info->shadow_free = info->shadow[free].req.id;
> + info->shadow[free].req.id = 0x0fee; /* debug */
> + return free;
> +}
> +
> +static inline void ADD_ID_TO_FREELIST(
> + struct blkfront_info *info, unsigned long id)
> +{
> + info->shadow[id].req.id  = info->shadow_free;
> + info->shadow[id].request = 0;
> + info->shadow_free = id;
> +}

Please give these proper lowercased names, and while you're at it
please kill all the inline statements you have and let the compiler
do it's work.


> +static irqreturn_t blkif_int(int irq, void *dev_id)

Please call interrupt handlers foo_interrupt, that makes peopl see what's
going on much more easily.

> +static void blkif_recover(struct blkfront_info *info)
> +{
> + int i;
> + struct blkif_request *req;
> + struct blk_shadow *copy;
> + int j;
> +
> + /* Stage 1: Make a safe copy of the shadow state. */
> + copy = kmalloc(sizeof(info->shadow), GFP_KERNEL | __GFP_NOFAIL);

Please don't use __GFP_NOFAIL in any new code.

> +/* ** Driver Registration ** */

No one would have guesses without that comments..

> +extern int blkif_ioctl(struct inode *inode, struct file *filep,
> +unsigned command, unsigned long argument);

doesn't actually exist anywhere.

> +extern int blkif_check(dev_t dev);

ditto

> +extern int blkif_revalidate(dev_t dev);

ditto.

> +/**

Just kill these *, okay?

> + * vbd.c

Please don't put filenames in the top of file comments, they'll get out
of date far too soon while serving no purpose at all.

> +#define BLKIF_MAJOR(dev) ((dev)>>8)
> +#define BLKIF_MINOR(dev) ((dev) & 0xff)

Please make these proper xenblk-prefixed inlines in lower case.

> +
> +static struct xlbd_type_info xvd_type_info = {
> + .partn_shift = 4,
> + .disks_per_major = 16,
> + .devname = "xvd",
> + .diskname = "xvd"
> +};
> +
> +static struct xlbd_major_info xvd_major_info = {
> + .major = XENVBD_MAJOR,
> + .type = &xvd_type_info
> +};

The structures seem quite useless and the code would be much
cleaner without them.  What's the reason for their existance?

> +
> +/* Information about our VBDs. */
> +#define MAX_VBDS 64
> +static LIST_HEAD(vbds_list);
> +
> +static struct block_device_operations xlvbd_block_fops =
> +{
> + .owner = THIS_MODULE,
> + .open = blkif_open,
> + .release = blkif_release,
> + .getgeo = blkif_getgeo
> +};
> +
> +DEFINE_SPINLOCK(blkif_io_lock);
> +
> +int
> +xlvbd_alloc_major(void)
> +{
> + if (register_blkdev(xvd_major_info.major,
> + xvd_major_info.type->devname)) {
> + printk(KERN_WARNING "xen_blk: can't get major %d with name 
> %s\n",
> +xvd_major_info.major, xvd_major_info.type->devname);
> + return -1;
> + }
> + return 0;
> +}

Useless wrapper, please just kill it.

> +
> +static int
> +xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size)
> +{
> + request_queue_t *rq;
> +
> + rq = blk_init_queue(do_blkif_request, &blkif_io_lock);
> + if (rq == NULL)
> + return -1;
> +
> + elevator_init(rq, "noop");
> +
> + /* Hard sector size and max sectors impersonate the equiv. hardware. */
> + blk_queue_hardsect_size(rq, sector_size);
> + blk_queue_max_sectors(rq, 512);
> +
> + /* Each segment in a request is up to an aligned page in size. */

Re: [patch 25/29] xen: Add the Xen virtual network device driver.

2007-05-05 Thread Christoph Hellwig
On Fri, May 04, 2007 at 04:21:16PM -0700, Jeremy Fitzhardinge wrote:
> +/*
> + * Mutually-exclusive module options to select receive data path:
> + *  rx_copy : Packets are copied by network backend into local memory
> + *  rx_flip : Page containing packet data is transferred to our ownership
> + * For fully-virtualised guests there is no option - copying must be used.
> + * For paravirtualised guests, flipping is the default.
> + */
> +static enum {
> + RX_COPY = 0,
> + RX_FLIP = 1,
> +} rx_mode = RX_FLIP;
> +MODULE_PARM_DESC(rx_mode, "How to get packets from card: 0->copy, 1->flip");

There only seems to be a module description but no actual paramter for
this.  I wish people would have listened to me back then and made the
description part of the modular_param statement..

> +
> +#define RX_COPY_THRESHOLD 256
> +
> +#define GRANT_INVALID_REF0
> +
> +#define NET_TX_RING_SIZE __RING_SIZE((struct xen_netif_tx_sring *)0, 
> PAGE_SIZE)
> +#define NET_RX_RING_SIZE __RING_SIZE((struct xen_netif_rx_sring *)0, 
> PAGE_SIZE)

__RING_SIZE is not in my tree, so it seems to be some kind of Xen
addition.  Can you make that clear in the name and give it a less
awkware calling convention, e.g. only pass in the type, not a null
pointer of the given type?


> +/*
> + * Implement our own carrier flag: the network stack's version causes delays
> + * when the carrier is re-enabled (in particular, dev_activate() may not
> + * immediately be called, which can cause packet loss).
> + */
> +#define netfront_carrier_on(netif)   ((netif)->carrier = 1)
> +#define netfront_carrier_off(netif)  ((netif)->carrier = 0)
> +#define netfront_carrier_ok(netif)   ((netif)->carrier)

This doesn't implement my review suggestion despite you ACKing
them.  Didn't you like it in the end or did you simply forget
about it?

> +/*
> + * Access macros for acquiring freeing slots in tx_skbs[].
> + */
> +
> +static void add_id_to_freelist(unsigned *head, union skb_entry *list, 
> unsigned short id)


no lines longer than 80 chars please.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [patch 7/9] lguest: the net driver

2007-05-09 Thread Christoph Hellwig
On Thu, May 10, 2007 at 01:14:55AM +1000, Rusty Russell wrote:
> > > + info->peer = (void *)ioremap(info->peer_phys, info->mapsize);
> > 
> > check for NULL
> 
> Erk, good catch!

Also the cast is bogus.  ioremap already returns void already.  Even
more importantly the lack of the __iomem annotations shows that either this
code hasn't been run through sparse or someone decided to ignore it's
errors.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/5] lguest feedback tidyups

2007-05-10 Thread Christoph Hellwig
On Fri, May 11, 2007 at 11:17:26AM +1000, Rusty Russell wrote:
> - But the cost was high: lots of __force casts 8(

That sounds like something is very fishy.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/5] lguest guest feedback tidyups

2007-05-10 Thread Christoph Hellwig
On Fri, May 11, 2007 at 11:21:30AM +1000, Rusty Russell wrote:
> 1) send-dma and bind-dma hypercall wrappers for drivers to use,
> 2) formalization of the convention that devices can use the irq
>corresponding to their index on the lguest_bus.
> 3) ___force to shut up sparse: guests *can* use ioremap as virtual mem.

No, they can't.  Even if in your case the underlying address spaces
happen to be the same anything returned by ioremap must use the proper
accessors.  That's the whole point of having this separation, otherwise
you wouldn't need to use ioremap at all.  So instead of sprinkling cast
around add lguest_read*/lguest_write* accessors that do the __force cast
once and make sure the ioremap return value is always accessed using those.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/5] lguest guest feedback tidyups

2007-05-11 Thread Christoph Hellwig
On Fri, May 11, 2007 at 05:31:06PM +1000, Rusty Russell wrote:
>   Well, without ioremap, the memory wouldn't normally be mapped.  Is
> there something better to use?

Either use accessors or use your own lguest-specific remapping function that
doesn't return __iomem function

> > So instead of sprinkling cast
> > around add lguest_read*/lguest_write* accessors that do the __force cast
> > once and make sure the ioremap return value is always accessed using those.
> 
> And that's nothing to do with iremap.  They're required because guest
> "physical" == host virtual, and casting a long to a "__user void *"
> seems to require a __force.

Well, it's the same problem really.  You want to treat it as host virtual
in some places and and guest physical in others, but you need to keep
the abstraction clean.  To keep that absraction clean you introduce
accessors that contain the __force cast.  Now that you have these accessors
instead of random casts you need to think a bit where the host virtual
abstraction makes more sense and were the the guest virtual abstraction
makes more sense and use it consistantly there with as few as possible
uses of the accessors in between. 

Now to something different than the technical content of this mail:

> Hi Christoph!

> I enjoy a good Hellwigging as much as anyone, but your aim is off.

Please stop this crap.  I know who I am, so there's no need to waste
mail estate with saying Hi.  Also please stop being a total fuckass
and abusing my lastname just because you didn't get what was in the
mail.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [kvm-devel] [PATCH RFC 2/3] Virtio draft III: example net driver

2007-06-25 Thread Christoph Hellwig
On Mon, Jun 25, 2007 at 10:26:37AM -0500, Brian King wrote:
> 1. The add_inbuf interface passes an sg list. This causes problems for
>ibmveth since its rx buffers cannot be scatterlists.
> 2. The user of this API does not have access to the sk_buf. This causes
>issues for ibmveth since it needs to reserve the first 8 bytes of the
>rx buffer for use by the firmware. It currently uses skb_reserve to do 
> this.
> 
> Would it be simpler to just pass an sk_buf rather than the scatterlist
> on these interfaces for virtio_net users?

It probably should pass the sk_buff.  Then again the network layer really
should be using struct scaterrlist for skb fragments - on a lot of iommu
a dma_map_sg is a lot more efficient than a lot of dma_map_page calls,
not to mention the benefit of a common data structure for things like
network attached storage protocols that have to talk to both worlds.

And yes, this is getting a little out of scope for the virtualization
discussion.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v8 01/12] virtio_ring: split: separate dma codes

2023-05-12 Thread Christoph Hellwig
As said before, please don't try to do weird runtime checks based
on the scatterlist.  What you have works for now, but there are
plans to repalce the page + offset tuple in the scatterlist with
just a phys_addr_t.  And with that your "clever" scheme will break
instantly.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 5/6] mm/gup: remove vmas parameter from pin_user_pages()

2023-05-15 Thread Christoph Hellwig
On Sun, May 14, 2023 at 10:26:58PM +0100, Lorenzo Stoakes wrote:
> We are now in a position where no caller of pin_user_pages() requires the
> vmas parameter at all, so eliminate this parameter from the function and
> all callers.
> 
> This clears the way to removing the vmas parameter from GUP altogether.
> 
> Acked-by: David Hildenbrand 
> Acked-by: Dennis Dalessandro  (for 
> qib)
> Signed-off-by: Lorenzo Stoakes 

Looks good:

Reviewed-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] virtio: abstract virtqueue related methods

2023-05-17 Thread Christoph Hellwig
On Wed, May 17, 2023 at 10:54:23AM +0800, zhenwei pi wrote:
> All the vring based virtqueue methods could be abstratct in theory,
> MST suggested that add/get bufs and kick functions are quite perfmance
> sensitive, so export these functions from virtio_ring.ko, drivers
> still call them in a fast path.

Who is going to use this?  And why do you think every virtio users
would want to pay for indirect calls just for your shiny new feature?

This seems like an amazingly bad idea to me.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/2] virtio: abstract virtqueue related methods

2023-05-17 Thread Christoph Hellwig
On Wed, May 17, 2023 at 03:43:03PM +0800, zhenwei pi wrote:
> I have a plan to introduce 'Virtio Over Fabrics'(TCP&RDMA) as Virtio
> transport, as mentioned in cover letter of this series:
> 3 weeks ago, I posted a proposal 'Virtio Over Fabrics':
> https://lists.oasis-open.org/archives/virtio-comment/202304/msg00442.html

Just don't do it.  Please define your own protocols over RDMA or TCP
for exactly the operations you need (for many they will already exist)
instead of piggyg backing on virtio and making everyone else pay the
price.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v9 01/12] virtio_ring: put mapping error check in vring_map_one_sg

2023-05-22 Thread Christoph Hellwig
On Wed, May 17, 2023 at 10:22:38AM +0800, Xuan Zhuo wrote:
> -static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
> -struct scatterlist *sg,
> -enum dma_data_direction direction)
> +static int vring_map_one_sg(const struct vring_virtqueue *vq, struct 
> scatterlist *sg,
> + enum dma_data_direction direction, static 
> dma_addr_t *addr)

Please avoid making this unreadable by adding overly lone lines.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v9 04/12] virtio_ring: virtqueue_add() support premapped

2023-05-22 Thread Christoph Hellwig
On Wed, May 17, 2023 at 10:22:41AM +0800, Xuan Zhuo wrote:
> virtuque_add() adds parameter premapped.

Well, I can see that.  But why?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP

2023-06-07 Thread Christoph Hellwig
On Mon, Jun 05, 2023 at 09:58:21AM +0800, Xuan Zhuo wrote:
> On Fri, 2 Jun 2023 23:29:02 -0700, Jakub Kicinski  wrote:
> > On Fri,  2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote:
> > > Thanks for the help from Christoph.
> >
> > That said you haven't CCed him on the series, isn't the general rule to
> > CC anyone who was involved in previous discussions?
> 
> 
> Sorry, I forgot to add cc after git format-patch.

So I've been looking for this series elsewhere, but it seems to include
neither lkml nor the iommu list, so I can't find it.  Can you please
repost it?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_pmem: do flush synchronously

2023-06-21 Thread Christoph Hellwig
I think the proper minimal fix is to pass in a REQ_WRITE in addition to
REQ_PREFLUSH.  We can than have a discussion on the merits of this
weird async pmem flush scheme separately.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio_pmem: add the missing REQ_OP_WRITE for flush bio

2023-06-21 Thread Christoph Hellwig
Please avoid the overly long line.  With that fixe this looks good
to me.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] virtio_pmem: add the missing REQ_OP_WRITE for flush bio

2023-06-26 Thread Christoph Hellwig
Looks good:

Reviewed-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"

2023-07-12 Thread Christoph Hellwig
On Wed, Jul 12, 2023 at 10:28:00AM +0200, Stefano Garzarella wrote:
> The problem is that the SCSI stack does not send this command, so we
> should do it in the driver. In fact we do it for
> VIRTIO_SCSI_EVT_RESET_RESCAN (hotplug), but not for
> VIRTIO_SCSI_EVT_RESET_REMOVED (hotunplug).

No, you should absolutely no do it in the driver.  The fact that
virtio-scsi even tries to do some of its own LUN scanning is
problematic and should have never happened.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()

2023-07-13 Thread Christoph Hellwig
On Mon, Jul 10, 2023 at 11:42:30AM +0800, Xuan Zhuo wrote:
> This helper allows the driver change the dma mode to premapped mode.
> Under the premapped mode, the virtio core do not do dma mapping
> internally.
> 
> This just work when the use_dma_api is true. If the use_dma_api is false,
> the dma options is not through the DMA APIs, that is not the standard
> way of the linux kernel.

I have a hard time parsing this.

More importantly having two modes seems very error prone going down
the route.  If the premapping is so important, why don't we do it
always?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-13 Thread Christoph Hellwig
On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> caller can do dma operation in advance. The purpose is to keep memory
> mapped across multiple add/get buf operations.

This is just poking holes into the abstraction..

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()

2023-07-19 Thread Christoph Hellwig
On Thu, Jul 13, 2023 at 10:47:23AM -0400, Michael S. Tsirkin wrote:
> There are a gazillion virtio drivers and most of them just use the
> virtio API, without bothering with these micro-optimizations.  virtio
> already tracks addresses so mapping/unmapping them for DMA is easier
> done in the core.  It's only networking and only with XDP where the
> difference becomes measureable.

Yes, but now you two differing code paths (which then branch into
another two with the fake DMA mappings).  I'm really worried about
the madness that follows like the USB dma mapping code that is a
constant soure of trouble.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page

2023-07-19 Thread Christoph Hellwig
Hi Jason,

can you please resend your reply with proper quoting?  I had to give
up after multiple pages of scrolling without finding anything that
you added to the full quote.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-19 Thread Christoph Hellwig
On Thu, Jul 13, 2023 at 10:51:59AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> > On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > caller can do dma operation in advance. The purpose is to keep memory
> > > mapped across multiple add/get buf operations.
> > 
> > This is just poking holes into the abstraction..
> 
> More specifically?

Because now you expose a device that can't be used for the non-dma
mapping case and shoud be hidden.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-19 Thread Christoph Hellwig
On Thu, Jul 20, 2023 at 02:45:14PM +0800, Xuan Zhuo wrote:
>  virtqueue_dma_dev() return the device that working with the DMA APIs.
>  Then that can be used like other devices. So what is the problem.
> 
>  I always think the code path without the DMA APIs is the trouble for you.

Because we now have an API where the upper level drivers sometimes
see the dma device and sometimes not.  This will be abused and cause
trouble sooner than you can say "layering".
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page

2023-07-20 Thread Christoph Hellwig
On Thu, Jul 20, 2023 at 03:41:56PM +0800, Jason Wang wrote:
> > Did you actually check that it works though?
> > Looks like with swiotlb you need to synch to trigger a copy
> > before unmap, and I don't see where it's done in the current
> > patch.
> 
> And this is needed for XDP_REDIRECT as well.

DMA always needs proper syncs, be that for swiotlb or for cache
maintainance, yes.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()

2023-07-24 Thread Christoph Hellwig
On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> Well I think we can add wrappers like virtio_dma_sync and so on.
> There are NOP for non-dma so passing the dma device is harmless.

Yes, please.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-01 Thread Christoph Hellwig
On Tue, Sep 26, 2023 at 07:41:44AM -0400, Michael S. Tsirkin wrote:
> 
> Except, there's no reasonable way for virtio to know what is done with
> the device then. You are not using just 2 symbols at all, instead you
> are using the rich vq API which was explicitly designed for the driver
> running the device being responsible for serializing accesses. Which is
> actually loaded and running. And I *think* your use won't conflict ATM
> mostly by luck. Witness the hack in patch 01 as exhibit 1 - nothing
> at all even hints at the fact that the reason for the complicated
> dance is because another driver pokes at some of the vqs.

Fully agreed.  The smart nic vendors are trying to do the same mess
in nvme, and we really need to stop them and agree on proper standarized
live migration features implemented in the core virtio/nvme code.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-05 Thread Christoph Hellwig
On Mon, Oct 02, 2023 at 12:13:20PM -0300, Jason Gunthorpe wrote:
> ??? This patch series is an implementation of changes that OASIS
> approved.

I think you are fundamentally missing my point.  This is not about
who publish a spec, but how we struture Linux code.

And the problem is that we trea vfio as a separate thing, and not an
integral part of the driver.  vfio being separate totally makes sense
for the original purpose of vfio, that is a a no-op passthrough of
a device to userspace.

But for all the augmented vfio use cases it doesn't, for them the
augmented vfio functionality is an integral part of the core driver.
That is true for nvme, virtio and I'd argue mlx5 as well.

So we need to stop registering separate pci_drivers for this kind
of functionality, and instead have an interface to the driver to
switch to certain functionalities.

E.g. for this case there should be no new vfio-virtio device, but
instead you should be able to switch the virtio device to an
fake-legacy vfio mode.

Assuming the whole thing actually makes sense, as the use case seems
a bit fishy to start with, but I'll leave that argument to the virtio
maintainers.

Similarly for nvme.  We'll never accept a separate nvme-live migration
vfio driver.  This functionality needs to be part of the nvme driver,
probed there and fully controlled there.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-06 Thread Christoph Hellwig
On Thu, Oct 05, 2023 at 08:10:04AM -0300, Jason Gunthorpe wrote:
> > But for all the augmented vfio use cases it doesn't, for them the
> > augmented vfio functionality is an integral part of the core driver.
> > That is true for nvme, virtio and I'd argue mlx5 as well.
> 
> I don't agree with this. I see the extra functionality as being an
> integral part of the VF and VFIO. The PF driver is only providing a
> proxied communication channel.
> 
> It is a limitation of PCI that the PF must act as a proxy.

For anything live migration it very fundamentally is not, as a function
that is visible to a guest by definition can't drive the migration
itself.  That isn't really a limitation in PCI, but follows form the
fact that something else must control a live migration that is
transparent to the guest.

> 
> > So we need to stop registering separate pci_drivers for this kind
> > of functionality, and instead have an interface to the driver to
> > switch to certain functionalities.
> 
> ?? We must bind something to the VF's pci_driver, what do you imagine
> that is?

The driver that knows this hardware.  In this case the virtio subsystem,
in case of nvme the nvme driver, and in case of mlx5 the mlx5 driver.

> > E.g. for this case there should be no new vfio-virtio device, but
> > instead you should be able to switch the virtio device to an
> > fake-legacy vfio mode.
> 
> Are you aruging about how we reach to vfio_register_XX() and what
> directory the file lives?

No.  That layout logically follows from what codebase the functionality
is part of, though.

> I don't know what "fake-legacy" even means, VFIO is not legacy.

The driver we're talking about in this thread fakes up a virtio_pci
legacy devie to the guest on top of a "modern" virtio_pci device.

> There is alot of code in VFIO and the VMM side to take a VF and turn
> it into a vPCI function. You can't just trivially duplicate VFIO in a
> dozen drivers without creating a giant mess.

I do not advocate for duplicating it.  But the code that calls this
functionality belongs into the driver that deals with the compound
device that we're doing this work for.

> Further, userspace wants consistent ways to operate this stuff. If we
> need a dozen ways to activate VFIO for every kind of driver that is
> not a positive direction.

We don't need a dozen ways.  We just need a single attribute on the
pci (or $OTHERBUS) devide that switches it to vfio mode.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-10 Thread Christoph Hellwig
On Tue, Oct 10, 2023 at 06:43:32PM +0300, Yishai Hadas wrote:
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in exposing
> > admin commands through a memory BAR of VF itself.
> > 
> The driver who owns the VF is VFIO, it's not a VIRTIO one.

And to loop back into my previous discussion: that's the fundamental
problem here.  If it is owned by the virtio subsystem, which just
calls into vfio we would not have this problem, including the
circular loops and exposed APIs.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-10 Thread Christoph Hellwig
On Tue, Oct 10, 2023 at 12:59:37PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 10, 2023 at 11:14:56AM -0400, Michael S. Tsirkin wrote:
> 
> > I suggest 3 but call it on the VF. commands will switch to PF
> > internally as needed. For example, intel might be interested in exposing
> > admin commands through a memory BAR of VF itself.
> 
> FWIW, we have been pushing back on such things in VFIO, so it will
> have to be very carefully security justified.
> 
> Probably since that is not standard it should just live in under some
> intel-only vfio driver behavior, not in virtio land.

Btw, what is that intel thing everyone is talking about?  And why
would the virtio core support vendor specific behavior like that?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-10 Thread Christoph Hellwig
On Tue, Oct 10, 2023 at 10:10:31AM -0300, Jason Gunthorpe wrote:
> We've talked around ideas like allowing the VF config space to do some
> of the work. For simple devices we could get away with 1 VF config
> space register. (VF config space is owned by the hypervisor, not the
> guest)

Which assumes you're actually using VFs and not multiple PFs, which
is a very limiting assumption.  It also limits your from actually
using DMA during the live migration process, which again is major
limitation once you have a non-tivial amount of state.

> SIOVr2 is discussing more a flexible RID mapping - there is a possible
> route where a "VF" could actually have two RIDs, a hypervisor RID and a
> guest RID.

Well, then you go down the SIOV route, which requires a complex driver
actually presenting the guest visible device anyway.

> It really is PCI limitations that force this design of making a PF
> driver do dual duty as a fully functionally normal device and act as a
> communication channel proxy to make a back channel into a SRIOV VF.
> 
> My view has always been that the VFIO live migration operations are
> executed logically within the VF as they only effect the VF.
> 
> So we have a logical design seperation where VFIO world owns the
> commands and the PF driver supplies the communication channel. This
> works well for devices that already have a robust RPC interface to
> their device FW.

Independent of my above points on the doubts on VF-controlled live
migration for PCe device I absolutely agree with your that the Linux
abstraction and user interface should be VF based.  Which further
reinforeces my point that the VFIO driver for the controlled function
(PF or VF) and the Linux driver for the controlling function (better
be a PF in practice) must be very tightly integrated.  And the best
way to do that is to export the vfio nodes from the Linux driver
that knowns the hardware and not split out into a separate one.

> > The driver that knows this hardware.  In this case the virtio subsystem,
> > in case of nvme the nvme driver, and in case of mlx5 the mlx5 driver.
> 
> But those are drivers operating the HW to create kernel devices. Here
> we need a VFIO device. They can't co-exist, if you switch mlx5 from
> normal to vfio you have to tear down the entire normal driver.

Yes, absolutey.  And if we're smart enough we structure it in a way
that we never even initialize the bits of the driver only needed for
the normal kernel consumers.

> > No.  That layout logically follows from what codebase the functionality
> > is part of, though.
> 
> I don't understand what we are talking about really. Where do you
> imagine the vfio_register_XX() goes?

In the driver controlling the hardware.  E.g. for virtio in
driver/virtio/ and for nvme in drivers/nvme/ and for mlx5
in the mlx5 driver directory.

> > > I don't know what "fake-legacy" even means, VFIO is not legacy.
> > 
> > The driver we're talking about in this thread fakes up a virtio_pci
> > legacy devie to the guest on top of a "modern" virtio_pci device.
> 
> I'm not sure I'd use the word fake, inb/outb are always trapped
> operations in VMs. If the device provided a real IO BAR then VFIO
> common code would trap and relay inb/outb to the device.
> 
> All this is doing is changing the inb/outb relay from using a physical
> IO BAR to a DMA command ring.
> 
> The motivation is simply because normal IO BAR space is incredibly
> limited and you can't get enough SRIOV functions when using it.

The fake is not meant as a judgement.  But it creates a virtio-legacy
device that in this form does not exist in hardware.  That's what
I call fake.  If you prefer a different term that's fine with me too.

> > > There is alot of code in VFIO and the VMM side to take a VF and turn
> > > it into a vPCI function. You can't just trivially duplicate VFIO in a
> > > dozen drivers without creating a giant mess.
> > 
> > I do not advocate for duplicating it.  But the code that calls this
> > functionality belongs into the driver that deals with the compound
> > device that we're doing this work for.
> 
> On one hand, I don't really care - we can put the code where people
> like.
> 
> However - the Intel GPU VFIO driver is such a bad experiance I don't
> want to encourage people to make VFIO drivers, or code that is only
> used by VFIO drivers, that are not under drivers/vfio review.

We can and should require vfio review for users of the vfio API.
But to be honest code placement was not the problem with i915.  The
problem was that the mdev APIs (under drivers/vfio) were a complete
trainwreck when it was written, and that the driver had a horrible
hypervisor API abstraction.

> Be aware, there is a significant performance concern here. If you want
> to create 1000 VFIO devices (this is a real thing), we *can't* probe a
> normal driver first, it is too slow. We need a path that goes directly
> from creating the RIDs to turning those RIDs into VFIO.

And by calling the vfio funtions from m

Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-10 Thread Christoph Hellwig
On Wed, Oct 11, 2023 at 02:43:37AM -0400, Michael S. Tsirkin wrote:
> > Btw, what is that intel thing everyone is talking about?  And why
> > would the virtio core support vendor specific behavior like that?
> 
> It's not a thing it's Zhu Lingshan :) intel is just one of the vendors
> that implemented vdpa support and so Zhu Lingshan from intel is working
> on vdpa and has also proposed virtio spec extensions for migration.
> intel's driver is called ifcvf.  vdpa composes all this stuff that is
> added to vfio in userspace, so it's a different approach.

Well, so let's call it virtio live migration instead of intel.

And please work all together in the virtio committee that you have
one way of communication between controlling and controlled functions.
If one extension does it one way and the other a different way that's
just creating a giant mess.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH vfio 10/11] vfio/virtio: Expose admin commands over virtio device

2023-10-11 Thread Christoph Hellwig
On Wed, Oct 11, 2023 at 10:57:09AM -0300, Jason Gunthorpe wrote:
> > Independent of my above points on the doubts on VF-controlled live
> > migration for PCe device I absolutely agree with your that the Linux
> > abstraction and user interface should be VF based.  Which further
> > reinforeces my point that the VFIO driver for the controlled function
> > (PF or VF) and the Linux driver for the controlling function (better
> > be a PF in practice) must be very tightly integrated.  And the best
> > way to do that is to export the vfio nodes from the Linux driver
> > that knowns the hardware and not split out into a separate one.
> 
> I'm not sure how we get to "very tightly integrated". We have many
> examples of live migration vfio drivers now and they do not seem to
> require tight integration. The PF driver only has to provide a way to
> execute a small number of proxied operations.

Yes.  And for that I need to know what VF it actually is dealing
with.  Which is tight integration in my book.

> Regardless, I'm not too fussed about what directory the implementation
> lives in, though I do prefer the current arrangement where VFIO only
> stuff is in drivers/vfio. I like the process we have where subsystems
> are responsible for the code that implements the subsystem ops.

I really don't care about where the code lives (in the directory tree)
either.  But as you see with virtio trying to split it out into
an arbitrary module causes all kinds of pain.

> 
> E800 also made some significant security mistakes that VFIO side
> caught. I think would have been missed if it went into a netdev
> tree.
> 
> Even unrelated to mdev, Intel GPU is still not using the vfio side
> properly, and the way it hacked into KVM to try to get page tracking
> is totally logically wrong (but Works For Me (tm))
> 
> Aside from technical concerns, I do have a big process worry
> here. vfio is responsible for the security side of the review of
> things implementing its ops.

Yes, anytjing exposing a vfio node needs vfio review, period.  And
I don't think where the code lived was the i915 problem.  The problem
was they they were the first open user of the mdev API, which was
just a badly deisgned hook for never published code at that time, and
they then shoehorned it into a weird hypervisor abstraction.  There's
no good way to succeed with that.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 01/17] iommu: Remove struct iommu_ops *iommu from arch_setup_dma_ops()

2023-11-05 Thread Christoph Hellwig
On Fri, Nov 03, 2023 at 01:44:46PM -0300, Jason Gunthorpe wrote:
> This is not being used to pass ops, it is just a way to tell if an
> iommu driver was probed. These days this can be detected directly via
> device_iommu_mapped(). Call device_iommu_mapped() in the two places that
> need to check it and remove the iommu parameter everywhere.

Yes, that's much better than exposing the iommu ops to a place that
should not care about them:

Acked-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-11 Thread Christoph Hellwig
Just as last time:  This does not make any sense.  ioremap is shared
by definition.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5] virtio-blk: Add validation for block size in config space

2021-10-11 Thread Christoph Hellwig
On Tue, Oct 05, 2021 at 06:42:43AM -0400, Michael S. Tsirkin wrote:
> Stefan also pointed out this duplicates the logic from 
> 
> if (blksize < 512 || blksize > PAGE_SIZE || !is_power_of_2(blksize))
> return -EINVAL;
> 
> 
> and a bunch of other places.
> 
> 
> Would it be acceptable for blk layer to validate the input
> instead of having each driver do it's own thing?
> Maybe inside blk_queue_logical_block_size?

I'm pretty sure we want down that before.  Let's just add a helper
just for that check for now as part of this series.  Actually validating
in in blk_queue_logical_block_size seems like a good idea, but returning
errors from that has a long tail.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()

2021-10-11 Thread Christoph Hellwig
On Mon, Oct 11, 2021 at 03:09:09PM -0400, Michael S. Tsirkin wrote:
> The reason we have trouble is that it's not clear what does the API mean
> outside the realm of TDX.
> If we really, truly want an API that says "ioremap and it's a hardened
> driver" then I guess ioremap_hardened_driver is what you want.

Yes.  And why would be we ioremap the BIOS anyway?  It is not I/O memory
in any of the senses we generally use ioremap for.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V4 0/8] Use copy_process/create_io_thread in vhost layer

2021-10-11 Thread Christoph Hellwig
The whole series looks good to me:

Reviewed-by: Christoph Hellwig 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 02/11] dax: remove CONFIG_DAX_DRIVER

2021-10-17 Thread Christoph Hellwig
CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/Kconfig| 4 
 drivers/nvdimm/Kconfig | 2 +-
 drivers/s390/block/Kconfig | 2 +-
 fs/fuse/Kconfig| 2 +-
 4 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index d2834c2cfa10d..954ab14ba7778 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,8 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-config DAX_DRIVER
-   select DAX
-   bool
-
 menuconfig DAX
tristate "DAX: direct access to differentiated memory"
select SRCU
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index b7d1eb38b27d4..347fe7afa5830 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -22,7 +22,7 @@ if LIBNVDIMM
 config BLK_DEV_PMEM
tristate "PMEM: Persistent memory block device support"
default LIBNVDIMM
-   select DAX_DRIVER
+   select DAX
select ND_BTT if BTT
select ND_PFN if NVDIMM_PFN
help
diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
index d0416dbd0cd81..e3710a762abae 100644
--- a/drivers/s390/block/Kconfig
+++ b/drivers/s390/block/Kconfig
@@ -5,7 +5,7 @@ comment "S/390 block device drivers"
 config DCSSBLK
def_tristate m
select FS_DAX_LIMITED
-   select DAX_DRIVER
+   select DAX
prompt "DCSSBLK support"
depends on S390 && BLOCK
help
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 40ce9a1c12e5d..038ed0b9aaa5d 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -45,7 +45,7 @@ config FUSE_DAX
select INTERVAL_TREE
depends on VIRTIO_FS
depends on FS_DAX
-   depends on DAX_DRIVER
+   depends on DAX
help
  This allows bypassing guest page cache and allows mapping host page
  cache directly in guest address space.
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


futher decouple DAX from block devices

2021-10-17 Thread Christoph Hellwig
Hi Dan,

this series cleans up and simplifies the association between DAX and block
devices in preparation of allowing to mount file systems directly on DAX
devices without a detour through block devices.

Diffstat:
 drivers/dax/Kconfig  |4 
 drivers/dax/bus.c|2 
 drivers/dax/super.c  |  220 +--
 drivers/md/dm-linear.c   |   51 +++--
 drivers/md/dm-log-writes.c   |   44 +++-
 drivers/md/dm-stripe.c   |   65 +++-
 drivers/md/dm-table.c|   22 ++--
 drivers/md/dm-writecache.c   |2 
 drivers/md/dm.c  |   29 -
 drivers/md/dm.h  |4 
 drivers/nvdimm/Kconfig   |2 
 drivers/nvdimm/pmem.c|9 -
 drivers/s390/block/Kconfig   |2 
 drivers/s390/block/dcssblk.c |   12 +-
 fs/dax.c |   13 ++
 fs/erofs/super.c |   11 +-
 fs/ext2/super.c  |6 -
 fs/ext4/super.c  |9 +
 fs/fuse/Kconfig  |2 
 fs/fuse/virtio_fs.c  |2 
 fs/xfs/xfs_super.c   |   54 +-
 include/linux/dax.h  |   30 ++---
 22 files changed, 185 insertions(+), 410 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 01/11] dm: make the DAX support dependend on CONFIG_FS_DAX

2021-10-17 Thread Christoph Hellwig
The device mapper DAX support is all hanging off a block device and thus
can't be used with device dax.  Make it depend on CONFIG_FS_DAX instead
of CONFIG_DAX_DRIVER.  This also means that bdev_dax_pgoff only needs to
be built under CONFIG_FS_DAX now.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c| 6 ++
 drivers/md/dm-linear.c | 2 +-
 drivers/md/dm-log-writes.c | 2 +-
 drivers/md/dm-stripe.c | 2 +-
 drivers/md/dm-writecache.c | 2 +-
 drivers/md/dm.c| 2 +-
 6 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index b882cf8106ea3..e20d0cef10a18 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -63,7 +63,7 @@ static int dax_host_hash(const char *host)
return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
 }
 
-#ifdef CONFIG_BLOCK
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
 #include 
 
 int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
@@ -80,7 +80,6 @@ int bdev_dax_pgoff(struct block_device *bdev, sector_t 
sector, size_t size,
 }
 EXPORT_SYMBOL(bdev_dax_pgoff);
 
-#if IS_ENABLED(CONFIG_FS_DAX)
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @host: alternate name for the device registered by a dax driver
@@ -219,8 +218,7 @@ bool dax_supported(struct dax_device *dax_dev, struct 
block_device *bdev,
return ret;
 }
 EXPORT_SYMBOL_GPL(dax_supported);
-#endif /* CONFIG_FS_DAX */
-#endif /* CONFIG_BLOCK */
+#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
 enum dax_device_flags {
/* !alive + rcu grace period == no new operations / mappings */
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 679b4c0a2eea1..32fbab11bf90c 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -163,7 +163,7 @@ static int linear_iterate_devices(struct dm_target *ti,
return fn(ti, lc->dev, lc->start, ti->len, data);
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index d93a4db235124..6d694526881d0 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -903,7 +903,7 @@ static void log_writes_io_hints(struct dm_target *ti, 
struct queue_limits *limit
limits->io_min = limits->physical_block_size;
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
   struct iov_iter *i)
 {
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 6660b6b53d5bf..f084607220293 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -300,7 +300,7 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 18320444fb0a9..4c3a6e33604d3 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -38,7 +38,7 @@
 #define BITMAP_GRANULARITY PAGE_SIZE
 #endif
 
-#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_FS_DAX)
 #define DM_WRITECACHE_HAS_PMEM
 #endif
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 7870e6460633f..79737aee516b1 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1783,7 +1783,7 @@ static struct mapped_device *alloc_dev(int minor)
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
 
-   if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
+   if (IS_ENABLED(CONFIG_FS_DAX)) {
md->dax_dev = alloc_dax(md, md->disk->disk_name,
&dm_dax_ops, 0);
if (IS_ERR(md->dax_dev))
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 04/11] dax: remove the pgmap sanity checks in generic_fsdax_supported

2021-10-17 Thread Christoph Hellwig
Drivers that register a dax_dev should make sure it works, no need
to double check from the file system.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 49 +
 1 file changed, 1 insertion(+), 48 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 9383c11b21853..04fc680542e8d 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -107,13 +107,9 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
 {
-   bool dax_enabled = false;
pgoff_t pgoff, pgoff_end;
-   void *kaddr, *end_kaddr;
-   pfn_t pfn, end_pfn;
sector_t last_page;
-   long len, len2;
-   int err, id;
+   int err;
 
if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
@@ -138,49 +134,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}
 
-   id = dax_read_lock();
-   len = dax_direct_access(dax_dev, pgoff, 1, &kaddr, &pfn);
-   len2 = dax_direct_access(dax_dev, pgoff_end, 1, &end_kaddr, &end_pfn);
-
-   if (len < 1 || len2 < 1) {
-   pr_info("%pg: error: dax access failed (%ld)\n",
-   bdev, len < 1 ? len : len2);
-   dax_read_unlock(id);
-   return false;
-   }
-
-   if (IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) {
-   /*
-* An arch that has enabled the pmem api should also
-* have its drivers support pfn_t_devmap()
-*
-* This is a developer warning and should not trigger in
-* production. dax_flush() will crash since it depends
-* on being able to do (page_address(pfn_to_page())).
-*/
-   WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API));
-   dax_enabled = true;
-   } else if (pfn_t_devmap(pfn) && pfn_t_devmap(end_pfn)) {
-   struct dev_pagemap *pgmap, *end_pgmap;
-
-   pgmap = get_dev_pagemap(pfn_t_to_pfn(pfn), NULL);
-   end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
-   if (pgmap && pgmap == end_pgmap && pgmap->type == 
MEMORY_DEVICE_FS_DAX
-   && pfn_t_to_page(pfn)->pgmap == pgmap
-   && pfn_t_to_page(end_pfn)->pgmap == pgmap
-   && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr))
-   && pfn_t_to_pfn(end_pfn) == 
PHYS_PFN(__pa(end_kaddr)))
-   dax_enabled = true;
-   put_dev_pagemap(pgmap);
-   put_dev_pagemap(end_pgmap);
-
-   }
-   dax_read_unlock(id);
-
-   if (!dax_enabled) {
-   pr_info("%pg: error: dax support not enabled\n", bdev);
-   return false;
-   }
return true;
 }
 EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 05/11] dax: move the partition alignment check into fs_dax_get_by_bdev

2021-10-17 Thread Christoph Hellwig
fs_dax_get_by_bdev is the primary interface to find a dax device for a
block device, so move the partition alignment check there instead of
wiring it up through ->dax_supported.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 04fc680542e8d..482fe775324a4 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -93,6 +93,12 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device 
*bdev)
if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
+   if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
+   (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
+   pr_info("%pg: error: unaligned partition for dax\n", bdev);
+   return NULL;
+   }
+
id = dax_read_lock();
dax_dev = xa_load(&dax_hosts, (unsigned long)bdev->bd_disk);
if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode))
@@ -107,10 +113,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
 {
-   pgoff_t pgoff, pgoff_end;
-   sector_t last_page;
-   int err;
-
if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
return false;
@@ -121,19 +123,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}
 
-   err = bdev_dax_pgoff(bdev, start, PAGE_SIZE, &pgoff);
-   if (err) {
-   pr_info("%pg: error: unaligned partition for dax\n", bdev);
-   return false;
-   }
-
-   last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SIZE / 512;
-   err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, &pgoff_end);
-   if (err) {
-   pr_info("%pg: error: unaligned partition for dax\n", bdev);
-   return false;
-   }
-
return true;
 }
 EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 03/11] dax: simplify the dax_device <-> gendisk association

2021-10-17 Thread Christoph Hellwig
Replace the dax_host_hash with an xarray indexed by the pointer value
of the gendisk, and require explicitl calls from the block drivers that
want to associate their gendisk with a dax_device.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/bus.c|   2 +-
 drivers/dax/super.c  | 106 +--
 drivers/md/dm.c  |   6 +-
 drivers/nvdimm/pmem.c|   8 ++-
 drivers/s390/block/dcssblk.c |  11 +++-
 fs/fuse/virtio_fs.c  |   2 +-
 include/linux/dax.h  |  19 +--
 7 files changed, 60 insertions(+), 94 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 6cc4da4c713d9..6d91b0186e3be 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1326,7 +1326,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data 
*data)
 * No 'host' or dax_operations since there is no access to this
 * device outside of mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
+   dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
goto err_alloc_dax;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e20d0cef10a18..9383c11b21853 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -7,10 +7,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -26,10 +24,8 @@
  * @flags: state and boolean properties
  */
 struct dax_device {
-   struct hlist_node list;
struct inode inode;
struct cdev cdev;
-   const char *host;
void *private;
unsigned long flags;
const struct dax_operations *ops;
@@ -42,10 +38,6 @@ static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
 
-#define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
-static struct hlist_head dax_host_list[DAX_HASH_SIZE];
-static DEFINE_SPINLOCK(dax_host_lock);
-
 int dax_read_lock(void)
 {
return srcu_read_lock(&dax_srcu);
@@ -58,13 +50,22 @@ void dax_read_unlock(int id)
 }
 EXPORT_SYMBOL_GPL(dax_read_unlock);
 
-static int dax_host_hash(const char *host)
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
+#include 
+
+static DEFINE_XARRAY(dax_hosts);
+
+int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
 {
-   return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
+   return xa_insert(&dax_hosts, (unsigned long)disk, dax_dev, GFP_KERNEL);
 }
+EXPORT_SYMBOL_GPL(dax_add_host);
 
-#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
-#include 
+void dax_remove_host(struct gendisk *disk)
+{
+   xa_erase(&dax_hosts, (unsigned long)disk);
+}
+EXPORT_SYMBOL_GPL(dax_remove_host);
 
 int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
pgoff_t *pgoff)
@@ -82,40 +83,23 @@ EXPORT_SYMBOL(bdev_dax_pgoff);
 
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
- * @host: alternate name for the device registered by a dax driver
+ * @bdev: block device to find a dax_device for
  */
-static struct dax_device *dax_get_by_host(const char *host)
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
 {
-   struct dax_device *dax_dev, *found = NULL;
-   int hash, id;
+   struct dax_device *dax_dev;
+   int id;
 
-   if (!host)
+   if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
-   hash = dax_host_hash(host);
-
id = dax_read_lock();
-   spin_lock(&dax_host_lock);
-   hlist_for_each_entry(dax_dev, &dax_host_list[hash], list) {
-   if (!dax_alive(dax_dev)
-   || strcmp(host, dax_dev->host) != 0)
-   continue;
-
-   if (igrab(&dax_dev->inode))
-   found = dax_dev;
-   break;
-   }
-   spin_unlock(&dax_host_lock);
+   dax_dev = xa_load(&dax_hosts, (unsigned long)bdev->bd_disk);
+   if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode))
+   dax_dev = NULL;
dax_read_unlock(id);
 
-   return found;
-}
-
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
-{
-   if (!blk_queue_dax(bdev->bd_disk->queue))
-   return NULL;
-   return dax_get_by_host(bdev->bd_disk->disk_name);
+   return dax_dev;
 }
 EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
 
@@ -361,12 +345,7 @@ void kill_dax(struct dax_device *dax_dev)
return;
 
clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
-
synchronize_srcu(&dax_srcu);
-
-   spin_lock(&dax_host_lock);
-   hlist_del_init(&dax_dev->list);
-   spin_unlock(&dax_host_lock

[PATCH 08/11] dm-linear: add a linear_dax_pgoff helper

2021-10-17 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-linear.c | 49 +-
 1 file changed, 15 insertions(+), 34 deletions(-)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 32fbab11bf90c..bf03f73fd0f36 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -164,63 +164,44 @@ static int linear_iterate_devices(struct dm_target *ti,
 }
 
 #if IS_ENABLED(CONFIG_FS_DAX)
+static struct dax_device *linear_dax_pgoff(struct dm_target *ti, pgoff_t 
*pgoff)
+{
+   struct linear_c *lc = ti->private;
+   sector_t sector = linear_map_sector(ti, *pgoff << PAGE_SECTORS_SHIFT);
+
+   *pgoff = (get_start_sect(lc->dev->bdev) + sector) >> PAGE_SECTORS_SHIFT;
+   return lc->dev->dax_dev;
+}
+
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
-   long ret;
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
-   dev_sector = linear_map_sector(ti, sector);
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff);
-   if (ret)
-   return ret;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
 
-   dev_sector = linear_map_sector(ti, sector);
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
-   return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
 
-   dev_sector = linear_map_sector(ti, sector);
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
-   return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
-   dev_sector = linear_map_sector(ti, sector);
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, &pgoff);
-   if (ret)
-   return ret;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
+
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 07/11] dax: remove dax_capable

2021-10-17 Thread Christoph Hellwig
Just open code the block size and dax_dev == NULL checks in the callers.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c  | 36 
 drivers/md/dm-table.c| 22 +++---
 drivers/md/dm.c  | 21 -
 drivers/md/dm.h  |  4 
 drivers/nvdimm/pmem.c|  1 -
 drivers/s390/block/dcssblk.c |  1 -
 fs/erofs/super.c | 11 +++
 fs/ext2/super.c  |  6 --
 fs/ext4/super.c  |  9 ++---
 fs/xfs/xfs_super.c   | 21 -
 include/linux/dax.h  | 14 --
 11 files changed, 36 insertions(+), 110 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 482fe775324a4..803942586d1b6 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -108,42 +108,6 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device 
*bdev)
return dax_dev;
 }
 EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
-
-bool generic_fsdax_supported(struct dax_device *dax_dev,
-   struct block_device *bdev, int blocksize, sector_t start,
-   sector_t sectors)
-{
-   if (blocksize != PAGE_SIZE) {
-   pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
-   return false;
-   }
-
-   if (!dax_dev) {
-   pr_debug("%pg: error: dax unsupported by block device\n", bdev);
-   return false;
-   }
-
-   return true;
-}
-EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-
-bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
-   int blocksize, sector_t start, sector_t len)
-{
-   bool ret = false;
-   int id;
-
-   if (!dax_dev)
-   return false;
-
-   id = dax_read_lock();
-   if (dax_alive(dax_dev) && dax_dev->ops->dax_supported)
-   ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize,
- start, len);
-   dax_read_unlock(id);
-   return ret;
-}
-EXPORT_SYMBOL_GPL(dax_supported);
 #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
 enum dax_device_flags {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 1fa4d5582dca5..4ae671c2168ea 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -807,12 +807,14 @@ void dm_table_set_type(struct dm_table *t, enum 
dm_queue_mode type)
 EXPORT_SYMBOL_GPL(dm_table_set_type);
 
 /* validate the dax capability of the target device span */
-int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
+static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
 {
-   int blocksize = *(int *) data;
+   if (dev->dax_dev)
+   return false;
 
-   return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
+   pr_debug("%pg: error: dax unsupported by block device\n", dev->bdev);
+   return true;
 }
 
 /* Check devices support synchronous DAX */
@@ -822,8 +824,8 @@ static int device_not_dax_synchronous_capable(struct 
dm_target *ti, struct dm_de
return !dev->dax_dev || !dax_synchronous(dev->dax_dev);
 }
 
-bool dm_table_supports_dax(struct dm_table *t,
-  iterate_devices_callout_fn iterate_fn, int 
*blocksize)
+static bool dm_table_supports_dax(struct dm_table *t,
+  iterate_devices_callout_fn iterate_fn)
 {
struct dm_target *ti;
unsigned i;
@@ -836,7 +838,7 @@ bool dm_table_supports_dax(struct dm_table *t,
return false;
 
if (!ti->type->iterate_devices ||
-   ti->type->iterate_devices(ti, iterate_fn, blocksize))
+   ti->type->iterate_devices(ti, iterate_fn, NULL))
return false;
}
 
@@ -863,7 +865,6 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
-   int page_size = PAGE_SIZE;
 
if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -907,7 +908,7 @@ static int dm_table_determine_type(struct dm_table *t)
 verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
-   if (dm_table_supports_dax(t, device_not_dax_capable, 
&page_size) ||
+   if (dm_table_supports_dax(t, device_not_dax_capable) ||
(list_empty(devices) && live_md_type == 
DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
}
@@ -1981,7 +1982,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct 
reques

[PATCH 06/11] xfs: factor out a xfs_setup_dax helper

2021-10-17 Thread Christoph Hellwig
Factor out another DAX setup helper to simplify future changes.  Also
move the experimental warning after the checks to not clutter the log
too much if the setup failed.

Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_super.c | 47 +++---
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index c4e0cd1c1c8ca..d07020a8eb9e3 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -339,6 +339,32 @@ xfs_buftarg_is_dax(
bdev_nr_sectors(bt->bt_bdev));
 }
 
+static int
+xfs_setup_dax(
+   struct xfs_mount*mp)
+{
+   struct super_block  *sb = mp->m_super;
+
+   if (!xfs_buftarg_is_dax(sb, mp->m_ddev_targp) &&
+  (!mp->m_rtdev_targp || !xfs_buftarg_is_dax(sb, mp->m_rtdev_targp))) {
+   xfs_alert(mp,
+   "DAX unsupported by block device. Turning off DAX.");
+   goto disable_dax;
+   }
+
+   if (xfs_has_reflink(mp)) {
+   xfs_alert(mp, "DAX and reflink cannot be used together!");
+   return -EINVAL;
+   }
+
+   xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own 
risk");
+   return 0;
+
+disable_dax:
+   xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
+   return 0;
+}
+
 STATIC int
 xfs_blkdev_get(
xfs_mount_t *mp,
@@ -1592,26 +1618,9 @@ xfs_fs_fill_super(
sb->s_flags |= SB_I_VERSION;
 
if (xfs_has_dax_always(mp)) {
-   bool rtdev_is_dax = false, datadev_is_dax;
-
-   xfs_warn(mp,
-   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
-
-   datadev_is_dax = xfs_buftarg_is_dax(sb, mp->m_ddev_targp);
-   if (mp->m_rtdev_targp)
-   rtdev_is_dax = xfs_buftarg_is_dax(sb,
-   mp->m_rtdev_targp);
-   if (!rtdev_is_dax && !datadev_is_dax) {
-   xfs_alert(mp,
-   "DAX unsupported by block device. Turning off DAX.");
-   xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
-   }
-   if (xfs_has_reflink(mp)) {
-   xfs_alert(mp,
-   "DAX and reflink cannot be used together!");
-   error = -EINVAL;
+   error = xfs_setup_dax(mp);
+   if (error)
goto out_filestream_unmount;
-   }
}
 
if (xfs_has_discard(mp)) {
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 09/11] dm-log-writes: add a log_writes_dax_pgoff helper

2021-10-17 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-log-writes.c | 42 +++---
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 6d694526881d0..5aac60c1b774c 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -949,17 +949,21 @@ static int log_dax(struct log_writes_c *lc, sector_t 
sector, size_t bytes,
return 0;
 }
 
+static struct dax_device *log_writes_dax_pgoff(struct dm_target *ti,
+   pgoff_t *pgoff)
+{
+   struct log_writes_c *lc = ti->private;
+
+   *pgoff += (get_start_sect(lc->dev->bdev) >> PAGE_SECTORS_SHIFT);
+   return lc->dev->dax_dev;
+}
+
 static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
 long nr_pages, void **kaddr, pfn_t 
*pfn)
 {
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
-   int ret;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
 
-   ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages * PAGE_SIZE, 
&pgoff);
-   if (ret)
-   return ret;
-   return dax_direct_access(lc->dev->dax_dev, pgoff, nr_pages, kaddr, pfn);
+   return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
@@ -968,11 +972,9 @@ static size_t log_writes_dax_copy_from_iter(struct 
dm_target *ti,
 {
struct log_writes_c *lc = ti->private;
sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
int err;
 
-   if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), 
&pgoff))
-   return 0;
-
/* Don't bother doing anything if logging has been disabled */
if (!lc->logging_enabled)
goto dax_copy;
@@ -983,34 +985,24 @@ static size_t log_writes_dax_copy_from_iter(struct 
dm_target *ti,
return 0;
}
 dax_copy:
-   return dax_copy_from_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+   return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t log_writes_dax_copy_to_iter(struct dm_target *ti,
  pgoff_t pgoff, void *addr, size_t 
bytes,
  struct iov_iter *i)
 {
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
 
-   if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), 
&pgoff))
-   return 0;
-   return dax_copy_to_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+   return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int log_writes_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
 
-   ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages << PAGE_SHIFT,
-&pgoff);
-   if (ret)
-   return ret;
-   return dax_zero_page_range(lc->dev->dax_dev, pgoff,
-  nr_pages << PAGE_SHIFT);
+   return dax_zero_page_range(dax_dev, pgoff, nr_pages << PAGE_SHIFT);
 }
 
 #else
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 10/11] dm-stripe: add a stripe_dax_pgoff helper

2021-10-17 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-stripe.c | 63 ++
 1 file changed, 15 insertions(+), 48 deletions(-)

diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index f084607220293..50dba3f39274c 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -301,83 +301,50 @@ static int stripe_map(struct dm_target *ti, struct bio 
*bio)
 }
 
 #if IS_ENABLED(CONFIG_FS_DAX)
-static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
-   long nr_pages, void **kaddr, pfn_t *pfn)
+static struct dax_device *stripe_dax_pgoff(struct dm_target *ti, pgoff_t 
*pgoff)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
struct block_device *bdev;
+   sector_t dev_sector;
uint32_t stripe;
-   long ret;
 
-   stripe_map_sector(sc, sector, &stripe, &dev_sector);
+   stripe_map_sector(sc, *pgoff * PAGE_SECTORS, &stripe, &dev_sector);
dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
bdev = sc->stripe[stripe].dev->bdev;
 
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff);
-   if (ret)
-   return ret;
+   *pgoff = (get_start_sect(bdev) + dev_sector) >> PAGE_SECTORS_SHIFT;
+   return sc->stripe[stripe].dev->dax_dev;
+}
+
+static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
+   long nr_pages, void **kaddr, pfn_t *pfn)
+{
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
-
-   stripe_map_sector(sc, sector, &stripe, &dev_sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
 
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
-   return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
-
-   stripe_map_sector(sc, sector, &stripe, &dev_sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
 
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
-   return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
 
-   stripe_map_sector(sc, sector, &stripe, &dev_sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
-
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, &pgoff);
-   if (ret)
-   return ret;
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 11/11] dax: move bdev_dax_pgoff to fs/dax.c

2021-10-17 Thread Christoph Hellwig
No functional changet, but this will allow for a tighter integration
with the iomap code, including possible passing the partition offset
in the iomap in the future.  For now it mostly avoids growing more
callers outside of fs/dax.c.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 14 --
 fs/dax.c| 13 +
 include/linux/dax.h |  1 -
 3 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 803942586d1b6..c0910687fbcb2 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -67,20 +67,6 @@ void dax_remove_host(struct gendisk *disk)
 }
 EXPORT_SYMBOL_GPL(dax_remove_host);
 
-int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
-   pgoff_t *pgoff)
-{
-   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
-   phys_addr_t phys_off = (start_sect + sector) * 512;
-
-   if (pgoff)
-   *pgoff = PHYS_PFN(phys_off);
-   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
-   return -EINVAL;
-   return 0;
-}
-EXPORT_SYMBOL(bdev_dax_pgoff);
-
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @bdev: block device to find a dax_device for
diff --git a/fs/dax.c b/fs/dax.c
index 4e3e5a283a916..eb715363fd667 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -709,6 +709,19 @@ int dax_invalidate_mapping_entry_sync(struct address_space 
*mapping,
return __dax_invalidate_entry(mapping, index, false);
 }
 
+static int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t 
size,
+   pgoff_t *pgoff)
+{
+   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
+   phys_addr_t phys_off = (start_sect + sector) * 512;
+
+   if (pgoff)
+   *pgoff = PHYS_PFN(phys_off);
+   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
+   return -EINVAL;
+   return 0;
+}
+
 static int copy_cow_page_dax(struct block_device *bdev, struct dax_device 
*dax_dev,
 sector_t sector, struct page *to, unsigned long 
vaddr)
 {
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 439c3c70e347b..324363b798ecd 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -107,7 +107,6 @@ static inline bool daxdev_mapping_supported(struct 
vm_area_struct *vma,
 #endif
 
 struct writeback_control;
-int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff);
 #if IS_ENABLED(CONFIG_FS_DAX)
 int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
 void dax_remove_host(struct gendisk *disk);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 06/11] xfs: factor out a xfs_setup_dax helper

2021-10-19 Thread Christoph Hellwig
On Mon, Oct 18, 2021 at 09:43:51AM -0700, Darrick J. Wong wrote:
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -339,6 +339,32 @@ xfs_buftarg_is_dax(
> > bdev_nr_sectors(bt->bt_bdev));
> >  }
> >  
> > +static int
> > +xfs_setup_dax(
> 
> /me wonders if this should be named xfs_setup_dax_always, since this
> doesn't handle the dax=inode mode?

Sure, why not.

> The only reason I bring that up is that Eric reminded me a while ago
> that we don't actually print any kind of EXPERIMENTAL warning for the
> auto-detection behavior.

Yes, I actually noticed that as well when preparing this series.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_blk: corrent types for status handling

2021-10-25 Thread Christoph Hellwig
On Mon, Oct 25, 2021 at 11:24:57AM +0300, Max Gurtovoy wrote:
> Maybe we can compare the returned status to BLK_STS_OK. But I see we don't 
> do it also in NVMe subsystem so I guess we can assume BLK_STS_OK == 0 
> forever.

Jes,  BLK_STS_OK == 0 is an intentional allowed short cut.  It is not
just a block layer design, but part of how the sparse __bitwise__
annotations work.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: futher decouple DAX from block devices

2021-11-04 Thread Christoph Hellwig
On Wed, Nov 03, 2021 at 12:59:31PM -0500, Eric Sandeen wrote:
> Christoph, can I ask what the end game looks like, here? If dax is completely
> decoupled from block devices, are there user-visible changes?

Yes.

> If I want to
> run fs-dax on a pmem device - what do I point mkfs at, if not a block device?

The rough plan is to use the device dax character devices.  I'll hopefully
have a draft version in the next days.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: futher decouple DAX from block devices

2021-11-04 Thread Christoph Hellwig
On Thu, Nov 04, 2021 at 10:34:17AM -0700, Darrick J. Wong wrote:
> /me wonders, are block devices going away?  Will mkfs.xfs have to learn
> how to talk to certain chardevs?  I guess jffs2 and others already do
> that kind of thing... but I suppose I can wait for the real draft to
> show up to ramble further. ;)

Right now I've mostly been looking into the kernel side.  An no, I
do not expect /dev/pmem* to go away as you'll still need it for a
not DAX aware file system and/or application (such as mkfs initially).

But yes, just pointing mkfs to the chardev should be doable with very
little work.  We can point it to a regular file after all.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


decouple DAX from block devices

2021-11-09 Thread Christoph Hellwig
Hi Dan,

this series decouples the DAX from the block layer so that the
block_device is not needed at all for the DAX I/O path.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 02/29] dm: make the DAX support dependend on CONFIG_FS_DAX

2021-11-09 Thread Christoph Hellwig
The device mapper DAX support is all hanging off a block device and thus
can't be used with device dax.  Make it depend on CONFIG_FS_DAX instead
of CONFIG_DAX_DRIVER.  This also means that bdev_dax_pgoff only needs to
be built under CONFIG_FS_DAX now.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c| 6 ++
 drivers/md/dm-linear.c | 2 +-
 drivers/md/dm-log-writes.c | 2 +-
 drivers/md/dm-stripe.c | 2 +-
 drivers/md/dm-writecache.c | 2 +-
 drivers/md/dm.c| 2 +-
 6 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index b882cf8106ea3..e20d0cef10a18 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -63,7 +63,7 @@ static int dax_host_hash(const char *host)
return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
 }
 
-#ifdef CONFIG_BLOCK
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
 #include 
 
 int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
@@ -80,7 +80,6 @@ int bdev_dax_pgoff(struct block_device *bdev, sector_t 
sector, size_t size,
 }
 EXPORT_SYMBOL(bdev_dax_pgoff);
 
-#if IS_ENABLED(CONFIG_FS_DAX)
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @host: alternate name for the device registered by a dax driver
@@ -219,8 +218,7 @@ bool dax_supported(struct dax_device *dax_dev, struct 
block_device *bdev,
return ret;
 }
 EXPORT_SYMBOL_GPL(dax_supported);
-#endif /* CONFIG_FS_DAX */
-#endif /* CONFIG_BLOCK */
+#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
 enum dax_device_flags {
/* !alive + rcu grace period == no new operations / mappings */
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 66ba16713f696..0a260c35aeeed 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -162,7 +162,7 @@ static int linear_iterate_devices(struct dm_target *ti,
return fn(ti, lc->dev, lc->start, ti->len, data);
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 46de085a96709..524bc536922eb 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -903,7 +903,7 @@ static void log_writes_io_hints(struct dm_target *ti, 
struct queue_limits *limit
limits->io_min = limits->physical_block_size;
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
   struct iov_iter *i)
 {
diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index 6660b6b53d5bf..f084607220293 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -300,7 +300,7 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
 }
 
-#if IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_FS_DAX)
 static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
diff --git a/drivers/md/dm-writecache.c b/drivers/md/dm-writecache.c
index 017806096b91e..0af464a863fe6 100644
--- a/drivers/md/dm-writecache.c
+++ b/drivers/md/dm-writecache.c
@@ -38,7 +38,7 @@
 #define BITMAP_GRANULARITY PAGE_SIZE
 #endif
 
-#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_DAX_DRIVER)
+#if IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API) && IS_ENABLED(CONFIG_FS_DAX)
 #define DM_WRITECACHE_HAS_PMEM
 #endif
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 63aa522636585..893fca738a3d8 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1783,7 +1783,7 @@ static struct mapped_device *alloc_dev(int minor)
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
 
-   if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
+   if (IS_ENABLED(CONFIG_FS_DAX)) {
md->dax_dev = alloc_dax(md, md->disk->disk_name,
&dm_dax_ops, 0);
if (IS_ERR(md->dax_dev))
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 05/29] dax: remove the pgmap sanity checks in generic_fsdax_supported

2021-11-09 Thread Christoph Hellwig
Drivers that register a dax_dev should make sure it works, no need
to double check from the file system.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 49 +
 1 file changed, 1 insertion(+), 48 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 9383c11b21853..04fc680542e8d 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -107,13 +107,9 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
 {
-   bool dax_enabled = false;
pgoff_t pgoff, pgoff_end;
-   void *kaddr, *end_kaddr;
-   pfn_t pfn, end_pfn;
sector_t last_page;
-   long len, len2;
-   int err, id;
+   int err;
 
if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
@@ -138,49 +134,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}
 
-   id = dax_read_lock();
-   len = dax_direct_access(dax_dev, pgoff, 1, &kaddr, &pfn);
-   len2 = dax_direct_access(dax_dev, pgoff_end, 1, &end_kaddr, &end_pfn);
-
-   if (len < 1 || len2 < 1) {
-   pr_info("%pg: error: dax access failed (%ld)\n",
-   bdev, len < 1 ? len : len2);
-   dax_read_unlock(id);
-   return false;
-   }
-
-   if (IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) {
-   /*
-* An arch that has enabled the pmem api should also
-* have its drivers support pfn_t_devmap()
-*
-* This is a developer warning and should not trigger in
-* production. dax_flush() will crash since it depends
-* on being able to do (page_address(pfn_to_page())).
-*/
-   WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API));
-   dax_enabled = true;
-   } else if (pfn_t_devmap(pfn) && pfn_t_devmap(end_pfn)) {
-   struct dev_pagemap *pgmap, *end_pgmap;
-
-   pgmap = get_dev_pagemap(pfn_t_to_pfn(pfn), NULL);
-   end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
-   if (pgmap && pgmap == end_pgmap && pgmap->type == 
MEMORY_DEVICE_FS_DAX
-   && pfn_t_to_page(pfn)->pgmap == pgmap
-   && pfn_t_to_page(end_pfn)->pgmap == pgmap
-   && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr))
-   && pfn_t_to_pfn(end_pfn) == 
PHYS_PFN(__pa(end_kaddr)))
-   dax_enabled = true;
-   put_dev_pagemap(pgmap);
-   put_dev_pagemap(end_pgmap);
-
-   }
-   dax_read_unlock(id);
-
-   if (!dax_enabled) {
-   pr_info("%pg: error: dax support not enabled\n", bdev);
-   return false;
-   }
return true;
 }
 EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 03/29] dax: remove CONFIG_DAX_DRIVER

2021-11-09 Thread Christoph Hellwig
CONFIG_DAX_DRIVER only selects CONFIG_DAX now, so remove it.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/Kconfig| 4 
 drivers/nvdimm/Kconfig | 2 +-
 drivers/s390/block/Kconfig | 2 +-
 fs/fuse/Kconfig| 2 +-
 4 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig
index d2834c2cfa10d..954ab14ba7778 100644
--- a/drivers/dax/Kconfig
+++ b/drivers/dax/Kconfig
@@ -1,8 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
-config DAX_DRIVER
-   select DAX
-   bool
-
 menuconfig DAX
tristate "DAX: direct access to differentiated memory"
select SRCU
diff --git a/drivers/nvdimm/Kconfig b/drivers/nvdimm/Kconfig
index b7d1eb38b27d4..347fe7afa5830 100644
--- a/drivers/nvdimm/Kconfig
+++ b/drivers/nvdimm/Kconfig
@@ -22,7 +22,7 @@ if LIBNVDIMM
 config BLK_DEV_PMEM
tristate "PMEM: Persistent memory block device support"
default LIBNVDIMM
-   select DAX_DRIVER
+   select DAX
select ND_BTT if BTT
select ND_PFN if NVDIMM_PFN
help
diff --git a/drivers/s390/block/Kconfig b/drivers/s390/block/Kconfig
index d0416dbd0cd81..e3710a762abae 100644
--- a/drivers/s390/block/Kconfig
+++ b/drivers/s390/block/Kconfig
@@ -5,7 +5,7 @@ comment "S/390 block device drivers"
 config DCSSBLK
def_tristate m
select FS_DAX_LIMITED
-   select DAX_DRIVER
+   select DAX
prompt "DCSSBLK support"
depends on S390 && BLOCK
help
diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig
index 40ce9a1c12e5d..038ed0b9aaa5d 100644
--- a/fs/fuse/Kconfig
+++ b/fs/fuse/Kconfig
@@ -45,7 +45,7 @@ config FUSE_DAX
select INTERVAL_TREE
depends on VIRTIO_FS
depends on FS_DAX
-   depends on DAX_DRIVER
+   depends on DAX
help
  This allows bypassing guest page cache and allows mapping host page
  cache directly in guest address space.
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 01/29] nvdimm/pmem: move dax_attribute_group from dax to pmem

2021-11-09 Thread Christoph Hellwig
dax_attribute_group is only used by the pmem driver, and can avoid the
completely pointless lookup by the disk name if moved there.  This
leaves just a single caller of dax_get_by_host, so move dax_get_by_host
into the same ifdef block as that caller.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Dan Williams 
Link: https://lore.kernel.org/r/20210922173431.2454024-3-...@lst.de
Signed-off-by: Dan Williams 
---
 drivers/dax/super.c   | 100 --
 drivers/nvdimm/pmem.c |  43 ++
 include/linux/dax.h   |   2 -
 3 files changed, 61 insertions(+), 84 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index fc89e91beea7c..b882cf8106ea3 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -63,6 +63,24 @@ static int dax_host_hash(const char *host)
return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
 }
 
+#ifdef CONFIG_BLOCK
+#include 
+
+int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
+   pgoff_t *pgoff)
+{
+   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
+   phys_addr_t phys_off = (start_sect + sector) * 512;
+
+   if (pgoff)
+   *pgoff = PHYS_PFN(phys_off);
+   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
+   return -EINVAL;
+   return 0;
+}
+EXPORT_SYMBOL(bdev_dax_pgoff);
+
+#if IS_ENABLED(CONFIG_FS_DAX)
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @host: alternate name for the device registered by a dax driver
@@ -94,24 +112,6 @@ static struct dax_device *dax_get_by_host(const char *host)
return found;
 }
 
-#ifdef CONFIG_BLOCK
-#include 
-
-int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
-   pgoff_t *pgoff)
-{
-   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
-   phys_addr_t phys_off = (start_sect + sector) * 512;
-
-   if (pgoff)
-   *pgoff = PHYS_PFN(phys_off);
-   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
-   return -EINVAL;
-   return 0;
-}
-EXPORT_SYMBOL(bdev_dax_pgoff);
-
-#if IS_ENABLED(CONFIG_FS_DAX)
 struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
 {
if (!blk_queue_dax(bdev->bd_disk->queue))
@@ -231,70 +231,6 @@ enum dax_device_flags {
DAXDEV_SYNC,
 };
 
-static ssize_t write_cache_show(struct device *dev,
-   struct device_attribute *attr, char *buf)
-{
-   struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
-   ssize_t rc;
-
-   WARN_ON_ONCE(!dax_dev);
-   if (!dax_dev)
-   return -ENXIO;
-
-   rc = sprintf(buf, "%d\n", !!dax_write_cache_enabled(dax_dev));
-   put_dax(dax_dev);
-   return rc;
-}
-
-static ssize_t write_cache_store(struct device *dev,
-   struct device_attribute *attr, const char *buf, size_t len)
-{
-   bool write_cache;
-   int rc = strtobool(buf, &write_cache);
-   struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
-
-   WARN_ON_ONCE(!dax_dev);
-   if (!dax_dev)
-   return -ENXIO;
-
-   if (rc)
-   len = rc;
-   else
-   dax_write_cache(dax_dev, write_cache);
-
-   put_dax(dax_dev);
-   return len;
-}
-static DEVICE_ATTR_RW(write_cache);
-
-static umode_t dax_visible(struct kobject *kobj, struct attribute *a, int n)
-{
-   struct device *dev = container_of(kobj, typeof(*dev), kobj);
-   struct dax_device *dax_dev = dax_get_by_host(dev_name(dev));
-
-   WARN_ON_ONCE(!dax_dev);
-   if (!dax_dev)
-   return 0;
-
-#ifndef CONFIG_ARCH_HAS_PMEM_API
-   if (a == &dev_attr_write_cache.attr)
-   return 0;
-#endif
-   return a->mode;
-}
-
-static struct attribute *dax_attributes[] = {
-   &dev_attr_write_cache.attr,
-   NULL,
-};
-
-struct attribute_group dax_attribute_group = {
-   .name = "dax",
-   .attrs = dax_attributes,
-   .is_visible = dax_visible,
-};
-EXPORT_SYMBOL_GPL(dax_attribute_group);
-
 /**
  * dax_direct_access() - translate a device pgoff to an absolute pfn
  * @dax_dev: a dax_device instance representing the logical memory range
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index c74d7bceb2224..9cc0d0ebfad16 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -327,6 +327,49 @@ static const struct dax_operations pmem_dax_ops = {
.zero_page_range = pmem_dax_zero_page_range,
 };
 
+static ssize_t write_cache_show(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct pmem_device *pmem = dev_to_disk(dev)->private_data;
+
+   return sprintf(buf, "%d\n", !!dax_write_cache_enabled(pmem->dax_dev));
+}
+
+static ssize_t write_cache_store(struct device *dev,
+   struct device_attribute *attr, const char *buf, size_t len)
+{
+   stru

[PATCH 06/29] dax: move the partition alignment check into fs_dax_get_by_bdev

2021-11-09 Thread Christoph Hellwig
fs_dax_get_by_bdev is the primary interface to find a dax device for a
block device, so move the partition alignment check there instead of
wiring it up through ->dax_supported.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 04fc680542e8d..482fe775324a4 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -93,6 +93,12 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device 
*bdev)
if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
+   if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
+   (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
+   pr_info("%pg: error: unaligned partition for dax\n", bdev);
+   return NULL;
+   }
+
id = dax_read_lock();
dax_dev = xa_load(&dax_hosts, (unsigned long)bdev->bd_disk);
if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode))
@@ -107,10 +113,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
 {
-   pgoff_t pgoff, pgoff_end;
-   sector_t last_page;
-   int err;
-
if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
return false;
@@ -121,19 +123,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}
 
-   err = bdev_dax_pgoff(bdev, start, PAGE_SIZE, &pgoff);
-   if (err) {
-   pr_info("%pg: error: unaligned partition for dax\n", bdev);
-   return false;
-   }
-
-   last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SIZE / 512;
-   err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, &pgoff_end);
-   if (err) {
-   pr_info("%pg: error: unaligned partition for dax\n", bdev);
-   return false;
-   }
-
return true;
 }
 EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 07/29] xfs: factor out a xfs_setup_dax_always helper

2021-11-09 Thread Christoph Hellwig
Factor out another DAX setup helper to simplify future changes.  Also
move the experimental warning after the checks to not clutter the log
too much if the setup failed.

Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_super.c | 47 +++---
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index e21459f9923a8..875fd3151d6c9 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -340,6 +340,32 @@ xfs_buftarg_is_dax(
bdev_nr_sectors(bt->bt_bdev));
 }
 
+static int
+xfs_setup_dax_always(
+   struct xfs_mount*mp)
+{
+   struct super_block  *sb = mp->m_super;
+
+   if (!xfs_buftarg_is_dax(sb, mp->m_ddev_targp) &&
+  (!mp->m_rtdev_targp || !xfs_buftarg_is_dax(sb, mp->m_rtdev_targp))) {
+   xfs_alert(mp,
+   "DAX unsupported by block device. Turning off DAX.");
+   goto disable_dax;
+   }
+
+   if (xfs_has_reflink(mp)) {
+   xfs_alert(mp, "DAX and reflink cannot be used together!");
+   return -EINVAL;
+   }
+
+   xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own 
risk");
+   return 0;
+
+disable_dax:
+   xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
+   return 0;
+}
+
 STATIC int
 xfs_blkdev_get(
xfs_mount_t *mp,
@@ -1593,26 +1619,9 @@ xfs_fs_fill_super(
sb->s_flags |= SB_I_VERSION;
 
if (xfs_has_dax_always(mp)) {
-   bool rtdev_is_dax = false, datadev_is_dax;
-
-   xfs_warn(mp,
-   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
-
-   datadev_is_dax = xfs_buftarg_is_dax(sb, mp->m_ddev_targp);
-   if (mp->m_rtdev_targp)
-   rtdev_is_dax = xfs_buftarg_is_dax(sb,
-   mp->m_rtdev_targp);
-   if (!rtdev_is_dax && !datadev_is_dax) {
-   xfs_alert(mp,
-   "DAX unsupported by block device. Turning off DAX.");
-   xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
-   }
-   if (xfs_has_reflink(mp)) {
-   xfs_alert(mp,
-   "DAX and reflink cannot be used together!");
-   error = -EINVAL;
+   error = xfs_setup_dax_always(mp);
+   if (error)
goto out_filestream_unmount;
-   }
}
 
if (xfs_has_discard(mp)) {
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 04/29] dax: simplify the dax_device <-> gendisk association

2021-11-09 Thread Christoph Hellwig
Replace the dax_host_hash with an xarray indexed by the pointer value
of the gendisk, and require explicitly calls from the block drivers that
want to associate their gendisk with a dax_device.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/dax/bus.c|   6 +-
 drivers/dax/super.c  | 106 +--
 drivers/md/dm.c  |   6 +-
 drivers/nvdimm/pmem.c|   8 ++-
 drivers/s390/block/dcssblk.c |  11 +++-
 fs/fuse/virtio_fs.c  |   2 +-
 include/linux/dax.h  |  19 +--
 7 files changed, 62 insertions(+), 96 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 6cc4da4c713d9..bd7af2f7c5b0a 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1323,10 +1323,10 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data 
*data)
}
 
/*
-* No 'host' or dax_operations since there is no access to this
-* device outside of mmap of the resulting character device.
+* No dax_operations since there is no access to this device outside of
+* mmap of the resulting character device.
 */
-   dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
+   dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
goto err_alloc_dax;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e20d0cef10a18..9383c11b21853 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -7,10 +7,8 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -26,10 +24,8 @@
  * @flags: state and boolean properties
  */
 struct dax_device {
-   struct hlist_node list;
struct inode inode;
struct cdev cdev;
-   const char *host;
void *private;
unsigned long flags;
const struct dax_operations *ops;
@@ -42,10 +38,6 @@ static DEFINE_IDA(dax_minor_ida);
 static struct kmem_cache *dax_cache __read_mostly;
 static struct super_block *dax_superblock __read_mostly;
 
-#define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
-static struct hlist_head dax_host_list[DAX_HASH_SIZE];
-static DEFINE_SPINLOCK(dax_host_lock);
-
 int dax_read_lock(void)
 {
return srcu_read_lock(&dax_srcu);
@@ -58,13 +50,22 @@ void dax_read_unlock(int id)
 }
 EXPORT_SYMBOL_GPL(dax_read_unlock);
 
-static int dax_host_hash(const char *host)
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
+#include 
+
+static DEFINE_XARRAY(dax_hosts);
+
+int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
 {
-   return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
+   return xa_insert(&dax_hosts, (unsigned long)disk, dax_dev, GFP_KERNEL);
 }
+EXPORT_SYMBOL_GPL(dax_add_host);
 
-#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
-#include 
+void dax_remove_host(struct gendisk *disk)
+{
+   xa_erase(&dax_hosts, (unsigned long)disk);
+}
+EXPORT_SYMBOL_GPL(dax_remove_host);
 
 int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
pgoff_t *pgoff)
@@ -82,40 +83,23 @@ EXPORT_SYMBOL(bdev_dax_pgoff);
 
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
- * @host: alternate name for the device registered by a dax driver
+ * @bdev: block device to find a dax_device for
  */
-static struct dax_device *dax_get_by_host(const char *host)
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
 {
-   struct dax_device *dax_dev, *found = NULL;
-   int hash, id;
+   struct dax_device *dax_dev;
+   int id;
 
-   if (!host)
+   if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
-   hash = dax_host_hash(host);
-
id = dax_read_lock();
-   spin_lock(&dax_host_lock);
-   hlist_for_each_entry(dax_dev, &dax_host_list[hash], list) {
-   if (!dax_alive(dax_dev)
-   || strcmp(host, dax_dev->host) != 0)
-   continue;
-
-   if (igrab(&dax_dev->inode))
-   found = dax_dev;
-   break;
-   }
-   spin_unlock(&dax_host_lock);
+   dax_dev = xa_load(&dax_hosts, (unsigned long)bdev->bd_disk);
+   if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode))
+   dax_dev = NULL;
dax_read_unlock(id);
 
-   return found;
-}
-
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
-{
-   if (!blk_queue_dax(bdev->bd_disk->queue))
-   return NULL;
-   return dax_get_by_host(bdev->bd_disk->disk_name);
+   return dax_dev;
 }
 EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
 
@@ -361,12 +345,7 @@ void kill_dax(struct dax_device *dax_dev)
return;
 
clear_bit(DAXDEV_ALIVE, &dax_dev->flags)

[PATCH 09/29] dm-linear: add a linear_dax_pgoff helper

2021-11-09 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/md/dm-linear.c | 49 +-
 1 file changed, 15 insertions(+), 34 deletions(-)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 0a260c35aeeed..90de42f6743ac 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -163,63 +163,44 @@ static int linear_iterate_devices(struct dm_target *ti,
 }
 
 #if IS_ENABLED(CONFIG_FS_DAX)
+static struct dax_device *linear_dax_pgoff(struct dm_target *ti, pgoff_t 
*pgoff)
+{
+   struct linear_c *lc = ti->private;
+   sector_t sector = linear_map_sector(ti, *pgoff << PAGE_SECTORS_SHIFT);
+
+   *pgoff = (get_start_sect(lc->dev->bdev) + sector) >> PAGE_SECTORS_SHIFT;
+   return lc->dev->dax_dev;
+}
+
 static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
 {
-   long ret;
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
-   dev_sector = linear_map_sector(ti, sector);
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff);
-   if (ret)
-   return ret;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
 
-   dev_sector = linear_map_sector(ti, sector);
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
-   return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
 
-   dev_sector = linear_map_sector(ti, sector);
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
-   return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   struct linear_c *lc = ti->private;
-   struct block_device *bdev = lc->dev->bdev;
-   struct dax_device *dax_dev = lc->dev->dax_dev;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
-   dev_sector = linear_map_sector(ti, sector);
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, &pgoff);
-   if (ret)
-   return ret;
+   struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
+
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 13/29] fsdax: use a saner calling convention for copy_cow_page_dax

2021-11-09 Thread Christoph Hellwig
Just pass the vm_fault and iomap_iter structures, and figure out the rest
locally.  Note that this requires moving dax_iomap_sector up in the file.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c | 29 +
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 73bd1439d8089..e51b4129d1b65 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -709,26 +709,31 @@ int dax_invalidate_mapping_entry_sync(struct 
address_space *mapping,
return __dax_invalidate_entry(mapping, index, false);
 }
 
-static int copy_cow_page_dax(struct block_device *bdev, struct dax_device 
*dax_dev,
-sector_t sector, struct page *to, unsigned long 
vaddr)
+static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
 {
+   return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
+}
+
+static int copy_cow_page_dax(struct vm_fault *vmf, const struct iomap_iter 
*iter)
+{
+   sector_t sector = dax_iomap_sector(&iter->iomap, iter->pos);
void *vto, *kaddr;
pgoff_t pgoff;
long rc;
int id;
 
-   rc = bdev_dax_pgoff(bdev, sector, PAGE_SIZE, &pgoff);
+   rc = bdev_dax_pgoff(iter->iomap.bdev, sector, PAGE_SIZE, &pgoff);
if (rc)
return rc;
 
id = dax_read_lock();
-   rc = dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL);
+   rc = dax_direct_access(iter->iomap.dax_dev, pgoff, 1, &kaddr, NULL);
if (rc < 0) {
dax_read_unlock(id);
return rc;
}
-   vto = kmap_atomic(to);
-   copy_user_page(vto, kaddr, vaddr, to);
+   vto = kmap_atomic(vmf->cow_page);
+   copy_user_page(vto, kaddr, vmf->address, vmf->cow_page);
kunmap_atomic(vto);
dax_read_unlock(id);
return 0;
@@ -1005,11 +1010,6 @@ int dax_writeback_mapping_range(struct address_space 
*mapping,
 }
 EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 
-static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
-{
-   return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
-}
-
 static int dax_iomap_pfn(const struct iomap *iomap, loff_t pos, size_t size,
 pfn_t *pfnp)
 {
@@ -1332,19 +1332,16 @@ static vm_fault_t dax_fault_synchronous_pfnp(pfn_t 
*pfnp, pfn_t pfn)
 static vm_fault_t dax_fault_cow_page(struct vm_fault *vmf,
const struct iomap_iter *iter)
 {
-   sector_t sector = dax_iomap_sector(&iter->iomap, iter->pos);
-   unsigned long vaddr = vmf->address;
vm_fault_t ret;
int error = 0;
 
switch (iter->iomap.type) {
case IOMAP_HOLE:
case IOMAP_UNWRITTEN:
-   clear_user_highpage(vmf->cow_page, vaddr);
+   clear_user_highpage(vmf->cow_page, vmf->address);
break;
case IOMAP_MAPPED:
-   error = copy_cow_page_dax(iter->iomap.bdev, iter->iomap.dax_dev,
- sector, vmf->cow_page, vaddr);
+   error = copy_cow_page_dax(vmf, iter);
break;
default:
WARN_ON_ONCE(1);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 08/29] dax: remove dax_capable

2021-11-09 Thread Christoph Hellwig
Just open code the block size and dax_dev == NULL checks in the callers.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/dax/super.c  | 36 
 drivers/md/dm-table.c| 22 +++---
 drivers/md/dm.c  | 21 -
 drivers/md/dm.h  |  4 
 drivers/nvdimm/pmem.c|  1 -
 drivers/s390/block/dcssblk.c |  1 -
 fs/erofs/super.c | 11 +++
 fs/ext2/super.c  |  6 --
 fs/ext4/super.c  |  9 ++---
 fs/xfs/xfs_super.c   | 21 -
 include/linux/dax.h  | 14 --
 11 files changed, 36 insertions(+), 110 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 482fe775324a4..803942586d1b6 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -108,42 +108,6 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device 
*bdev)
return dax_dev;
 }
 EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
-
-bool generic_fsdax_supported(struct dax_device *dax_dev,
-   struct block_device *bdev, int blocksize, sector_t start,
-   sector_t sectors)
-{
-   if (blocksize != PAGE_SIZE) {
-   pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
-   return false;
-   }
-
-   if (!dax_dev) {
-   pr_debug("%pg: error: dax unsupported by block device\n", bdev);
-   return false;
-   }
-
-   return true;
-}
-EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-
-bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
-   int blocksize, sector_t start, sector_t len)
-{
-   bool ret = false;
-   int id;
-
-   if (!dax_dev)
-   return false;
-
-   id = dax_read_lock();
-   if (dax_alive(dax_dev) && dax_dev->ops->dax_supported)
-   ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize,
- start, len);
-   dax_read_unlock(id);
-   return ret;
-}
-EXPORT_SYMBOL_GPL(dax_supported);
 #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
 
 enum dax_device_flags {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index bcddc5effd155..f4915a7d5dc84 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -806,12 +806,14 @@ void dm_table_set_type(struct dm_table *t, enum 
dm_queue_mode type)
 EXPORT_SYMBOL_GPL(dm_table_set_type);
 
 /* validate the dax capability of the target device span */
-int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
+static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
 {
-   int blocksize = *(int *) data;
+   if (dev->dax_dev)
+   return false;
 
-   return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
+   DMDEBUG("%pg: error: dax unsupported by block device", dev->bdev);
+   return true;
 }
 
 /* Check devices support synchronous DAX */
@@ -821,8 +823,8 @@ static int device_not_dax_synchronous_capable(struct 
dm_target *ti, struct dm_de
return !dev->dax_dev || !dax_synchronous(dev->dax_dev);
 }
 
-bool dm_table_supports_dax(struct dm_table *t,
-  iterate_devices_callout_fn iterate_fn, int 
*blocksize)
+static bool dm_table_supports_dax(struct dm_table *t,
+  iterate_devices_callout_fn iterate_fn)
 {
struct dm_target *ti;
unsigned i;
@@ -835,7 +837,7 @@ bool dm_table_supports_dax(struct dm_table *t,
return false;
 
if (!ti->type->iterate_devices ||
-   ti->type->iterate_devices(ti, iterate_fn, blocksize))
+   ti->type->iterate_devices(ti, iterate_fn, NULL))
return false;
}
 
@@ -862,7 +864,6 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
-   int page_size = PAGE_SIZE;
 
if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -906,7 +907,7 @@ static int dm_table_determine_type(struct dm_table *t)
 verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
-   if (dm_table_supports_dax(t, device_not_dax_capable, 
&page_size) ||
+   if (dm_table_supports_dax(t, device_not_dax_capable) ||
(list_empty(devices) && live_md_type == 
DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
}
@@ -1976,7 +1977,6 @@ int dm_table_set_restrictions(struct dm_tab

[PATCH 10/29] dm-log-writes: add a log_writes_dax_pgoff helper

2021-11-09 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/md/dm-log-writes.c | 42 +++---
 1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 524bc536922eb..df3cd78223fb2 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -949,17 +949,21 @@ static int log_dax(struct log_writes_c *lc, sector_t 
sector, size_t bytes,
return 0;
 }
 
+static struct dax_device *log_writes_dax_pgoff(struct dm_target *ti,
+   pgoff_t *pgoff)
+{
+   struct log_writes_c *lc = ti->private;
+
+   *pgoff += (get_start_sect(lc->dev->bdev) >> PAGE_SECTORS_SHIFT);
+   return lc->dev->dax_dev;
+}
+
 static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
 long nr_pages, void **kaddr, pfn_t 
*pfn)
 {
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
-   int ret;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
 
-   ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages * PAGE_SIZE, 
&pgoff);
-   if (ret)
-   return ret;
-   return dax_direct_access(lc->dev->dax_dev, pgoff, nr_pages, kaddr, pfn);
+   return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
@@ -968,11 +972,9 @@ static size_t log_writes_dax_copy_from_iter(struct 
dm_target *ti,
 {
struct log_writes_c *lc = ti->private;
sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
int err;
 
-   if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), 
&pgoff))
-   return 0;
-
/* Don't bother doing anything if logging has been disabled */
if (!lc->logging_enabled)
goto dax_copy;
@@ -983,34 +985,24 @@ static size_t log_writes_dax_copy_from_iter(struct 
dm_target *ti,
return 0;
}
 dax_copy:
-   return dax_copy_from_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+   return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t log_writes_dax_copy_to_iter(struct dm_target *ti,
  pgoff_t pgoff, void *addr, size_t 
bytes,
  struct iov_iter *i)
 {
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
 
-   if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), 
&pgoff))
-   return 0;
-   return dax_copy_to_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+   return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int log_writes_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   struct log_writes_c *lc = ti->private;
-   sector_t sector = pgoff * PAGE_SECTORS;
+   struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
 
-   ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages << PAGE_SHIFT,
-&pgoff);
-   if (ret)
-   return ret;
-   return dax_zero_page_range(lc->dev->dax_dev, pgoff,
-  nr_pages << PAGE_SHIFT);
+   return dax_zero_page_range(dax_dev, pgoff, nr_pages << PAGE_SHIFT);
 }
 
 #else
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 12/29] fsdax: remove a pointless __force cast in copy_cow_page_dax

2021-11-09 Thread Christoph Hellwig
Despite its name copy_user_page expected kernel addresses, which is what
we already have.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index 4e3e5a283a916..73bd1439d8089 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -728,7 +728,7 @@ static int copy_cow_page_dax(struct block_device *bdev, 
struct dax_device *dax_d
return rc;
}
vto = kmap_atomic(to);
-   copy_user_page(vto, (void __force *)kaddr, vaddr, to);
+   copy_user_page(vto, kaddr, vaddr, to);
kunmap_atomic(vto);
dax_read_unlock(id);
return 0;
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 16/29] fsdax: simplify the offset check in dax_iomap_zero

2021-11-09 Thread Christoph Hellwig
The file relative offset must have the same alignment as the storage
offset, so use that and get rid of the call to iomap_sector.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 5364549d67a48..d7a923d152240 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1123,7 +1123,6 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, 
struct vm_fault *vmf,
 
 s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
-   sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
long rc, id;
void *kaddr;
@@ -1131,8 +1130,7 @@ s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap 
*iomap)
unsigned offset = offset_in_page(pos);
unsigned size = min_t(u64, PAGE_SIZE - offset, length);
 
-   if (IS_ALIGNED(sector << SECTOR_SHIFT, PAGE_SIZE) &&
-   (size == PAGE_SIZE))
+   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
page_aligned = true;
 
id = dax_read_lock();
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 11/29] dm-stripe: add a stripe_dax_pgoff helper

2021-11-09 Thread Christoph Hellwig
Add a helper to perform the entire remapping for DAX accesses.  This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig 
Acked-by: Mike Snitzer 
---
 drivers/md/dm-stripe.c | 63 ++
 1 file changed, 15 insertions(+), 48 deletions(-)

diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index f084607220293..50dba3f39274c 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -301,83 +301,50 @@ static int stripe_map(struct dm_target *ti, struct bio 
*bio)
 }
 
 #if IS_ENABLED(CONFIG_FS_DAX)
-static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
-   long nr_pages, void **kaddr, pfn_t *pfn)
+static struct dax_device *stripe_dax_pgoff(struct dm_target *ti, pgoff_t 
*pgoff)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
struct block_device *bdev;
+   sector_t dev_sector;
uint32_t stripe;
-   long ret;
 
-   stripe_map_sector(sc, sector, &stripe, &dev_sector);
+   stripe_map_sector(sc, *pgoff * PAGE_SECTORS, &stripe, &dev_sector);
dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
bdev = sc->stripe[stripe].dev->bdev;
 
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff);
-   if (ret)
-   return ret;
+   *pgoff = (get_start_sect(bdev) + dev_sector) >> PAGE_SECTORS_SHIFT;
+   return sc->stripe[stripe].dev->dax_dev;
+}
+
+static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
+   long nr_pages, void **kaddr, pfn_t *pfn)
+{
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
 }
 
 static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
-
-   stripe_map_sector(sc, sector, &stripe, &dev_sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
 
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
-   return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
 {
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
-
-   stripe_map_sector(sc, sector, &stripe, &dev_sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
 
-   if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
-   return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
 }
 
 static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
  size_t nr_pages)
 {
-   int ret;
-   sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-   struct stripe_c *sc = ti->private;
-   struct dax_device *dax_dev;
-   struct block_device *bdev;
-   uint32_t stripe;
+   struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
 
-   stripe_map_sector(sc, sector, &stripe, &dev_sector);
-   dev_sector += sc->stripe[stripe].physical_start;
-   dax_dev = sc->stripe[stripe].dev->dax_dev;
-   bdev = sc->stripe[stripe].dev->bdev;
-
-   ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, &pgoff);
-   if (ret)
-   return ret;
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 18/29] fsdax: decouple zeroing from the iomap buffered I/O code

2021-11-09 Thread Christoph Hellwig
Unshare the DAX and iomap buffered I/O page zeroing code.  This code
previously did a IS_DAX check deep inside the iomap code, which in
fact was the only DAX check in the code.  Instead move these checks
into the callers.  Most callers already have DAX special casing anyway
and XFS will need it for reflink support as well.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c   | 77 ++
 fs/ext2/inode.c|  6 ++--
 fs/ext4/inode.c|  4 +--
 fs/iomap/buffered-io.c | 35 +++
 fs/xfs/xfs_iomap.c |  6 
 include/linux/dax.h|  6 +++-
 6 files changed, 91 insertions(+), 43 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index dc9ebeff850ab..5b52b878124ac 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1135,24 +1135,73 @@ static int dax_memzero(struct dax_device *dax_dev, 
pgoff_t pgoff,
return rc;
 }
 
-s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
+static loff_t dax_zero_iter(struct iomap_iter *iter, bool *did_zero)
 {
-   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
-   long rc, id;
-   unsigned offset = offset_in_page(pos);
-   unsigned size = min_t(u64, PAGE_SIZE - offset, length);
+   const struct iomap *iomap = &iter->iomap;
+   const struct iomap *srcmap = iomap_iter_srcmap(iter);
+   loff_t pos = iter->pos;
+   loff_t length = iomap_length(iter);
+   loff_t written = 0;
+
+   /* already zeroed?  we're done. */
+   if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN)
+   return length;
+
+   do {
+   unsigned offset = offset_in_page(pos);
+   unsigned size = min_t(u64, PAGE_SIZE - offset, length);
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
+   long rc;
+   int id;
 
-   id = dax_read_lock();
-   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
-   rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
-   else
-   rc = dax_memzero(iomap->dax_dev, pgoff, offset, size);
-   dax_read_unlock(id);
+   id = dax_read_lock();
+   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
+   rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
+   else
+   rc = dax_memzero(iomap->dax_dev, pgoff, offset, size);
+   dax_read_unlock(id);
 
-   if (rc < 0)
-   return rc;
-   return size;
+   if (rc < 0)
+   return rc;
+   pos += size;
+   length -= size;
+   written += size;
+   if (did_zero)
+   *did_zero = true;
+   } while (length > 0);
+
+   return written;
+}
+
+int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
+   const struct iomap_ops *ops)
+{
+   struct iomap_iter iter = {
+   .inode  = inode,
+   .pos= pos,
+   .len= len,
+   .flags  = IOMAP_ZERO,
+   };
+   int ret;
+
+   while ((ret = iomap_iter(&iter, ops)) > 0)
+   iter.processed = dax_zero_iter(&iter, did_zero);
+   return ret;
+}
+EXPORT_SYMBOL_GPL(dax_zero_range);
+
+int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
+   const struct iomap_ops *ops)
+{
+   unsigned int blocksize = i_blocksize(inode);
+   unsigned int off = pos & (blocksize - 1);
+
+   /* Block boundary? Nothing to do */
+   if (!off)
+   return 0;
+   return dax_zero_range(inode, pos, blocksize - off, did_zero, ops);
 }
+EXPORT_SYMBOL_GPL(dax_truncate_page);
 
 static loff_t dax_iomap_iter(const struct iomap_iter *iomi,
struct iov_iter *iter)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 333fa62661d56..ae9993018a015 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1297,9 +1297,9 @@ static int ext2_setsize(struct inode *inode, loff_t 
newsize)
inode_dio_wait(inode);
 
if (IS_DAX(inode)) {
-   error = iomap_zero_range(inode, newsize,
-PAGE_ALIGN(newsize) - newsize, NULL,
-&ext2_iomap_ops);
+   error = dax_zero_range(inode, newsize,
+  PAGE_ALIGN(newsize) - newsize, NULL,
+  &ext2_iomap_ops);
} else if (test_opt(inode->i_sb, NOBH))
error = nobh_truncate_page(inode->i_mapping,
newsize, ext2_get_block);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0f06305167d5a..8c443b753b815 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3783,8 +3783,8 @@ static int ext4_block_zero_page_range(handle_t *handle,
 

[PATCH 14/29] fsdax: simplify the pgoff calculation

2021-11-09 Thread Christoph Hellwig
Replace the two steps of dax_iomap_sector and bdev_dax_pgoff with a
single dax_iomap_pgoff helper that avoids lots of cumbersome sector
conversions.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 14 --
 fs/dax.c| 35 ++-
 include/linux/dax.h |  1 -
 3 files changed, 10 insertions(+), 40 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 803942586d1b6..c0910687fbcb2 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -67,20 +67,6 @@ void dax_remove_host(struct gendisk *disk)
 }
 EXPORT_SYMBOL_GPL(dax_remove_host);
 
-int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
-   pgoff_t *pgoff)
-{
-   sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
-   phys_addr_t phys_off = (start_sect + sector) * 512;
-
-   if (pgoff)
-   *pgoff = PHYS_PFN(phys_off);
-   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
-   return -EINVAL;
-   return 0;
-}
-EXPORT_SYMBOL(bdev_dax_pgoff);
-
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @bdev: block device to find a dax_device for
diff --git a/fs/dax.c b/fs/dax.c
index e51b4129d1b65..5364549d67a48 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -709,23 +709,22 @@ int dax_invalidate_mapping_entry_sync(struct 
address_space *mapping,
return __dax_invalidate_entry(mapping, index, false);
 }
 
-static sector_t dax_iomap_sector(const struct iomap *iomap, loff_t pos)
+static pgoff_t dax_iomap_pgoff(const struct iomap *iomap, loff_t pos)
 {
-   return (iomap->addr + (pos & PAGE_MASK) - iomap->offset) >> 9;
+   phys_addr_t paddr = iomap->addr + (pos & PAGE_MASK) - iomap->offset;
+
+   if (iomap->bdev)
+   paddr += (get_start_sect(iomap->bdev) << SECTOR_SHIFT);
+   return PHYS_PFN(paddr);
 }
 
 static int copy_cow_page_dax(struct vm_fault *vmf, const struct iomap_iter 
*iter)
 {
-   sector_t sector = dax_iomap_sector(&iter->iomap, iter->pos);
+   pgoff_t pgoff = dax_iomap_pgoff(&iter->iomap, iter->pos);
void *vto, *kaddr;
-   pgoff_t pgoff;
long rc;
int id;
 
-   rc = bdev_dax_pgoff(iter->iomap.bdev, sector, PAGE_SIZE, &pgoff);
-   if (rc)
-   return rc;
-
id = dax_read_lock();
rc = dax_direct_access(iter->iomap.dax_dev, pgoff, 1, &kaddr, NULL);
if (rc < 0) {
@@ -1013,14 +1012,10 @@ EXPORT_SYMBOL_GPL(dax_writeback_mapping_range);
 static int dax_iomap_pfn(const struct iomap *iomap, loff_t pos, size_t size,
 pfn_t *pfnp)
 {
-   const sector_t sector = dax_iomap_sector(iomap, pos);
-   pgoff_t pgoff;
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
int id, rc;
long length;
 
-   rc = bdev_dax_pgoff(iomap->bdev, sector, size, &pgoff);
-   if (rc)
-   return rc;
id = dax_read_lock();
length = dax_direct_access(iomap->dax_dev, pgoff, PHYS_PFN(size),
   NULL, pfnp);
@@ -1129,7 +1124,7 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state *xas, 
struct vm_fault *vmf,
 s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
sector_t sector = iomap_sector(iomap, pos & PAGE_MASK);
-   pgoff_t pgoff;
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
long rc, id;
void *kaddr;
bool page_aligned = false;
@@ -1140,10 +1135,6 @@ s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap 
*iomap)
(size == PAGE_SIZE))
page_aligned = true;
 
-   rc = bdev_dax_pgoff(iomap->bdev, sector, PAGE_SIZE, &pgoff);
-   if (rc)
-   return rc;
-
id = dax_read_lock();
 
if (page_aligned)
@@ -1169,7 +1160,6 @@ static loff_t dax_iomap_iter(const struct iomap_iter 
*iomi,
const struct iomap *iomap = &iomi->iomap;
loff_t length = iomap_length(iomi);
loff_t pos = iomi->pos;
-   struct block_device *bdev = iomap->bdev;
struct dax_device *dax_dev = iomap->dax_dev;
loff_t end = pos + length, done = 0;
ssize_t ret = 0;
@@ -1203,9 +1193,8 @@ static loff_t dax_iomap_iter(const struct iomap_iter 
*iomi,
while (pos < end) {
unsigned offset = pos & (PAGE_SIZE - 1);
const size_t size = ALIGN(length + offset, PAGE_SIZE);
-   const sector_t sector = dax_iomap_sector(iomap, pos);
+   pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
ssize_t map_len;
-   pgoff_t pgoff;
void *kaddr;
 
if (fatal_signal_pending(current)) {
@@ -1213,10 +1202,6 @@ static loff_t dax_iomap_iter(const struct iomap_iter 
*iomi,
break;
}
 
-   ret = bdev_dax_pgoff(bdev, secto

[PATCH 15/29] xfs: add xfs_zero_range and xfs_truncate_page helpers

2021-11-09 Thread Christoph Hellwig
From: Shiyang Ruan 

Add helpers to prepare for using different DAX operations.

Signed-off-by: Shiyang Ruan 
[hch: split from a larger patch + slight cleanups]
Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_bmap_util.c |  7 +++
 fs/xfs/xfs_file.c  |  3 +--
 fs/xfs/xfs_iomap.c | 25 +
 fs/xfs/xfs_iomap.h |  4 
 fs/xfs/xfs_iops.c  |  7 +++
 fs/xfs/xfs_reflink.c   |  3 +--
 6 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 73a36b7be3bd1..797ea0c8b14e1 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1001,7 +1001,7 @@ xfs_free_file_space(
 
/*
 * Now that we've unmap all full blocks we'll have to zero out any
-* partial block at the beginning and/or end.  iomap_zero_range is smart
+* partial block at the beginning and/or end.  xfs_zero_range is smart
 * enough to skip any holes, including those we just created, but we
 * must take care not to zero beyond EOF and enlarge i_size.
 */
@@ -1009,15 +1009,14 @@ xfs_free_file_space(
return 0;
if (offset + len > XFS_ISIZE(ip))
len = XFS_ISIZE(ip) - offset;
-   error = iomap_zero_range(VFS_I(ip), offset, len, NULL,
-   &xfs_buffered_write_iomap_ops);
+   error = xfs_zero_range(ip, offset, len, NULL);
if (error)
return error;
 
/*
 * If we zeroed right up to EOF and EOF straddles a page boundary we
 * must make sure that the post-EOF area is also zeroed because the
-* page could be mmap'd and iomap_zero_range doesn't do that for us.
+* page could be mmap'd and xfs_zero_range doesn't do that for us.
 * Writeback of the eof page will do this, albeit clumsily.
 */
if (offset + len >= XFS_ISIZE(ip) && offset_in_page(offset + len) > 0) {
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 27594738b0d18..8d4c5ca261bd7 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -437,8 +437,7 @@ xfs_file_write_checks(
}
 
trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize);
-   error = iomap_zero_range(inode, isize, iocb->ki_pos - isize,
-   NULL, &xfs_buffered_write_iomap_ops);
+   error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL);
if (error)
return error;
} else
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 093758440ad53..d6d71ae9f2ae4 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1311,3 +1311,28 @@ xfs_xattr_iomap_begin(
 const struct iomap_ops xfs_xattr_iomap_ops = {
.iomap_begin= xfs_xattr_iomap_begin,
 };
+
+int
+xfs_zero_range(
+   struct xfs_inode*ip,
+   loff_t  pos,
+   loff_t  len,
+   bool*did_zero)
+{
+   struct inode*inode = VFS_I(ip);
+
+   return iomap_zero_range(inode, pos, len, did_zero,
+   &xfs_buffered_write_iomap_ops);
+}
+
+int
+xfs_truncate_page(
+   struct xfs_inode*ip,
+   loff_t  pos,
+   bool*did_zero)
+{
+   struct inode*inode = VFS_I(ip);
+
+   return iomap_truncate_page(inode, pos, did_zero,
+  &xfs_buffered_write_iomap_ops);
+}
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index 7d3703556d0e0..f1a281ab9328c 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -20,6 +20,10 @@ xfs_fileoff_t xfs_iomap_eof_align_last_fsb(struct xfs_inode 
*ip,
 int xfs_bmbt_to_iomap(struct xfs_inode *, struct iomap *,
struct xfs_bmbt_irec *, u16);
 
+int xfs_zero_range(struct xfs_inode *ip, loff_t pos, loff_t len,
+   bool *did_zero);
+int xfs_truncate_page(struct xfs_inode *ip, loff_t pos, bool *did_zero);
+
 static inline xfs_filblks_t
 xfs_aligned_fsb_count(
xfs_fileoff_t   offset_fsb,
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index a607d6aca5c4d..ab5ef52b2a9ff 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -911,8 +911,8 @@ xfs_setattr_size(
 */
if (newsize > oldsize) {
trace_xfs_zero_eof(ip, oldsize, newsize - oldsize);
-   error = iomap_zero_range(inode, oldsize, newsize - oldsize,
-   &did_zeroing, &xfs_buffered_write_iomap_ops);
+   error = xfs_zero_range(ip, oldsize, newsize - oldsize,
+   &did_zeroing);
} else {
/*
 * iomap won't detect a dirty page over an unwritten block (or a
@@ -924,8 +924,7 @@ xfs_setattr_size(
 news

[PATCH 21/29] xfs: move dax device handling into xfs_{alloc, free}_buftarg

2021-11-09 Thread Christoph Hellwig
Hide the DAX device lookup from the xfs_super.c code.

Reviewed-by: Christoph Hellwig 
---
 fs/xfs/xfs_buf.c   |  8 
 fs/xfs/xfs_buf.h   |  4 ++--
 fs/xfs/xfs_super.c | 26 +-
 3 files changed, 11 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 631c5a61d89b7..4d4553ffa7050 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1892,6 +1892,7 @@ xfs_free_buftarg(
list_lru_destroy(&btp->bt_lru);
 
blkdev_issue_flush(btp->bt_bdev);
+   fs_put_dax(btp->bt_daxdev);
 
kmem_free(btp);
 }
@@ -1932,11 +1933,10 @@ xfs_setsize_buftarg_early(
return xfs_setsize_buftarg(btp, bdev_logical_block_size(bdev));
 }
 
-xfs_buftarg_t *
+struct xfs_buftarg *
 xfs_alloc_buftarg(
struct xfs_mount*mp,
-   struct block_device *bdev,
-   struct dax_device   *dax_dev)
+   struct block_device *bdev)
 {
xfs_buftarg_t   *btp;
 
@@ -1945,7 +1945,7 @@ xfs_alloc_buftarg(
btp->bt_mount = mp;
btp->bt_dev =  bdev->bd_dev;
btp->bt_bdev = bdev;
-   btp->bt_daxdev = dax_dev;
+   btp->bt_daxdev = fs_dax_get_by_bdev(bdev);
 
/*
 * Buffer IO error rate limiting. Limit it to no more than 10 messages
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 6b0200b8007d1..bd7f709f0d232 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -338,8 +338,8 @@ xfs_buf_update_cksum(struct xfs_buf *bp, unsigned long 
cksum_offset)
 /*
  * Handling of buftargs.
  */
-extern struct xfs_buftarg *xfs_alloc_buftarg(struct xfs_mount *,
-   struct block_device *, struct dax_device *);
+struct xfs_buftarg *xfs_alloc_buftarg(struct xfs_mount *mp,
+   struct block_device *bdev);
 extern void xfs_free_buftarg(struct xfs_buftarg *);
 extern void xfs_buftarg_wait(struct xfs_buftarg *);
 extern void xfs_buftarg_drain(struct xfs_buftarg *);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 3a45d5caa28d5..7262716afb215 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -391,26 +391,19 @@ STATIC void
 xfs_close_devices(
struct xfs_mount*mp)
 {
-   struct dax_device *dax_ddev = mp->m_ddev_targp->bt_daxdev;
-
if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
struct block_device *logdev = mp->m_logdev_targp->bt_bdev;
-   struct dax_device *dax_logdev = mp->m_logdev_targp->bt_daxdev;
 
xfs_free_buftarg(mp->m_logdev_targp);
xfs_blkdev_put(logdev);
-   fs_put_dax(dax_logdev);
}
if (mp->m_rtdev_targp) {
struct block_device *rtdev = mp->m_rtdev_targp->bt_bdev;
-   struct dax_device *dax_rtdev = mp->m_rtdev_targp->bt_daxdev;
 
xfs_free_buftarg(mp->m_rtdev_targp);
xfs_blkdev_put(rtdev);
-   fs_put_dax(dax_rtdev);
}
xfs_free_buftarg(mp->m_ddev_targp);
-   fs_put_dax(dax_ddev);
 }
 
 /*
@@ -428,8 +421,6 @@ xfs_open_devices(
struct xfs_mount*mp)
 {
struct block_device *ddev = mp->m_super->s_bdev;
-   struct dax_device   *dax_ddev = fs_dax_get_by_bdev(ddev);
-   struct dax_device   *dax_logdev = NULL, *dax_rtdev = NULL;
struct block_device *logdev = NULL, *rtdev = NULL;
int error;
 
@@ -439,8 +430,7 @@ xfs_open_devices(
if (mp->m_logname) {
error = xfs_blkdev_get(mp, mp->m_logname, &logdev);
if (error)
-   goto out;
-   dax_logdev = fs_dax_get_by_bdev(logdev);
+   return error;
}
 
if (mp->m_rtname) {
@@ -454,25 +444,24 @@ xfs_open_devices(
error = -EINVAL;
goto out_close_rtdev;
}
-   dax_rtdev = fs_dax_get_by_bdev(rtdev);
}
 
/*
 * Setup xfs_mount buffer target pointers
 */
error = -ENOMEM;
-   mp->m_ddev_targp = xfs_alloc_buftarg(mp, ddev, dax_ddev);
+   mp->m_ddev_targp = xfs_alloc_buftarg(mp, ddev);
if (!mp->m_ddev_targp)
goto out_close_rtdev;
 
if (rtdev) {
-   mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev, dax_rtdev);
+   mp->m_rtdev_targp = xfs_alloc_buftarg(mp, rtdev);
if (!mp->m_rtdev_targp)
goto out_free_ddev_targ;
}
 
if (logdev && logdev != ddev) {
-   mp->m_logdev_targp = xfs_alloc_buftarg(mp, logdev, dax_logdev);
+   mp->m_logdev_targp = xfs_alloc_buftarg(mp, logdev);
if (!mp->m_logdev_targp)
goto out_free_rtdev_targ;
} else {
@@ -488,14 +477,9 @@ xfs_open_devices(

[PATCH 20/29] ext4: cleanup the dax handling in ext4_fill_super

2021-11-09 Thread Christoph Hellwig
Only call fs_dax_get_by_bdev once the sbi has been allocated and remove
the need for the dax_dev local variable.

Signed-off-by: Christoph Hellwig 
---
 fs/ext4/super.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index eb4df43abd76e..b60401bb1c310 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3879,7 +3879,6 @@ static void ext4_setup_csum_trigger(struct super_block 
*sb,
 
 static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 {
-   struct dax_device *dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
char *orig_data = kstrdup(data, GFP_KERNEL);
struct buffer_head *bh, **group_desc;
struct ext4_super_block *es = NULL;
@@ -3910,12 +3909,12 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
if ((data && !orig_data) || !sbi)
goto out_free_base;
 
-   sbi->s_daxdev = dax_dev;
sbi->s_blockgroup_lock =
kzalloc(sizeof(struct blockgroup_lock), GFP_KERNEL);
if (!sbi->s_blockgroup_lock)
goto out_free_base;
 
+   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev);
sb->s_fs_info = sbi;
sbi->s_sb = sb;
sbi->s_inode_readahead_blks = EXT4_DEF_INODE_READAHEAD_BLKS;
@@ -4300,7 +4299,7 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
goto failed_mount;
}
 
-   if (dax_dev) {
+   if (sbi->s_daxdev) {
if (blocksize == PAGE_SIZE)
set_bit(EXT4_FLAGS_BDEV_IS_DAX, &sbi->s_ext4_flags);
else
@@ -5096,10 +5095,10 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
 out_fail:
sb->s_fs_info = NULL;
kfree(sbi->s_blockgroup_lock);
+   fs_put_dax(sbi->s_daxdev );
 out_free_base:
kfree(sbi);
kfree(orig_data);
-   fs_put_dax(dax_dev);
return err ? err : ret;
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 22/29] iomap: add a IOMAP_DAX flag

2021-11-09 Thread Christoph Hellwig
Add a flag so that the file system can easily detect DAX operations.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c  | 7 ---
 include/linux/iomap.h | 1 +
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 5b52b878124ac..0bd6cdcbacfc4 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1180,7 +1180,7 @@ int dax_zero_range(struct inode *inode, loff_t pos, 
loff_t len, bool *did_zero,
.inode  = inode,
.pos= pos,
.len= len,
-   .flags  = IOMAP_ZERO,
+   .flags  = IOMAP_DAX | IOMAP_ZERO,
};
int ret;
 
@@ -1308,6 +1308,7 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
.inode  = iocb->ki_filp->f_mapping->host,
.pos= iocb->ki_pos,
.len= iov_iter_count(iter),
+   .flags  = IOMAP_DAX,
};
loff_t done = 0;
int ret;
@@ -1461,7 +1462,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault 
*vmf, pfn_t *pfnp,
.inode  = mapping->host,
.pos= (loff_t)vmf->pgoff << PAGE_SHIFT,
.len= PAGE_SIZE,
-   .flags  = IOMAP_FAULT,
+   .flags  = IOMAP_DAX | IOMAP_FAULT,
};
vm_fault_t ret = 0;
void *entry;
@@ -1570,7 +1571,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault 
*vmf, pfn_t *pfnp,
struct iomap_iter iter = {
.inode  = mapping->host,
.len= PMD_SIZE,
-   .flags  = IOMAP_FAULT,
+   .flags  = IOMAP_DAX | IOMAP_FAULT,
};
vm_fault_t ret = VM_FAULT_FALLBACK;
pgoff_t max_pgoff;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 6d1b08d0ae930..146a7e3e3ea11 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -141,6 +141,7 @@ struct iomap_page_ops {
 #define IOMAP_NOWAIT   (1 << 5) /* do not block */
 #define IOMAP_OVERWRITE_ONLY   (1 << 6) /* only pure overwrites allowed */
 #define IOMAP_UNSHARE  (1 << 7) /* unshare_file_range */
+#define IOMAP_DAX  (1 << 8) /* DAX mapping */
 
 struct iomap_ops {
/*
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 19/29] ext2: cleanup the dax handling in ext2_fill_super

2021-11-09 Thread Christoph Hellwig
Only call fs_dax_get_by_bdev once the sbi has been allocated and remove
the need for the dax_dev local variable.

Signed-off-by: Christoph Hellwig 
---
 fs/ext2/super.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index a964066a80aa7..7e23482862e69 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -802,7 +802,6 @@ static unsigned long descriptor_loc(struct super_block *sb,
 
 static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 {
-   struct dax_device *dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
struct buffer_head * bh;
struct ext2_sb_info * sbi;
struct ext2_super_block * es;
@@ -822,17 +821,17 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
 
sbi = kzalloc(sizeof(*sbi), GFP_KERNEL);
if (!sbi)
-   goto failed;
+   return -ENOMEM;
 
sbi->s_blockgroup_lock =
kzalloc(sizeof(struct blockgroup_lock), GFP_KERNEL);
if (!sbi->s_blockgroup_lock) {
kfree(sbi);
-   goto failed;
+   return -ENOMEM;
}
sb->s_fs_info = sbi;
sbi->s_sb_block = sb_block;
-   sbi->s_daxdev = dax_dev;
+   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev);
 
spin_lock_init(&sbi->s_lock);
ret = -EINVAL;
@@ -946,7 +945,7 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
blocksize = BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);
 
if (test_opt(sb, DAX)) {
-   if (!dax_dev) {
+   if (!sbi->s_daxdev) {
ext2_msg(sb, KERN_ERR,
"DAX unsupported by block device. Turning off 
DAX.");
clear_opt(sbi->s_mount_opt, DAX);
@@ -1201,11 +1200,10 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
 failed_mount:
brelse(bh);
 failed_sbi:
+   fs_put_dax(sbi->s_daxdev);
sb->s_fs_info = NULL;
kfree(sbi->s_blockgroup_lock);
kfree(sbi);
-failed:
-   fs_put_dax(dax_dev);
return ret;
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 24/29] xfs: use xfs_direct_write_iomap_ops for DAX zeroing

2021-11-09 Thread Christoph Hellwig
While the buffered write iomap ops do work due to the fact that zeroing
never allocates blocks, the DAX zeroing should use the direct ops just
like actual DAX I/O.

Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_iomap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 8cef3b68cba78..704292c6ce0c7 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1324,7 +1324,7 @@ xfs_zero_range(
 
if (IS_DAX(inode))
return dax_zero_range(inode, pos, len, did_zero,
- &xfs_buffered_write_iomap_ops);
+ &xfs_direct_write_iomap_ops);
return iomap_zero_range(inode, pos, len, did_zero,
&xfs_buffered_write_iomap_ops);
 }
@@ -1339,7 +1339,7 @@ xfs_truncate_page(
 
if (IS_DAX(inode))
return dax_truncate_page(inode, pos, did_zero,
-   &xfs_buffered_write_iomap_ops);
+   &xfs_direct_write_iomap_ops);
return iomap_truncate_page(inode, pos, did_zero,
   &xfs_buffered_write_iomap_ops);
 }
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 17/29] fsdax: factor out a dax_memzero helper

2021-11-09 Thread Christoph Hellwig
Factor out a helper for the "manual" zeroing of a DAX range to clean
up dax_iomap_zero a lot.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index d7a923d152240..dc9ebeff850ab 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1121,34 +1121,36 @@ static vm_fault_t dax_pmd_load_hole(struct xa_state 
*xas, struct vm_fault *vmf,
 }
 #endif /* CONFIG_FS_DAX_PMD */
 
+static int dax_memzero(struct dax_device *dax_dev, pgoff_t pgoff,
+   unsigned int offset, size_t size)
+{
+   void *kaddr;
+   long rc;
+
+   rc = dax_direct_access(dax_dev, pgoff, 1, &kaddr, NULL);
+   if (rc >= 0) {
+   memset(kaddr + offset, 0, size);
+   dax_flush(dax_dev, kaddr + offset, size);
+   }
+   return rc;
+}
+
 s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
 {
pgoff_t pgoff = dax_iomap_pgoff(iomap, pos);
long rc, id;
-   void *kaddr;
-   bool page_aligned = false;
unsigned offset = offset_in_page(pos);
unsigned size = min_t(u64, PAGE_SIZE - offset, length);
 
-   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
-   page_aligned = true;
-
id = dax_read_lock();
-
-   if (page_aligned)
+   if (IS_ALIGNED(pos, PAGE_SIZE) && size == PAGE_SIZE)
rc = dax_zero_page_range(iomap->dax_dev, pgoff, 1);
else
-   rc = dax_direct_access(iomap->dax_dev, pgoff, 1, &kaddr, NULL);
-   if (rc < 0) {
-   dax_read_unlock(id);
-   return rc;
-   }
-
-   if (!page_aligned) {
-   memset(kaddr + offset, 0, size);
-   dax_flush(iomap->dax_dev, kaddr + offset, size);
-   }
+   rc = dax_memzero(iomap->dax_dev, pgoff, offset, size);
dax_read_unlock(id);
+
+   if (rc < 0)
+   return rc;
return size;
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 25/29] dax: return the partition offset from fs_dax_get_by_bdev

2021-11-09 Thread Christoph Hellwig
Prepare from removing the block_device from the DAX I/O path by returning
the partition offset from fs_dax_get_by_bdev so that the file systems
have it at hand for use during I/O.

Signed-off-by: Christoph Hellwig 
---
 drivers/dax/super.c | 9 ++---
 drivers/md/dm.c | 4 ++--
 fs/erofs/internal.h | 2 ++
 fs/erofs/super.c| 4 ++--
 fs/ext2/ext2.h  | 1 +
 fs/ext2/super.c | 2 +-
 fs/ext4/ext4.h  | 1 +
 fs/ext4/super.c | 2 +-
 fs/xfs/xfs_buf.c| 2 +-
 fs/xfs/xfs_buf.h| 1 +
 include/linux/dax.h | 6 --
 11 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index c0910687fbcb2..cc32dcf71c116 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -70,17 +70,20 @@ EXPORT_SYMBOL_GPL(dax_remove_host);
 /**
  * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
  * @bdev: block device to find a dax_device for
+ * @start_off: returns the byte offset into the dax_device that @bdev starts
  */
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, u64 
*start_off)
 {
struct dax_device *dax_dev;
+   u64 part_size;
int id;
 
if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;
 
-   if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
-   (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
+   *start_off = get_start_sect(bdev) * SECTOR_SIZE;
+   part_size = bdev_nr_sectors(bdev) * SECTOR_SIZE;
+   if (*start_off % PAGE_SIZE || part_size % PAGE_SIZE) {
pr_info("%pg: error: unaligned partition for dax\n", bdev);
return NULL;
}
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 282008afc465f..5ea6115d19bdc 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -637,7 +637,7 @@ static int open_table_device(struct table_device *td, dev_t 
dev,
 struct mapped_device *md)
 {
struct block_device *bdev;
-
+   u64 part_off;
int r;
 
BUG_ON(td->dm_dev.bdev);
@@ -653,7 +653,7 @@ static int open_table_device(struct table_device *td, dev_t 
dev,
}
 
td->dm_dev.bdev = bdev;
-   td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev);
+   td->dm_dev.dax_dev = fs_dax_get_by_bdev(bdev, &part_off);
return 0;
 }
 
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 3265688af7f9f..c1e65346e9f15 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -51,6 +51,7 @@ struct erofs_device_info {
char *path;
struct block_device *bdev;
struct dax_device *dax_dev;
+   u64 dax_part_off;
 
u32 blocks;
u32 mapped_blkaddr;
@@ -109,6 +110,7 @@ struct erofs_sb_info {
 #endif /* CONFIG_EROFS_FS_ZIP */
struct erofs_dev_context *devs;
struct dax_device *dax_dev;
+   u64 dax_part_off;
u64 total_blocks;
u32 primarydevice_blocks;
 
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 0aed886473c8d..71efce16024d9 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -312,7 +312,7 @@ static int erofs_init_devices(struct super_block *sb,
goto err_out;
}
dif->bdev = bdev;
-   dif->dax_dev = fs_dax_get_by_bdev(bdev);
+   dif->dax_dev = fs_dax_get_by_bdev(bdev, &dif->dax_part_off);
dif->blocks = le32_to_cpu(dis->blocks);
dif->mapped_blkaddr = le32_to_cpu(dis->mapped_blkaddr);
sbi->total_blocks += dif->blocks;
@@ -644,7 +644,7 @@ static int erofs_fc_fill_super(struct super_block *sb, 
struct fs_context *fc)
 
sb->s_fs_info = sbi;
sbi->opt = ctx->opt;
-   sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev);
+   sbi->dax_dev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->dax_part_off);
sbi->devs = ctx->devs;
ctx->devs = NULL;
 
diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index 3be9dd6412b78..d4f306aa5aceb 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -118,6 +118,7 @@ struct ext2_sb_info {
spinlock_t s_lock;
struct mb_cache *s_ea_block_cache;
struct dax_device *s_daxdev;
+   u64 s_dax_part_off;
 };
 
 static inline spinlock_t *
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 7e23482862e69..94f1fbd7d3ac2 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -831,7 +831,7 @@ static int ext2_fill_super(struct super_block *sb, void 
*data, int silent)
}
sb->s_fs_info = sbi;
sbi->s_sb_block = sb_block;
-   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev);
+   sbi->s_daxdev = fs_dax_get_by_bdev(sb->s_bdev, &sbi->s_dax_part_off);
 
spin_lock_init(&sbi->s_lock);
ret = -EINVAL;
diff --git a/fs/ext4/ext4.h b/fs/ex

[PATCH 23/29] xfs: use IOMAP_DAX to check for DAX mappings

2021-11-09 Thread Christoph Hellwig
Use the explicit DAX flag instead of checking the inode flag in the
iomap code.

Signed-off-by: Christoph Hellwig 
---
 fs/xfs/xfs_iomap.c | 7 ---
 fs/xfs/xfs_iomap.h | 3 ++-
 fs/xfs/xfs_pnfs.c  | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 604000b6243ec..8cef3b68cba78 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -188,6 +188,7 @@ xfs_iomap_write_direct(
struct xfs_inode*ip,
xfs_fileoff_t   offset_fsb,
xfs_fileoff_t   count_fsb,
+   unsigned intflags,
struct xfs_bmbt_irec*imap)
 {
struct xfs_mount*mp = ip->i_mount;
@@ -229,7 +230,7 @@ xfs_iomap_write_direct(
 * the reserve block pool for bmbt block allocation if there is no space
 * left but we need to do unwritten extent conversion.
 */
-   if (IS_DAX(VFS_I(ip))) {
+   if (flags & IOMAP_DAX) {
bmapi_flags = XFS_BMAPI_CONVERT | XFS_BMAPI_ZERO;
if (imap->br_state == XFS_EXT_UNWRITTEN) {
force = true;
@@ -620,7 +621,7 @@ imap_needs_alloc(
imap->br_startblock == DELAYSTARTBLOCK)
return true;
/* we convert unwritten extents before copying the data for DAX */
-   if (IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN)
+   if ((flags & IOMAP_DAX) && imap->br_state == XFS_EXT_UNWRITTEN)
return true;
return false;
 }
@@ -826,7 +827,7 @@ xfs_direct_write_iomap_begin(
xfs_iunlock(ip, lockmode);
 
error = xfs_iomap_write_direct(ip, offset_fsb, end_fsb - offset_fsb,
-   &imap);
+   flags, &imap);
if (error)
return error;
 
diff --git a/fs/xfs/xfs_iomap.h b/fs/xfs/xfs_iomap.h
index f1a281ab9328c..5648262a71736 100644
--- a/fs/xfs/xfs_iomap.h
+++ b/fs/xfs/xfs_iomap.h
@@ -12,7 +12,8 @@ struct xfs_inode;
 struct xfs_bmbt_irec;
 
 int xfs_iomap_write_direct(struct xfs_inode *ip, xfs_fileoff_t offset_fsb,
-   xfs_fileoff_t count_fsb, struct xfs_bmbt_irec *imap);
+   xfs_fileoff_t count_fsb, unsigned int flags,
+   struct xfs_bmbt_irec *imap);
 int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, xfs_off_t, bool);
 xfs_fileoff_t xfs_iomap_eof_align_last_fsb(struct xfs_inode *ip,
xfs_fileoff_t end_fsb);
diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
index 5e1d29d8b2e73..e188e1cf97cc5 100644
--- a/fs/xfs/xfs_pnfs.c
+++ b/fs/xfs/xfs_pnfs.c
@@ -155,7 +155,7 @@ xfs_fs_map_blocks(
xfs_iunlock(ip, lock_flags);
 
error = xfs_iomap_write_direct(ip, offset_fsb,
-   end_fsb - offset_fsb, &imap);
+   end_fsb - offset_fsb, 0, &imap);
if (error)
goto out_unlock;
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 28/29] iomap: build the block based code conditionally

2021-11-09 Thread Christoph Hellwig
Only build the block based iomap code if CONFIG_BLOCK is set.  Currently
that is always the case, but it will change soon.

Signed-off-by: Christoph Hellwig 
---
 fs/Kconfig| 4 ++--
 fs/iomap/Makefile | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index a6313a969bc5f..6d608330a096e 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -15,11 +15,11 @@ config VALIDATE_FS_PARSER
  Enable this to perform validation of the parameter description for a
  filesystem when it is registered.
 
-if BLOCK
-
 config FS_IOMAP
bool
 
+if BLOCK
+
 source "fs/ext2/Kconfig"
 source "fs/ext4/Kconfig"
 source "fs/jbd2/Kconfig"
diff --git a/fs/iomap/Makefile b/fs/iomap/Makefile
index 4143a3ff89dbc..fc070184b7faa 100644
--- a/fs/iomap/Makefile
+++ b/fs/iomap/Makefile
@@ -9,9 +9,9 @@ ccflags-y += -I $(srctree)/$(src)   # needed for 
trace events
 obj-$(CONFIG_FS_IOMAP) += iomap.o
 
 iomap-y+= trace.o \
-  buffered-io.o \
+  iter.o
+iomap-$(CONFIG_BLOCK)  += buffered-io.o \
   direct-io.o \
   fiemap.o \
-  iter.o \
   seek.o
 iomap-$(CONFIG_SWAP)   += swapfile.o
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 29/29] fsdax: don't require CONFIG_BLOCK

2021-11-09 Thread Christoph Hellwig
The file system DAX code now does not require the block code.  So allow
building a kernel with fuse DAX but not block layer.

Signed-off-by: Christoph Hellwig 
---
 fs/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 6d608330a096e..7a2b11c0b8036 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -42,6 +42,8 @@ source "fs/nilfs2/Kconfig"
 source "fs/f2fs/Kconfig"
 source "fs/zonefs/Kconfig"
 
+endif # BLOCK
+
 config FS_DAX
bool "File system based Direct Access (DAX) support"
depends on MMU
@@ -89,8 +91,6 @@ config FS_DAX_PMD
 config FS_DAX_LIMITED
bool
 
-endif # BLOCK
-
 # Posix ACL utility routines
 #
 # Note: Posix ACLs can be implemented without these helpers.  Never use
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 26/29] fsdax: shift partition offset handling into the file systems

2021-11-09 Thread Christoph Hellwig
Remove the last user of ->bdev in dax.c by requiring the file system to
pass in an address that already includes the DAX offset.  As part of the
only set ->bdev or ->daxdev when actually required in the ->iomap_begin
methods.

Signed-off-by: Christoph Hellwig 
---
 fs/dax.c |  6 +-
 fs/erofs/data.c  | 11 --
 fs/erofs/internal.h  |  1 +
 fs/ext2/inode.c  |  8 +--
 fs/ext4/inode.c  | 16 +-
 fs/xfs/libxfs/xfs_bmap.c |  4 ++--
 fs/xfs/xfs_aops.c|  2 +-
 fs/xfs/xfs_iomap.c   | 45 +---
 fs/xfs/xfs_iomap.h   |  5 +++--
 fs/xfs/xfs_pnfs.c|  2 +-
 10 files changed, 63 insertions(+), 37 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 0bd6cdcbacfc4..2c13c681edf09 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -711,11 +711,7 @@ int dax_invalidate_mapping_entry_sync(struct address_space 
*mapping,
 
 static pgoff_t dax_iomap_pgoff(const struct iomap *iomap, loff_t pos)
 {
-   phys_addr_t paddr = iomap->addr + (pos & PAGE_MASK) - iomap->offset;
-
-   if (iomap->bdev)
-   paddr += (get_start_sect(iomap->bdev) << SECTOR_SHIFT);
-   return PHYS_PFN(paddr);
+   return PHYS_PFN(iomap->addr + (pos & PAGE_MASK) - iomap->offset);
 }
 
 static int copy_cow_page_dax(struct vm_fault *vmf, const struct iomap_iter 
*iter)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 0e35ef3f9f3d7..9b1bb177ce303 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -159,6 +159,7 @@ int erofs_map_dev(struct super_block *sb, struct 
erofs_map_dev *map)
/* primary device by default */
map->m_bdev = sb->s_bdev;
map->m_daxdev = EROFS_SB(sb)->dax_dev;
+   map->m_dax_part_off = EROFS_SB(sb)->dax_part_off;
 
if (map->m_deviceid) {
down_read(&devs->rwsem);
@@ -169,6 +170,7 @@ int erofs_map_dev(struct super_block *sb, struct 
erofs_map_dev *map)
}
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
+   map->m_dax_part_off = dif->dax_part_off;
up_read(&devs->rwsem);
} else if (devs->extra_devices) {
down_read(&devs->rwsem);
@@ -185,6 +187,7 @@ int erofs_map_dev(struct super_block *sb, struct 
erofs_map_dev *map)
map->m_pa -= startoff;
map->m_bdev = dif->bdev;
map->m_daxdev = dif->dax_dev;
+   map->m_dax_part_off = dif->dax_part_off;
break;
}
}
@@ -215,9 +218,13 @@ static int erofs_iomap_begin(struct inode *inode, loff_t 
offset, loff_t length,
if (ret)
return ret;
 
-   iomap->bdev = mdev.m_bdev;
-   iomap->dax_dev = mdev.m_daxdev;
iomap->offset = map.m_la;
+   if (flags & IOMAP_DAX) {
+   iomap->dax_dev = mdev.m_daxdev;
+   iomap->offset += mdev.m_dax_part_off;
+   } else {
+   iomap->bdev = mdev.m_bdev;
+   }
iomap->length = map.m_llen;
iomap->flags = 0;
iomap->private = NULL;
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index c1e65346e9f15..5c2a83876220c 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -438,6 +438,7 @@ static inline int z_erofs_map_blocks_iter(struct inode 
*inode,
 struct erofs_map_dev {
struct block_device *m_bdev;
struct dax_device *m_daxdev;
+   u64 m_dax_part_off;
 
erofs_off_t m_pa;
unsigned int m_deviceid;
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index ae9993018a015..da4c301b43051 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -816,9 +816,11 @@ static int ext2_iomap_begin(struct inode *inode, loff_t 
offset, loff_t length,
return ret;
 
iomap->flags = 0;
-   iomap->bdev = inode->i_sb->s_bdev;
iomap->offset = (u64)first_block << blkbits;
-   iomap->dax_dev = sbi->s_daxdev;
+   if (flags & IOMAP_DAX)
+   iomap->dax_dev = sbi->s_daxdev;
+   else
+   iomap->bdev = inode->i_sb->s_bdev;
 
if (ret == 0) {
iomap->type = IOMAP_HOLE;
@@ -827,6 +829,8 @@ static int ext2_iomap_begin(struct inode *inode, loff_t 
offset, loff_t length,
} else {
iomap->type = IOMAP_MAPPED;
iomap->addr = (u64)bno << blkbits;
+   if (flags & IOMAP_DAX)
+   iomap->addr += sbi->s_dax_part_off;
iomap->length = (u64)ret << blkbits;
iomap->flags |= IOMAP_F_MERGED;
}
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 8c443b753b815..6cbecd7ff

[PATCH 27/29] dax: fix up some of the block device related ifdefs

2021-11-09 Thread Christoph Hellwig
The DAX device <-> block device association is only enabled if
CONFIG_BLOCK is enabled.  Update dax.h to account for that and use
the right conditions for the fs_put_dax stub as well.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dax.h | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 90f95deff504d..5568d3dca941b 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -108,28 +108,15 @@ static inline bool daxdev_mapping_supported(struct 
vm_area_struct *vma,
 #endif
 
 struct writeback_control;
-#if IS_ENABLED(CONFIG_FS_DAX)
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
 int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
 void dax_remove_host(struct gendisk *disk);
-
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev,
+   u64 *start_off);
 static inline void fs_put_dax(struct dax_device *dax_dev)
 {
put_dax(dax_dev);
 }
-
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev,
-   u64 *start_off);
-int dax_writeback_mapping_range(struct address_space *mapping,
-   struct dax_device *dax_dev, struct writeback_control *wbc);
-
-struct page *dax_layout_busy_page(struct address_space *mapping);
-struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t 
start, loff_t end);
-dax_entry_t dax_lock_page(struct page *page);
-void dax_unlock_page(struct page *page, dax_entry_t cookie);
-int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
-   const struct iomap_ops *ops);
-int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
-   const struct iomap_ops *ops);
 #else
 static inline int dax_add_host(struct dax_device *dax_dev, struct gendisk 
*disk)
 {
@@ -138,17 +125,29 @@ static inline int dax_add_host(struct dax_device 
*dax_dev, struct gendisk *disk)
 static inline void dax_remove_host(struct gendisk *disk)
 {
 }
-
-static inline void fs_put_dax(struct dax_device *dax_dev)
-{
-}
-
 static inline struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev,
u64 *start_off)
 {
return NULL;
 }
+static inline void fs_put_dax(struct dax_device *dax_dev)
+{
+}
+#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
+
+#if IS_ENABLED(CONFIG_FS_DAX)
+int dax_writeback_mapping_range(struct address_space *mapping,
+   struct dax_device *dax_dev, struct writeback_control *wbc);
 
+struct page *dax_layout_busy_page(struct address_space *mapping);
+struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t 
start, loff_t end);
+dax_entry_t dax_lock_page(struct page *page);
+void dax_unlock_page(struct page *page, dax_entry_t cookie);
+int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
+   const struct iomap_ops *ops);
+int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
+   const struct iomap_ops *ops);
+#else
 static inline struct page *dax_layout_busy_page(struct address_space *mapping)
 {
return NULL;
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 02/29] dm: make the DAX support dependend on CONFIG_FS_DAX

2021-11-18 Thread Christoph Hellwig
On Wed, Nov 17, 2021 at 09:23:44AM -0800, Dan Williams wrote:
> Applied, fixed the spelling of 'dependent' in the subject and picked
> up Mike's Ack from the previous send:
> 
> https://lore.kernel.org/r/yyasbvuorceds...@redhat.com
> 
> Christoph, any particular reason you did not pick up the tags from the
> last posting?

I thought I did, but apparently I've missed some.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 01/29] nvdimm/pmem: move dax_attribute_group from dax to pmem

2021-11-18 Thread Christoph Hellwig
On Wed, Nov 17, 2021 at 09:44:25AM -0800, Dan Williams wrote:
> On Tue, Nov 9, 2021 at 12:33 AM Christoph Hellwig  wrote:
> >
> > dax_attribute_group is only used by the pmem driver, and can avoid the
> > completely pointless lookup by the disk name if moved there.  This
> > leaves just a single caller of dax_get_by_host, so move dax_get_by_host
> > into the same ifdef block as that caller.
> >
> > Signed-off-by: Christoph Hellwig 
> > Reviewed-by: Dan Williams 
> > Link: https://lore.kernel.org/r/20210922173431.2454024-3-...@lst.de
> > Signed-off-by: Dan Williams 
> 
> This one already made v5.16-rc1.

Yes, but 5.16-rc1 did not exist yet when I pointed the series.

Note that the series also has a conflict against 5.16-rc1 in pmem.c,
and buildbot pointed out the file systems need explicit dax.h
includes in a few files for some configurations.

The current branch is here, I just did not bother to repost without
any comments:

   
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dax-block-cleanup

no functional changes.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 02/29] dm: make the DAX support dependend on CONFIG_FS_DAX

2021-11-22 Thread Christoph Hellwig
On Mon, Nov 22, 2021 at 06:54:09PM -0800, Dan Williams wrote:
> On Thu, Nov 18, 2021 at 10:55 PM Christoph Hellwig  wrote:
> >
> > On Wed, Nov 17, 2021 at 09:23:44AM -0800, Dan Williams wrote:
> > > Applied, fixed the spelling of 'dependent' in the subject and picked
> > > up Mike's Ack from the previous send:
> > >
> > > https://lore.kernel.org/r/yyasbvuorceds...@redhat.com
> > >
> > > Christoph, any particular reason you did not pick up the tags from the
> > > last posting?
> >
> > I thought I did, but apparently I've missed some.
> 
> I'll reply with the ones I see missing that need carrying over and add
> my own reviewed-by then you can send me a pull request when ready,
> deal?

Ok.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 04/29] dax: simplify the dax_device <-> gendisk association

2021-11-22 Thread Christoph Hellwig
On Mon, Nov 22, 2021 at 07:33:06PM -0800, Dan Williams wrote:
> Is it time to add a "DAX" symbol namespace?

What would be the benefit?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 06/29] dax: move the partition alignment check into fs_dax_get_by_bdev

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 02:25:55PM -0800, Darrick J. Wong wrote:
> > +   if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
> > +   (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
> 
> Do we have to be careful about 64-bit division here, or do we not
> support DAX on 32-bit?

I can't find anything in the Kconfig limiting DAX to 32-bit.  But
then again the existing code has divisions like this, so the compiler
is probably smart enough to turn them into shifts.

> > +   pr_info("%pg: error: unaligned partition for dax\n", bdev);
> 
> I also wonder if this should be ratelimited...?

This happens once (or maybe three times for XFS with rt and log devices)
at mount time, so I see no need for a ratelimit.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 08/29] dax: remove dax_capable

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 02:31:23PM -0800, Darrick J. Wong wrote:
> > -   struct super_block  *sb = mp->m_super;
> > -
> > -   if (!xfs_buftarg_is_dax(sb, mp->m_ddev_targp) &&
> > -  (!mp->m_rtdev_targp || !xfs_buftarg_is_dax(sb, mp->m_rtdev_targp))) {
> > +   if (!mp->m_ddev_targp->bt_daxdev &&
> > +  (!mp->m_rtdev_targp || !mp->m_rtdev_targp->bt_daxdev)) {
> 
> Nit: This  ^ paren should be indented one more column because it's a
> sub-clause of the if() test.

Done.

> Nit: xfs_alert() already adds a newline to the end of the format string.

Already done in the current tree.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 14/29] fsdax: simplify the pgoff calculation

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 02:36:42PM -0800, Darrick J. Wong wrote:
> > -   phys_addr_t phys_off = (start_sect + sector) * 512;
> > -
> > -   if (pgoff)
> > -   *pgoff = PHYS_PFN(phys_off);
> > -   if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
> 
> AFAICT, we're relying on fs_dax_get_by_bdev to have validated this
> previously, which is why the error return stuff goes away?

Exactly.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 17/29] fsdax: factor out a dax_memzero helper

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 01:22:13PM -0800, Dan Williams wrote:
> On Tue, Nov 9, 2021 at 12:34 AM Christoph Hellwig  wrote:
> >
> > Factor out a helper for the "manual" zeroing of a DAX range to clean
> > up dax_iomap_zero a lot.
> >
> 
> Small / optional fixup below:

Incorporated.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 18/29] fsdax: decouple zeroing from the iomap buffered I/O code

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 01:46:35PM -0800, Dan Williams wrote:
> > +   const struct iomap_ops *ops)
> > +{
> > +   unsigned int blocksize = i_blocksize(inode);
> > +   unsigned int off = pos & (blocksize - 1);
> > +
> > +   /* Block boundary? Nothing to do */
> > +   if (!off)
> > +   return 0;
> 
> It took me a moment to figure out why this was correct. I see it was
> also copied from iomap_truncate_page(). It makes sense for DAX where
> blocksize >= PAGE_SIZE so it's always the case that the amount of
> capacity to zero relative to a page is from @pos to the end of the
> block. Is there something else that protects the blocksize < PAGE_SIZE
> case outside of DAX?
> 
> Nothing to change for this patch, just a question I had while reviewing.

This is a helper for truncate ->setattr, where everything outside the
block is deallocated.  So zeroing is only needed inside the block.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 18/29] fsdax: decouple zeroing from the iomap buffered I/O code

2021-11-23 Thread Christoph Hellwig
On Tue, Nov 23, 2021 at 02:53:15PM -0800, Darrick J. Wong wrote:
> > -s64 dax_iomap_zero(loff_t pos, u64 length, struct iomap *iomap)
> > +static loff_t dax_zero_iter(struct iomap_iter *iter, bool *did_zero)
> 
> Shouldn't this return value remain s64 to match iomap_iter.processed?

I'll switch it over.  Given that loff_t is always the same as s64
it shouldn't really matter.

(same for the others)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


  1   2   3   4   5   6   7   8   9   10   >