Re: [PATCH net-next 03/21] ethtool, stats: introduce standard XDP statistics

2021-08-04 Thread David Ahern
On 8/4/21 12:27 PM, Saeed Mahameed wrote:
> 
>> I just ran some quick tests with my setup and measured about 1.2%
>> worst
> 
> 1.2% is a lot ! what was the test ? what is the change ?

I did say "quick test ... not exhaustive" and it was definitely
eyeballing a pps change over a small time window.

If multiple counters are bumped 20-25 million times a second (e.g. XDP
drop case), how measurable is it? I was just trying to ballpark the
overhead - 1%, 5%, more? If it is <~ 1% then there is no performance
argument in which case let's do the right thing for users - export via
existing APIs.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 00/11] Provide offset-adjusted framebuffer mappings

2021-08-04 Thread Thomas Zimmermann

Hi Sam

Am 03.08.21 um 18:21 schrieb Sam Ravnborg:

Hi Thomas,

On Tue, Aug 03, 2021 at 02:59:17PM +0200, Thomas Zimmermann wrote:

A framebuffer's offsets field might be non-zero to make the BO data
start at the specified offset within the BO's memory. Handle this
case in drm_gem_fb_vmap() and update all callers. So far, many drivers
ignore the offsets, which can lead to visual artifacts.

Patch 1 adds an optional argument to drm_gem_fb_vmap() to return the
offset-adjusted data address for use with shadow-buffered planes.

Patches 3 and 11 convert gud and vkms, which are the other callers of
drm_gem_fb_vmap(). For gud, it's just a cleanup. Vkms will handle the
framebuffer offsets correctly for its input and output framebuffers.

The other patches convert users of shadow-buffered planes to use the
data address. After conversion, each driver will use the correct data
for non-zero offsets.




   drm/ast: Use offset-adjusted shadow-plane mappings
   drm/gud: Get offset-adjusted mapping from drm_gem_fb_vmap()
   drm/hyperv: Use offset-adjusted shadow-plane mappings
   drm/mgag200: Use offset-adjusted shadow-plane mappings
   drm/cirrus: Use offset-adjusted shadow-plane mappings
   drm/gm12u320: Use offset-adjusted shadow-plane mappings
   drm/simpledrm: Use offset-adjusted shadow-plane mapping
   drm/udl: Use offset-adjusted shadow-plane mapping
   drm/vbox: Use offset-adjusted shadow-plane mappings
   drm/vkms: Use offset-adjusted shadow-plane mappings and output

Everything looked good while reading through the patches.
I cannot say if everything was properly converted but the patches looked
good.

So they are all:
Acked-by: Sam Ravnborg 


Thanks!



There was a few TODO comments visible aboput using the mapping api
properly. I assume this is coming in a later patch set..


There are indeed quite a few such comments. When we introduced the 
dma_buf_map type to solve the fbdev issue on sparc64, in many places I 
simply put the existing vaddr pointers into the map structure, and vice 
versa.


The code works as expected, but in the future we may want to use 
dma_buf_map for all framebuffer pointers. This would, for example, 
require format conversion helpers that operate on these structures. 
Adding that will require a number of changes throughout these helpers.


Best regards
Thomas



Sam



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



OpenPGP_signature
Description: OpenPGP digital signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 03/21] ethtool, stats: introduce standard XDP statistics

2021-08-04 Thread David Ahern
On 8/4/21 10:44 AM, Jakub Kicinski wrote:
> On Wed, 4 Aug 2021 10:17:56 -0600 David Ahern wrote:
>> On 8/4/21 6:36 AM, Jakub Kicinski wrote:
 XDP is going to always be eBPF based ! why not just report such stats
 to a special BPF_MAP ? BPF stack can collect the stats from the driver
 and report them to this special MAP upon user request.  
>>> Do you mean replacing the ethtool-netlink / rtnetlink etc. with
>>> a new BPF_MAP? I don't think adding another category of uAPI thru 
>>> which netdevice stats are exposed would do much good :( Plus it 
>>> doesn't address the "yet another cacheline" concern.
>>>
>>> To my understanding the need for stats recognizes the fact that (in
>>> large organizations) fleet monitoring is done by different teams than
>>> XDP development. So XDP team may have all the stats they need, but the
>>> team doing fleet monitoring has no idea how to get to them.
>>>
>>> To bridge the two worlds we need a way for the infra team to ask the
>>> XDP for well-defined stats. Maybe we should take a page from the BPF
>>> iterators book and create a program type for bridging the two worlds?
>>> Called by networking core when duping stats to extract from the
>>> existing BPF maps all the relevant stats and render them into a well
>>> known struct? Users' XDP design can still use a single per-cpu map with
>>> all the stats if they so choose, but there's a way to implement more
>>> optimal designs and still expose well-defined stats.
>>>
>>> Maybe that's too complex, IDK.  
>>
>> I was just explaining to someone internally how to get stats at all of
>> the different points in the stack to track down reasons for dropped packets:
>>
>> ethtool -S for h/w and driver
>> tc -s for drops by the qdisc
>> /proc/net/softnet_stat for drops at the backlog layer
>> netstat -s for network and transport layer
>>
>> yet another command and API just adds to the nightmare of explaining and
>> understanding these stats.
> 
> Are you referring to RTM_GETSTATS when you say "yet another command"?
> RTM_GETSTATS exists and is used by offloads today.
> 
> I'd expect ip -s (-s) to be extended to run GETSTATS and display the xdp
> stats. (Not sure why ip -s was left out of your list :))

It's on my diagram, and yes, forgot to add it here.

> 
>> There is real value in continuing to use ethtool API for XDP stats. Not
>> saying this reorg of the XDP stats is the right thing to do, only that
>> the existing API has real user benefits.
> 
> RTM_GETSTATS is an existing API. New ethtool stats are intended to be HW
> stats. I don't want to go back to ethtool being a dumping ground for all
> stats because that's what the old interface encouraged.

driver stats are important too. e.g., mlx5's cache stats and per-queue
stats.

> 
>> Does anyone have data that shows bumping a properly implemented counter
>> causes a noticeable performance degradation and if so by how much? You
>> mention 'yet another cacheline' but collecting stats on stack and
>> incrementing the driver structs at the end of the napi loop should not
>> have a huge impact versus the value the stats provide.
> 
> Not sure, maybe Jesper has some numbers. Maybe Intel folks do?

I just ran some quick tests with my setup and measured about 1.2% worst
case. Certainly not exhaustive. Perhaps Intel or Mellanox can provide
numbers for their high speed nics - e.g. ConnectX-6 and a saturated host.

> 
> I'm just allergic to situations when there is a decision made and 
> then months later patches are posted disregarding the decision, 
> without analysis on why that decision was wrong. And while the
> maintainer who made the decision is on vacation.
> 

stats is one of the many sensitive topics. I have been consistent in
defending the need to use existing APIs and tooling and not relying on
XDP program writers to add the relevant stats and then provide whatever
tool is needed to extract and print them. Standardization for
fundamental analysis tools.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 03/21] ethtool, stats: introduce standard XDP statistics

2021-08-04 Thread David Ahern
On 8/4/21 6:36 AM, Jakub Kicinski wrote:
>> XDP is going to always be eBPF based ! why not just report such stats
>> to a special BPF_MAP ? BPF stack can collect the stats from the driver
>> and report them to this special MAP upon user request.
> Do you mean replacing the ethtool-netlink / rtnetlink etc. with
> a new BPF_MAP? I don't think adding another category of uAPI thru 
> which netdevice stats are exposed would do much good :( Plus it 
> doesn't address the "yet another cacheline" concern.
> 
> To my understanding the need for stats recognizes the fact that (in
> large organizations) fleet monitoring is done by different teams than
> XDP development. So XDP team may have all the stats they need, but the
> team doing fleet monitoring has no idea how to get to them.
> 
> To bridge the two worlds we need a way for the infra team to ask the
> XDP for well-defined stats. Maybe we should take a page from the BPF
> iterators book and create a program type for bridging the two worlds?
> Called by networking core when duping stats to extract from the
> existing BPF maps all the relevant stats and render them into a well
> known struct? Users' XDP design can still use a single per-cpu map with
> all the stats if they so choose, but there's a way to implement more
> optimal designs and still expose well-defined stats.
> 
> Maybe that's too complex, IDK.

I was just explaining to someone internally how to get stats at all of
the different points in the stack to track down reasons for dropped packets:

ethtool -S for h/w and driver
tc -s for drops by the qdisc
/proc/net/softnet_stat for drops at the backlog layer
netstat -s for network and transport layer

yet another command and API just adds to the nightmare of explaining and
understanding these stats.

There is real value in continuing to use ethtool API for XDP stats. Not
saying this reorg of the XDP stats is the right thing to do, only that
the existing API has real user benefits.

Does anyone have data that shows bumping a properly implemented counter
causes a noticeable performance degradation and if so by how much? You
mention 'yet another cacheline' but collecting stats on stack and
incrementing the driver structs at the end of the napi loop should not
have a huge impact versus the value the stats provide.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast()

2021-08-04 Thread Robin Murphy

On 2021-08-04 06:02, Yongji Xie wrote:

On Tue, Aug 3, 2021 at 6:54 PM Robin Murphy  wrote:


On 2021-08-03 09:54, Yongji Xie wrote:

On Tue, Aug 3, 2021 at 3:41 PM Jason Wang  wrote:



在 2021/7/29 下午3:34, Xie Yongji 写道:

Export alloc_iova_fast() and free_iova_fast() so that
some modules can use it to improve iova allocation efficiency.



It's better to explain why alloc_iova() is not sufficient here.



Fine.


What I fail to understand from the later patches is what the IOVA domain
actually represents. If the "device" is a userspace process then
logically the "IOVA" would be the userspace address, so presumably
somewhere you're having to translate between this arbitrary address
space and actual usable addresses - if you're worried about efficiency
surely it would be even better to not do that?



Yes, userspace daemon needs to translate the "IOVA" in a DMA
descriptor to the VA (from mmap(2)). But this actually doesn't affect
performance since it's an identical mapping in most cases.


I'm not familiar with the vhost_iotlb stuff, but it looks suspiciously 
like you're walking yet another tree to make those translations. Even if 
the buffer can be mapped all at once with a fixed offset such that each 
DMA mapping call doesn't need a lookup for each individual "IOVA" - that 
might be what's happening already, but it's a bit hard to follow just 
reading the patches in my mail client - vhost_iotlb_add_range() doesn't 
look like it's super-cheap to call, and you're serialising on a lock for 
that.


My main point, though, is that if you've already got something else 
keeping track of the actual addresses, then the way you're using an 
iova_domain appears to be something you could do with a trivial bitmap 
allocator. That's why I don't buy the efficiency argument. The main 
design points of the IOVA allocator are to manage large address spaces 
while trying to maximise spatial locality to minimise the underlying 
pagetable usage, and allocating with a flexible limit to support 
multiple devices with different addressing capabilities in the same 
address space. If none of those aspects are relevant to the use-case - 
which AFAICS appears to be true here - then as a general-purpose 
resource allocator it's rubbish and has an unreasonably massive memory 
overhead and there are many, many better choices.


FWIW I've recently started thinking about moving all the caching stuff 
out of iova_domain and into the iommu-dma layer since it's now a giant 
waste of space for all the other current IOVA users.



Presumably userspace doesn't have any concern about alignment and the
things we have to worry about for the DMA API in general, so it's pretty
much just allocating slots in a buffer, and there are far more effective
ways to do that than a full-blown address space manager.


Considering iova allocation efficiency, I think the iova allocator is
better here. In most cases, we don't even need to hold a spin lock
during iova allocation.


If you're going
to reuse any infrastructure I'd have expected it to be SWIOTLB rather
than the IOVA allocator. Because, y'know, you're *literally implementing
a software I/O TLB* ;)



But actually what we can reuse in SWIOTLB is the IOVA allocator.


Huh? Those are completely unrelated and orthogonal things - SWIOTLB does 
not use an external allocator (see find_slots()). By SWIOTLB I mean 
specifically the library itself, not dma-direct or any of the other 
users built around it. The functionality for managing slots in a buffer 
and bouncing data in and out can absolutely be reused - that's why users 
like the Xen and iommu-dma code *are* reusing it instead of open-coding 
their own versions.



And
the IOVA management in SWIOTLB is not what we want. For example,
SWIOTLB allocates and uses contiguous memory for bouncing, which is
not necessary in VDUSE case.


alloc_iova() allocates a contiguous (in IOVA address) region of space. 
In vduse_domain_map_page() you use it to allocate a contiguous region of 
space from your bounce buffer. Can you clarify how that is fundamentally 
different from allocating a contiguous region of space from a bounce 
buffer? Nobody's saying the underlying implementation details of where 
the buffer itself comes from can't be tweaked.



And VDUSE needs coherent mapping which is
not supported by the SWIOTLB. Besides, the SWIOTLB works in singleton
mode (designed for platform IOMMU) , but VDUSE is based on on-chip
IOMMU (supports multiple instances).
That's not entirely true - the IOMMU bounce buffering scheme introduced 
in intel-iommu and now moved into the iommu-dma layer was already a step 
towards something conceptually similar. It does still rely on stealing 
the underlying pages from the global SWIOTLB pool at the moment, but the 
bouncing is effectively done in a per-IOMMU-domain context.


The next step is currently queued in linux-next, wherein we can now have 
individual per-device SWIOTLB pools. In fact at that point I think you 
might actually 

Re: [External] Re: [PATCH 0/4] Add multi-cid support for vsock driver

2021-08-04 Thread Stefano Garzarella

On Wed, Aug 04, 2021 at 03:09:41PM +0800, 傅关丞 wrote:

Sorry I cannot figure out a good use case.

It is normal for a host to have multiple ip addresses used for
communication.
So I thought it might be nice to have both  host and guest use multiple
CIDs for communication.
I know this is not a very strong argument.


Maybe there could be a use case for guests (which I don't see now), but 
for the host it seems pointless. The strength of vsock is that the guest 
knows that using CID=2 always reaches the host.


Moreover we have recently merged VMADDR_FLAG_TO_HOST that when set 
allows you to forward any packet to the host, regardless of the CID (not 
yet supported by vhost-vsock).




The vsock driver does not work if one of the two peers doesn't support
multiple CIDs.


This is absolutely to be avoided.

I think the virtio device feature negotiation can help here.



I have a possible solution here, but there may be some problems with it
that I haven't noticed.

Hypervisors will use different ways to send CIDs setup to the kernel based
on their vsock setup.

--For host---
If host vsock driver supports multi-cid, the hypervisor will use the
modified VHOST_VSOCK_SET_GUEST_CID call to set its CIDs.
Otherwise, the original call is used.

--For guest---
Now the virtio_vsock_config looks like this:
u64 guest_cid
u32 num_guest_cid;
u32 num_host_cid;
u32 index;
u64 cid;

If the guest vsock driver supports multi-cid, it will read num_guest_cid
and num_host_cid from the device config space.
Then it writes an index register, which is the cid it wants to read.  After
hypervisors handle this issue, it can read the cid
from the cid register.

If it does not support multi-cid, it will just read the guest_cid from the
config space, which should work just fine.



Why not add a new device feature to enable or disable multi-cid?




---Communication
For communication issues, we might need to use a new feature bit.  Let's
call it VHOST_VSOCK_SUPPORT_MULTI_CID.
The basic idea is that this feature bit is set when both host and guest
support using multiple CIDs.  After negotiation, if the feature bit
is set, the host can use all the CIDs specified to communicate with the
guest.  Otherwise, the first cid passed in will
be used as the guest_cid to communicate with guests.


I think the same feature bit can be used for the virtio_vsock_config, 
no?




Also, if the bit is set for guests, all the CIDs can be used to communicate
with the host.  Otherwise, the first cid with index 0 will be
used as the guest_cid while the VMADDR_HOST_CID will be used for host cid.


We already have VMADDR_FLAG_TO_HOST to forward all packets to the host, 
we only need to support in some way in vhost-vsock.


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v1 0/7] virtio/vsock: introduce MSG_EOR flag for SEQPACKET

2021-08-04 Thread Stefano Garzarella

Hi Arseny,

On Mon, Jul 26, 2021 at 07:31:33PM +0300, Arseny Krasnov wrote:

This patchset implements support of MSG_EOR bit for SEQPACKET
AF_VSOCK sockets over virtio transport.
Idea is to distinguish concepts of 'messages' and 'records'.
Message is result of sending calls: 'write()', 'send()', 'sendmsg()'
etc. It has fixed maximum length, and it bounds are visible using
return from receive calls: 'read()', 'recv()', 'recvmsg()' etc.
Current implementation based on message definition above.


Okay, so the implementation we merged is wrong right?
Should we disable the feature bit in stable kernels that contain it? Or 
maybe we can backport the fixes...



Record has unlimited length, it consists of multiple message,
and bounds of record are visible via MSG_EOR flag returned from
'recvmsg()' call. Sender passes MSG_EOR to sending system call and
receiver will see MSG_EOR when corresponding message will be processed.
To support MSG_EOR new bit was added along with existing
'VIRTIO_VSOCK_SEQ_EOR': 'VIRTIO_VSOCK_SEQ_EOM'(end-of-message) - now it
works in the same way as 'VIRTIO_VSOCK_SEQ_EOR'. But 'VIRTIO_VSOCK_SEQ_EOR'
is used to mark 'MSG_EOR' bit passed from userspace.


I understand that it makes sense to remap VIRTIO_VSOCK_SEQ_EOR to 
MSG_EOR to make the user understand the boundaries, but why do we need 
EOM as well?


Why do we care about the boundaries of a message within a record?
I mean, if the sender makes 3 calls:
send(A1,0)
send(A2,0)
send(A3, MSG_EOR);

IIUC it should be fine if the receiver for example receives all in one 
single recv() calll with MSG_EOR set, so why do we need EOM?


Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 15/15] nvme: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 drivers/nvme/host/core.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dfd9dec0c1f6..02ce94b2906b 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -968,12 +968,11 @@ void nvme_cleanup_cmd(struct request *req)
 {
if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
-   struct page *page = req->special_vec.bv_page;
 
-   if (page == ctrl->discard_page)
+   if (req->special_vec.bv_page == ctrl->discard_page)
clear_bit_unlock(0, >discard_page_busy);
else
-   kfree(page_address(page) + req->special_vec.bv_offset);
+   kfree(bvec_virt(>special_vec));
}
 }
 EXPORT_SYMBOL_GPL(nvme_cleanup_cmd);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 11/15] ubd: use bvec_virt

2021-08-04 Thread Anton Ivanov




On 04/08/2021 10:56, Christoph Hellwig wrote:

Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
  arch/um/drivers/ubd_kern.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index e497185dd393..cd9dc0556e91 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -1268,8 +1268,7 @@ static void ubd_map_req(struct ubd *dev, struct 
io_thread_req *io_req,
rq_for_each_segment(bvec, req, iter) {
BUG_ON(i >= io_req->desc_cnt);
  
-			io_req->io_desc[i].buffer =

-   page_address(bvec.bv_page) + bvec.bv_offset;
+   io_req->io_desc[i].buffer = bvec_virt();
io_req->io_desc[i].length = bvec.bv_len;
i++;
}


Acked-By: Anton Ivanov 

--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 14/15] dcssblk: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 drivers/s390/block/dcssblk.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 29180bdf0977..5be3d1c39a78 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -892,8 +892,7 @@ dcssblk_submit_bio(struct bio *bio)
 
index = (bio->bi_iter.bi_sector >> 3);
bio_for_each_segment(bvec, bio, iter) {
-   page_addr = (unsigned long)
-   page_address(bvec.bv_page) + bvec.bv_offset;
+   page_addr = (unsigned long)bvec_virt();
source_addr = dev_info->start + (index<<12) + bytes_done;
if (unlikely((page_addr & 4095) != 0) || (bvec.bv_len & 4095) 
!= 0)
// More paranoia.
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 13/15] dasd: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 drivers/s390/block/dasd_diag.c |  2 +-
 drivers/s390/block/dasd_eckd.c | 14 +++---
 drivers/s390/block/dasd_fba.c  |  4 ++--
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/s390/block/dasd_diag.c b/drivers/s390/block/dasd_diag.c
index 6bb775236c16..db5987281010 100644
--- a/drivers/s390/block/dasd_diag.c
+++ b/drivers/s390/block/dasd_diag.c
@@ -552,7 +552,7 @@ static struct dasd_ccw_req *dasd_diag_build_cp(struct 
dasd_device *memdev,
dbio = dreq->bio;
recid = first_rec;
rq_for_each_segment(bv, req, iter) {
-   dst = page_address(bv.bv_page) + bv.bv_offset;
+   dst = bvec_virt();
for (off = 0; off < bv.bv_len; off += blksize) {
memset(dbio, 0, sizeof (struct dasd_diag_bio));
dbio->type = rw_cmd;
diff --git a/drivers/s390/block/dasd_eckd.c b/drivers/s390/block/dasd_eckd.c
index 0de1a463c509..8610ea414322 100644
--- a/drivers/s390/block/dasd_eckd.c
+++ b/drivers/s390/block/dasd_eckd.c
@@ -3267,7 +3267,7 @@ static int dasd_eckd_ese_read(struct dasd_ccw_req *cqr, 
struct irb *irb)
end_blk = (curr_trk + 1) * recs_per_trk;
 
rq_for_each_segment(bv, req, iter) {
-   dst = page_address(bv.bv_page) + bv.bv_offset;
+   dst = bvec_virt();
for (off = 0; off < bv.bv_len; off += blksize) {
if (first_blk + blk_count >= end_blk) {
cqr->proc_bytes = blk_count * blksize;
@@ -3999,7 +3999,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp_cmd_single(
  last_rec - recid + 1, cmd, basedev, blksize);
}
rq_for_each_segment(bv, req, iter) {
-   dst = page_address(bv.bv_page) + bv.bv_offset;
+   dst = bvec_virt();
if (dasd_page_cache) {
char *copy = kmem_cache_alloc(dasd_page_cache,
  GFP_DMA | __GFP_NOWARN);
@@ -4166,7 +4166,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp_cmd_track(
idaw_dst = NULL;
idaw_len = 0;
rq_for_each_segment(bv, req, iter) {
-   dst = page_address(bv.bv_page) + bv.bv_offset;
+   dst = bvec_virt();
seg_len = bv.bv_len;
while (seg_len) {
if (new_track) {
@@ -4509,7 +4509,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp_tpm_track(
new_track = 1;
recid = first_rec;
rq_for_each_segment(bv, req, iter) {
-   dst = page_address(bv.bv_page) + bv.bv_offset;
+   dst = bvec_virt();
seg_len = bv.bv_len;
while (seg_len) {
if (new_track) {
@@ -4542,7 +4542,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp_tpm_track(
}
} else {
rq_for_each_segment(bv, req, iter) {
-   dst = page_address(bv.bv_page) + bv.bv_offset;
+   dst = bvec_virt();
last_tidaw = itcw_add_tidaw(itcw, 0x00,
dst, bv.bv_len);
if (IS_ERR(last_tidaw)) {
@@ -4778,7 +4778,7 @@ static struct dasd_ccw_req *dasd_eckd_build_cp_raw(struct 
dasd_device *startdev,
idaws = idal_create_words(idaws, rawpadpage, PAGE_SIZE);
}
rq_for_each_segment(bv, req, iter) {
-   dst = page_address(bv.bv_page) + bv.bv_offset;
+   dst = bvec_virt();
seg_len = bv.bv_len;
if (cmd == DASD_ECKD_CCW_READ_TRACK)
memset(dst, 0, seg_len);
@@ -4839,7 +4839,7 @@ dasd_eckd_free_cp(struct dasd_ccw_req *cqr, struct 
request *req)
if (private->uses_cdl == 0 || recid > 2*blk_per_trk)
ccw++;
rq_for_each_segment(bv, req, iter) {
-   dst = page_address(bv.bv_page) + bv.bv_offset;
+   dst = bvec_virt();
for (off = 0; off < bv.bv_len; off += blksize) {
/* Skip locate record. */
if (private->uses_cdl && recid <= 2*blk_per_trk)
diff --git a/drivers/s390/block/dasd_fba.c b/drivers/s390/block/dasd_fba.c
index 3ad319aee51e..e084f4de 100644
--- a/drivers/s390/block/dasd_fba.c
+++ b/drivers/s390/block/dasd_fba.c
@@ -501,7 +501,7 @@ static struct dasd_ccw_req *dasd_fba_build_cp_regular(
}
recid = first_rec;
rq_for_each_segment(bv, req, iter) {
-   dst = page_address(bv.bv_page) + bv.bv_offset;
+   dst = bvec_virt();
if (dasd_page_cache) {
char *copy = kmem_cache_alloc(dasd_page_cache,
  

[PATCH 12/15] ps3vram: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/ps3vram.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/ps3vram.c b/drivers/block/ps3vram.c
index 7fbf469651c4..c7b19e128b03 100644
--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -541,7 +541,7 @@ static struct bio *ps3vram_do_bio(struct 
ps3_system_bus_device *dev,
 
bio_for_each_segment(bvec, bio, iter) {
/* PS3 is ppc64, so we don't handle highmem */
-   char *ptr = page_address(bvec.bv_page) + bvec.bv_offset;
+   char *ptr = bvec_virt();
size_t len = bvec.bv_len, retlen;
 
dev_dbg(>core, "%s %zu bytes at offset %llu\n", op,
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 11/15] ubd: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 arch/um/drivers/ubd_kern.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index e497185dd393..cd9dc0556e91 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -1268,8 +1268,7 @@ static void ubd_map_req(struct ubd *dev, struct 
io_thread_req *io_req,
rq_for_each_segment(bvec, req, iter) {
BUG_ON(i >= io_req->desc_cnt);
 
-   io_req->io_desc[i].buffer =
-   page_address(bvec.bv_page) + bvec.bv_offset;
+   io_req->io_desc[i].buffer = bvec_virt();
io_req->io_desc[i].length = bvec.bv_len;
i++;
}
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 10/15] sd: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 drivers/scsi/sd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index b8d55af763f9..5b5b8266e142 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -886,7 +886,7 @@ static blk_status_t sd_setup_unmap_cmnd(struct scsi_cmnd 
*cmd)
cmd->cmnd[0] = UNMAP;
cmd->cmnd[8] = 24;
 
-   buf = page_address(rq->special_vec.bv_page);
+   buf = bvec_virt(>special_vec);
put_unaligned_be16(6 + 16, [0]);
put_unaligned_be16(16, [2]);
put_unaligned_be64(lba, [8]);
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 09/15] bcache: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.  Note that the existing code is
fine despite ignoring bv_offset as the bio is known to contain exactly
one page from the page allocator per bio_vec.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/bcache/btree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 183a58c89377..0595559de174 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -378,7 +378,7 @@ static void do_btree_node_write(struct btree *b)
struct bvec_iter_all iter_all;
 
bio_for_each_segment_all(bv, b->bio, iter_all) {
-   memcpy(page_address(bv->bv_page), addr, PAGE_SIZE);
+   memcpy(bvec_virt(bv), addr, PAGE_SIZE);
addr += PAGE_SIZE;
}
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 08/15] virtio_blk: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/virtio_blk.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 4b49df2dfd23..767b4f72a54d 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -166,11 +166,8 @@ static inline void virtblk_request_done(struct request 
*req)
 {
struct virtblk_req *vbr = blk_mq_rq_to_pdu(req);
 
-   if (req->rq_flags & RQF_SPECIAL_PAYLOAD) {
-   kfree(page_address(req->special_vec.bv_page) +
- req->special_vec.bv_offset);
-   }
-
+   if (req->rq_flags & RQF_SPECIAL_PAYLOAD)
+   kfree(bvec_virt(>special_vec));
blk_mq_end_request(req, virtblk_result(vbr));
 }
 
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 07/15] rbd: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 drivers/block/rbd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 6d596c2c2cd6..e65c9d706f6f 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2986,8 +2986,7 @@ static bool is_zero_bvecs(struct bio_vec *bvecs, u32 
bytes)
};
 
ceph_bvec_iter_advance_step(, bytes, ({
-   if (memchr_inv(page_address(bv.bv_page) + bv.bv_offset, 0,
-  bv.bv_len))
+   if (memchr_inv(bvec_virt(), 0, bv.bv_len))
return false;
}));
return true;
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 06/15] squashfs: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 fs/squashfs/block.c| 7 +++
 fs/squashfs/lz4_wrapper.c  | 2 +-
 fs/squashfs/lzo_wrapper.c  | 2 +-
 fs/squashfs/xz_wrapper.c   | 2 +-
 fs/squashfs/zlib_wrapper.c | 2 +-
 fs/squashfs/zstd_wrapper.c | 2 +-
 6 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c
index 855f0e87066d..2db8bcf7ff85 100644
--- a/fs/squashfs/block.c
+++ b/fs/squashfs/block.c
@@ -49,8 +49,7 @@ static int copy_bio_to_actor(struct bio *bio,
 
bytes_to_copy = min_t(int, bytes_to_copy,
  req_length - copied_bytes);
-   memcpy(actor_addr + actor_offset,
-  page_address(bvec->bv_page) + bvec->bv_offset + offset,
+   memcpy(actor_addr + actor_offset, bvec_virt(bvec) + offset,
   bytes_to_copy);
 
actor_offset += bytes_to_copy;
@@ -177,7 +176,7 @@ int squashfs_read_data(struct super_block *sb, u64 index, 
int length,
goto out_free_bio;
}
/* Extract the length of the metadata block */
-   data = page_address(bvec->bv_page) + bvec->bv_offset;
+   data = bvec_virt(bvec);
length = data[offset];
if (offset < bvec->bv_len - 1) {
length |= data[offset + 1] << 8;
@@ -186,7 +185,7 @@ int squashfs_read_data(struct super_block *sb, u64 index, 
int length,
res = -EIO;
goto out_free_bio;
}
-   data = page_address(bvec->bv_page) + bvec->bv_offset;
+   data = bvec_virt(bvec);
length |= data[0] << 8;
}
bio_free_pages(bio);
diff --git a/fs/squashfs/lz4_wrapper.c b/fs/squashfs/lz4_wrapper.c
index 233d5582fbee..b685b6238316 100644
--- a/fs/squashfs/lz4_wrapper.c
+++ b/fs/squashfs/lz4_wrapper.c
@@ -101,7 +101,7 @@ static int lz4_uncompress(struct squashfs_sb_info *msblk, 
void *strm,
while (bio_next_segment(bio, _all)) {
int avail = min(bytes, ((int)bvec->bv_len) - offset);
 
-   data = page_address(bvec->bv_page) + bvec->bv_offset;
+   data = bvec_virt(bvec);
memcpy(buff, data + offset, avail);
buff += avail;
bytes -= avail;
diff --git a/fs/squashfs/lzo_wrapper.c b/fs/squashfs/lzo_wrapper.c
index 97bb7d92ddcd..cb510a631968 100644
--- a/fs/squashfs/lzo_wrapper.c
+++ b/fs/squashfs/lzo_wrapper.c
@@ -76,7 +76,7 @@ static int lzo_uncompress(struct squashfs_sb_info *msblk, 
void *strm,
while (bio_next_segment(bio, _all)) {
int avail = min(bytes, ((int)bvec->bv_len) - offset);
 
-   data = page_address(bvec->bv_page) + bvec->bv_offset;
+   data = bvec_virt(bvec);
memcpy(buff, data + offset, avail);
buff += avail;
bytes -= avail;
diff --git a/fs/squashfs/xz_wrapper.c b/fs/squashfs/xz_wrapper.c
index e80419aed862..68f6d09bb3a2 100644
--- a/fs/squashfs/xz_wrapper.c
+++ b/fs/squashfs/xz_wrapper.c
@@ -146,7 +146,7 @@ static int squashfs_xz_uncompress(struct squashfs_sb_info 
*msblk, void *strm,
}
 
avail = min(length, ((int)bvec->bv_len) - offset);
-   data = page_address(bvec->bv_page) + bvec->bv_offset;
+   data = bvec_virt(bvec);
length -= avail;
stream->buf.in = data + offset;
stream->buf.in_size = avail;
diff --git a/fs/squashfs/zlib_wrapper.c b/fs/squashfs/zlib_wrapper.c
index bcb881ec47f2..a20e9042146b 100644
--- a/fs/squashfs/zlib_wrapper.c
+++ b/fs/squashfs/zlib_wrapper.c
@@ -76,7 +76,7 @@ static int zlib_uncompress(struct squashfs_sb_info *msblk, 
void *strm,
}
 
avail = min(length, ((int)bvec->bv_len) - offset);
-   data = page_address(bvec->bv_page) + bvec->bv_offset;
+   data = bvec_virt(bvec);
length -= avail;
stream->next_in = data + offset;
stream->avail_in = avail;
diff --git a/fs/squashfs/zstd_wrapper.c b/fs/squashfs/zstd_wrapper.c
index b7cb1faa652d..0015cf8b5582 100644
--- a/fs/squashfs/zstd_wrapper.c
+++ b/fs/squashfs/zstd_wrapper.c
@@ -94,7 +94,7 @@ static int zstd_uncompress(struct squashfs_sb_info *msblk, 
void *strm,
}
 
avail = min(length, ((int)bvec->bv_len) - offset);
-   data = page_address(bvec->bv_page) + bvec->bv_offset;
+   data = bvec_virt(bvec);
length -= avail;
in_buf.src = data + offset;

[PATCH 05/15] dm-integrity: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-integrity.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c
index 20f2510db1f6..a9ea361769a7 100644
--- a/drivers/md/dm-integrity.c
+++ b/drivers/md/dm-integrity.c
@@ -1819,7 +1819,7 @@ static void integrity_metadata(struct work_struct *w)
unsigned this_len;
 
BUG_ON(PageHighMem(biv.bv_page));
-   tag = lowmem_page_address(biv.bv_page) + 
biv.bv_offset;
+   tag = bvec_virt();
this_len = min(biv.bv_len, data_to_process);
r = dm_integrity_rw_tag(ic, tag, 
>metadata_block, >metadata_offset,
this_len, dio->op == 
REQ_OP_READ ? TAG_READ : TAG_WRITE);
@@ -2006,7 +2006,7 @@ static bool __journal_read_write(struct dm_integrity_io 
*dio, struct bio *bio,
unsigned tag_now = min(biv.bv_len, 
tag_todo);
char *tag_addr;
BUG_ON(PageHighMem(biv.bv_page));
-   tag_addr = 
lowmem_page_address(biv.bv_page) + biv.bv_offset;
+   tag_addr = bvec_virt();
if (likely(dio->op == REQ_OP_WRITE))
memcpy(tag_ptr, tag_addr, 
tag_now);
else
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 04/15] dm-ebs: use bvec_virt

2021-08-04 Thread Christoph Hellwig
Use bvec_virt instead of open coding it.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/dm-ebs-target.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-ebs-target.c b/drivers/md/dm-ebs-target.c
index 71475a2410be..0c509dae0ff8 100644
--- a/drivers/md/dm-ebs-target.c
+++ b/drivers/md/dm-ebs-target.c
@@ -74,7 +74,7 @@ static int __ebs_rw_bvec(struct ebs_c *ec, int rw, struct 
bio_vec *bv, struct bv
if (unlikely(!bv->bv_page || !bv_len))
return -EIO;
 
-   pa = page_address(bv->bv_page) + bv->bv_offset;
+   pa = bvec_virt(bv);
 
/* Handle overlapping page <-> blocks */
while (bv_len) {
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 03/15] dm: make EBS depend on !HIGHMEM

2021-08-04 Thread Christoph Hellwig
__ebs_rw_bvec use page_address on the submitted bios data, and thus
can't deal with highmem.  Disable the target on highmem configs.

Signed-off-by: Christoph Hellwig 
---
 drivers/md/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 0602e82a9516..ecc559c60d40 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -340,7 +340,7 @@ config DM_WRITECACHE
 
 config DM_EBS
tristate "Emulated block size target (EXPERIMENTAL)"
-   depends on BLK_DEV_DM
+   depends on BLK_DEV_DM && !HIGHMEM
select DM_BUFIO
help
  dm-ebs emulates smaller logical block size on backing devices
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 02/15] block: use bvec_virt in bio_integrity_{process,free}

2021-08-04 Thread Christoph Hellwig
Use the bvec_virt helper to clean up the bio integrity processing a
little bit.

Signed-off-by: Christoph Hellwig 
---
 block/bio-integrity.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 8f54d49dc500..6b47cddbbca1 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -104,8 +104,7 @@ void bio_integrity_free(struct bio *bio)
struct bio_set *bs = bio->bi_pool;
 
if (bip->bip_flags & BIP_BLOCK_INTEGRITY)
-   kfree(page_address(bip->bip_vec->bv_page) +
- bip->bip_vec->bv_offset);
+   kfree(bvec_virt(bip->bip_vec));
 
__bio_integrity_free(bs, bip);
bio->bi_integrity = NULL;
@@ -163,13 +162,11 @@ static blk_status_t bio_integrity_process(struct bio *bio,
struct bio_vec bv;
struct bio_integrity_payload *bip = bio_integrity(bio);
blk_status_t ret = BLK_STS_OK;
-   void *prot_buf = page_address(bip->bip_vec->bv_page) +
-   bip->bip_vec->bv_offset;
 
iter.disk_name = bio->bi_bdev->bd_disk->disk_name;
iter.interval = 1 << bi->interval_exp;
iter.seed = proc_iter->bi_sector;
-   iter.prot_buf = prot_buf;
+   iter.prot_buf = bvec_virt(bip->bip_vec);
 
__bio_for_each_segment(bv, bio, bviter, *proc_iter) {
void *kaddr = bvec_kmap_local();
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 01/15] bvec: add a bvec_virt helper

2021-08-04 Thread Christoph Hellwig
Add a helper to get the virtual address for a bvec.  This avoids that
all callers need to know about the page + offset representation.

Signed-off-by: Christoph Hellwig 
---
 include/linux/bvec.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/bvec.h b/include/linux/bvec.h
index f9fa43b940ff..0e9bdd42dafb 100644
--- a/include/linux/bvec.h
+++ b/include/linux/bvec.h
@@ -229,4 +229,16 @@ static inline void memzero_bvec(struct bio_vec *bvec)
memzero_page(bvec->bv_page, bvec->bv_offset, bvec->bv_len);
 }
 
+/**
+ * bvec_virt - return the virtual address for a bvec
+ * @bvec: bvec to return the virtual address for
+ *
+ * Note: the caller must ensure that @bvec->bv_page is not a highmem page.
+ */
+static inline void *bvec_virt(struct bio_vec *bvec)
+{
+   WARN_ON_ONCE(PageHighMem(bvec->bv_page));
+   return page_address(bvec->bv_page) + bvec->bv_offset;
+}
+
 #endif /* __LINUX_BVEC_H */
-- 
2.30.2

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


add a bvec_virt helper

2021-08-04 Thread Christoph Hellwig
Hi Jens,

this series adds a bvec_virt helper to return the virtual address of the
data in bvec to replace the open coded calculation, and as a reminder
that generall bio/bvec data can be in high memory unless it is caller
controller or in an architecture specific driver where highmem is
impossible.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v10 10/17] virtio: Handle device reset failure in register_virtio_device()

2021-08-04 Thread Jason Wang


在 2021/8/4 下午4:50, Yongji Xie 写道:

On Wed, Aug 4, 2021 at 4:32 PM Jason Wang  wrote:


在 2021/8/3 下午5:38, Yongji Xie 写道:

On Tue, Aug 3, 2021 at 4:09 PM Jason Wang  wrote:

在 2021/7/29 下午3:34, Xie Yongji 写道:

The device reset may fail in virtio-vdpa case now, so add checks to
its return value and fail the register_virtio_device().

So the reset() would be called by the driver during remove as well, or
is it sufficient to deal only with the reset during probe?


Actually there is no way to handle failure during removal. And it
should be safe with the protection of software IOTLB even if the
reset() fails.

Thanks,
Yongji


If this is true, does it mean we don't even need to care about reset
failure?


But we need to handle the failure in the vhost-vdpa case, isn't it?



Yes, but:

- This patch is for virtio not for vhost, if we don't care virtio, we 
can avoid the changes

- For vhost, there could be two ways probably:

1) let the set_status to report error
2) require userspace to re-read for status

It looks to me you want to go with 1) and I'm not sure whether or not 
it's too late to go with 2).


Thanks




Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v10 05/17] vhost-vdpa: Fail the vhost_vdpa_set_status() on reset failure

2021-08-04 Thread Jason Wang


在 2021/8/3 下午5:50, Yongji Xie 写道:

On Tue, Aug 3, 2021 at 4:10 PM Jason Wang  wrote:


在 2021/7/29 下午3:34, Xie Yongji 写道:

Re-read the device status to ensure it's set to zero during
resetting. Otherwise, fail the vhost_vdpa_set_status() after timeout.

Signed-off-by: Xie Yongji 
---
   drivers/vhost/vdpa.c | 11 ++-
   1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index b07aa161f7ad..dd05c1e1133c 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -157,7 +157,7 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 
__user *statusp)
   struct vdpa_device *vdpa = v->vdpa;
   const struct vdpa_config_ops *ops = vdpa->config;
   u8 status, status_old;
- int nvqs = v->nvqs;
+ int timeout = 0, nvqs = v->nvqs;
   u16 i;

   if (copy_from_user(, statusp, sizeof(status)))
@@ -173,6 +173,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 
__user *statusp)
   return -EINVAL;

   ops->set_status(vdpa, status);
+ if (status == 0) {
+ while (ops->get_status(vdpa)) {
+ timeout += 20;
+ if (timeout > VDPA_RESET_TIMEOUT_MS)
+ return -EIO;
+
+ msleep(20);
+ }


Spec has introduced the reset a one of the basic facility. And consider
we differ reset here.

This makes me think if it's better to introduce a dedicated vdpa ops for
reset?


Do you mean replace the ops.set_status(vdev, 0) with the ops.reset()?
Then we can remove the timeout processing which is device specific
stuff.



Exactly.

Thanks




Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v10 10/17] virtio: Handle device reset failure in register_virtio_device()

2021-08-04 Thread Jason Wang


在 2021/8/3 下午5:38, Yongji Xie 写道:

On Tue, Aug 3, 2021 at 4:09 PM Jason Wang  wrote:


在 2021/7/29 下午3:34, Xie Yongji 写道:

The device reset may fail in virtio-vdpa case now, so add checks to
its return value and fail the register_virtio_device().


So the reset() would be called by the driver during remove as well, or
is it sufficient to deal only with the reset during probe?


Actually there is no way to handle failure during removal. And it
should be safe with the protection of software IOTLB even if the
reset() fails.

Thanks,
Yongji



If this is true, does it mean we don't even need to care about reset 
failure?


Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v10 04/17] vdpa: Fail the vdpa_reset() if fail to set device status to zero

2021-08-04 Thread Jason Wang


在 2021/8/3 下午5:31, Yongji Xie 写道:

On Tue, Aug 3, 2021 at 3:58 PM Jason Wang  wrote:


在 2021/7/29 下午3:34, Xie Yongji 写道:

Re-read the device status to ensure it's set to zero during
resetting. Otherwise, fail the vdpa_reset() after timeout.

Signed-off-by: Xie Yongji 
---
   include/linux/vdpa.h | 15 ++-
   1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 406d53a606ac..d1a80ef05089 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -6,6 +6,7 @@
   #include 
   #include 
   #include 
+#include 

   /**
* struct vdpa_calllback - vDPA callback definition.
@@ -340,12 +341,24 @@ static inline struct device *vdpa_get_dma_dev(struct 
vdpa_device *vdev)
   return vdev->dma_dev;
   }

-static inline void vdpa_reset(struct vdpa_device *vdev)
+#define VDPA_RESET_TIMEOUT_MS 1000
+
+static inline int vdpa_reset(struct vdpa_device *vdev)
   {
   const struct vdpa_config_ops *ops = vdev->config;
+ int timeout = 0;

   vdev->features_valid = false;
   ops->set_status(vdev, 0);
+ while (ops->get_status(vdev)) {
+ timeout += 20;
+ if (timeout > VDPA_RESET_TIMEOUT_MS)
+ return -EIO;
+
+ msleep(20);
+ }


I wonder if it's better to do this in the vDPA parent?

Thanks


Sorry, I didn't get you here. Do you mean vDPA parent driver (e.g.
VDUSE)?



Yes, since the how it's expected to behave depends on the specific hardware.

Even for the spec, the behavior is transport specific:

PCI: requires reread until 0
MMIO: doesn't require but it might not work for the hardware so we 
decide to change

CCW: the succeed of the ccw command means the success of the reset

Thanks



Actually I didn't find any other place where I can do
set_status() and get_status().

Thanks,
Yongji



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v10 02/17] file: Export receive_fd() to modules

2021-08-04 Thread Jason Wang


在 2021/8/3 下午5:01, Yongji Xie 写道:

On Tue, Aug 3, 2021 at 3:46 PM Jason Wang  wrote:


在 2021/7/29 下午3:34, Xie Yongji 写道:

Export receive_fd() so that some modules can use
it to pass file descriptor between processes without
missing any security stuffs.

Signed-off-by: Xie Yongji 
---
   fs/file.c| 6 ++
   include/linux/file.h | 7 +++
   2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 86dc9956af32..210e540672aa 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1134,6 +1134,12 @@ int receive_fd_replace(int new_fd, struct file *file, 
unsigned int o_flags)
   return new_fd;
   }

+int receive_fd(struct file *file, unsigned int o_flags)
+{
+ return __receive_fd(file, NULL, o_flags);


Any reason that receive_fd_user() can live in the file.h?


Since no modules use it.

Thanks,
Yongji



Ok.


Acked-by: Jason Wang 






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH virtiofsd 1/3] virtiofsd: expand fuse protocol to support per-file DAX

2021-08-04 Thread Jeffle Xu
Signed-off-by: Jeffle Xu 
---
 include/standard-headers/linux/fuse.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/standard-headers/linux/fuse.h 
b/include/standard-headers/linux/fuse.h
index 950d7edb7e..7bd006ffcb 100644
--- a/include/standard-headers/linux/fuse.h
+++ b/include/standard-headers/linux/fuse.h
@@ -356,6 +356,7 @@ struct fuse_file_lock {
 #define FUSE_MAP_ALIGNMENT (1 << 26)
 #define FUSE_SUBMOUNTS (1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2(1 << 28)
+#define FUSE_PERFILE_DAX   (1 << 30)
 
 /**
  * CUSE INIT request/reply flags
@@ -440,6 +441,7 @@ struct fuse_file_lock {
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
  */
 #define FUSE_ATTR_SUBMOUNT  (1 << 0)
+#define FUSE_ATTR_DAX  (1 << 1)
 
 /**
  * Open flags
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH virtiofsd 2/3] virtiofsd: support per-file DAX negotiation in FUSE_INIT

2021-08-04 Thread Jeffle Xu
Signed-off-by: Jeffle Xu 
---
 tools/virtiofsd/fuse_common.h| 5 +
 tools/virtiofsd/fuse_lowlevel.c  | 6 ++
 tools/virtiofsd/passthrough_ll.c | 3 +++
 3 files changed, 14 insertions(+)

diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 8a75729be9..ee6fc64c23 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -372,6 +372,11 @@ struct fuse_file_info {
  */
 #define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
 
+/**
+ * Indicates support for per-file DAX.
+ */
+#define FUSE_CAP_PERFILE_DAX (1 << 29)
+
 /**
  * Ioctl flags
  *
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 50fc5c8d5a..60fb82a92a 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2065,6 +2065,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
 se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV_V2;
 }
+if (arg->flags & FUSE_PERFILE_DAX) {
+se->conn.capable |= FUSE_CAP_PERFILE_DAX;
+}
 #ifdef HAVE_SPLICE
 #ifdef HAVE_VMSPLICE
 se->conn.capable |= FUSE_CAP_SPLICE_WRITE | FUSE_CAP_SPLICE_MOVE;
@@ -2180,6 +2183,9 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 if (se->conn.want & FUSE_CAP_POSIX_ACL) {
 outarg.flags |= FUSE_POSIX_ACL;
 }
+if (se->conn.want & FUSE_CAP_PERFILE_DAX) {
+outarg.flags |= FUSE_PERFILE_DAX;
+}
 outarg.max_readahead = se->conn.max_readahead;
 outarg.max_write = se->conn.max_write;
 if (se->conn.max_background >= (1 << 16)) {
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b76d878509..da88304253 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -702,6 +702,9 @@ static void lo_init(void *userdata, struct fuse_conn_info 
*conn)
 conn->want &= ~FUSE_CAP_HANDLE_KILLPRIV_V2;
 lo->killpriv_v2 = 0;
 }
+if (conn->capable & FUSE_CAP_PERFILE_DAX) {
+conn->want |= FUSE_CAP_PERFILE_DAX;
+}
 }
 
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH virtiofsd 3/3] virtiofsd: support per-file DAX in FUSE_LOOKUP

2021-08-04 Thread Jeffle Xu
Signed-off-by: Jeffle Xu 
---
 tools/virtiofsd/passthrough_ll.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index da88304253..1a472ce7f0 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -1026,6 +1026,9 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t 
parent, const char *name,
 e->attr_flags |= FUSE_ATTR_SUBMOUNT;
 }
 
+if (e->attr.st_size >= 1048576)
+   e->attr_flags |= FUSE_ATTR_DAX;
+
 inode = lo_find(lo, >attr, mnt_id);
 if (inode) {
 close(newfd);
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH virtiofsd 0/3] virtiofsd: support per-file DAX

2021-08-04 Thread Jeffle Xu
As discussed with Vivek Goyal, I tried to make virtiofsd support
per-file DAX by checking if the file is marked with FS_DAX_FL attr. Thus
I need to implement .ioctl() method for passthrough_ll.c (because
FS_DAX_FL attr is get/set by FS_IOC_GETFLAGS/FS_IOC_SETFLAGS ioctl),
something like

```
static struct fuse_lowlevel_ops lo_oper = {
+.ioctl = lo_ioctl,

+static void lo_ioctl(...)
+{
+   ret = ioctl(fd, FS_IOC_SETFLAGS, ...);
}
```

But unfortunately once virtiofsd calls ioctl() syscall, it directly
exits, and qemu also hangs with 'qemu-system-x86_64: Unexpected
end-of-file before all data were read'. I'm not sure if it's because
ioctl() is not permitted at all for virtiofsd or qemu. Many thanks if
someone familiar with virtualization could help.

The code repository of virtiofsd used is:
gitlab.com/virtio-fs/qemu.git virtio-fs-dev

Thus this patch set is still used for test only, marking files larger
than 1MB shall enable per-file DAX.

Jeffle Xu (3):
  virtiofsd: expand fuse protocol to support per-file DAX
  virtiofsd: support per-file DAX negotiation in FUSE_INIT
  virtiofsd: support per-file DAX in FUSE_LOOKUP

 include/standard-headers/linux/fuse.h | 2 ++
 tools/virtiofsd/fuse_common.h | 5 +
 tools/virtiofsd/fuse_lowlevel.c   | 6 ++
 tools/virtiofsd/passthrough_ll.c  | 6 ++
 4 files changed, 19 insertions(+)

-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 0/8] fuse,virtiofs: support per-file DAX

2021-08-04 Thread Jeffle Xu
changes since v2:
- modify fuse_show_options() accordingly to make it compatible with
  new tri-state mount option (patch 2)
- extract FUSE protocol changes into one seperate patch (patch 3)
- FUSE server/client need to negotiate if they support per-file DAX
  (patch 4)
- extract DONT_CACHE logic into patch 6/7


This patchset adds support of per-file DAX for virtiofs, which is
inspired by Ira Weiny's work on ext4[1] and xfs[2].

Any comment is welcome.

[1] commit 9cb20f94afcd ("fs/ext4: Make DAX mount option a tri-state")
[2] commit 02beb2686ff9 ("fs/xfs: Make DAX mount option a tri-state")

v2: https://www.spinics.net/lists/linux-fsdevel/msg199584.html
v1: https://www.spinics.net/lists/linux-virtualization/msg51008.html

Jeffle Xu (8):
  fuse: add fuse_should_enable_dax() helper
  fuse: Make DAX mount option a tri-state
  fuse: support per-file DAX
  fuse: negotiate if server/client supports per-file DAX
  fuse: enable per-file DAX
  fuse: mark inode DONT_CACHE when per-file DAX indication changes
  fuse: support changing per-file DAX flag inside guest
  fuse: show '-o dax=inode' option only when FUSE server supports

 fs/fuse/dax.c | 32 ++--
 fs/fuse/file.c|  4 ++--
 fs/fuse/fuse_i.h  | 22 ++
 fs/fuse/inode.c   | 27 ++-
 fs/fuse/ioctl.c   | 15 +--
 fs/fuse/virtio_fs.c   | 16 ++--
 include/uapi/linux/fuse.h |  9 -
 7 files changed, 103 insertions(+), 22 deletions(-)

-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 5/8] fuse: enable per-file DAX

2021-08-04 Thread Jeffle Xu
Enable per-file DAX if fuse server advertises that the file supports
that.

Currently the state whether the file enables DAX or not is initialized
only when inode is instantiated.

Signed-off-by: Jeffle Xu 
---
 fs/fuse/dax.c| 12 
 fs/fuse/file.c   |  4 ++--
 fs/fuse/fuse_i.h |  4 ++--
 fs/fuse/inode.c  |  2 +-
 4 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index fe4e9593a590..30833f8d37dd 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1339,7 +1339,7 @@ static const struct address_space_operations 
fuse_dax_file_aops  = {
.invalidatepage = noop_invalidatepage,
 };
 
-static bool fuse_should_enable_dax(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode, unsigned int flags)
 {
struct fuse_conn *fc = get_fuse_conn(inode);
unsigned int dax_mode = fc->dax_mode;
@@ -1348,12 +1348,16 @@ static bool fuse_should_enable_dax(struct inode *inode)
if (dax_mode == FUSE_DAX_NEVER)
return false;
 
-   return true;
+   if (dax_mode == FUSE_DAX_ALWAYS)
+   return true;
+
+   WARN_ON_ONCE(dax_mode != FUSE_DAX_INODE);
+   return fc->perfile_dax && (flags & FUSE_ATTR_DAX);
 }
 
-void fuse_dax_inode_init(struct inode *inode)
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags)
 {
-   if (!fuse_should_enable_dax(inode))
+   if (!fuse_should_enable_dax(inode, flags))
return;
 
inode->i_flags |= S_DAX;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 97f860cfc195..b494ff08f08c 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -3142,7 +3142,7 @@ static const struct address_space_operations 
fuse_file_aops  = {
.write_end  = fuse_write_end,
 };
 
-void fuse_init_file_inode(struct inode *inode)
+void fuse_init_file_inode(struct inode *inode, unsigned int flags)
 {
struct fuse_inode *fi = get_fuse_inode(inode);
 
@@ -3156,5 +3156,5 @@ void fuse_init_file_inode(struct inode *inode)
fi->writepages = RB_ROOT;
 
if (IS_ENABLED(CONFIG_FUSE_DAX))
-   fuse_dax_inode_init(inode);
+   fuse_dax_inode_init(inode, flags);
 }
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 0b21e76a379a..7b7b4c208af2 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1006,7 +1006,7 @@ int fuse_notify_poll_wakeup(struct fuse_conn *fc,
 /**
  * Initialize file operations on a regular file
  */
-void fuse_init_file_inode(struct inode *inode);
+void fuse_init_file_inode(struct inode *inode, unsigned int flags);
 
 /**
  * Initialize inode operations on regular files and special files
@@ -1258,7 +1258,7 @@ int fuse_dax_conn_alloc(struct fuse_conn *fc, enum 
fuse_dax_mode mode,
struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
-void fuse_dax_inode_init(struct inode *inode);
+void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int 
map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index d59aea41d70d..882c376f188b 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -281,7 +281,7 @@ static void fuse_init_inode(struct inode *inode, struct 
fuse_attr *attr)
inode->i_ctime.tv_nsec = attr->ctimensec;
if (S_ISREG(inode->i_mode)) {
fuse_init_common(inode);
-   fuse_init_file_inode(inode);
+   fuse_init_file_inode(inode, attr->flags);
} else if (S_ISDIR(inode->i_mode))
fuse_init_dir(inode);
else if (S_ISLNK(inode->i_mode))
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 8/8] fuse: show '-o dax=inode' option only when FUSE server supports

2021-08-04 Thread Jeffle Xu
Prior of this patch, the mount option will still show '-o dax=inode'
when FUSE server advertises that it doesn't support per-file DAX.

Signed-off-by: Jeffle Xu 
---
 fs/fuse/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 9108d8ab39bc..439f74a041fa 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -697,7 +697,7 @@ static int fuse_show_options(struct seq_file *m, struct 
dentry *root)
seq_puts(m, ",dax=always");
else if (fc->dax_mode == FUSE_DAX_NEVER)
seq_puts(m, ",dax=never");
-   else if (fc->dax_mode == FUSE_DAX_INODE)
+   else if ((fc->dax_mode == FUSE_DAX_INODE) && fc->perfile_dax)
seq_puts(m, ",dax=inode");
 #endif
 
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 4/8] fuse: negotiate if server/client supports per-file DAX

2021-08-04 Thread Jeffle Xu
Among the FUSE_INIT phase, server/client shall negotiate if supporting
per-file DAX.

Requirements for server:
- capable of handling SETFLAGS/FSSETXATTR ioctl and storing
  FS_DAX_FL/FS_XFLAG_DAX persistently.
- set FUSE_ATTR_DAX if the file capable of per-file DAX when replying
  FUSE_LOOKUP request accordingly.

Requirements for client:
- capable of handling per-file DAX when receiving FUSE_ATTR_DAX.

Signed-off-by: Jeffle Xu 
---
 fs/fuse/fuse_i.h |  3 +++
 fs/fuse/inode.c  | 12 +++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index a23dd8d0c181..0b21e76a379a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -770,6 +770,9 @@ struct fuse_conn {
/* Propagate syncfs() to server */
unsigned int sync_fs:1;
 
+   /* Does the filesystem support per-file DAX? */
+   unsigned int perfile_dax:1;
+
/** The number of requests waiting for completion */
atomic_t num_waiting;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 0bc0d8af81e1..d59aea41d70d 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -1087,10 +1087,12 @@ static void process_init_reply(struct fuse_mount *fm, 
struct fuse_args *args,
min_t(unsigned int, fc->max_pages_limit,
max_t(unsigned int, arg->max_pages, 1));
}
-   if (IS_ENABLED(CONFIG_FUSE_DAX) &&
-   arg->flags & FUSE_MAP_ALIGNMENT &&
-   !fuse_dax_check_alignment(fc, arg->map_alignment)) {
-   ok = false;
+   if (IS_ENABLED(CONFIG_FUSE_DAX)) {
+   if (arg->flags & FUSE_MAP_ALIGNMENT &&
+   !fuse_dax_check_alignment(fc, 
arg->map_alignment))
+   ok = false;
+   if (arg->flags & FUSE_PERFILE_DAX)
+   fc->perfile_dax = 1;
}
if (arg->flags & FUSE_HANDLE_KILLPRIV_V2) {
fc->handle_killpriv_v2 = 1;
@@ -1144,7 +1146,7 @@ void fuse_send_init(struct fuse_mount *fm)
FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT;
 #ifdef CONFIG_FUSE_DAX
if (fm->fc->dax)
-   ia->in.flags |= FUSE_MAP_ALIGNMENT;
+   ia->in.flags |= FUSE_MAP_ALIGNMENT | FUSE_PERFILE_DAX;
 #endif
if (fm->fc->auto_submounts)
ia->in.flags |= FUSE_SUBMOUNTS;
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 3/8] fuse: support per-file DAX

2021-08-04 Thread Jeffle Xu
Expand the fuse protocol to support per-file DAX.

FUSE_PERFILE_DAX flag is added indicating if fuse server/client
supporting per-file DAX when sending or replying FUSE_INIT request.

Besides, FUSE_ATTR_DAX flag is added indicating if DAX shall be enabled
for corresponding file when replying FUSE_LOOKUP request.

Signed-off-by: Jeffle Xu 
---
 include/uapi/linux/fuse.h | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 36ed092227fa..15a1f5fc0797 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -184,6 +184,9 @@
  *
  *  7.34
  *  - add FUSE_SYNCFS
+ *
+ *  7.35
+ *  - add FUSE_PERFILE_DAX, FUSE_ATTR_DAX
  */
 
 #ifndef _LINUX_FUSE_H
@@ -219,7 +222,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 34
+#define FUSE_KERNEL_MINOR_VERSION 35
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -336,6 +339,7 @@ struct fuse_file_lock {
  * write/truncate sgid is killed only if file has group
  * execute permission. (Same as Linux VFS behavior).
  * FUSE_SETXATTR_EXT:  Server supports extended struct fuse_setxattr_in
+ * FUSE_PERFILE_DAX:   kernel supports per-file DAX
  */
 #define FUSE_ASYNC_READ(1 << 0)
 #define FUSE_POSIX_LOCKS   (1 << 1)
@@ -367,6 +371,7 @@ struct fuse_file_lock {
 #define FUSE_SUBMOUNTS (1 << 27)
 #define FUSE_HANDLE_KILLPRIV_V2(1 << 28)
 #define FUSE_SETXATTR_EXT  (1 << 29)
+#define FUSE_PERFILE_DAX   (1 << 30)
 
 /**
  * CUSE INIT request/reply flags
@@ -449,8 +454,10 @@ struct fuse_file_lock {
  * fuse_attr flags
  *
  * FUSE_ATTR_SUBMOUNT: Object is a submount root
+ * FUSE_ATTR_DAX: Enable DAX for this file in per-file DAX mode
  */
 #define FUSE_ATTR_SUBMOUNT  (1 << 0)
+#define FUSE_ATTR_DAX  (1 << 1)
 
 /**
  * Open flags
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 6/8] fuse: mark inode DONT_CACHE when per-file DAX indication changes

2021-08-04 Thread Jeffle Xu
When the per-file DAX indication changes while the file is still
*opened*, it is quite complicated and maybe fragile to dynamically
change the DAX state.

Hence mark the inode and corresponding dentries as DONE_CACHE once the
per-file DAX indication changes, so that the inode instance will be
evicted and freed as soon as possible once the file is closed and the
last reference to the inode is put. And then when the file gets reopened
next time, the inode will reflect the new DAX state.

In summary, when the per-file DAX indication changes for an *opened*
file, the state of the file won't be updated until this file is closed
and reopened later.

Signed-off-by: Jeffle Xu 
---
 fs/fuse/dax.c| 9 +
 fs/fuse/fuse_i.h | 1 +
 fs/fuse/inode.c  | 3 +++
 3 files changed, 13 insertions(+)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 30833f8d37dd..14bc5b9a0576 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1364,6 +1364,15 @@ void fuse_dax_inode_init(struct inode *inode, unsigned 
int flags)
inode->i_data.a_ops = _dax_file_aops;
 }
 
+void fuse_dax_dontcache(struct inode *inode, bool newdax)
+{
+   struct fuse_conn *fc = get_fuse_conn(inode);
+
+   if (fc->dax_mode == FUSE_DAX_INODE &&
+   fc->perfile_dax && IS_DAX(inode) != newdax)
+   d_mark_dontcache(inode);
+}
+
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment)
 {
if (fc->dax && (map_alignment > FUSE_DAX_SHIFT)) {
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7b7b4c208af2..56fe1c4d2136 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1260,6 +1260,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
 void fuse_dax_inode_cleanup(struct inode *inode);
+void fuse_dax_dontcache(struct inode *inode, bool newdax);
 bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int 
map_alignment);
 void fuse_dax_cancel_work(struct fuse_conn *fc);
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 882c376f188b..9108d8ab39bc 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -269,6 +269,9 @@ void fuse_change_attributes(struct inode *inode, struct 
fuse_attr *attr,
if (inval)
invalidate_inode_pages2(inode->i_mapping);
}
+
+   if (IS_ENABLED(CONFIG_FUSE_DAX))
+   fuse_dax_dontcache(inode, attr->flags & FUSE_ATTR_DAX);
 }
 
 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr)
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 2/8] fuse: Make DAX mount option a tri-state

2021-08-04 Thread Jeffle Xu
We add 'always', 'never', and 'inode' (default). '-o dax' continues to
operate the same which is equivalent to 'always'.

By the time this patch is applied, 'inode' mode is actually equal to
'always' mode, before the per-file DAX flag is introduced in the
following patch.

Signed-off-by: Jeffle Xu 
---
 fs/fuse/dax.c   |  9 +++--
 fs/fuse/fuse_i.h| 14 --
 fs/fuse/inode.c | 10 +++---
 fs/fuse/virtio_fs.c | 16 ++--
 4 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index c6f4e82e65f3..fe4e9593a590 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1288,11 +1288,14 @@ static int fuse_dax_mem_range_init(struct fuse_conn_dax 
*fcd)
return ret;
 }
 
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev)
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode dax_mode,
+   struct dax_device *dax_dev)
 {
struct fuse_conn_dax *fcd;
int err;
 
+   fc->dax_mode = dax_mode;
+
if (!dax_dev)
return 0;
 
@@ -1339,8 +1342,10 @@ static const struct address_space_operations 
fuse_dax_file_aops  = {
 static bool fuse_should_enable_dax(struct inode *inode)
 {
struct fuse_conn *fc = get_fuse_conn(inode);
+   unsigned int dax_mode = fc->dax_mode;
 
-   if (!fc->dax)
+   /* If 'dax=always/inode', fc->dax couldn't be NULL */
+   if (dax_mode == FUSE_DAX_NEVER)
return false;
 
return true;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 07829ce78695..a23dd8d0c181 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -487,6 +487,12 @@ struct fuse_dev {
struct list_head entry;
 };
 
+enum fuse_dax_mode {
+   FUSE_DAX_INODE,
+   FUSE_DAX_ALWAYS,
+   FUSE_DAX_NEVER,
+};
+
 struct fuse_fs_context {
int fd;
unsigned int rootmode;
@@ -503,7 +509,7 @@ struct fuse_fs_context {
bool no_control:1;
bool no_force_umount:1;
bool legacy_opts_show:1;
-   bool dax:1;
+   enum fuse_dax_mode dax_mode;
unsigned int max_read;
unsigned int blksize;
const char *subtype;
@@ -801,6 +807,9 @@ struct fuse_conn {
struct list_head devices;
 
 #ifdef CONFIG_FUSE_DAX
+   /* dax mode: FUSE_DAX_* (always, never or per-file) */
+   enum fuse_dax_mode dax_mode;
+
/* Dax specific conn data, non-NULL if DAX is enabled */
struct fuse_conn_dax *dax;
 #endif
@@ -1242,7 +1251,8 @@ ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct 
iov_iter *to);
 ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from);
 int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma);
 int fuse_dax_break_layouts(struct inode *inode, u64 dmap_start, u64 dmap_end);
-int fuse_dax_conn_alloc(struct fuse_conn *fc, struct dax_device *dax_dev);
+int fuse_dax_conn_alloc(struct fuse_conn *fc, enum fuse_dax_mode mode,
+   struct dax_device *dax_dev);
 void fuse_dax_conn_free(struct fuse_conn *fc);
 bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
 void fuse_dax_inode_init(struct inode *inode);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index b9beb39a4a18..0bc0d8af81e1 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -690,8 +690,12 @@ static int fuse_show_options(struct seq_file *m, struct 
dentry *root)
seq_printf(m, ",blksize=%lu", sb->s_blocksize);
}
 #ifdef CONFIG_FUSE_DAX
-   if (fc->dax)
-   seq_puts(m, ",dax");
+   if (fc->dax_mode == FUSE_DAX_ALWAYS)
+   seq_puts(m, ",dax=always");
+   else if (fc->dax_mode == FUSE_DAX_NEVER)
+   seq_puts(m, ",dax=never");
+   else if (fc->dax_mode == FUSE_DAX_INODE)
+   seq_puts(m, ",dax=inode");
 #endif
 
return 0;
@@ -1434,7 +1438,7 @@ int fuse_fill_super_common(struct super_block *sb, struct 
fuse_fs_context *ctx)
sb->s_subtype = ctx->subtype;
ctx->subtype = NULL;
if (IS_ENABLED(CONFIG_FUSE_DAX)) {
-   err = fuse_dax_conn_alloc(fc, ctx->dax_dev);
+   err = fuse_dax_conn_alloc(fc, ctx->dax_mode, ctx->dax_dev);
if (err)
goto err;
}
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 8f52cdaa8445..0050132e2787 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -88,12 +88,21 @@ struct virtio_fs_req_work {
 static int virtio_fs_enqueue_req(struct virtio_fs_vq *fsvq,
 struct fuse_req *req, bool in_flight);
 
+static const struct constant_table dax_param_enums[] = {
+   {"inode",   FUSE_DAX_INODE },
+   {"always",  FUSE_DAX_ALWAYS },
+   {"never",   FUSE_DAX_NEVER },
+   {}
+};
+
 enum {
OPT_DAX,
+   OPT_DAX_ENUM,
 };
 
 static const struct fs_parameter_spec virtio_fs_parameters[] = {
fsparam_flag("dax", OPT_DAX),
+   

[PATCH v3 7/8] fuse: support changing per-file DAX flag inside guest

2021-08-04 Thread Jeffle Xu
Fuse client can enable or disable per-file DAX inside kernel/guest by
chattr(1). Similarly the new state won't be updated until the file is
closed and reopened later.

Signed-off-by: Jeffle Xu 
---
 fs/fuse/ioctl.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/ioctl.c b/fs/fuse/ioctl.c
index 546ea3d58fb4..a9ed53c5dbd1 100644
--- a/fs/fuse/ioctl.c
+++ b/fs/fuse/ioctl.c
@@ -469,8 +469,6 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
if (fa->flags_valid) {
err = fuse_priv_ioctl(inode, ff, FS_IOC_SETFLAGS,
  , sizeof(flags));
-   if (err)
-   goto cleanup;
} else {
memset(, 0, sizeof(xfa));
xfa.fsx_xflags = fa->fsx_xflags;
@@ -483,6 +481,19 @@ int fuse_fileattr_set(struct user_namespace *mnt_userns,
  , sizeof(xfa));
}
 
+   if (err)
+   goto cleanup;
+
+   if (IS_ENABLED(CONFIG_FUSE_DAX)) {
+   bool newdax;
+
+   if (fa->flags_valid)
+   newdax = flags & FS_DAX_FL;
+   else
+   newdax = fa->fsx_xflags & FS_XFLAG_DAX;
+   fuse_dax_dontcache(inode, newdax);
+   }
+
 cleanup:
fuse_priv_ioctl_cleanup(inode, ff);
 
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 1/8] fuse: add fuse_should_enable_dax() helper

2021-08-04 Thread Jeffle Xu
This is in prep for following per-file DAX checking.

Signed-off-by: Jeffle Xu 
---
 fs/fuse/dax.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index 0e5407f48e6a..c6f4e82e65f3 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -1336,11 +1336,19 @@ static const struct address_space_operations 
fuse_dax_file_aops  = {
.invalidatepage = noop_invalidatepage,
 };
 
-void fuse_dax_inode_init(struct inode *inode)
+static bool fuse_should_enable_dax(struct inode *inode)
 {
struct fuse_conn *fc = get_fuse_conn(inode);
 
if (!fc->dax)
+   return false;
+
+   return true;
+}
+
+void fuse_dax_inode_init(struct inode *inode)
+{
+   if (!fuse_should_enable_dax(inode))
return;
 
inode->i_flags |= S_DAX;
-- 
2.27.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V4 2/2] gpio: virtio: Add IRQ support

2021-08-04 Thread Viresh Kumar
On 03-08-21, 17:01, Arnd Bergmann wrote:
> On Tue, Aug 3, 2021 at 1:36 PM Viresh Kumar  wrote:
> > +struct vgpio_irq_line {
> > +   u8 type;
> > +   bool masked;
> > +   bool update_pending;
> > +   bool queued;
> > +
> > +   struct virtio_gpio_irq_request ireq;
> > +   struct virtio_gpio_irq_response ires;
> > +};
> 
> I think the last two members should be marked as __cacheline_aligned,
> since they are transferred separately over DMA.

Right.

> > +static void virtio_gpio_irq_eoi(struct irq_data *d)
> > +{
> > +   /*
> > +* Queue buffers, by calling virtio_gpio_irq_prepare(), from
> > +* virtio_gpio_event_vq() itself, after taking into consideration 
> > the
> > +* masking status of the interrupt.
> > +*/
> > +}
> 
> Shouldn't this just requeue the interrupt? There is no reason to
> defer this, and I think we want the normal operation to not have
> to involve any scheduling.
> 
> > +static void virtio_gpio_irq_unmask(struct irq_data *d)
> > +{
> > +   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
> > +   struct virtio_gpio *vgpio = gpiochip_get_data(gc);
> > +   struct vgpio_irq_line *irq_line = >irq_lines[d->hwirq];
> > +
> > +   irq_line->masked = false;
> > +   irq_line->update_pending = true;
> > +}
> 
> Same here. unmask is probably less important, but it's the
> same operation: if you want interrupts to become active
> again when they are not, just queue the request

We can't because its a slow bus ? And unmask can be called from
irq-context. That's exactly why we had the irq_bus_lock/unlock
discussion earlier.

> > +static void virtio_gpio_irq_mask(struct irq_data *d)
> > +{
> > +   struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
> > +   struct virtio_gpio *vgpio = gpiochip_get_data(gc);
> > +   struct vgpio_irq_line *irq_line = >irq_lines[d->hwirq];
> > +
> > +   irq_line->masked = true;
> > +   irq_line->update_pending = true;
> > +}
> 
> This is of course the tricky bit, I was hoping you had found a solution
> based on what I wrote above for eio() and unmask().
> 
> > +static void vgpio_work_handler(struct work_struct *work)
> > +{
> > +   struct virtio_gpio *vgpio = container_of(work, struct virtio_gpio,
> > +work);
> > +   struct device *dev = >vdev->dev;
> > +   struct vgpio_irq_line *irq_line;
> > +   int irq, gpio, ret;
> > +   unsigned int len;
> > +
> > +   mutex_lock(>irq_lock);
> > +
> > +   while (true) {
> > +   irq_line = virtqueue_get_buf(vgpio->event_vq, );
> > +   if (!irq_line)
> > +   break;
> 
> Related to above, I think all the eventq handling should be moved into the
> virtio_gpio_event_vq() function directly.

You mean without scheduling a work ?

> > +   /* The interrupt may have been disabled by now */
> > +   if (irq_line->update_pending && irq_line->masked)
> > +   update_irq_type(vgpio, gpio, 
> > VIRTIO_GPIO_IRQ_TYPE_NONE);
> 
> This is a part I'm not sure about, and I suppose it's the same part that
> Marc was also confused by.
> 
> As far as I can tell, the update_irq_type() message would lead to the
> interrupt getting delivered when it was armed and is now getting disabled,
> but I don't see why we would call update_irq_type() as a result of the
> eventq notification.

Lemme try to explain answer to all other question together here.

The irq related functions get called in two scenarios:

- request_irq() or irq_set_irq_type(), enable/disable_irq(), etc:

  The call sequence here is like this:

  ->irq_bus_lock()

  ->spin-lock-irqsave
  ->irq_mask()/irq_unmask()/irq_set_type()..
  ->spin-unlock-irqsave

  ->irq_bus_unlock()


  So the mask/unmask/set-type routines can't issue virtio requests and
  we need to do that from irq_bus_unlock(). This shall answer your
  question about mask/unmask, right ? Or maybe I misunderstood them
  then ?


- Interrupt: i.e. buffer sent back by the host over virtio.

  virtio_gpio_event_vq() schedules a work item, which processes the
  items from the eventq virtqueue and eventually calls
  generic_handle_irq(). The irq-core can issue calls to
  ->irq_mask/unmask() here without a prior call to
  irq_bus_lock/unlock(), normally they will balance out by the end,
  but I am not sure if it is guaranteed. Moreover, interrupt should be
  re-enabled only after unmask() is called (for ONESHOT) and not at
  EOI, right ?

  I chose not to queue the buffers back from eoi() as it is possible
  that we won't want to queue them at all, as the interrupt needs to
  be disabled by the time generic_handle_irq() returns. And so I did
  everything from the end of vgpio_work_handler()'s loop, i.e. either
  disable the interrupts with VIRTIO_GPIO_IRQ_TYPE_NONE or enable the
  interrupt again by re-queuing the buffer.

Regarding irq handling using work-item, I had to move to that 

Re: [PATCH v2 0/4] virtiofs,fuse: support per-file DAX

2021-08-04 Thread JeffleXu



On 7/21/21 10:42 PM, Vivek Goyal wrote:
> On Wed, Jul 21, 2021 at 08:48:57AM -0400, Vivek Goyal wrote:
> [..]
 So is "dax=inode" enough for your needs? What's your requirement,
 can you give little bit of more details.
>>>
>>> In our use case, the backend fs is something like SquashFS on host. The
>>> content of the file on host is downloaded *as needed*. When the file is
>>> not completely ready (completely downloaded), the guest will follow the
>>> normal IO routine, i.e., by FUSE_READ/FUSE_WRITE request. While the file
>>> is completely ready, per-file DAX is enabled for this file. IOW the FUSE
>>> server need to dynamically decide if per-file DAX shall be enabled,
>>> depending on if the file is completely downloaded.
>>
>> So you don't want to enable DAX yet because guest might fault on
>> a section of file which has not been downloaded yet?
>>
>> I am wondering if somehow user fault handling can help with this.
>> If we could handle faults for this file in user space, then you
>> should be able to download that particular page[s] and resolve
>> the fault?
> 
> Stefan mentioned that can't we block when fuse mmap request comes
> in and download corresponding section of file. Or do whatever you
> are doing in FUSE_READ. 
> 
> IOW, even if you enable dax in your use case on all files,
> FUSE_SETUPMAPPING request will give you control to make sure 
> file section being mmaped has been downloaded.
> 

Sorry for the late reply. I missed this mail as it is classified into
the mailing list folder.

The idea you mentioned may works. Anyway, the implementation details of
the FUSE server is not strongly binding to the FUSE protocol changes in
kernel. The protocol only requires that FUSE client shall be able to
store FS_DAX_FL attr persistently in *some way*. The changes in kernel
shall be general, no matter whether the FUSE server is FS_DAX_FL attr
based or something else.


-- 
Thanks,
Jeffle
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization