Re: [RFC 0/2] virtio-pmem: Asynchronous flush

2021-03-11 Thread Dan Williams
On Thu, Mar 11, 2021 at 8:21 PM Pankaj Gupta
 wrote:
>
> Hi David,
>
> > >   Jeff reported preflush order issue with the existing implementation
> > >   of virtio pmem preflush. Dan suggested[1] to implement asynchronous 
> > > flush
> > >   for virtio pmem using work queue as done in md/RAID. This patch series
> > >   intends to solve the preflush ordering issue and also makes the flush
> > >   asynchronous from the submitting thread POV.
> > >
> > >   Submitting this patch series for feeback and is in WIP. I have
> > >   done basic testing and currently doing more testing.
> > >
> > > Pankaj Gupta (2):
> > >pmem: make nvdimm_flush asynchronous
> > >virtio_pmem: Async virtio-pmem flush
> > >
> > >   drivers/nvdimm/nd_virtio.c   | 66 ++--
> > >   drivers/nvdimm/pmem.c| 15 
> > >   drivers/nvdimm/region_devs.c |  3 +-
> > >   drivers/nvdimm/virtio_pmem.c |  9 +
> > >   drivers/nvdimm/virtio_pmem.h | 12 +++
> > >   5 files changed, 78 insertions(+), 27 deletions(-)
> > >
> > > [1] https://marc.info/?l=linux-kernel=157446316409937=2
> > >
> >
> > Just wondering, was there any follow up of this or are we still waiting
> > for feedback? :)
>
> Thank you for bringing this up.
>
> My apologies I could not followup on this. I have another version in my local
> tree but could not post it as I was not sure if I solved the problem
> correctly. I will
> clean it up and post for feedback as soon as I can.
>
> P.S: Due to serious personal/family health issues I am not able to
> devote much time
> on this with other professional commitments. I feel bad that I have
> this unfinished task.
> Just in last one year things have not been stable for me & my family
> and still not getting :(

No worries Pankaj. Take care of yourself and your family. The
community can handle this for you. I'm open to coaching somebody
through what's involved to get this fix landed.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages

2021-03-11 Thread Dan Williams
On Wed, Mar 10, 2021 at 10:13 AM Joao Martins  wrote:
>
> On 2/22/21 8:37 PM, Dan Williams wrote:
> > On Mon, Feb 22, 2021 at 3:24 AM Joao Martins  
> > wrote:
> >> On 2/20/21 1:43 AM, Dan Williams wrote:
> >>> On Tue, Dec 8, 2020 at 9:59 PM John Hubbard  wrote:
>  On 12/8/20 9:28 AM, Joao Martins wrote:
> >> One thing to point out about altmap is that the degradation (in pinning and
> >> unpining) we observed with struct page's in device memory, is no longer 
> >> observed
> >> once 1) we batch ref count updates as we move to compound pages 2) reusing
> >> tail pages seems to lead to these struct pages staying more likely in cache
> >> which perhaps contributes to dirtying a lot less cachelines.
> >
> > True, it makes it more palatable to survive 'struct page' in PMEM,
>
> I want to retract for now what I said above wrt to the no degradation with
> struct page in device comment. I was fooled by a bug on a patch later down
> this series. Particular because I accidentally cleared PGMAP_ALTMAP_VALID when
> unilaterally setting PGMAP_COMPOUND, which consequently lead to always
> allocating struct pages from memory. No wonder the numbers were just as fast.
> I am still confident that it's going to be faster and observe less degradation
> in pinning/init. Init for now is worst-case 2x faster. But to be *as fast* 
> struct
> pages in memory might still be early to say.
>
> The broken masking of the PGMAP_ALTMAP_VALID bit did hide one flaw, where
> we don't support altmap for basepages on x86/mm and it apparently depends
> on architectures to implement it (and a couple other issues). The vmemmap
> allocation isn't the problem, so the previous comment in this thread that
> altmap doesn't change much in the vmemmap_populate_compound_pages() is
> still accurate.
>
> The problem though resides on the freeing of vmemmap pagetables with
> basepages *with altmap* (e.g. at dax-device teardown) which require arch
> support. Doing it properly would mean making the altmap reserve smaller
> (given fewer pages are allocated), and the ability for the altmap pfn
> allocator to track references per pfn. But I think it deserves its own
> separate patch series (probably almost just as big).
>
> Perhaps for this set I can stick without altmap as you suggested, and
> use hugepage vmemmap population (which wouldn't
> lead to device memory savings) instead of reusing base pages . I would
> still leave the compound page support logic as metadata representation
> for > 4K @align, as I think that's the right thing to do. And then
> a separate series onto improving altmap to leverage the metadata reduction
> support as done with non-device struct pages.
>
> Thoughts?

The space savings is the whole point. So I agree with moving altmap
support to a follow-on enhancement, but land the non-altmap basepage
support in the first round.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [RFC 0/2] virtio-pmem: Asynchronous flush

2021-03-11 Thread Pankaj Gupta
Hi David,

> >   Jeff reported preflush order issue with the existing implementation
> >   of virtio pmem preflush. Dan suggested[1] to implement asynchronous flush
> >   for virtio pmem using work queue as done in md/RAID. This patch series
> >   intends to solve the preflush ordering issue and also makes the flush
> >   asynchronous from the submitting thread POV.
> >
> >   Submitting this patch series for feeback and is in WIP. I have
> >   done basic testing and currently doing more testing.
> >
> > Pankaj Gupta (2):
> >pmem: make nvdimm_flush asynchronous
> >virtio_pmem: Async virtio-pmem flush
> >
> >   drivers/nvdimm/nd_virtio.c   | 66 ++--
> >   drivers/nvdimm/pmem.c| 15 
> >   drivers/nvdimm/region_devs.c |  3 +-
> >   drivers/nvdimm/virtio_pmem.c |  9 +
> >   drivers/nvdimm/virtio_pmem.h | 12 +++
> >   5 files changed, 78 insertions(+), 27 deletions(-)
> >
> > [1] https://marc.info/?l=linux-kernel=157446316409937=2
> >
>
> Just wondering, was there any follow up of this or are we still waiting
> for feedback? :)

Thank you for bringing this up.

My apologies I could not followup on this. I have another version in my local
tree but could not post it as I was not sure if I solved the problem
correctly. I will
clean it up and post for feedback as soon as I can.

P.S: Due to serious personal/family health issues I am not able to
devote much time
on this with other professional commitments. I feel bad that I have
this unfinished task.
Just in last one year things have not been stable for me & my family
and still not getting :(

Best regards,
Pankaj
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Shipping Documents: PEG 200_Global Industries - PO No.4501088971 under B/L No. EGLV001100116033_ETA: March 08, 2021

2021-03-11 Thread Logistics
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [RFC 0/2] virtio-pmem: Asynchronous flush

2021-03-11 Thread David Hildenbrand

On 20.04.20 15:19, Pankaj Gupta wrote:

  Jeff reported preflush order issue with the existing implementation
  of virtio pmem preflush. Dan suggested[1] to implement asynchronous flush
  for virtio pmem using work queue as done in md/RAID. This patch series
  intends to solve the preflush ordering issue and also makes the flush
  asynchronous from the submitting thread POV.

  Submitting this patch series for feeback and is in WIP. I have
  done basic testing and currently doing more testing.

Pankaj Gupta (2):
   pmem: make nvdimm_flush asynchronous
   virtio_pmem: Async virtio-pmem flush

  drivers/nvdimm/nd_virtio.c   | 66 ++--
  drivers/nvdimm/pmem.c| 15 
  drivers/nvdimm/region_devs.c |  3 +-
  drivers/nvdimm/virtio_pmem.c |  9 +
  drivers/nvdimm/virtio_pmem.h | 12 +++
  5 files changed, 78 insertions(+), 27 deletions(-)

[1] https://marc.info/?l=linux-kernel=157446316409937=2



Just wondering, was there any follow up of this or are we still waiting 
for feedback? :)


--
Thanks,

David / dhildenb
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Can you please help with the failure to create namespace via Ndctl issue?

2021-03-11 Thread Xu, Chunye
Hi Vishal Verma,

Now there is an urgent issue from Alibaba - AEP namespace disable issue 
https://hsdes.intel.com/appstore/article/#/22012576039.

Here attached the log of ndctl cmd and you can find several error prompting 
info inside.

You can see it fails to recreate namespace again after destroying the namespace 
with mode being devdax.
]0;root@cloud-dev-benz: ~root@cloud-dev-benz:~# ndctl create-namespace -r 
region0 --mode=devdax
libndctl: ndctl_dax_enable: dax0.1: failed to enable
  Error: namespace0.0: failed to enable

Can you please help to check with priority as it blocks Alibaba's deployment?

Many thanks,
Chunye
Script started on 2021-03-10 16:18:12+00:00 [TERM="xterm" TTY="/dev/pts/0" 
COLUMNS="149" LINES="43"]
]0;root@cloud-dev-benz: ~root@cloud-dev-benz:~# ipmctl show -memoryresources

 MemoryType   | DDR | DCPMM   | Total   



 Volatile | 192.000 GiB | 0.000 GiB   | 192.000 GiB

 AppDirect| -   | 504.000 GiB | 504.000 GiB

 Cache| 0.000 GiB   | -   | 0.000 GiB

 Inaccessible | -   | 1.689 GiB   | 1.689 GiB

 Physical | 192.000 GiB | 505.689 GiB | 697.689 GiB

]0;root@cloud-dev-benz: ~root@cloud-dev-benz:~# ndctl create-namespace -r 
region0 --mode=devdax

{

  "dev":"namespace0.0",

  "mode":"devdax",

  "map":"dev",

  "size":"248.06 GiB (266.35 GB)",

  "uuid":"9ad75b6c-52f1-4c58-94e7-a59b937a0a3e",

  "daxregion":{

"id":0,

"size":"248.06 GiB (266.35 GB)",

"align":2097152,

"devices":[

  {

"chardev":"dax0.0",

"size":"248.06 GiB (266.35 GB)",

"target_node":2,

"mode":"devdax"

  }

]

  },

  "align":2097152

}

]0;root@cloud-dev-benz: ~root@cloud-dev-benz:~# ndctl list --regions 
--namespaces --human --buses

{

  "provider":"ACPI.NFIT",

  "dev":"ndbus0",

  "scrub_state":"idle",

  "regions":[

{

  "dev":"region1",

  "size":"252.00 GiB (270.58 GB)",

  "available_size":"252.00 GiB (270.58 GB)",

  "max_available_extent":"252.00 GiB (270.58 GB)",

  "type":"pmem",

  "iset_id":"0xbcd6eeb86a8f2444",

  "persistence_domain":"memory_controller"

},

{

  "dev":"region0",

  "size":"252.00 GiB (270.58 GB)",

  "available_size":0,

  "max_available_extent":0,

  "type":"pmem",

  "iset_id":"0x1c72eeb8e48b2444",

  "persistence_domain":"memory_controller",

  "namespaces":[

{

  "dev":"namespace0.0",

  "mode":"devdax",

  "map":"dev",

  "size":"248.06 GiB (266.35 GB)",

  "uuid":"9ad75b6c-52f1-4c58-94e7-a59b937a0a3e",

  "chardev":"dax0.0",

  "align":2097152

}

  ]

}

  ]

}

]0;root@cloud-dev-benz: ~root@cloud-dev-benz:~# daxctl reconfigure-device 
dax0.0 --mode=system-ram

dax0.0:

  WARNING: detected a race while onlining memory

  Some memory may not be in the expected zone. It is

  recommended to disable any other onlining mechanisms,

  and retry. If onlining is to be left to other agents,

  use the --no-online option to suppress this warning

dax0.0: all memory sections (248) already online

[

  {

"chardev":"dax0.0",

"size":266352984064,

"target_node":2,

"mode":"system-ram",

"movable":false

  }

]

reconfigured 1 device

]0;root@cloud-dev-benz: ~root@cloud-dev-benz:~# mount -t tmpfs -o 
size=4g,mpol=bind:2 tmpfs /mnt/pmem0

]0;root@cloud-dev-benz: ~root@cloud-dev-benz:~# ndctl list --regions 
--namespaces --human --buses

{

  "provider":"ACPI.NFIT",

  "dev":"ndbus0",

  "scrub_state":"idle",

  "regions":[

{

  "dev":"region1",

  "size":"252.00 GiB (270.58 GB)",

  "available_size":"252.00 GiB (270.58 GB)",

  "max_available_extent":"252.00 GiB (270.58 GB)",

  "type":"pmem",

  "iset_id":"0xbcd6eeb86a8f2444",

  "persistence_domain":"memory_controller"

},

{

  "dev":"region0",

  "size":"252.00 GiB (270.58 GB)",

  "available_size":0,

  "max_available_extent":0,

  "type":"pmem",

  "iset_id":"0x1c72eeb8e48b2444",

  "persistence_domain":"memory_controller",

  "namespaces":[

{

  "dev":"namespace0.0",

  "mode":"devdax",

  "map":"dev",

  "size":"248.06 GiB (266.35 GB)",

  "uuid":"9ad75b6c-52f1-4c58-94e7-a59b937a0a3e",

  "chardev":"dax0.0",

  "align":2097152

}

  ]

}

  ]

}

]0;root@cloud-dev-benz: ~root@cloud-dev-benz:~# ndctl disable-namespace 
namespace0.0

disabled 1 namespace

]0;root@cloud-dev-benz: ~root@cloud-dev-benz:~# ndctl destroy-namespace -f 
namespace0.0

destroyed 1 namespace

]0;root@cloud-dev-benz: ~root@cloud-dev-benz:~# ndctl list -NR 

[

  {

"dev":"region1",

"size":270582939648,

"available_size":270582939648,

"max_available_extent":270582939648,

"type":"pmem",


Re: [RESEND PATCH v2.1 07/10] iomap: Introduce iomap_apply2() for operations on two files

2021-03-11 Thread Christoph Hellwig
On Thu, Mar 04, 2021 at 01:41:42PM +0800, Shiyang Ruan wrote:
> Some operations, such as comparing a range of data in two files under
> fsdax mode, requires nested iomap_open()/iomap_end() on two file.  Thus,
> we introduce iomap_apply2() to accept arguments from two files and
> iomap_actor2_t for actions on two files.

I still wonder if adding the iter based iomap API that willy proposed
would be a better fit here.  In that case we might not even need
a special API for the double iteration.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 00/10] fsdax,xfs: Add reflink support for fsdax

2021-03-11 Thread Neal Gompa
On Wed, Mar 10, 2021 at 7:53 PM Dan Williams  wrote:
>
> On Wed, Mar 10, 2021 at 6:27 AM Matthew Wilcox  wrote:
> >
> > On Wed, Mar 10, 2021 at 08:21:59AM -0600, Goldwyn Rodrigues wrote:
> > > On 13:02 10/03, Matthew Wilcox wrote:
> > > > On Wed, Mar 10, 2021 at 07:30:41AM -0500, Neal Gompa wrote:
> > > > > Forgive my ignorance, but is there a reason why this isn't wired up to
> > > > > Btrfs at the same time? It seems weird to me that adding a feature
> > > >
> > > > btrfs doesn't support DAX.  only ext2, ext4, XFS and FUSE have DAX 
> > > > support.
> > > >
> > > > If you think about it, btrfs and DAX are diametrically opposite things.
> > > > DAX is about giving raw access to the hardware.  btrfs is about offering
> > > > extra value (RAID, checksums, ...), none of which can be done if the
> > > > filesystem isn't in the read/write path.
> > > >
> > > > That's why there's no DAX support in btrfs.  If you want DAX, you have
> > > > to give up all the features you like in btrfs.  So you may as well use
> > > > a different filesystem.
> > >
> > > DAX on btrfs has been attempted[1]. Of course, we could not
> >
> > But why?  A completeness fetish?  I don't understand why you decided
> > to do this work.
>
> Isn't DAX useful for pagecache minimization on read even if it is
> awkward for a copy-on-write fs?
>
> Seems it would be a useful case to have COW'd VM images on BTRFS that
> don't need superfluous page cache allocations.

I could also see this being useful for databases (and maybe even swap
files!) on Btrfs, if I'm understanding this feature correctly.


-- 
真実はいつも一つ!/ Always, there's only one truth!
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org