Re: [PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-17 Thread Andrzej Jakowski
On 3/17/20 4:23 AM, Stefan Hajnoczi wrote:
>> Code is posted here
>> https://github.com/AndrzejJakowski/qemu/commit/3a7762a1d13ff1543d1da430748eb24e38faab6f
>>
>> QEMU command line:
>>
>> # below are just relevant pieces of configuration, other stuff omitted
>> # tried different setting (e.g. pmem=on and pmem=off)
>>
>> ./x86_64-softmmu/qemu-system-x86_64 ... \
>> -object 
>> memory-backend-file,id=mem1,share=off,pmem=on,mem-path=../nvme_pmr.bin,size=$((1*1024*1024))
>>  \
> share=off is MAP_PRIVATE.  If persistence is desired then share=on
> should be used.
> 
> However, this shouldn't affect "system_reset" behavior since the QEMU
> process still has the same mapped file open.
> 

Hi Stefan,

Thx!! share=off setting was the problem. I confirmed with my simple test
that persistence is achieved.
I didn't find API to perform flush (msync). Any suggestion what function to use?

Given that host memory backend is working I think my patch is almost ready for 
resubmission -- let me know if there are any other comments.

Andrzej

>> -drive file=../nvme.bin,format=raw,if=none,id=nvme_emulated \
>> -device nvme,drive=nvme_emulated,serial="test serial",pmrdev=mem1
>>
>> In VM:
>> My persisent memory region is exposed PCI BAR
>> Region 2: Memory at fe00 (64-bit, prefetchable) [size=1M]
>>
>> So I perform reads/writes from/to following adress 0xfe00 (decimal 
>> 4261412864)
>>
>> dd if=test.bin of=/dev/mem bs=1 count=30 seek=4261412864
>> dd if=/dev/mem of=test1.bin bs=1 count=30 skip=4261412864
> Did you verify that the guest kernel is really accessing the BAR?  I
> remember that distro kernels often ship with options that make
> /dev/mem of limited use because it's considered insecure.
> 
>> On VMM I didn't observe that backing file has been updated and after power 
>> cycling VM
>> I see old junk when reading PMR region.
> Did you check that the pmrdev mmap region contains the data the guest
> wrote before power cycling?
> 
>> Also from include/qemu/pmem.h it looks like pmem_persist() will cause qemu 
>> to exit
>> if libpmem is not installed:
> The libpmem support only needs to be used when the pmem=on option was
> given.  If there isn't a physical pmem device then it doesn't need to
> be used.
> 
> Stefan




Re: [PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-17 Thread Stefan Hajnoczi
On Mon, Mar 16, 2020 at 5:10 PM Andrzej Jakowski
 wrote:
> On 3/16/20 4:32 AM, Stefan Hajnoczi wrote:
> > On Wed, Mar 11, 2020 at 11:08:27PM -0700, Klaus Birkelund Jensen wrote:
> >> On Mar 11 15:54, Andrzej Jakowski wrote:
> >>> On 3/11/20 2:20 AM, Stefan Hajnoczi wrote:
>  Please try:
> 
>    $ git grep pmem
> 
>  backends/hostmem-file.c is the backend that can be used and the
>  pmem_persist() API can be used to flush writes.
> >>> I've reworked this patch into hostmem-file type of backend.
> >>> From simple tests in virtual machine: writing to PMR region
> >>> and then reading from it after VM power cycle I have observed that
> >>> there is no persistency.
> > Sounds like an integration bug.  QEMU's NVDIMM emulation uses
> > HostMemoryBackend and file contents survive guest reboot.
> >
> > If you would like help debugging this, please post a link to the code
> > and the command-line that you are using.
> >
>
> Code is posted here
> https://github.com/AndrzejJakowski/qemu/commit/3a7762a1d13ff1543d1da430748eb24e38faab6f
>
> QEMU command line:
>
> # below are just relevant pieces of configuration, other stuff omitted
> # tried different setting (e.g. pmem=on and pmem=off)
>
> ./x86_64-softmmu/qemu-system-x86_64 ... \
> -object 
> memory-backend-file,id=mem1,share=off,pmem=on,mem-path=../nvme_pmr.bin,size=$((1*1024*1024))
>  \

share=off is MAP_PRIVATE.  If persistence is desired then share=on
should be used.

However, this shouldn't affect "system_reset" behavior since the QEMU
process still has the same mapped file open.

> -drive file=../nvme.bin,format=raw,if=none,id=nvme_emulated \
> -device nvme,drive=nvme_emulated,serial="test serial",pmrdev=mem1
>
> In VM:
> My persisent memory region is exposed PCI BAR
> Region 2: Memory at fe00 (64-bit, prefetchable) [size=1M]
>
> So I perform reads/writes from/to following adress 0xfe00 (decimal 
> 4261412864)
>
> dd if=test.bin of=/dev/mem bs=1 count=30 seek=4261412864
> dd if=/dev/mem of=test1.bin bs=1 count=30 skip=4261412864

Did you verify that the guest kernel is really accessing the BAR?  I
remember that distro kernels often ship with options that make
/dev/mem of limited use because it's considered insecure.

> On VMM I didn't observe that backing file has been updated and after power 
> cycling VM
> I see old junk when reading PMR region.

Did you check that the pmrdev mmap region contains the data the guest
wrote before power cycling?

> Also from include/qemu/pmem.h it looks like pmem_persist() will cause qemu to 
> exit
> if libpmem is not installed:

The libpmem support only needs to be used when the pmem=on option was
given.  If there isn't a physical pmem device then it doesn't need to
be used.

Stefan



Re: [PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-16 Thread Andrzej Jakowski
On 3/16/20 4:32 AM, Stefan Hajnoczi wrote:
> On Wed, Mar 11, 2020 at 11:08:27PM -0700, Klaus Birkelund Jensen wrote:
>> On Mar 11 15:54, Andrzej Jakowski wrote:
>>> On 3/11/20 2:20 AM, Stefan Hajnoczi wrote:
 Please try:

   $ git grep pmem

 backends/hostmem-file.c is the backend that can be used and the
 pmem_persist() API can be used to flush writes.
>>> I've reworked this patch into hostmem-file type of backend.
>>> From simple tests in virtual machine: writing to PMR region
>>> and then reading from it after VM power cycle I have observed that
>>> there is no persistency.
> Sounds like an integration bug.  QEMU's NVDIMM emulation uses
> HostMemoryBackend and file contents survive guest reboot.
> 
> If you would like help debugging this, please post a link to the code
> and the command-line that you are using.
> 

Code is posted here
https://github.com/AndrzejJakowski/qemu/commit/3a7762a1d13ff1543d1da430748eb24e38faab6f

QEMU command line:

# below are just relevant pieces of configuration, other stuff omitted
# tried different setting (e.g. pmem=on and pmem=off)

./x86_64-softmmu/qemu-system-x86_64 ... \
-object 
memory-backend-file,id=mem1,share=off,pmem=on,mem-path=../nvme_pmr.bin,size=$((1*1024*1024))
 \
-drive file=../nvme.bin,format=raw,if=none,id=nvme_emulated \
-device nvme,drive=nvme_emulated,serial="test serial",pmrdev=mem1 

In VM:
My persisent memory region is exposed PCI BAR
Region 2: Memory at fe00 (64-bit, prefetchable) [size=1M]

So I perform reads/writes from/to following adress 0xfe00 (decimal 
4261412864)

dd if=test.bin of=/dev/mem bs=1 count=30 seek=4261412864
dd if=/dev/mem of=test1.bin bs=1 count=30 skip=4261412864

On VMM I didn't observe that backing file has been updated and after power 
cycling VM 
I see old junk when reading PMR region.

Also from include/qemu/pmem.h it looks like pmem_persist() will cause qemu to 
exit 
if libpmem is not installed:

#ifndef QEMU_PMEM_H
#define QEMU_PMEM_H

#ifdef CONFIG_LIBPMEM
#include 
#else  /* !CONFIG_LIBPMEM */

static inline void *
pmem_memcpy_persist(void *pmemdest, const void *src, size_t len)
{
/* If 'pmem' option is 'on', we should always have libpmem support,
   or qemu will report a error and exit, never come here. */
g_assert_not_reached();
return NULL;
}

static inline void
pmem_persist(const void *addr, size_t len)
{
g_assert_not_reached();
}

#endif /* CONFIG_LIBPMEM */

#endif /* QEMU_PMEM_H */ 



Re: [PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-16 Thread Stefan Hajnoczi
On Wed, Mar 11, 2020 at 11:08:27PM -0700, Klaus Birkelund Jensen wrote:
> On Mar 11 15:54, Andrzej Jakowski wrote:
> > On 3/11/20 2:20 AM, Stefan Hajnoczi wrote:
> > > Please try:
> > > 
> > >   $ git grep pmem
> > > 
> > > backends/hostmem-file.c is the backend that can be used and the
> > > pmem_persist() API can be used to flush writes.
> > 
> > I've reworked this patch into hostmem-file type of backend.
> > From simple tests in virtual machine: writing to PMR region
> > and then reading from it after VM power cycle I have observed that
> > there is no persistency.

Sounds like an integration bug.  QEMU's NVDIMM emulation uses
HostMemoryBackend and file contents survive guest reboot.

If you would like help debugging this, please post a link to the code
and the command-line that you are using.

> > I guess that persistent behavior can be achieved if memory backend file
> > resides on actual persistent memory in VMM. I haven't found mechanism to
> > persist memory backend file when it resides in the file system on block
> > storage. My original mmap + msync based solution worked well there.
> > I believe that main problem with mmap was with "ifdef _WIN32" that made it 
> > platform specific and w/o it patchew CI complained. 
> > Is there a way that I could rework mmap + msync solution so it would fit
> > into qemu design?
> > 
> 
> Hi Andrzej,
> 
> Thanks for working on this!
> 
> FWIW, I have implemented other stuff for the NVMe device that requires
> persistent storage (e.g. LBA allocation tracking for DULBE support). I
> used the approach of adding an additional blockdev and simply use the
> qemu block layer. This would also make it work on WIN32. And if we just
> set bit 0 in PMRWBM and disable the write cache on the blockdev we
> should be good on the durability requirements.
>
> Unfortunately, I do not see (or know, maybe Stefan has an idea?) an easy
> way of using the MemoryRegionOps nicely with async block backend i/o. so
> we either have to use blocking I/O or fire and forget aio. Or, we can
> maybe keep bit 1 set in PMRWBM and force a blocking blk_flush on PMRSTS
> read.

QEMU's block layer does not support persistent memory semantics and
doesn't support mmap.  It's fine for storing state from device emulation
code, but if the guest itself requires memory load/store access to the
data then the QEMU block layer does not provide that.

For PMR I think HostMemoryBackend is the best fit.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-12 Thread Klaus Birkelund Jensen
On Mar 11 15:54, Andrzej Jakowski wrote:
> On 3/11/20 2:20 AM, Stefan Hajnoczi wrote:
> > Please try:
> > 
> >   $ git grep pmem
> > 
> > backends/hostmem-file.c is the backend that can be used and the
> > pmem_persist() API can be used to flush writes.
> 
> I've reworked this patch into hostmem-file type of backend.
> From simple tests in virtual machine: writing to PMR region
> and then reading from it after VM power cycle I have observed that
> there is no persistency.
> 
> I guess that persistent behavior can be achieved if memory backend file
> resides on actual persistent memory in VMM. I haven't found mechanism to
> persist memory backend file when it resides in the file system on block
> storage. My original mmap + msync based solution worked well there.
> I believe that main problem with mmap was with "ifdef _WIN32" that made it 
> platform specific and w/o it patchew CI complained. 
> Is there a way that I could rework mmap + msync solution so it would fit
> into qemu design?
> 

Hi Andrzej,

Thanks for working on this!

FWIW, I have implemented other stuff for the NVMe device that requires
persistent storage (e.g. LBA allocation tracking for DULBE support). I
used the approach of adding an additional blockdev and simply use the
qemu block layer. This would also make it work on WIN32. And if we just
set bit 0 in PMRWBM and disable the write cache on the blockdev we
should be good on the durability requirements.

Unfortunately, I do not see (or know, maybe Stefan has an idea?) an easy
way of using the MemoryRegionOps nicely with async block backend i/o. so
we either have to use blocking I/O or fire and forget aio. Or, we can
maybe keep bit 1 set in PMRWBM and force a blocking blk_flush on PMRSTS
read.

Finally, a thing to consider is that this is adding an optional NVMe 1.4
feature to an already frankenstein device that doesn't even implement
mandatory v1.2. I think that bumping the NVMe version to 1.4 is out of
the question until we actually implement it fully wrt. mandatory
features. My patchset brings the device up to v1.3 and I have v1.4 ready
for posting, so I think we can get there.


Klaus



Re: [PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-11 Thread Andrzej Jakowski
On 3/11/20 2:20 AM, Stefan Hajnoczi wrote:
> Please try:
> 
>   $ git grep pmem
> 
> backends/hostmem-file.c is the backend that can be used and the
> pmem_persist() API can be used to flush writes.

I've reworked this patch into hostmem-file type of backend.
>From simple tests in virtual machine: writing to PMR region
and then reading from it after VM power cycle I have observed that
there is no persistency.

I guess that persistent behavior can be achieved if memory backend file
resides on actual persistent memory in VMM. I haven't found mechanism to
persist memory backend file when it resides in the file system on block
storage. My original mmap + msync based solution worked well there.
I believe that main problem with mmap was with "ifdef _WIN32" that made it 
platform specific and w/o it patchew CI complained. 
Is there a way that I could rework mmap + msync solution so it would fit
into qemu design?




Re: [PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-11 Thread Andrzej Jakowski
On 3/11/20 2:20 AM, Stefan Hajnoczi wrote:
> Oh, I think I see what you mean.  That is not how the term
> "preallocated" is usually used in POSIX file systems.  File systems
> have sparse files by default and the term preallocation is used in the
> context of fadvise(2) for reserving space.
> 
> In this case I think you're saying the file cannot grow.  That is
> implicit since the BAR can't grow either so you could drop the comment
> about preallocation.

Yes, there is no need to have file preallocated in POSIX meaning. Actaul
requirement is to have file that is multiple of MiB and power-of-two in size.
User may (but may not need to) use fallocate/fadvise to fulfill this 
requirement.



Re: [PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-11 Thread Stefan Hajnoczi
On Tue, Mar 10, 2020 at 8:09 PM Andrzej Jakowski
 wrote:
> On 3/10/20 2:51 AM, Stefan Hajnoczi wrote:
> > On Fri, Mar 06, 2020 at 03:38:53PM -0700, Andrzej Jakowski wrote:
> >> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> >> index d28335cbf3..ff7e74d765 100644
> >> --- a/hw/block/nvme.c
> >> +++ b/hw/block/nvme.c
> >> @@ -19,10 +19,14 @@
> >>   *  -drive file=,if=none,id=
> >>   *  -device nvme,drive=,serial=,id=, \
> >>   *  cmb_size_mb=, \
> >> + *  [pmr_file=,] \
> >>   *  num_queues=
> >>   *
> >>   * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
> >>   * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
> >> + *
> >> + * Either cmb or pmr - due to limitation in avaialbe BAR indexes.
> >
> > s/avaialbe/available/
> >
> >> + * pmr_file file needs to be preallocated and power of two in size.
> >
> > Why does it need to be preallocated?
>
> PMR file is mmaped into address space. If memory accesses are made outside of
> file then SIGBUS signal is raised. Preallocation requirement was introduced
> to prevent this situation.

Oh, I think I see what you mean.  That is not how the term
"preallocated" is usually used in POSIX file systems.  File systems
have sparse files by default and the term preallocation is used in the
context of fadvise(2) for reserving space.

In this case I think you're saying the file cannot grow.  That is
implicit since the BAR can't grow either so you could drop the comment
about preallocation.

> >
> >>   */
> >>
> >>  #include "qemu/osdep.h"
> >> @@ -1141,6 +1145,28 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr 
> >> offset, uint64_t data,
> >>  NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly,
> >> "invalid write to read only CMBSZ, ignored");
> >>  return;
> >> +#ifndef _WIN32
> >
> > This ifdef is a hint that the layering is not right.  QEMU device models
> > usually only implement the "frontend" device registers, interrupts, and
> > request processing logic.  The platform-specific host "backend"
> > (mmapping files, sending network packets, audio/graphics APIs, etc) is
> > implemented separately.
>
> Agree. I couldn't find QEMU backend ensuring persistence - thus decided to
> go with mmap.

Please try:

  $ git grep pmem

backends/hostmem-file.c is the backend that can be used and the
pmem_persist() API can be used to flush writes.



Re: [PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-10 Thread Andrzej Jakowski
On 3/10/20 2:51 AM, Stefan Hajnoczi wrote:
> On Fri, Mar 06, 2020 at 03:38:53PM -0700, Andrzej Jakowski wrote:
>> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
>> index d28335cbf3..ff7e74d765 100644
>> --- a/hw/block/nvme.c
>> +++ b/hw/block/nvme.c
>> @@ -19,10 +19,14 @@
>>   *  -drive file=,if=none,id=
>>   *  -device nvme,drive=,serial=,id=, \
>>   *  cmb_size_mb=, \
>> + *  [pmr_file=,] \
>>   *  num_queues=
>>   *
>>   * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
>>   * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
>> + *
>> + * Either cmb or pmr - due to limitation in avaialbe BAR indexes.
> 
> s/avaialbe/available/
> 
>> + * pmr_file file needs to be preallocated and power of two in size.
> 
> Why does it need to be preallocated?

PMR file is mmaped into address space. If memory accesses are made outside of 
file then SIGBUS signal is raised. Preallocation requirement was introduced
to prevent this situation.

> 
>>   */
>>  
>>  #include "qemu/osdep.h"
>> @@ -1141,6 +1145,28 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr 
>> offset, uint64_t data,
>>  NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly,
>> "invalid write to read only CMBSZ, ignored");
>>  return;
>> +#ifndef _WIN32
> 
> This ifdef is a hint that the layering is not right.  QEMU device models
> usually only implement the "frontend" device registers, interrupts, and
> request processing logic.  The platform-specific host "backend"
> (mmapping files, sending network packets, audio/graphics APIs, etc) is
> implemented separately.

Agree. I couldn't find QEMU backend ensuring persistence - thus decided to
go with mmap.

> 
> In the previous version I asked NVDIMM folks to review this patch and
> suggest how to use the same HostMemoryBackend (see
> include/sysemu/hostmem.h) that is already used for NVDIMM emulation.
> 
> That seems cleaner than baking platform-specific memory mapped file I/O
> into hw/block/nvme.c, and it will also add a few features that this
> patch does not have.
> 
> If NVDIMM folks don't respond to this email, would you be able to
> research backends/hostmem*.c and try to integrate it?  If you feel lost
> I can help but it will require me to spend time investigating how that
> stuff works again :).
> 

Yes I can research this topic. Does HostMemoryBacked provide persistence?




Re: [PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-10 Thread Stefan Hajnoczi
On Fri, Mar 06, 2020 at 03:38:53PM -0700, Andrzej Jakowski wrote:
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index d28335cbf3..ff7e74d765 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -19,10 +19,14 @@
>   *  -drive file=,if=none,id=
>   *  -device nvme,drive=,serial=,id=, \
>   *  cmb_size_mb=, \
> + *  [pmr_file=,] \
>   *  num_queues=
>   *
>   * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
>   * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
> + *
> + * Either cmb or pmr - due to limitation in avaialbe BAR indexes.

s/avaialbe/available/

> + * pmr_file file needs to be preallocated and power of two in size.

Why does it need to be preallocated?

>   */
>  
>  #include "qemu/osdep.h"
> @@ -1141,6 +1145,28 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
> uint64_t data,
>  NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly,
> "invalid write to read only CMBSZ, ignored");
>  return;
> +#ifndef _WIN32

This ifdef is a hint that the layering is not right.  QEMU device models
usually only implement the "frontend" device registers, interrupts, and
request processing logic.  The platform-specific host "backend"
(mmapping files, sending network packets, audio/graphics APIs, etc) is
implemented separately.

In the previous version I asked NVDIMM folks to review this patch and
suggest how to use the same HostMemoryBackend (see
include/sysemu/hostmem.h) that is already used for NVDIMM emulation.

That seems cleaner than baking platform-specific memory mapped file I/O
into hw/block/nvme.c, and it will also add a few features that this
patch does not have.

If NVDIMM folks don't respond to this email, would you be able to
research backends/hostmem*.c and try to integrate it?  If you feel lost
I can help but it will require me to spend time investigating how that
stuff works again :).


signature.asc
Description: PGP signature


[PATCH RESEND v2] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-03-06 Thread Andrzej Jakowski
This patch introduces support for PMR that has been defined as part of NVMe 1.4
spec. User can now specify a pmr_file which will be mmap'ed into qemu address
space and subsequently in PCI BAR 2. Guest OS can perform mmio read and writes
to the PMR region that will stay persistent accross system reboot.

Signed-off-by: Andrzej Jakowski 
---
Changes since v1:
 - provided support for Bit 1 from PMRWBM register instead of Bit 0 to ensure
   improved performance in virtualized environment [1] (Stefan)

 - added check if pmr size is power of two in size (David)

 - addressed cross compilation build problems reported by CI environment

[1]: 
https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf
[2]: 
https://lore.kernel.org/qemu-devel/20200218224811.30050-1-andrzej.jakow...@linux.intel.com/
 
---

Persistent Memory Region (PMR) is a new optional feature provided in NVMe 1.4
specification. This patch implements initial support for it in NVMe driver.

 hw/block/nvme.c   | 165 +++-
 hw/block/nvme.h   |   5 ++
 hw/block/trace-events |   5 ++
 include/block/nvme.h  | 172 ++
 4 files changed, 346 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d28335cbf3..ff7e74d765 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -19,10 +19,14 @@
  *  -drive file=,if=none,id=
  *  -device nvme,drive=,serial=,id=, \
  *  cmb_size_mb=, \
+ *  [pmr_file=,] \
  *  num_queues=
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
  * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
+ *
+ * Either cmb or pmr - due to limitation in avaialbe BAR indexes.
+ * pmr_file file needs to be preallocated and power of two in size.
  */
 
 #include "qemu/osdep.h"
@@ -1141,6 +1145,28 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly,
"invalid write to read only CMBSZ, ignored");
 return;
+#ifndef _WIN32
+case 0xE00: /* PMRCAP */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrcap_readonly,
+   "invalid write to PMRCAP register, ignored");
+return;
+case 0xE04: /* TODO PMRCTL */
+break;
+case 0xE08: /* PMRSTS */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrsts_readonly,
+   "invalid write to PMRSTS register, ignored");
+return;
+case 0xE0C: /* PMREBS */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrebs_readonly,
+   "invalid write to PMREBS register, ignored");
+return;
+case 0xE10: /* PMRSWTP */
+NVME_GUEST_ERR(nvme_ub_mmiowr_pmrswtp_readonly,
+   "invalid write to PMRSWTP register, ignored");
+return;
+case 0xE14: /* TODO PMRMSC */
+ break;
+#endif /* !_WIN32 */
 default:
 NVME_GUEST_ERR(nvme_ub_mmiowr_invalid,
"invalid MMIO write,"
@@ -1169,6 +1195,22 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr 
addr, unsigned size)
 }
 
 if (addr < sizeof(n->bar)) {
+#ifndef _WIN32
+/*
+ * When PMRWBM bit 1 is set then read from
+ * from PMRSTS should ensure prior writes
+ * made it to persistent media
+ */
+if (addr == 0xE08 &&
+(NVME_PMRCAP_PMRWBM(n->bar.pmrcap) & 0x02) >> 1) {
+int ret;
+ret = msync(n->pmrbuf, n->f_pmr_size, MS_SYNC);
+if (!ret) {
+NVME_GUEST_ERR(nvme_ub_mmiord_pmrread_barrier,
+   "error while persisting data");
+}
+}
+#endif /* !_WIN32 */
 memcpy(, ptr + addr, size);
 } else {
 NVME_GUEST_ERR(nvme_ub_mmiord_invalid_ofs,
@@ -1303,6 +1345,31 @@ static const MemoryRegionOps nvme_cmb_ops = {
 },
 };
 
+#ifndef _WIN32
+static void nvme_pmr_write(void *opaque, hwaddr addr, uint64_t data,
+unsigned size)
+{
+NvmeCtrl *n = (NvmeCtrl *)opaque;
+stn_le_p(>pmrbuf[addr], size, data);
+}
+
+static uint64_t nvme_pmr_read(void *opaque, hwaddr addr, unsigned size)
+{
+NvmeCtrl *n = (NvmeCtrl *)opaque;
+return ldn_le_p(>pmrbuf[addr], size);
+}
+
+static const MemoryRegionOps nvme_pmr_ops = {
+.read = nvme_pmr_read,
+.write = nvme_pmr_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.impl = {
+.min_access_size = 1,
+.max_access_size = 8,
+},
+};
+#endif /* !_WIN32 */
+
 static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 {
 NvmeCtrl *n = NVME(pci_dev);
@@ -1332,6 +1399,39 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 error_setg(errp, "serial property not set");
 return;
 }
+
+#ifndef _WIN32
+if (!n->cmb_size_mb && n->pmr_file) {
+int fd;
+
+n->f_pmr = fopen(n->pmr_file, "r+b");
+if (!n->f_pmr) {
+error_setg(errp, "pmr