Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-19 Thread Michael S. Tsirkin
On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote:
> Changelog in v3:
> There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
> Michael for their valuable comments, the patchset finally gets better shape.

Thanks!
This needs some changes in coding style, and more comments, to
make it easier to maintain going forward.

High level comments - I didn't point out all instances,
please go over code and locate them yourself.
I focused on acpi code in this review.

- fix coding style violations, prefix eveything with nvdimm_ etc
- in apci code, avoid manual memory management/complex pointer math
- comments are needed to document apis & explain what's going on
- constants need comments too, refer to text that
  can be looked up in acpi spec verbatim


> - changes from Igor's comments:
>   1) abstract dimm device type from pc-dimm and create nvdimm device based on
>  dimm, then it uses memory backend device as nvdimm's memory and NUMA has
>  easily been implemented.
>   2) let file-backend device support any kind of filesystem not only for
>  hugetlbfs and let it work on file not only for directory which is
>  achieved by extending 'mem-path' - if it's a directory then it works as
>  current behavior, otherwise if it's file then directly allocates memory
>  from it.
>   3) we figure out a unused memory hole below 4G that is 0xFF0 ~ 
>  0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit
>  ACPI SSDT/DSDT table will break windows XP.
>  BTW, only make SSDT.rev = 2 can not work since the width is only depended
>  on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>  in ACPI spec:
> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit 
> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
> | If the ComplianceRevision is less than 2, all integers are restricted to 32 
> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT 
> sets 
> | the global integer width for all integers, including integers in SSDTs.
>   4) use the lowest ACPI spec version to document AML terms.
>   5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm"
> 
> - changes from Stefan's comments:
>   1) do not do endian adjustment in-place since _DSM memory is visible to 
> guest
>   2) use target platform's target page size instead of fixed PAGE_SIZE
>  definition
>   3) lots of code style improvement and typo fixes.
>   4) live migration fix
> - changes from Paolo's comments:
>   1) improve the name of memory region
>   
> - other changes:
>   1) return exact buffer size for _DSM method instead of the page size.
>   2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm
>  devices.
>   3) NUMA support
>   4) implement _FIT method
>   5) rename "configdata" to "reserve-label-data"
>   6) simplify _DSM arg3 determination
>   7) main changelog update to let it reflect v3.
> 
> Changlog in v2:
> - Use litten endian for DSM method, thanks for Stefan's suggestion
> 
> - introduce a new parameter, @configdata, if it's false, Qemu will
>   build a static and readonly namespace in memory and use it serveing
>   for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
>   reserved region is needed at the end of the @file, it is good for
>   the user who want to pass whole nvdimm device and make its data
>   completely be visible to guest
> 
> - divide the source code into separated files and add maintain info
> 
> BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
> be posted on next week
> 
> == Background ==
> NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
> on Intel's platform. They are discovered via ACPI and configured by _DSM
> method of NVDIMM device in ACPI. There has some supporting documents which
> can be found at:
> ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
> NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
> DSM Interface Example: 
> http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
> Driver Writer's Guide: 
> http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
> 
> Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
> this patchset tries to enable it in virtualization field
> 
> == Design ==
> NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
> address space then CPU can directly access it as normal memory, another is
> BLK which is used as block device to reduce the occupying of CPU address
> space
> 
> BLK mode accesses NVDIMM via Command Register window and Data Register window.
> BLK virtualization has high workload since each sector access will cause at
> least two VM-EXIT. So we currently only imperilment vPMEM in this patchset
> 
> --- vPMEM design ---
> We introduce a new device named "nvdimm", it uses memory 

Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-19 Thread Michael S. Tsirkin
On Tue, Oct 13, 2015 at 01:29:48PM +0800, Xiao Guangrong wrote:
> 
> 
> On 10/12/2015 07:55 PM, Michael S. Tsirkin wrote:
> >On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote:
> >>Changelog in v3:
> >>There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
> >>Michael for their valuable comments, the patchset finally gets better shape.
> >
> >Thanks!
> >This needs some changes in coding style, and more comments, to
> >make it easier to maintain going forward.
> 
> Thanks for your review, Michael. I have learned lots of thing from
> your comments.
> 
> >
> >High level comments - I didn't point out all instances,
> >please go over code and locate them yourself.
> >I focused on acpi code in this review.
> 
> Okay, will do.
> 
> >
> > - fix coding style violations, prefix eveything with nvdimm_ etc
> 
> Actually i did not pay attention on naming the stuff which is only internally
> used. Thank you for pointing it out and will fix it in next version.
> 
> > - in apci code, avoid manual memory management/complex pointer math
> 
> I am not very good at ACPI ASL/AML, could you please more detail?

It's about C.

For example:
Foo *foo = acpi_data_push(table, sizeof *foo);
Bar *foo = acpi_data_push(table, sizeof *bar);
is pretty obviously safe, and it doesn't require you to do any
calculations.
char *buf = acpi_data_push(table, sizeof *foo + sizeof *bar);
is worse, now you need:
Bar *bar = (Bar *)(buf + sizeof *foo);
which will corrupt memory if you get the size wrong in push.

> > - comments are needed to document apis & explain what's going on
> > - constants need comments too, refer to text that
> >   can be looked up in acpi spec verbatim
> 
> Indeed, will document carefully.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

2015-10-14 Thread Dan Williams
On Tue, Oct 13, 2015 at 9:03 PM, Xiao Guangrong
 wrote:
>> Label-less DIMMs are tested as part of the unit test [1] and the
>> "memmap=nn!ss" kernel parameter that registers a persistent-memory
>> address range without a DIMM.  What error do you see when label
>> support is disabled?
>>
>> [1]: https://github.com/pmem/ndctl/blob/master/README.md
>>
>
> After revert my commits on NVDIMM driver, yeah, it works.
>
> Okay, i will drop the namespace part and make it as label-less
> instead.
>
> Thank you, Dan!
>

Good to hear.  There are still cases where a guest would likely want
to submit a _DSM, like retrieving address range scrub results from the
ACPI0012 root device, so the ASL work is still needed.  However, I
think the bulk of the storage functionality can be had without
storing/retrieving labels in the guest.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

2015-10-13 Thread Xiao Guangrong



On 10/13/2015 02:36 PM, Dan Williams wrote:

On Mon, Oct 12, 2015 at 10:49 PM, Xiao Guangrong
 wrote:



On 10/13/2015 11:38 AM, Dan Williams wrote:


On Mon, Oct 12, 2015 at 8:14 PM, Xiao Guangrong
 wrote:


On 10/13/2015 12:36 AM, Dan Williams wrote:


Static namespaces can be emitted without a label.  Linux needs this to
support existing "label-less" bare metal NVDIMMs.




This is Linux specific? As i did not see it has been documented in the
spec...



I expect most NVDIMMs, especially existing ones available today, do
not have a label area.  This is not Linux specific and ACPI 6 does not
specify a label area, only the Intel DSM Interface Example.



Yup, label data is accessed via DSM interface, the spec I mentioned
is Intel DSM Interface Example.

However, IIRC Linux NVDIMM driver refused to use the device if no
DSM GET_LABEL support, are you going to update it?


Label-less DIMMs are tested as part of the unit test [1] and the
"memmap=nn!ss" kernel parameter that registers a persistent-memory
address range without a DIMM.  What error do you see when label
support is disabled?

[1]: https://github.com/pmem/ndctl/blob/master/README.md



After revert my commits on NVDIMM driver, yeah, it works.

Okay, i will drop the namespace part and make it as label-less
instead.

Thank you, Dan!

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

2015-10-13 Thread Dan Williams
On Mon, Oct 12, 2015 at 10:49 PM, Xiao Guangrong
 wrote:
>
>
> On 10/13/2015 11:38 AM, Dan Williams wrote:
>>
>> On Mon, Oct 12, 2015 at 8:14 PM, Xiao Guangrong
>>  wrote:
>>>
>>> On 10/13/2015 12:36 AM, Dan Williams wrote:

 Static namespaces can be emitted without a label.  Linux needs this to
 support existing "label-less" bare metal NVDIMMs.
>>>
>>>
>>>
>>> This is Linux specific? As i did not see it has been documented in the
>>> spec...
>>
>>
>> I expect most NVDIMMs, especially existing ones available today, do
>> not have a label area.  This is not Linux specific and ACPI 6 does not
>> specify a label area, only the Intel DSM Interface Example.
>>
>
> Yup, label data is accessed via DSM interface, the spec I mentioned
> is Intel DSM Interface Example.
>
> However, IIRC Linux NVDIMM driver refused to use the device if no
> DSM GET_LABEL support, are you going to update it?

Label-less DIMMs are tested as part of the unit test [1] and the
"memmap=nn!ss" kernel parameter that registers a persistent-memory
address range without a DIMM.  What error do you see when label
support is disabled?

[1]: https://github.com/pmem/ndctl/blob/master/README.md
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Igor Mammedov
On Mon, 12 Oct 2015 11:06:20 +0800
Xiao Guangrong  wrote:

> 
> 
> On 10/12/2015 10:59 AM, Bharata B Rao wrote:
> > Xiao,
> >
> > Are these patches present in any git tree so that they can be easily tried 
> > out.
> >
> 
> Sorry, currently no git tree out of my workspace is available :(
Is it possible for you to put working tree on github?

> 
> BTW, this patchset is based on top of the commit b37686f7e on qemu tree:
> commit b37686f7e84b22cfaf7fd01ac5133f2617cc3027
> Merge: 8be6e62 98cf48f
> Author: Peter Maydell 
> Date:   Fri Oct 9 12:18:13 2015 +0100
> 
>  Merge remote-tracking branch 
> 'remotes/stefanha/tags/tracing-pull-request' into staging
> 
> Thanks.
> 
> > Regards,
> > Bharata.
> >
> > On Sun, Oct 11, 2015 at 9:22 AM, Xiao Guangrong
> >  wrote:
> >> Changelog in v3:
> >> There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
> >> Michael for their valuable comments, the patchset finally gets better 
> >> shape.
> >> - changes from Igor's comments:
> >>1) abstract dimm device type from pc-dimm and create nvdimm device 
> >> based on
> >>   dimm, then it uses memory backend device as nvdimm's memory and NUMA 
> >> has
> >>   easily been implemented.
> >>2) let file-backend device support any kind of filesystem not only for
> >>   hugetlbfs and let it work on file not only for directory which is
> >>   achieved by extending 'mem-path' - if it's a directory then it works 
> >> as
> >>   current behavior, otherwise if it's file then directly allocates 
> >> memory
> >>   from it.
> >>3) we figure out a unused memory hole below 4G that is 0xFF0 ~
> >>   0xFFF0, this range is large enough for NVDIMM ACPI as build 
> >> 64-bit
> >>   ACPI SSDT/DSDT table will break windows XP.
> >>   BTW, only make SSDT.rev = 2 can not work since the width is only 
> >> depended
> >>   on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition 
> >> Block)
> >>   in ACPI spec:
> >> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
> >> | width of Integer objects is dependent on the ComplianceRevision of the 
> >> DSDT.
> >> | If the ComplianceRevision is less than 2, all integers are restricted to 
> >> 32
> >> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT 
> >> sets
> >> | the global integer width for all integers, including integers in SSDTs.
> >>4) use the lowest ACPI spec version to document AML terms.
> >>5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm"
> >>
> >> - changes from Stefan's comments:
> >>1) do not do endian adjustment in-place since _DSM memory is visible to 
> >> guest
> >>2) use target platform's target page size instead of fixed PAGE_SIZE
> >>   definition
> >>3) lots of code style improvement and typo fixes.
> >>4) live migration fix
> >> - changes from Paolo's comments:
> >>1) improve the name of memory region
> >>
> >> - other changes:
> >>1) return exact buffer size for _DSM method instead of the page size.
> >>2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all 
> >> nvdimm
> >>   devices.
> >>3) NUMA support
> >>4) implement _FIT method
> >>5) rename "configdata" to "reserve-label-data"
> >>6) simplify _DSM arg3 determination
> >>7) main changelog update to let it reflect v3.
> >>
> >> Changlog in v2:
> >> - Use litten endian for DSM method, thanks for Stefan's suggestion
> >>
> >> - introduce a new parameter, @configdata, if it's false, Qemu will
> >>build a static and readonly namespace in memory and use it serveing
> >>for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
> >>reserved region is needed at the end of the @file, it is good for
> >>the user who want to pass whole nvdimm device and make its data
> >>completely be visible to guest
> >>
> >> - divide the source code into separated files and add maintain info
> >>
> >> BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
> >> be posted on next week
> >>
> >> == Background ==
> >> NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
> >> on Intel's platform. They are discovered via ACPI and configured by _DSM
> >> method of NVDIMM device in ACPI. There has some supporting documents which
> >> can be found at:
> >> ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
> >> NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
> >> DSM Interface Example: 
> >> http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
> >> Driver Writer's Guide: 
> >> http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
> >>
> >> Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
> >> this patchset tries to enable it in virtualization field
> >>
> >> == Design ==
> >> NVDIMM 

Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Xiao Guangrong



On 10/12/2015 04:20 PM, Igor Mammedov wrote:

On Mon, 12 Oct 2015 11:06:20 +0800
Xiao Guangrong  wrote:




On 10/12/2015 10:59 AM, Bharata B Rao wrote:

Xiao,

Are these patches present in any git tree so that they can be easily tried out.



Sorry, currently no git tree out of my workspace is available :(

Is it possible for you to put working tree on github?


Yes. I am planning do this. Hopefully, i will publish the git repo alone with 
the
new version patchset.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Dan Williams
On Sun, Oct 11, 2015 at 9:33 PM, Xiao Guangrong
 wrote:
>
>
> On 10/11/2015 05:17 AM, Dan Williams wrote:
>>
>> On Sat, Oct 10, 2015 at 8:52 PM, Xiao Guangrong
>>  wrote:
>> [..]
>>>
>>> == Test ==
>>> In host
>>> 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G
>>> count=10
>>> 2) append "-object memory-backend-file,share,id=mem1,
>>> mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
>>> id=nv1" in QEMU command line
>>>
>>> In guest, download the latest upsteam kernel (4.2 merge window) and
>>> enable
>>> ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
>>> 1) insmod drivers/nvdimm/libnvdimm.ko
>>> 2) insmod drivers/acpi/nfit.ko
>>> 3) insmod drivers/nvdimm/nd_btt.ko
>>> 4) insmod drivers/nvdimm/nd_pmem.ko
>>> You can see the whole nvdimm device used as a single namespace and
>>> /dev/pmem0
>>> appears. You can do whatever on /dev/pmem0 including DAX access.
>>>
>>> Currently Linux NVDIMM driver does not support namespace operation on
>>> this
>>> kind of PMEM, apply below changes to support dynamical namespace:
>>>
>>> @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct
>>> acpi_nfit_desc *a
>>>  continue;
>>>  }
>>>
>>> -   if (nfit_mem->bdw && nfit_mem->memdev_pmem)
>>> +   //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
>>> +   if (nfit_mem->memdev_pmem)
>>>  flags |= NDD_ALIASING;
>>
>>
>> This is just for testing purposes, right?  I expect guests can
>
>
> It's used to validate NVDIMM _DSM method and static namespace following
> NVDIMM specs...

Static namespaces can be emitted without a label.  Linux needs this to
support existing "label-less" bare metal NVDIMMs.

>> sub-divide persistent memory capacity by partitioning the resulting
>> block device(s).
>
>
> I understand that it's a Linux design... Hmm, can the same expectation
> apply to PBLK?

BLK-mode is a bit different as those namespaces have both configurable
sector-size and an optional BTT.  It is possible to expect multiple
BLK namespaces per a given region with different settings.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Michael S. Tsirkin
On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote:
> Changelog in v3:
> There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
> Michael for their valuable comments, the patchset finally gets better shape.

Thanks!
This needs some changes in coding style, and more comments, to
make it easier to maintain going forward.

High level comments - I didn't point out all instances,
please go over code and locate them yourself.
I focused on acpi code in this review.

- fix coding style violations, prefix eveything with nvdimm_ etc
- in apci code, avoid manual memory management/complex pointer math
- comments are needed to document apis & explain what's going on
- constants need comments too, refer to text that
  can be looked up in acpi spec verbatim


> - changes from Igor's comments:
>   1) abstract dimm device type from pc-dimm and create nvdimm device based on
>  dimm, then it uses memory backend device as nvdimm's memory and NUMA has
>  easily been implemented.
>   2) let file-backend device support any kind of filesystem not only for
>  hugetlbfs and let it work on file not only for directory which is
>  achieved by extending 'mem-path' - if it's a directory then it works as
>  current behavior, otherwise if it's file then directly allocates memory
>  from it.
>   3) we figure out a unused memory hole below 4G that is 0xFF0 ~ 
>  0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit
>  ACPI SSDT/DSDT table will break windows XP.
>  BTW, only make SSDT.rev = 2 can not work since the width is only depended
>  on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>  in ACPI spec:
> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit 
> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
> | If the ComplianceRevision is less than 2, all integers are restricted to 32 
> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT 
> sets 
> | the global integer width for all integers, including integers in SSDTs.
>   4) use the lowest ACPI spec version to document AML terms.
>   5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm"
> 
> - changes from Stefan's comments:
>   1) do not do endian adjustment in-place since _DSM memory is visible to 
> guest
>   2) use target platform's target page size instead of fixed PAGE_SIZE
>  definition
>   3) lots of code style improvement and typo fixes.
>   4) live migration fix
> - changes from Paolo's comments:
>   1) improve the name of memory region
>   
> - other changes:
>   1) return exact buffer size for _DSM method instead of the page size.
>   2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm
>  devices.
>   3) NUMA support
>   4) implement _FIT method
>   5) rename "configdata" to "reserve-label-data"
>   6) simplify _DSM arg3 determination
>   7) main changelog update to let it reflect v3.
> 
> Changlog in v2:
> - Use litten endian for DSM method, thanks for Stefan's suggestion
> 
> - introduce a new parameter, @configdata, if it's false, Qemu will
>   build a static and readonly namespace in memory and use it serveing
>   for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
>   reserved region is needed at the end of the @file, it is good for
>   the user who want to pass whole nvdimm device and make its data
>   completely be visible to guest
> 
> - divide the source code into separated files and add maintain info
> 
> BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
> be posted on next week
> 
> == Background ==
> NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
> on Intel's platform. They are discovered via ACPI and configured by _DSM
> method of NVDIMM device in ACPI. There has some supporting documents which
> can be found at:
> ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
> NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
> DSM Interface Example: 
> http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
> Driver Writer's Guide: 
> http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
> 
> Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
> this patchset tries to enable it in virtualization field
> 
> == Design ==
> NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
> address space then CPU can directly access it as normal memory, another is
> BLK which is used as block device to reduce the occupying of CPU address
> space
> 
> BLK mode accesses NVDIMM via Command Register window and Data Register window.
> BLK virtualization has high workload since each sector access will cause at
> least two VM-EXIT. So we currently only imperilment vPMEM in this patchset
> 
> --- vPMEM design ---
> We introduce a new device named "nvdimm", it uses memory 

Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Xiao Guangrong



On 10/13/2015 11:38 AM, Dan Williams wrote:

On Mon, Oct 12, 2015 at 8:14 PM, Xiao Guangrong
 wrote:

On 10/13/2015 12:36 AM, Dan Williams wrote:

Static namespaces can be emitted without a label.  Linux needs this to
support existing "label-less" bare metal NVDIMMs.



This is Linux specific? As i did not see it has been documented in the
spec...


I expect most NVDIMMs, especially existing ones available today, do
not have a label area.  This is not Linux specific and ACPI 6 does not
specify a label area, only the Intel DSM Interface Example.



Yup, label data is accessed via DSM interface, the spec I mentioned
is Intel DSM Interface Example.

However, IIRC Linux NVDIMM driver refused to use the device if no
DSM GET_LABEL support, are you going to update it?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Xiao Guangrong



On 10/13/2015 12:36 AM, Dan Williams wrote:

On Sun, Oct 11, 2015 at 9:33 PM, Xiao Guangrong
 wrote:



On 10/11/2015 05:17 AM, Dan Williams wrote:


On Sat, Oct 10, 2015 at 8:52 PM, Xiao Guangrong
 wrote:
[..]


== Test ==
In host
1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G
count=10
2) append "-object memory-backend-file,share,id=mem1,
 mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
 id=nv1" in QEMU command line

In guest, download the latest upsteam kernel (4.2 merge window) and
enable
ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
1) insmod drivers/nvdimm/libnvdimm.ko
2) insmod drivers/acpi/nfit.ko
3) insmod drivers/nvdimm/nd_btt.ko
4) insmod drivers/nvdimm/nd_pmem.ko
You can see the whole nvdimm device used as a single namespace and
/dev/pmem0
appears. You can do whatever on /dev/pmem0 including DAX access.

Currently Linux NVDIMM driver does not support namespace operation on
this
kind of PMEM, apply below changes to support dynamical namespace:

@@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct
acpi_nfit_desc *a
  continue;
  }

-   if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+   //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+   if (nfit_mem->memdev_pmem)
  flags |= NDD_ALIASING;



This is just for testing purposes, right?  I expect guests can



It's used to validate NVDIMM _DSM method and static namespace following
NVDIMM specs...


Static namespaces can be emitted without a label.  Linux needs this to
support existing "label-less" bare metal NVDIMMs.


This is Linux specific? As i did not see it has been documented in the
spec...




sub-divide persistent memory capacity by partitioning the resulting
block device(s).



I understand that it's a Linux design... Hmm, can the same expectation
apply to PBLK?


BLK-mode is a bit different as those namespaces have both configurable
sector-size and an optional BTT.  It is possible to expect multiple
BLK namespaces per a given region with different settings.


Okay, thanks for your nice explanation, Dan!

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Xiao Guangrong



On 10/12/2015 07:55 PM, Michael S. Tsirkin wrote:

On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote:

Changelog in v3:
There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
Michael for their valuable comments, the patchset finally gets better shape.


Thanks!
This needs some changes in coding style, and more comments, to
make it easier to maintain going forward.


Thanks for your review, Michael. I have learned lots of thing from
your comments.



High level comments - I didn't point out all instances,
please go over code and locate them yourself.
I focused on acpi code in this review.


Okay, will do.



 - fix coding style violations, prefix eveything with nvdimm_ etc


Actually i did not pay attention on naming the stuff which is only internally
used. Thank you for pointing it out and will fix it in next version.


 - in apci code, avoid manual memory management/complex pointer math


I am not very good at ACPI ASL/AML, could you please more detail?


 - comments are needed to document apis & explain what's going on
 - constants need comments too, refer to text that
   can be looked up in acpi spec verbatim


Indeed, will document carefully.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Michael S. Tsirkin
On Tue, Oct 13, 2015 at 01:29:48PM +0800, Xiao Guangrong wrote:
> 
> 
> On 10/12/2015 07:55 PM, Michael S. Tsirkin wrote:
> >On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote:
> >>Changelog in v3:
> >>There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
> >>Michael for their valuable comments, the patchset finally gets better shape.
> >
> >Thanks!
> >This needs some changes in coding style, and more comments, to
> >make it easier to maintain going forward.
> 
> Thanks for your review, Michael. I have learned lots of thing from
> your comments.
> 
> >
> >High level comments - I didn't point out all instances,
> >please go over code and locate them yourself.
> >I focused on acpi code in this review.
> 
> Okay, will do.
> 
> >
> > - fix coding style violations, prefix eveything with nvdimm_ etc
> 
> Actually i did not pay attention on naming the stuff which is only internally
> used. Thank you for pointing it out and will fix it in next version.
> 
> > - in apci code, avoid manual memory management/complex pointer math
> 
> I am not very good at ACPI ASL/AML, could you please more detail?

It's about C.

For example:
Foo *foo = acpi_data_push(table, sizeof *foo);
Bar *foo = acpi_data_push(table, sizeof *bar);
is pretty obviously safe, and it doesn't require you to do any
calculations.
char *buf = acpi_data_push(table, sizeof *foo + sizeof *bar);
is worse, now you need:
Bar *bar = (Bar *)(buf + sizeof *foo);
which will corrupt memory if you get the size wrong in push.

> > - comments are needed to document apis & explain what's going on
> > - constants need comments too, refer to text that
> >   can be looked up in acpi spec verbatim
> 
> Indeed, will document carefully.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Xiao Guangrong



On 10/13/2015 01:57 PM, Michael S. Tsirkin wrote:

On Tue, Oct 13, 2015 at 01:29:48PM +0800, Xiao Guangrong wrote:



On 10/12/2015 07:55 PM, Michael S. Tsirkin wrote:

On Sun, Oct 11, 2015 at 11:52:32AM +0800, Xiao Guangrong wrote:

Changelog in v3:
There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
Michael for their valuable comments, the patchset finally gets better shape.


Thanks!
This needs some changes in coding style, and more comments, to
make it easier to maintain going forward.


Thanks for your review, Michael. I have learned lots of thing from
your comments.



High level comments - I didn't point out all instances,
please go over code and locate them yourself.
I focused on acpi code in this review.


Okay, will do.



 - fix coding style violations, prefix eveything with nvdimm_ etc


Actually i did not pay attention on naming the stuff which is only internally
used. Thank you for pointing it out and will fix it in next version.


 - in apci code, avoid manual memory management/complex pointer math


I am not very good at ACPI ASL/AML, could you please more detail?


It's about C.

For example:
Foo *foo = acpi_data_push(table, sizeof *foo);
Bar *foo = acpi_data_push(table, sizeof *bar);
is pretty obviously safe, and it doesn't require you to do any
calculations.
char *buf = acpi_data_push(table, sizeof *foo + sizeof *bar);
is worse, now you need:
Bar *bar = (Bar *)(buf + sizeof *foo);
which will corrupt memory if you get the size wrong in push.


Ah, got it. :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 00/32] implement vNVDIMM

2015-10-12 Thread Dan Williams
On Mon, Oct 12, 2015 at 8:14 PM, Xiao Guangrong
 wrote:
> On 10/13/2015 12:36 AM, Dan Williams wrote:
>> Static namespaces can be emitted without a label.  Linux needs this to
>> support existing "label-less" bare metal NVDIMMs.
>
>
> This is Linux specific? As i did not see it has been documented in the
> spec...

I expect most NVDIMMs, especially existing ones available today, do
not have a label area.  This is not Linux specific and ACPI 6 does not
specify a label area, only the Intel DSM Interface Example.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-11 Thread Bharata B Rao
Xiao,

Are these patches present in any git tree so that they can be easily tried out.

Regards,
Bharata.

On Sun, Oct 11, 2015 at 9:22 AM, Xiao Guangrong
 wrote:
> Changelog in v3:
> There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
> Michael for their valuable comments, the patchset finally gets better shape.
> - changes from Igor's comments:
>   1) abstract dimm device type from pc-dimm and create nvdimm device based on
>  dimm, then it uses memory backend device as nvdimm's memory and NUMA has
>  easily been implemented.
>   2) let file-backend device support any kind of filesystem not only for
>  hugetlbfs and let it work on file not only for directory which is
>  achieved by extending 'mem-path' - if it's a directory then it works as
>  current behavior, otherwise if it's file then directly allocates memory
>  from it.
>   3) we figure out a unused memory hole below 4G that is 0xFF0 ~
>  0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit
>  ACPI SSDT/DSDT table will break windows XP.
>  BTW, only make SSDT.rev = 2 can not work since the width is only depended
>  on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>  in ACPI spec:
> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
> | If the ComplianceRevision is less than 2, all integers are restricted to 32
> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
> | the global integer width for all integers, including integers in SSDTs.
>   4) use the lowest ACPI spec version to document AML terms.
>   5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm"
>
> - changes from Stefan's comments:
>   1) do not do endian adjustment in-place since _DSM memory is visible to 
> guest
>   2) use target platform's target page size instead of fixed PAGE_SIZE
>  definition
>   3) lots of code style improvement and typo fixes.
>   4) live migration fix
> - changes from Paolo's comments:
>   1) improve the name of memory region
>
> - other changes:
>   1) return exact buffer size for _DSM method instead of the page size.
>   2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm
>  devices.
>   3) NUMA support
>   4) implement _FIT method
>   5) rename "configdata" to "reserve-label-data"
>   6) simplify _DSM arg3 determination
>   7) main changelog update to let it reflect v3.
>
> Changlog in v2:
> - Use litten endian for DSM method, thanks for Stefan's suggestion
>
> - introduce a new parameter, @configdata, if it's false, Qemu will
>   build a static and readonly namespace in memory and use it serveing
>   for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
>   reserved region is needed at the end of the @file, it is good for
>   the user who want to pass whole nvdimm device and make its data
>   completely be visible to guest
>
> - divide the source code into separated files and add maintain info
>
> BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
> be posted on next week
>
> == Background ==
> NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
> on Intel's platform. They are discovered via ACPI and configured by _DSM
> method of NVDIMM device in ACPI. There has some supporting documents which
> can be found at:
> ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
> NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
> DSM Interface Example: 
> http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
> Driver Writer's Guide: 
> http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
>
> Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
> this patchset tries to enable it in virtualization field
>
> == Design ==
> NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
> address space then CPU can directly access it as normal memory, another is
> BLK which is used as block device to reduce the occupying of CPU address
> space
>
> BLK mode accesses NVDIMM via Command Register window and Data Register window.
> BLK virtualization has high workload since each sector access will cause at
> least two VM-EXIT. So we currently only imperilment vPMEM in this patchset
>
> --- vPMEM design ---
> We introduce a new device named "nvdimm", it uses memory backend device as
> NVDIMM memory. The file in file-backend device can be a regular file and block
> device. We can use any file when we do test or emulation, however,
> in the real word, the files passed to guest are:
> - the regular file in the filesystem with DAX enabled created on NVDIMM device
>   on host
> - the raw PMEM device on host, e,g /dev/pmem0
> Memory access on the address created by mmap on these kinds of files can
> directly reach NVDIMM 

Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-11 Thread Xiao Guangrong



On 10/12/2015 10:59 AM, Bharata B Rao wrote:

Xiao,

Are these patches present in any git tree so that they can be easily tried out.



Sorry, currently no git tree out of my workspace is available :(

BTW, this patchset is based on top of the commit b37686f7e on qemu tree:
commit b37686f7e84b22cfaf7fd01ac5133f2617cc3027
Merge: 8be6e62 98cf48f
Author: Peter Maydell 
Date:   Fri Oct 9 12:18:13 2015 +0100

Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' 
into staging

Thanks.


Regards,
Bharata.

On Sun, Oct 11, 2015 at 9:22 AM, Xiao Guangrong
 wrote:

Changelog in v3:
There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
Michael for their valuable comments, the patchset finally gets better shape.
- changes from Igor's comments:
   1) abstract dimm device type from pc-dimm and create nvdimm device based on
  dimm, then it uses memory backend device as nvdimm's memory and NUMA has
  easily been implemented.
   2) let file-backend device support any kind of filesystem not only for
  hugetlbfs and let it work on file not only for directory which is
  achieved by extending 'mem-path' - if it's a directory then it works as
  current behavior, otherwise if it's file then directly allocates memory
  from it.
   3) we figure out a unused memory hole below 4G that is 0xFF0 ~
  0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit
  ACPI SSDT/DSDT table will break windows XP.
  BTW, only make SSDT.rev = 2 can not work since the width is only depended
  on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
  in ACPI spec:
| Note: For compatibility with ACPI versions before ACPI 2.0, the bit
| width of Integer objects is dependent on the ComplianceRevision of the DSDT.
| If the ComplianceRevision is less than 2, all integers are restricted to 32
| bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
| the global integer width for all integers, including integers in SSDTs.
   4) use the lowest ACPI spec version to document AML terms.
   5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm"

- changes from Stefan's comments:
   1) do not do endian adjustment in-place since _DSM memory is visible to guest
   2) use target platform's target page size instead of fixed PAGE_SIZE
  definition
   3) lots of code style improvement and typo fixes.
   4) live migration fix
- changes from Paolo's comments:
   1) improve the name of memory region

- other changes:
   1) return exact buffer size for _DSM method instead of the page size.
   2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm
  devices.
   3) NUMA support
   4) implement _FIT method
   5) rename "configdata" to "reserve-label-data"
   6) simplify _DSM arg3 determination
   7) main changelog update to let it reflect v3.

Changlog in v2:
- Use litten endian for DSM method, thanks for Stefan's suggestion

- introduce a new parameter, @configdata, if it's false, Qemu will
   build a static and readonly namespace in memory and use it serveing
   for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
   reserved region is needed at the end of the @file, it is good for
   the user who want to pass whole nvdimm device and make its data
   completely be visible to guest

- divide the source code into separated files and add maintain info

BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
be posted on next week

== Background ==
NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
on Intel's platform. They are discovered via ACPI and configured by _DSM
method of NVDIMM device in ACPI. There has some supporting documents which
can be found at:
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
this patchset tries to enable it in virtualization field

== Design ==
NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
address space then CPU can directly access it as normal memory, another is
BLK which is used as block device to reduce the occupying of CPU address
space

BLK mode accesses NVDIMM via Command Register window and Data Register window.
BLK virtualization has high workload since each sector access will cause at
least two VM-EXIT. So we currently only imperilment vPMEM in this patchset

--- vPMEM design ---
We introduce a new device named "nvdimm", it uses memory backend device as
NVDIMM memory. The file in file-backend device can be a regular file and block
device. We can use any file when we do 

Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-11 Thread Xiao Guangrong



On 10/11/2015 05:17 AM, Dan Williams wrote:

On Sat, Oct 10, 2015 at 8:52 PM, Xiao Guangrong
 wrote:
[..]

== Test ==
In host
1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
2) append "-object memory-backend-file,share,id=mem1,
mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
id=nv1" in QEMU command line

In guest, download the latest upsteam kernel (4.2 merge window) and enable
ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
1) insmod drivers/nvdimm/libnvdimm.ko
2) insmod drivers/acpi/nfit.ko
3) insmod drivers/nvdimm/nd_btt.ko
4) insmod drivers/nvdimm/nd_pmem.ko
You can see the whole nvdimm device used as a single namespace and /dev/pmem0
appears. You can do whatever on /dev/pmem0 including DAX access.

Currently Linux NVDIMM driver does not support namespace operation on this
kind of PMEM, apply below changes to support dynamical namespace:

@@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a
 continue;
 }

-   if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+   //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+   if (nfit_mem->memdev_pmem)
 flags |= NDD_ALIASING;


This is just for testing purposes, right?  I expect guests can


It's used to validate NVDIMM _DSM method and static namespace following
NVDIMM specs...


sub-divide persistent memory capacity by partitioning the resulting
block device(s).


I understand that it's a Linux design... Hmm, can the same expectation
apply to PBLK?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 00/32] implement vNVDIMM

2015-10-10 Thread Xiao Guangrong
Changelog in v3:
There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
Michael for their valuable comments, the patchset finally gets better shape.
- changes from Igor's comments:
  1) abstract dimm device type from pc-dimm and create nvdimm device based on
 dimm, then it uses memory backend device as nvdimm's memory and NUMA has
 easily been implemented.
  2) let file-backend device support any kind of filesystem not only for
 hugetlbfs and let it work on file not only for directory which is
 achieved by extending 'mem-path' - if it's a directory then it works as
 current behavior, otherwise if it's file then directly allocates memory
 from it.
  3) we figure out a unused memory hole below 4G that is 0xFF0 ~ 
 0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit
 ACPI SSDT/DSDT table will break windows XP.
 BTW, only make SSDT.rev = 2 can not work since the width is only depended
 on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
 in ACPI spec:
| Note: For compatibility with ACPI versions before ACPI 2.0, the bit 
| width of Integer objects is dependent on the ComplianceRevision of the DSDT.
| If the ComplianceRevision is less than 2, all integers are restricted to 32 
| bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets 
| the global integer width for all integers, including integers in SSDTs.
  4) use the lowest ACPI spec version to document AML terms.
  5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm"

- changes from Stefan's comments:
  1) do not do endian adjustment in-place since _DSM memory is visible to guest
  2) use target platform's target page size instead of fixed PAGE_SIZE
 definition
  3) lots of code style improvement and typo fixes.
  4) live migration fix
- changes from Paolo's comments:
  1) improve the name of memory region
  
- other changes:
  1) return exact buffer size for _DSM method instead of the page size.
  2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm
 devices.
  3) NUMA support
  4) implement _FIT method
  5) rename "configdata" to "reserve-label-data"
  6) simplify _DSM arg3 determination
  7) main changelog update to let it reflect v3.

Changlog in v2:
- Use litten endian for DSM method, thanks for Stefan's suggestion

- introduce a new parameter, @configdata, if it's false, Qemu will
  build a static and readonly namespace in memory and use it serveing
  for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
  reserved region is needed at the end of the @file, it is good for
  the user who want to pass whole nvdimm device and make its data
  completely be visible to guest

- divide the source code into separated files and add maintain info

BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
be posted on next week

== Background ==
NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
on Intel's platform. They are discovered via ACPI and configured by _DSM
method of NVDIMM device in ACPI. There has some supporting documents which
can be found at:
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
this patchset tries to enable it in virtualization field

== Design ==
NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
address space then CPU can directly access it as normal memory, another is
BLK which is used as block device to reduce the occupying of CPU address
space

BLK mode accesses NVDIMM via Command Register window and Data Register window.
BLK virtualization has high workload since each sector access will cause at
least two VM-EXIT. So we currently only imperilment vPMEM in this patchset

--- vPMEM design ---
We introduce a new device named "nvdimm", it uses memory backend device as
NVDIMM memory. The file in file-backend device can be a regular file and block 
device. We can use any file when we do test or emulation, however,
in the real word, the files passed to guest are:
- the regular file in the filesystem with DAX enabled created on NVDIMM device
  on host
- the raw PMEM device on host, e,g /dev/pmem0
Memory access on the address created by mmap on these kinds of files can
directly reach NVDIMM device on host.

--- vConfigure data area design ---
Each NVDIMM device has a configure data area which is used to store label
namespace data. In order to emulating this area, we divide the file into two
parts:
- first parts is (0, size - 128K], which is used as PMEM
- 128K at the end of the file, which is used as Label Data Area
So that the label namespace data can be 

Re: [PATCH v3 00/32] implement vNVDIMM

2015-10-10 Thread Dan Williams
On Sat, Oct 10, 2015 at 8:52 PM, Xiao Guangrong
 wrote:
[..]
> == Test ==
> In host
> 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
> 2) append "-object memory-backend-file,share,id=mem1,
>mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
>id=nv1" in QEMU command line
>
> In guest, download the latest upsteam kernel (4.2 merge window) and enable
> ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
> 1) insmod drivers/nvdimm/libnvdimm.ko
> 2) insmod drivers/acpi/nfit.ko
> 3) insmod drivers/nvdimm/nd_btt.ko
> 4) insmod drivers/nvdimm/nd_pmem.ko
> You can see the whole nvdimm device used as a single namespace and /dev/pmem0
> appears. You can do whatever on /dev/pmem0 including DAX access.
>
> Currently Linux NVDIMM driver does not support namespace operation on this
> kind of PMEM, apply below changes to support dynamical namespace:
>
> @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc 
> *a
> continue;
> }
>
> -   if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> +   //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> +   if (nfit_mem->memdev_pmem)
> flags |= NDD_ALIASING;

This is just for testing purposes, right?  I expect guests can
sub-divide persistent memory capacity by partitioning the resulting
block device(s).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html