Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-05-22 Thread Mikulas Patocka



On Fri, 22 May 2020, Aneesh Kumar K.V wrote:

> On 5/22/20 3:01 PM, Michal Suchánek wrote:
> > On Thu, May 21, 2020 at 02:52:30PM -0400, Mikulas Patocka wrote:
> > > 
> > > 
> > > On Thu, 21 May 2020, Dan Williams wrote:
> > > 
> > > > On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
> > > >  wrote:
> > > > > 
> > > > > > Moving on to the patch itself--Aneesh, have you audited other
> > > > > > persistent
> > > > > > memory users in the kernel?  For example, drivers/md/dm-writecache.c
> > > > > > does
> > > > > > this:
> > > > > > 
> > > > > > static void writecache_commit_flushed(struct dm_writecache *wc, bool
> > > > > > wait_for_ios)
> > > > > > {
> > > > > >if (WC_MODE_PMEM(wc))
> > > > > >wmb(); <==
> > > > > >   else
> > > > > >   ssd_commit_flushed(wc, wait_for_ios);
> > > > > > }
> > > > > > 
> > > > > > I believe you'll need to make modifications there.
> > > > > > 
> > > > > 
> > > > > Correct. Thanks for catching that.
> > > > > 
> > > > > 
> > > > > I don't understand dm much, wondering how this will work with
> > > > > non-synchronous DAX device?
> > > > 
> > > > That's a good point. DM-writecache needs to be cognizant of things
> > > > like virtio-pmem that violate the rule that persisent memory writes
> > > > can be flushed by CPU functions rather than calling back into the
> > > > driver. It seems we need to always make the flush case a dax_operation
> > > > callback to account for this.
> > > 
> > > dm-writecache is normally sitting on the top of dm-linear, so it would
> > > need to pass the wmb() call through the dm core and dm-linear target ...
> > > that would slow it down ... I remember that you already did it this way
> > > some times ago and then removed it.
> > > 
> > > What's the exact problem with POWER? Could the POWER system have two types
> > > of persistent memory that need two different ways of flushing?
> > 
> > As far as I understand the discussion so far
> > 
> >   - on POWER $oldhardware uses $oldinstruction to ensure pmem consistency
> >   - on POWER $newhardware uses $newinstruction to ensure pmem consistency
> > (compatible with $oldinstruction on $oldhardware)
> 
> Correct.
> 
> >   - on some platforms instead of barrier instruction a callback into the
> > driver is issued to ensure consistency 
> 
> This is virtio-pmem only at this point IIUC.
> 
> -aneesh

And does the virtio-pmem driver track which pages are dirty? Or does it 
need to specify the range of pages to flush in the flush function?

> > None of this is reflected by the dm driver.

We could make a new dax method:
void *(dax_get_flush_function)(void);

This would return a pointer to "wmb()" on x86 and something else on Power.

The method "dax_get_flush_function" would be called only once when 
initializing the writecache driver (because the call would be slow because 
it would have to go through the DM stack) and then, the returned function 
would be called each time we need write ordering. The returned function 
would do just "sfence; ret".

Mikulas
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment

2020-05-22 Thread Greg KH
On Fri, May 22, 2020 at 01:58:00PM +0200, Greg KH wrote:
> On Thu, May 21, 2020 at 04:37:43PM -0700, Dan Williams wrote:
> > Hello stable team,
> > 
> > These patches have been shipping in mainline since v5.7-rc1 with no
> > reported issues. They address long standing problems in libnvdimm's
> > handling of namespace provisioning relative to alignment constraints
> > including crashes trying to even load the driver on some PowerPC
> > configurations.
> > 
> > I did fold one build fix [1] into "libnvdimm/region: Introduce an 'align'
> > attribute" so as to not convey the bisection breakage to -stable.
> > 
> > Please consider them for v5.4-stable. They do pass the latest
> > version of the ndctl unit tests.
> 
> What about 5.6.y?  Any user upgrading from 5.4-stable to 5.6-stable
> would hit a regression, right?
> 
> So can we get a series backported to 5.6.y as well?  I need that before
> I can take this series.

Also, I really don't see the "bug" that this is fixing here.  If this
didn't work on PowerPC before, it can continue to just "not work" until
5.7, right?  What problems with 5.4.y and 5.6.y is this series fixing
that used to work before?

thanks,

greg k-h
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [5.4-stable PATCH 0/7] libnvdimm: Cross-arch compatible namespace alignment

2020-05-22 Thread Greg KH
On Thu, May 21, 2020 at 04:37:43PM -0700, Dan Williams wrote:
> Hello stable team,
> 
> These patches have been shipping in mainline since v5.7-rc1 with no
> reported issues. They address long standing problems in libnvdimm's
> handling of namespace provisioning relative to alignment constraints
> including crashes trying to even load the driver on some PowerPC
> configurations.
> 
> I did fold one build fix [1] into "libnvdimm/region: Introduce an 'align'
> attribute" so as to not convey the bisection breakage to -stable.
> 
> Please consider them for v5.4-stable. They do pass the latest
> version of the ndctl unit tests.

What about 5.6.y?  Any user upgrading from 5.4-stable to 5.6-stable
would hit a regression, right?

So can we get a series backported to 5.6.y as well?  I need that before
I can take this series.

thanks,

greg k-h
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-05-22 Thread Aneesh Kumar K.V

On 5/22/20 3:01 PM, Michal Suchánek wrote:

On Thu, May 21, 2020 at 02:52:30PM -0400, Mikulas Patocka wrote:



On Thu, 21 May 2020, Dan Williams wrote:


On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
 wrote:



Moving on to the patch itself--Aneesh, have you audited other persistent
memory users in the kernel?  For example, drivers/md/dm-writecache.c does
this:

static void writecache_commit_flushed(struct dm_writecache *wc, bool 
wait_for_ios)
{
   if (WC_MODE_PMEM(wc))
   wmb(); <==
  else
  ssd_commit_flushed(wc, wait_for_ios);
}

I believe you'll need to make modifications there.



Correct. Thanks for catching that.


I don't understand dm much, wondering how this will work with
non-synchronous DAX device?


That's a good point. DM-writecache needs to be cognizant of things
like virtio-pmem that violate the rule that persisent memory writes
can be flushed by CPU functions rather than calling back into the
driver. It seems we need to always make the flush case a dax_operation
callback to account for this.


dm-writecache is normally sitting on the top of dm-linear, so it would
need to pass the wmb() call through the dm core and dm-linear target ...
that would slow it down ... I remember that you already did it this way
some times ago and then removed it.

What's the exact problem with POWER? Could the POWER system have two types
of persistent memory that need two different ways of flushing?


As far as I understand the discussion so far

  - on POWER $oldhardware uses $oldinstruction to ensure pmem consistency
  - on POWER $newhardware uses $newinstruction to ensure pmem consistency
(compatible with $oldinstruction on $oldhardware)


Correct.


  - on some platforms instead of barrier instruction a callback into the
driver is issued to ensure consistency 


This is virtio-pmem only at this point IIUC.




None of this is reflected by the dm driver.



-aneesh
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier

2020-05-22 Thread Michal Suchánek
On Thu, May 21, 2020 at 02:52:30PM -0400, Mikulas Patocka wrote:
> 
> 
> On Thu, 21 May 2020, Dan Williams wrote:
> 
> > On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
> >  wrote:
> > >
> > > > Moving on to the patch itself--Aneesh, have you audited other persistent
> > > > memory users in the kernel?  For example, drivers/md/dm-writecache.c 
> > > > does
> > > > this:
> > > >
> > > > static void writecache_commit_flushed(struct dm_writecache *wc, bool 
> > > > wait_for_ios)
> > > > {
> > > >   if (WC_MODE_PMEM(wc))
> > > >   wmb(); <==
> > > >  else
> > > >  ssd_commit_flushed(wc, wait_for_ios);
> > > > }
> > > >
> > > > I believe you'll need to make modifications there.
> > > >
> > >
> > > Correct. Thanks for catching that.
> > >
> > >
> > > I don't understand dm much, wondering how this will work with
> > > non-synchronous DAX device?
> > 
> > That's a good point. DM-writecache needs to be cognizant of things
> > like virtio-pmem that violate the rule that persisent memory writes
> > can be flushed by CPU functions rather than calling back into the
> > driver. It seems we need to always make the flush case a dax_operation
> > callback to account for this.
> 
> dm-writecache is normally sitting on the top of dm-linear, so it would 
> need to pass the wmb() call through the dm core and dm-linear target ... 
> that would slow it down ... I remember that you already did it this way 
> some times ago and then removed it.
> 
> What's the exact problem with POWER? Could the POWER system have two types 
> of persistent memory that need two different ways of flushing?

As far as I understand the discussion so far

 - on POWER $oldhardware uses $oldinstruction to ensure pmem consistency
 - on POWER $newhardware uses $newinstruction to ensure pmem consistency
   (compatible with $oldinstruction on $oldhardware)
 - on some platforms instead of barrier instruction a callback into the
   driver is issued to ensure consistency

None of this is reflected by the dm driver.

Thanks

Michal
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [RESEND PATCH v7 4/5] ndctl/papr_scm, uapi: Add support for PAPR nvdimm specific methods

2020-05-22 Thread Vaibhav Jain
Michael Ellerman  writes:

> Ira Weiny  writes:
>> On Wed, May 20, 2020 at 12:30:57AM +0530, Vaibhav Jain wrote:
>>> Introduce support for Papr nvDimm Specific Methods (PDSM) in papr_scm
>>> modules and add the command family to the white list of NVDIMM command
>>> sets. Also advertise support for ND_CMD_CALL for the dimm
>>> command mask and implement necessary scaffolding in the module to
>>> handle ND_CMD_CALL ioctl and PDSM requests that we receive.
> ...
>>> + *
>>> + * Payload Version:
>>> + *
>>> + * A 'payload_version' field is present in PDSM header that indicates a 
>>> specific
>>> + * version of the structure present in PDSM Payload for a given PDSM 
>>> command.
>>> + * This provides backward compatibility in case the PDSM Payload structure
>>> + * evolves and different structures are supported by 'papr_scm' and 
>>> 'libndctl'.
>>> + *
>>> + * When sending a PDSM Payload to 'papr_scm', 'libndctl' should send the 
>>> version
>>> + * of the payload struct it supports via 'payload_version' field. The 
>>> 'papr_scm'
>>> + * module when servicing the PDSM envelope checks the 'payload_version' 
>>> and then
>>> + * uses 'payload struct version' == MIN('payload_version field',
>>> + * 'max payload-struct-version supported by papr_scm') to service the PDSM.
>>> + * After servicing the PDSM, 'papr_scm' put the negotiated version of 
>>> payload
>>> + * struct in returned 'payload_version' field.
>>
>> FWIW many people believe using a size rather than version is more 
>> sustainable.
>> It is expected that new payload structures are larger (more features) than 
>> the
>> previous payload structure.
>>
>> I can't find references at the moment through.
>
> I think clone_args is a good modern example:
>
>   
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/sched.h#n88
>
> cheers

Thank Ira and Mpe for pointing this out. I looked into how clone3 sycall
handles clone_args and few differences came out:

* Unlike clone_args that are always transferred in one direction from
  user-space to kernel, payload contents of pdsms are transferred in both
  directions. Having a single version number makes it easier for
  user-space and kernel to determine what data will be exchanged.

* For PDSMs, the version number is negotiated between libndctl and
  kernel. For example in case kernel only supports an older version of
  a structure, its free to send a lower version number back to
  libndctl. Such negotiations doesnt happen with clone3 syscall.
  
-- 
Cheers
~ Vaibhav
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org