from:"Alistair"

Re: [PATCH RESEND 4/4] dax, kmem: calculate abstract distance with general interface

2023-08-25 Thread Alistair Popple



"Huang, Ying"  writes:

> Alistair Popple  writes:
>
>> "Huang, Ying"  writes:
>>
>>> Alistair Popple  writes:
>>>
>>>> "Huang, Ying"  writes:
>>>>
>>>>> Alistair Popple  writes:
>>>>>
>>>>>> Huang Ying  writes:
>>>>>>
>>>>>>> Previously, a fixed abstract distance MEMTIER_DEFAULT_DAX_ADISTANCE is
>>>>>>> used for slow memory type in kmem driver.  This limits the usage of
>>>>>>> kmem driver, for example, it cannot be used for HBM (high bandwidth
>>>>>>> memory).
>>>>>>>
>>>>>>> So, we use the general abstract distance calculation mechanism in kmem
>>>>>>> drivers to get more accurate abstract distance on systems with proper
>>>>>>> support.  The original MEMTIER_DEFAULT_DAX_ADISTANCE is used as
>>>>>>> fallback only.
>>>>>>>
>>>>>>> Now, multiple memory types may be managed by kmem.  These memory types
>>>>>>> are put into the "kmem_memory_types" list and protected by
>>>>>>> kmem_memory_type_lock.
>>>>>>
>>>>>> See below but I wonder if kmem_memory_types could be a common helper
>>>>>> rather than kdax specific?
>>>>>>
>>>>>>> Signed-off-by: "Huang, Ying" 
>>>>>>> Cc: Aneesh Kumar K.V 
>>>>>>> Cc: Wei Xu 
>>>>>>> Cc: Alistair Popple 
>>>>>>> Cc: Dan Williams 
>>>>>>> Cc: Dave Hansen 
>>>>>>> Cc: Davidlohr Bueso 
>>>>>>> Cc: Johannes Weiner 
>>>>>>> Cc: Jonathan Cameron 
>>>>>>> Cc: Michal Hocko 
>>>>>>> Cc: Yang Shi 
>>>>>>> Cc: Rafael J Wysocki 
>>>>>>> ---
>>>>>>>  drivers/dax/kmem.c   | 54 +++-
>>>>>>>  include/linux/memory-tiers.h |  2 ++
>>>>>>>  mm/memory-tiers.c|  2 +-
>>>>>>>  3 files changed, 44 insertions(+), 14 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
>>>>>>> index 898ca9505754..837165037231 100644
>>>>>>> --- a/drivers/dax/kmem.c
>>>>>>> +++ b/drivers/dax/kmem.c
>>>>>>> @@ -49,14 +49,40 @@ struct dax_kmem_data {
>>>>>>> struct resource *res[];
>>>>>>>  };
>>>>>>>  
>>>>>>> -static struct memory_dev_type *dax_slowmem_type;
>>>>>>> +static DEFINE_MUTEX(kmem_memory_type_lock);
>>>>>>> +static LIST_HEAD(kmem_memory_types);
>>>>>>> +
>>>>>>> +static struct memory_dev_type *kmem_find_alloc_memorty_type(int adist)
>>>>>>> +{
>>>>>>> +   bool found = false;
>>>>>>> +   struct memory_dev_type *mtype;
>>>>>>> +
>>>>>>> +   mutex_lock(_memory_type_lock);
>>>>>>> +   list_for_each_entry(mtype, _memory_types, list) {
>>>>>>> +   if (mtype->adistance == adist) {
>>>>>>> +   found = true;
>>>>>>> +   break;
>>>>>>> +   }
>>>>>>> +   }
>>>>>>> +   if (!found) {
>>>>>>> +   mtype = alloc_memory_type(adist);
>>>>>>> +   if (!IS_ERR(mtype))
>>>>>>> +   list_add(>list, _memory_types);
>>>>>>> +   }
>>>>>>> +   mutex_unlock(_memory_type_lock);
>>>>>>> +
>>>>>>> +   return mtype;
>>>>>>> +}
>>>>>>> +
>>>>>>>  static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>>>>>>>  {
>>>>>>> struct device *dev = _dax->dev;
>>>>>>> unsigned long total_len = 0;
>>>>>>> struct dax_kmem_data *data;
>>>>>>> +   struct memory_dev_type *mtype;
>>>>>>> int i, rc, mapped = 0;
>>>>>>> int num

Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management

2023-08-24 Thread Alistair Popple



"Huang, Ying"  writes:

> Alistair Popple  writes:
>
>> "Huang, Ying"  writes:
>>
>>> Alistair Popple  writes:
>>>
>>>> "Huang, Ying"  writes:
>>>>
>>>>> Alistair Popple  writes:
>>>>>
>>>>>> "Huang, Ying"  writes:
>>>>>>
>>>>>>> Hi, Alistair,
>>>>>>>
>>>>>>> Sorry for late response.  Just come back from vacation.
>>>>>>
>>>>>> Ditto for this response :-)
>>>>>>
>>>>>> I see Andrew has taken this into mm-unstable though, so my bad for not
>>>>>> getting around to following all this up sooner.
>>>>>>
>>>>>>> Alistair Popple  writes:
>>>>>>>
>>>>>>>> "Huang, Ying"  writes:
>>>>>>>>
>>>>>>>>> Alistair Popple  writes:
>>>>>>>>>
>>>>>>>>>> "Huang, Ying"  writes:
>>>>>>>>>>
>>>>>>>>>>> Alistair Popple  writes:
>>>>>>>>>>>
>>>>>>>>>>>>>>> While other memory device drivers can use the general notifier 
>>>>>>>>>>>>>>> chain
>>>>>>>>>>>>>>> interface at the same time.
>>>>>>>>>>>>
>>>>>>>>>>>> How would that work in practice though? The abstract distance as 
>>>>>>>>>>>> far as
>>>>>>>>>>>> I can tell doesn't have any meaning other than establishing 
>>>>>>>>>>>> preferences
>>>>>>>>>>>> for memory demotion order. Therefore all calculations are relative 
>>>>>>>>>>>> to
>>>>>>>>>>>> the rest of the calculations on the system. So if a driver does 
>>>>>>>>>>>> it's own
>>>>>>>>>>>> thing how does it choose a sensible distance? IHMO the value here 
>>>>>>>>>>>> is in
>>>>>>>>>>>> coordinating all that through a standard interface, whether that 
>>>>>>>>>>>> is HMAT
>>>>>>>>>>>> or something else.
>>>>>>>>>>>
>>>>>>>>>>> Only if different algorithms follow the same basic principle.  For
>>>>>>>>>>> example, the abstract distance of default DRAM nodes are fixed
>>>>>>>>>>> (MEMTIER_ADISTANCE_DRAM).  The abstract distance of the memory 
>>>>>>>>>>> device is
>>>>>>>>>>> in linear direct proportion to the memory latency and inversely
>>>>>>>>>>> proportional to the memory bandwidth.  Use the memory latency and
>>>>>>>>>>> bandwidth of default DRAM nodes as base.
>>>>>>>>>>>
>>>>>>>>>>> HMAT and CDAT report the raw memory latency and bandwidth.  If 
>>>>>>>>>>> there are
>>>>>>>>>>> some other methods to report the raw memory latency and bandwidth, 
>>>>>>>>>>> we
>>>>>>>>>>> can use them too.
>>>>>>>>>>
>>>>>>>>>> Argh! So we could address my concerns by having drivers feed
>>>>>>>>>> latency/bandwidth numbers into a standard calculation algorithm 
>>>>>>>>>> right?
>>>>>>>>>> Ie. Rather than having drivers calculate abstract distance 
>>>>>>>>>> themselves we
>>>>>>>>>> have the notifier chains return the raw performance data from which 
>>>>>>>>>> the
>>>>>>>>>> abstract distance is derived.
>>>>>>>>>
>>>>>>>>> Now, memory device drivers only need a general interface to get the
>>>>>>>>> abstract distance from the NUMA node ID.  In the future, if they need
>>>>>>>>> more interfaces, we can add them.  For example, the interface you
>>>>

Re: [PATCH RESEND 4/4] dax, kmem: calculate abstract distance with general interface

2023-08-22 Thread Alistair Popple



"Huang, Ying"  writes:

> Alistair Popple  writes:
>
>> "Huang, Ying"  writes:
>>
>>> Alistair Popple  writes:
>>>
>>>> Huang Ying  writes:
>>>>
>>>>> Previously, a fixed abstract distance MEMTIER_DEFAULT_DAX_ADISTANCE is
>>>>> used for slow memory type in kmem driver.  This limits the usage of
>>>>> kmem driver, for example, it cannot be used for HBM (high bandwidth
>>>>> memory).
>>>>>
>>>>> So, we use the general abstract distance calculation mechanism in kmem
>>>>> drivers to get more accurate abstract distance on systems with proper
>>>>> support.  The original MEMTIER_DEFAULT_DAX_ADISTANCE is used as
>>>>> fallback only.
>>>>>
>>>>> Now, multiple memory types may be managed by kmem.  These memory types
>>>>> are put into the "kmem_memory_types" list and protected by
>>>>> kmem_memory_type_lock.
>>>>
>>>> See below but I wonder if kmem_memory_types could be a common helper
>>>> rather than kdax specific?
>>>>
>>>>> Signed-off-by: "Huang, Ying" 
>>>>> Cc: Aneesh Kumar K.V 
>>>>> Cc: Wei Xu 
>>>>> Cc: Alistair Popple 
>>>>> Cc: Dan Williams 
>>>>> Cc: Dave Hansen 
>>>>> Cc: Davidlohr Bueso 
>>>>> Cc: Johannes Weiner 
>>>>> Cc: Jonathan Cameron 
>>>>> Cc: Michal Hocko 
>>>>> Cc: Yang Shi 
>>>>> Cc: Rafael J Wysocki 
>>>>> ---
>>>>>  drivers/dax/kmem.c   | 54 +++-
>>>>>  include/linux/memory-tiers.h |  2 ++
>>>>>  mm/memory-tiers.c|  2 +-
>>>>>  3 files changed, 44 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
>>>>> index 898ca9505754..837165037231 100644
>>>>> --- a/drivers/dax/kmem.c
>>>>> +++ b/drivers/dax/kmem.c
>>>>> @@ -49,14 +49,40 @@ struct dax_kmem_data {
>>>>>   struct resource *res[];
>>>>>  };
>>>>>  
>>>>> -static struct memory_dev_type *dax_slowmem_type;
>>>>> +static DEFINE_MUTEX(kmem_memory_type_lock);
>>>>> +static LIST_HEAD(kmem_memory_types);
>>>>> +
>>>>> +static struct memory_dev_type *kmem_find_alloc_memorty_type(int adist)
>>>>> +{
>>>>> + bool found = false;
>>>>> + struct memory_dev_type *mtype;
>>>>> +
>>>>> + mutex_lock(_memory_type_lock);
>>>>> + list_for_each_entry(mtype, _memory_types, list) {
>>>>> + if (mtype->adistance == adist) {
>>>>> + found = true;
>>>>> + break;
>>>>> + }
>>>>> + }
>>>>> + if (!found) {
>>>>> + mtype = alloc_memory_type(adist);
>>>>> + if (!IS_ERR(mtype))
>>>>> + list_add(>list, _memory_types);
>>>>> + }
>>>>> + mutex_unlock(_memory_type_lock);
>>>>> +
>>>>> + return mtype;
>>>>> +}
>>>>> +
>>>>>  static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>>>>>  {
>>>>>   struct device *dev = _dax->dev;
>>>>>   unsigned long total_len = 0;
>>>>>   struct dax_kmem_data *data;
>>>>> + struct memory_dev_type *mtype;
>>>>>   int i, rc, mapped = 0;
>>>>>   int numa_node;
>>>>> + int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;
>>>>>  
>>>>>   /*
>>>>>* Ensure good NUMA information for the persistent memory.
>>>>> @@ -71,6 +97,11 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>>>>>   return -EINVAL;
>>>>>   }
>>>>>  
>>>>> + mt_calc_adistance(numa_node, );
>>>>> + mtype = kmem_find_alloc_memorty_type(adist);
>>>>> + if (IS_ERR(mtype))
>>>>> + return PTR_ERR(mtype);
>>>>> +
>>>>
>>>> I wrote my own quick and dirty module to test this and wrote basically
>>>> the same code sequence.
>>>>
>>>> I notice your using a list of memory types here though. I think it would
&

Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management

2023-08-22 Thread Alistair Popple



"Huang, Ying"  writes:

> Alistair Popple  writes:
>
>> "Huang, Ying"  writes:
>>
>>> Alistair Popple  writes:
>>>
>>>> "Huang, Ying"  writes:
>>>>
>>>>> Hi, Alistair,
>>>>>
>>>>> Sorry for late response.  Just come back from vacation.
>>>>
>>>> Ditto for this response :-)
>>>>
>>>> I see Andrew has taken this into mm-unstable though, so my bad for not
>>>> getting around to following all this up sooner.
>>>>
>>>>> Alistair Popple  writes:
>>>>>
>>>>>> "Huang, Ying"  writes:
>>>>>>
>>>>>>> Alistair Popple  writes:
>>>>>>>
>>>>>>>> "Huang, Ying"  writes:
>>>>>>>>
>>>>>>>>> Alistair Popple  writes:
>>>>>>>>>
>>>>>>>>>>>>> While other memory device drivers can use the general notifier 
>>>>>>>>>>>>> chain
>>>>>>>>>>>>> interface at the same time.
>>>>>>>>>>
>>>>>>>>>> How would that work in practice though? The abstract distance as far 
>>>>>>>>>> as
>>>>>>>>>> I can tell doesn't have any meaning other than establishing 
>>>>>>>>>> preferences
>>>>>>>>>> for memory demotion order. Therefore all calculations are relative to
>>>>>>>>>> the rest of the calculations on the system. So if a driver does it's 
>>>>>>>>>> own
>>>>>>>>>> thing how does it choose a sensible distance? IHMO the value here is 
>>>>>>>>>> in
>>>>>>>>>> coordinating all that through a standard interface, whether that is 
>>>>>>>>>> HMAT
>>>>>>>>>> or something else.
>>>>>>>>>
>>>>>>>>> Only if different algorithms follow the same basic principle.  For
>>>>>>>>> example, the abstract distance of default DRAM nodes are fixed
>>>>>>>>> (MEMTIER_ADISTANCE_DRAM).  The abstract distance of the memory device 
>>>>>>>>> is
>>>>>>>>> in linear direct proportion to the memory latency and inversely
>>>>>>>>> proportional to the memory bandwidth.  Use the memory latency and
>>>>>>>>> bandwidth of default DRAM nodes as base.
>>>>>>>>>
>>>>>>>>> HMAT and CDAT report the raw memory latency and bandwidth.  If there 
>>>>>>>>> are
>>>>>>>>> some other methods to report the raw memory latency and bandwidth, we
>>>>>>>>> can use them too.
>>>>>>>>
>>>>>>>> Argh! So we could address my concerns by having drivers feed
>>>>>>>> latency/bandwidth numbers into a standard calculation algorithm right?
>>>>>>>> Ie. Rather than having drivers calculate abstract distance themselves 
>>>>>>>> we
>>>>>>>> have the notifier chains return the raw performance data from which the
>>>>>>>> abstract distance is derived.
>>>>>>>
>>>>>>> Now, memory device drivers only need a general interface to get the
>>>>>>> abstract distance from the NUMA node ID.  In the future, if they need
>>>>>>> more interfaces, we can add them.  For example, the interface you
>>>>>>> suggested above.
>>>>>>
>>>>>> Huh? Memory device drivers (ie. dax/kmem.c) don't care about abstract
>>>>>> distance, it's a meaningless number. The only reason they care about it
>>>>>> is so they can pass it to alloc_memory_type():
>>>>>>
>>>>>> struct memory_dev_type *alloc_memory_type(int adistance)
>>>>>>
>>>>>> Instead alloc_memory_type() should be taking bandwidth/latency numbers
>>>>>> and the calculation of abstract distance should be done there. That
>>>>>> resovles the issues about how drivers are supposed to devine adistance
>>>>>> and also means that when CDAT is added

Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management

2023-08-21 Thread Alistair Popple



"Huang, Ying"  writes:

> Alistair Popple  writes:
>
>> "Huang, Ying"  writes:
>>
>>> Hi, Alistair,
>>>
>>> Sorry for late response.  Just come back from vacation.
>>
>> Ditto for this response :-)
>>
>> I see Andrew has taken this into mm-unstable though, so my bad for not
>> getting around to following all this up sooner.
>>
>>> Alistair Popple  writes:
>>>
>>>> "Huang, Ying"  writes:
>>>>
>>>>> Alistair Popple  writes:
>>>>>
>>>>>> "Huang, Ying"  writes:
>>>>>>
>>>>>>> Alistair Popple  writes:
>>>>>>>
>>>>>>>>>>> While other memory device drivers can use the general notifier chain
>>>>>>>>>>> interface at the same time.
>>>>>>>>
>>>>>>>> How would that work in practice though? The abstract distance as far as
>>>>>>>> I can tell doesn't have any meaning other than establishing preferences
>>>>>>>> for memory demotion order. Therefore all calculations are relative to
>>>>>>>> the rest of the calculations on the system. So if a driver does it's 
>>>>>>>> own
>>>>>>>> thing how does it choose a sensible distance? IHMO the value here is in
>>>>>>>> coordinating all that through a standard interface, whether that is 
>>>>>>>> HMAT
>>>>>>>> or something else.
>>>>>>>
>>>>>>> Only if different algorithms follow the same basic principle.  For
>>>>>>> example, the abstract distance of default DRAM nodes are fixed
>>>>>>> (MEMTIER_ADISTANCE_DRAM).  The abstract distance of the memory device is
>>>>>>> in linear direct proportion to the memory latency and inversely
>>>>>>> proportional to the memory bandwidth.  Use the memory latency and
>>>>>>> bandwidth of default DRAM nodes as base.
>>>>>>>
>>>>>>> HMAT and CDAT report the raw memory latency and bandwidth.  If there are
>>>>>>> some other methods to report the raw memory latency and bandwidth, we
>>>>>>> can use them too.
>>>>>>
>>>>>> Argh! So we could address my concerns by having drivers feed
>>>>>> latency/bandwidth numbers into a standard calculation algorithm right?
>>>>>> Ie. Rather than having drivers calculate abstract distance themselves we
>>>>>> have the notifier chains return the raw performance data from which the
>>>>>> abstract distance is derived.
>>>>>
>>>>> Now, memory device drivers only need a general interface to get the
>>>>> abstract distance from the NUMA node ID.  In the future, if they need
>>>>> more interfaces, we can add them.  For example, the interface you
>>>>> suggested above.
>>>>
>>>> Huh? Memory device drivers (ie. dax/kmem.c) don't care about abstract
>>>> distance, it's a meaningless number. The only reason they care about it
>>>> is so they can pass it to alloc_memory_type():
>>>>
>>>> struct memory_dev_type *alloc_memory_type(int adistance)
>>>>
>>>> Instead alloc_memory_type() should be taking bandwidth/latency numbers
>>>> and the calculation of abstract distance should be done there. That
>>>> resovles the issues about how drivers are supposed to devine adistance
>>>> and also means that when CDAT is added we don't have to duplicate the
>>>> calculation code.
>>>
>>> In the current design, the abstract distance is the key concept of
>>> memory types and memory tiers.  And it is used as interface to allocate
>>> memory types.  This provides more flexibility than some other interfaces
>>> (e.g. read/write bandwidth/latency).  For example, in current
>>> dax/kmem.c, if HMAT isn't available in the system, the default abstract
>>> distance: MEMTIER_DEFAULT_DAX_ADISTANCE is used.  This is still useful
>>> to support some systems now.  On a system without HMAT/CDAT, it's
>>> possible to calculate abstract distance from ACPI SLIT, although this is
>>> quite limited.  I'm not sure whether all systems will provide read/write
>>> bandwith/latency data for all memory devices.
>>>
>>> HMAT and CDAT o

Re: [PATCH RESEND 4/4] dax, kmem: calculate abstract distance with general interface

2023-08-21 Thread Alistair Popple



"Huang, Ying"  writes:

> Alistair Popple  writes:
>
>> Huang Ying  writes:
>>
>>> Previously, a fixed abstract distance MEMTIER_DEFAULT_DAX_ADISTANCE is
>>> used for slow memory type in kmem driver.  This limits the usage of
>>> kmem driver, for example, it cannot be used for HBM (high bandwidth
>>> memory).
>>>
>>> So, we use the general abstract distance calculation mechanism in kmem
>>> drivers to get more accurate abstract distance on systems with proper
>>> support.  The original MEMTIER_DEFAULT_DAX_ADISTANCE is used as
>>> fallback only.
>>>
>>> Now, multiple memory types may be managed by kmem.  These memory types
>>> are put into the "kmem_memory_types" list and protected by
>>> kmem_memory_type_lock.
>>
>> See below but I wonder if kmem_memory_types could be a common helper
>> rather than kdax specific?
>>
>>> Signed-off-by: "Huang, Ying" 
>>> Cc: Aneesh Kumar K.V 
>>> Cc: Wei Xu 
>>> Cc: Alistair Popple 
>>> Cc: Dan Williams 
>>> Cc: Dave Hansen 
>>> Cc: Davidlohr Bueso 
>>> Cc: Johannes Weiner 
>>> Cc: Jonathan Cameron 
>>> Cc: Michal Hocko 
>>> Cc: Yang Shi 
>>> Cc: Rafael J Wysocki 
>>> ---
>>>  drivers/dax/kmem.c   | 54 +++-
>>>  include/linux/memory-tiers.h |  2 ++
>>>  mm/memory-tiers.c|  2 +-
>>>  3 files changed, 44 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
>>> index 898ca9505754..837165037231 100644
>>> --- a/drivers/dax/kmem.c
>>> +++ b/drivers/dax/kmem.c
>>> @@ -49,14 +49,40 @@ struct dax_kmem_data {
>>> struct resource *res[];
>>>  };
>>>  
>>> -static struct memory_dev_type *dax_slowmem_type;
>>> +static DEFINE_MUTEX(kmem_memory_type_lock);
>>> +static LIST_HEAD(kmem_memory_types);
>>> +
>>> +static struct memory_dev_type *kmem_find_alloc_memorty_type(int adist)
>>> +{
>>> +   bool found = false;
>>> +   struct memory_dev_type *mtype;
>>> +
>>> +   mutex_lock(_memory_type_lock);
>>> +   list_for_each_entry(mtype, _memory_types, list) {
>>> +   if (mtype->adistance == adist) {
>>> +   found = true;
>>> +   break;
>>> +   }
>>> +   }
>>> +   if (!found) {
>>> +   mtype = alloc_memory_type(adist);
>>> +   if (!IS_ERR(mtype))
>>> +   list_add(>list, _memory_types);
>>> +   }
>>> +   mutex_unlock(_memory_type_lock);
>>> +
>>> +   return mtype;
>>> +}
>>> +
>>>  static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>>>  {
>>> struct device *dev = _dax->dev;
>>> unsigned long total_len = 0;
>>> struct dax_kmem_data *data;
>>> +   struct memory_dev_type *mtype;
>>> int i, rc, mapped = 0;
>>> int numa_node;
>>> +   int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;
>>>  
>>> /*
>>>  * Ensure good NUMA information for the persistent memory.
>>> @@ -71,6 +97,11 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>>> return -EINVAL;
>>> }
>>>  
>>> +   mt_calc_adistance(numa_node, );
>>> +   mtype = kmem_find_alloc_memorty_type(adist);
>>> +   if (IS_ERR(mtype))
>>> +   return PTR_ERR(mtype);
>>> +
>>
>> I wrote my own quick and dirty module to test this and wrote basically
>> the same code sequence.
>>
>> I notice your using a list of memory types here though. I think it would
>> be nice to have a common helper that other users could call to do the
>> mt_calc_adistance() / kmem_find_alloc_memory_type() /
>> init_node_memory_type() sequence and cleanup as my naive approach would
>> result in a new memory_dev_type per device even though adist might be
>> the same. A common helper would make it easy to de-dup those.
>
> If it's useful, we can move kmem_find_alloc_memory_type() to
> memory-tier.c after some revision.  But I tend to move it after we have
> the second user.  What do you think about that?

Usually I would agree, but this series already introduces a general
interface for calculating adist even though there's only one user and
implementation. So if we're going to add a general interface I think it
would be better to make it more usable now rather than after variations
of it have been cut and pasted into other drivers.

Re: [PATCH RESEND 3/4] acpi, hmat: calculate abstract distance with HMAT

2023-08-21 Thread Alistair Popple



"Huang, Ying"  writes:

> Alistair Popple  writes:
>
>> Huang Ying  writes:
>>
>>> A memory tiering abstract distance calculation algorithm based on ACPI
>>> HMAT is implemented.  The basic idea is as follows.
>>>
>>> The performance attributes of system default DRAM nodes are recorded
>>> as the base line.  Whose abstract distance is MEMTIER_ADISTANCE_DRAM.
>>> Then, the ratio of the abstract distance of a memory node (target) to
>>> MEMTIER_ADISTANCE_DRAM is scaled based on the ratio of the performance
>>> attributes of the node to that of the default DRAM nodes.
>>
>> The problem I encountered here with the calculations is that HBM memory
>> ended up in a lower-tiered node which isn't what I wanted (at least when
>> that HBM is attached to a GPU say).
>
> I have tested the series on a server machine with HBM (pure HBM, not
> attached to a GPU).  Where, HBM is placed in a higher tier than DRAM.

Good to know.

>> I suspect this is because the calculations are based on the CPU
>> point-of-view (access1) which still sees lower bandwidth to remote HBM
>> than local DRAM, even though the remote GPU has higher bandwidth access
>> to that memory. Perhaps we need to be considering access0 as well?
>> Ie. HBM directly attached to a generic initiator should be in a higher
>> tier regardless of CPU access characteristics?
>
> What's your requirements for memory tiers on the machine?  I guess you
> want to put GPU attache HBM in a higher tier and put DRAM in a lower
> tier.  So, cold HBM pages can be demoted to DRAM when there are memory
> pressure on HBM?  This sounds reasonable from GPU point of view.

Yes, that is what I would like to implement.

> The above requirements may be satisfied via calculating abstract
> distance based on access0 (or combined with access1).  But I suspect
> this will be a general solution.  I guess that any memory devices that
> are used mainly by the memory initiators other than CPUs want to put
> themselves in a higher memory tier than DRAM, regardless of its
> access0.

Right. I'm still figuring out how ACPI HMAT fits together but that
sounds reasonable.

> One solution is to put GPU HBM in the highest memory tier (with smallest
> abstract distance) always in GPU device driver regardless its HMAT
> performance attributes.  Is it possible?

It's certainly possible and easy enough to do, although I think it would
be good to provide upper and lower bounds for HMAT derived adistances to
make that easier. It does make me wonder what the point of HMAT is if we
have to ignore it in some scenarios though. But perhaps I need to dig
deeper into the GPU values to figure out how it can be applied correctly
there.

>> That said I'm not entirely convinced the HMAT tables I'm testing against
>> are accurate/complete.

Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management

2023-08-21 Thread Alistair Popple



"Huang, Ying"  writes:

> Hi, Alistair,
>
> Sorry for late response.  Just come back from vacation.

Ditto for this response :-)

I see Andrew has taken this into mm-unstable though, so my bad for not
getting around to following all this up sooner.

> Alistair Popple  writes:
>
>> "Huang, Ying"  writes:
>>
>>> Alistair Popple  writes:
>>>
>>>> "Huang, Ying"  writes:
>>>>
>>>>> Alistair Popple  writes:
>>>>>
>>>>>>>>> While other memory device drivers can use the general notifier chain
>>>>>>>>> interface at the same time.
>>>>>>
>>>>>> How would that work in practice though? The abstract distance as far as
>>>>>> I can tell doesn't have any meaning other than establishing preferences
>>>>>> for memory demotion order. Therefore all calculations are relative to
>>>>>> the rest of the calculations on the system. So if a driver does it's own
>>>>>> thing how does it choose a sensible distance? IHMO the value here is in
>>>>>> coordinating all that through a standard interface, whether that is HMAT
>>>>>> or something else.
>>>>>
>>>>> Only if different algorithms follow the same basic principle.  For
>>>>> example, the abstract distance of default DRAM nodes are fixed
>>>>> (MEMTIER_ADISTANCE_DRAM).  The abstract distance of the memory device is
>>>>> in linear direct proportion to the memory latency and inversely
>>>>> proportional to the memory bandwidth.  Use the memory latency and
>>>>> bandwidth of default DRAM nodes as base.
>>>>>
>>>>> HMAT and CDAT report the raw memory latency and bandwidth.  If there are
>>>>> some other methods to report the raw memory latency and bandwidth, we
>>>>> can use them too.
>>>>
>>>> Argh! So we could address my concerns by having drivers feed
>>>> latency/bandwidth numbers into a standard calculation algorithm right?
>>>> Ie. Rather than having drivers calculate abstract distance themselves we
>>>> have the notifier chains return the raw performance data from which the
>>>> abstract distance is derived.
>>>
>>> Now, memory device drivers only need a general interface to get the
>>> abstract distance from the NUMA node ID.  In the future, if they need
>>> more interfaces, we can add them.  For example, the interface you
>>> suggested above.
>>
>> Huh? Memory device drivers (ie. dax/kmem.c) don't care about abstract
>> distance, it's a meaningless number. The only reason they care about it
>> is so they can pass it to alloc_memory_type():
>>
>> struct memory_dev_type *alloc_memory_type(int adistance)
>>
>> Instead alloc_memory_type() should be taking bandwidth/latency numbers
>> and the calculation of abstract distance should be done there. That
>> resovles the issues about how drivers are supposed to devine adistance
>> and also means that when CDAT is added we don't have to duplicate the
>> calculation code.
>
> In the current design, the abstract distance is the key concept of
> memory types and memory tiers.  And it is used as interface to allocate
> memory types.  This provides more flexibility than some other interfaces
> (e.g. read/write bandwidth/latency).  For example, in current
> dax/kmem.c, if HMAT isn't available in the system, the default abstract
> distance: MEMTIER_DEFAULT_DAX_ADISTANCE is used.  This is still useful
> to support some systems now.  On a system without HMAT/CDAT, it's
> possible to calculate abstract distance from ACPI SLIT, although this is
> quite limited.  I'm not sure whether all systems will provide read/write
> bandwith/latency data for all memory devices.
>
> HMAT and CDAT or some other mechanisms may provide the read/write
> bandwidth/latency data to be used to calculate abstract distance.  For
> them, we can provide a shared implementation in mm/memory-tiers.c to map
> from read/write bandwith/latency to the abstract distance.  Can this
> solve your concerns about the consistency among algorithms?  If so, we
> can do that when we add the second algorithm that needs that.

I guess it would address my concerns if we did that now. I don't see why
we need to wait for a second implementation for that though - the whole
series seems to be built around adding a framework for supporting
multiple algorithms even though only one exists. So I think we should
support that fully, or simplfy the whole thing and just assume the only
thing that exists is HMAT and get rid of the general interface until a
second algorithm comes along.

Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management

2023-07-27 Thread Alistair Popple



"Huang, Ying"  writes:

> Alistair Popple  writes:
>
>> "Huang, Ying"  writes:
>>
>>> Alistair Popple  writes:
>>>
>>>>>>> While other memory device drivers can use the general notifier chain
>>>>>>> interface at the same time.
>>>>
>>>> How would that work in practice though? The abstract distance as far as
>>>> I can tell doesn't have any meaning other than establishing preferences
>>>> for memory demotion order. Therefore all calculations are relative to
>>>> the rest of the calculations on the system. So if a driver does it's own
>>>> thing how does it choose a sensible distance? IHMO the value here is in
>>>> coordinating all that through a standard interface, whether that is HMAT
>>>> or something else.
>>>
>>> Only if different algorithms follow the same basic principle.  For
>>> example, the abstract distance of default DRAM nodes are fixed
>>> (MEMTIER_ADISTANCE_DRAM).  The abstract distance of the memory device is
>>> in linear direct proportion to the memory latency and inversely
>>> proportional to the memory bandwidth.  Use the memory latency and
>>> bandwidth of default DRAM nodes as base.
>>>
>>> HMAT and CDAT report the raw memory latency and bandwidth.  If there are
>>> some other methods to report the raw memory latency and bandwidth, we
>>> can use them too.
>>
>> Argh! So we could address my concerns by having drivers feed
>> latency/bandwidth numbers into a standard calculation algorithm right?
>> Ie. Rather than having drivers calculate abstract distance themselves we
>> have the notifier chains return the raw performance data from which the
>> abstract distance is derived.
>
> Now, memory device drivers only need a general interface to get the
> abstract distance from the NUMA node ID.  In the future, if they need
> more interfaces, we can add them.  For example, the interface you
> suggested above.

Huh? Memory device drivers (ie. dax/kmem.c) don't care about abstract
distance, it's a meaningless number. The only reason they care about it
is so they can pass it to alloc_memory_type():

struct memory_dev_type *alloc_memory_type(int adistance)

Instead alloc_memory_type() should be taking bandwidth/latency numbers
and the calculation of abstract distance should be done there. That
resovles the issues about how drivers are supposed to devine adistance
and also means that when CDAT is added we don't have to duplicate the
calculation code.

Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management

2023-07-26 Thread Alistair Popple



"Huang, Ying"  writes:

> Alistair Popple  writes:
>
>> "Huang, Ying"  writes:
>>
>>>>> And, I don't think that we are forced to use the general notifier
>>>>> chain interface in all memory device drivers.  If the memory device
>>>>> driver has better understanding of the memory device, it can use other
>>>>> way to determine abstract distance.  For example, a CXL memory device
>>>>> driver can identify abstract distance by itself.  While other memory
>>>>> device drivers can use the general notifier chain interface at the
>>>>> same time.
>>>>
>>>> Whilst I think personally I would find that flexibility useful I am
>>>> concerned it means every driver will just end up divining it's own
>>>> distance rather than ensuring data in HMAT/CDAT/etc. is correct. That
>>>> would kind of defeat the purpose of it all then.
>>>
>>> But we have no way to enforce that too.
>>
>> Enforce that HMAT/CDAT/etc. is correct? Agree we can't enforce it, but
>> we can influence it. If drivers can easily ignore the notifier chain and
>> do their own thing that's what will happen.
>
> IMHO, both enforce HMAT/CDAT/etc is correct and enforce drivers to use
> general interface we provided.  Anyway, we should try to make HMAT/CDAT
> works well, so drivers want to use them :-)

Exactly :-)

>>>>> While other memory device drivers can use the general notifier chain
>>>>> interface at the same time.
>>
>> How would that work in practice though? The abstract distance as far as
>> I can tell doesn't have any meaning other than establishing preferences
>> for memory demotion order. Therefore all calculations are relative to
>> the rest of the calculations on the system. So if a driver does it's own
>> thing how does it choose a sensible distance? IHMO the value here is in
>> coordinating all that through a standard interface, whether that is HMAT
>> or something else.
>
> Only if different algorithms follow the same basic principle.  For
> example, the abstract distance of default DRAM nodes are fixed
> (MEMTIER_ADISTANCE_DRAM).  The abstract distance of the memory device is
> in linear direct proportion to the memory latency and inversely
> proportional to the memory bandwidth.  Use the memory latency and
> bandwidth of default DRAM nodes as base.
>
> HMAT and CDAT report the raw memory latency and bandwidth.  If there are
> some other methods to report the raw memory latency and bandwidth, we
> can use them too.

Argh! So we could address my concerns by having drivers feed
latency/bandwidth numbers into a standard calculation algorithm right?
Ie. Rather than having drivers calculate abstract distance themselves we
have the notifier chains return the raw performance data from which the
abstract distance is derived.

Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management

2023-07-26 Thread Alistair Popple

"Huang, Ying"  writes:

>>> The other way (suggested by this series) is to make dax/kmem call a
>>> notifier chain, then CXL CDAT or ACPI HMAT can identify the type of
>>> device and calculate the distance if the type is correct for them.  I
>>> don't think that it's good to make dax/kem to know every possible
>>> types of memory devices.
>>
>> Do we expect there to be lots of different types of memory devices
>> sharing a common dax/kmem driver though? Must admit I'm coming from a
>> GPU background where we'd expect each type of device to have it's own
>> driver anyway so wasn't expecting different types of memory devices to
>> be handled by the same driver.
>
> Now, dax/kmem.c is used for
>
> - PMEM (Optane DCPMM, or AEP)
> - CXL.mem
> - HBM (attached to CPU)

Thanks a lot for the background! I will admit to having a faily narrow
focus here.

>>> And, I don't think that we are forced to use the general notifier
>>> chain interface in all memory device drivers.  If the memory device
>>> driver has better understanding of the memory device, it can use other
>>> way to determine abstract distance.  For example, a CXL memory device
>>> driver can identify abstract distance by itself.  While other memory
>>> device drivers can use the general notifier chain interface at the
>>> same time.
>>
>> Whilst I think personally I would find that flexibility useful I am
>> concerned it means every driver will just end up divining it's own
>> distance rather than ensuring data in HMAT/CDAT/etc. is correct. That
>> would kind of defeat the purpose of it all then.
>
> But we have no way to enforce that too.

Enforce that HMAT/CDAT/etc. is correct? Agree we can't enforce it, but
we can influence it. If drivers can easily ignore the notifier chain and
do their own thing that's what will happen.

>>> While other memory device drivers can use the general notifier chain
>>> interface at the same time.

How would that work in practice though? The abstract distance as far as
I can tell doesn't have any meaning other than establishing preferences
for memory demotion order. Therefore all calculations are relative to
the rest of the calculations on the system. So if a driver does it's own
thing how does it choose a sensible distance? IHMO the value here is in
coordinating all that through a standard interface, whether that is HMAT
or something else.

 - Alistair

Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management

2023-07-25 Thread Alistair Popple



"Huang, Ying"  writes:

> Hi, Alistair,
>
> Thanks a lot for comments!
>
> Alistair Popple  writes:
>
>> Huang Ying  writes:
>>
>>> The abstract distance may be calculated by various drivers, such as
>>> ACPI HMAT, CXL CDAT, etc.  While it may be used by various code which
>>> hot-add memory node, such as dax/kmem etc.  To decouple the algorithm
>>> users and the providers, the abstract distance calculation algorithms
>>> management mechanism is implemented in this patch.  It provides
>>> interface for the providers to register the implementation, and
>>> interface for the users.
>>
>> I wonder if we need this level of decoupling though? It seems to me like
>> it would be simpler and better for drivers to calculate the abstract
>> distance directly themselves by calling the desired algorithm (eg. ACPI
>> HMAT) and pass this when creating the nodes rather than having a
>> notifier chain.
>
> Per my understanding, ACPI HMAT and memory device drivers (such as
> dax/kmem) may belong to different subsystems (ACPI vs. dax).  It's not
> good to call functions across subsystems directly.  So, I think it's
> better to use a general subsystem: memory-tier.c to decouple them.  If
> it turns out that a notifier chain is unnecessary, we can use some
> function pointers instead.
>
>> At the moment it seems we've only identified two possible algorithms
>> (ACPI HMAT and CXL CDAT) and I don't think it would make sense for one
>> of those to fallback to the other based on priority, so why not just
>> have drivers call the correct algorithm directly?
>
> For example, we have a system with PMEM (persistent memory, Optane
> DCPMM, or AEP, or something else) in DIMM slots and CXL.mem connected
> via CXL link to a remote memory pool.  We will need ACPI HMAT for PMEM
> and CXL CDAT for CXL.mem.  One way is to make dax/kmem identify the
> types of the device and call corresponding algorithms.

Yes, that is what I was thinking.

> The other way (suggested by this series) is to make dax/kmem call a
> notifier chain, then CXL CDAT or ACPI HMAT can identify the type of
> device and calculate the distance if the type is correct for them.  I
> don't think that it's good to make dax/kem to know every possible
> types of memory devices.

Do we expect there to be lots of different types of memory devices
sharing a common dax/kmem driver though? Must admit I'm coming from a
GPU background where we'd expect each type of device to have it's own
driver anyway so wasn't expecting different types of memory devices to
be handled by the same driver.

>>> Multiple algorithm implementations can cooperate via calculating
>>> abstract distance for different memory nodes.  The preference of
>>> algorithm implementations can be specified via
>>> priority (notifier_block.priority).
>>
>> How/what decides the priority though? That seems like something better
>> decided by a device driver than the algorithm driver IMHO.
>
> Do we need the memory device driver specific priority?  Or we just share
> a common priority?  For example, the priority of CXL CDAT is always
> higher than that of ACPI HMAT?  Or architecture specific?

Ok, thanks. Having read the above I think the priority is
unimportant. Algorithms can either decide to return a distance and
NOTIFY_STOP_MASK if they can calculate a distance or NOTIFY_DONE if they
can't for a specific device.

> And, I don't think that we are forced to use the general notifier
> chain interface in all memory device drivers.  If the memory device
> driver has better understanding of the memory device, it can use other
> way to determine abstract distance.  For example, a CXL memory device
> driver can identify abstract distance by itself.  While other memory
> device drivers can use the general notifier chain interface at the
> same time.

Whilst I think personally I would find that flexibility useful I am
concerned it means every driver will just end up divining it's own
distance rather than ensuring data in HMAT/CDAT/etc. is correct. That
would kind of defeat the purpose of it all then.

Re: [PATCH RESEND 4/4] dax, kmem: calculate abstract distance with general interface

2023-07-24 Thread Alistair Popple



Huang Ying  writes:

> Previously, a fixed abstract distance MEMTIER_DEFAULT_DAX_ADISTANCE is
> used for slow memory type in kmem driver.  This limits the usage of
> kmem driver, for example, it cannot be used for HBM (high bandwidth
> memory).
>
> So, we use the general abstract distance calculation mechanism in kmem
> drivers to get more accurate abstract distance on systems with proper
> support.  The original MEMTIER_DEFAULT_DAX_ADISTANCE is used as
> fallback only.
>
> Now, multiple memory types may be managed by kmem.  These memory types
> are put into the "kmem_memory_types" list and protected by
> kmem_memory_type_lock.

See below but I wonder if kmem_memory_types could be a common helper
rather than kdax specific?

> Signed-off-by: "Huang, Ying" 
> Cc: Aneesh Kumar K.V 
> Cc: Wei Xu 
> Cc: Alistair Popple 
> Cc: Dan Williams 
> Cc: Dave Hansen 
> Cc: Davidlohr Bueso 
> Cc: Johannes Weiner 
> Cc: Jonathan Cameron 
> Cc: Michal Hocko 
> Cc: Yang Shi 
> Cc: Rafael J Wysocki 
> ---
>  drivers/dax/kmem.c   | 54 +++-
>  include/linux/memory-tiers.h |  2 ++
>  mm/memory-tiers.c|  2 +-
>  3 files changed, 44 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
> index 898ca9505754..837165037231 100644
> --- a/drivers/dax/kmem.c
> +++ b/drivers/dax/kmem.c
> @@ -49,14 +49,40 @@ struct dax_kmem_data {
>   struct resource *res[];
>  };
>  
> -static struct memory_dev_type *dax_slowmem_type;
> +static DEFINE_MUTEX(kmem_memory_type_lock);
> +static LIST_HEAD(kmem_memory_types);
> +
> +static struct memory_dev_type *kmem_find_alloc_memorty_type(int adist)
> +{
> + bool found = false;
> + struct memory_dev_type *mtype;
> +
> + mutex_lock(_memory_type_lock);
> + list_for_each_entry(mtype, _memory_types, list) {
> + if (mtype->adistance == adist) {
> + found = true;
> + break;
> + }
> + }
> + if (!found) {
> + mtype = alloc_memory_type(adist);
> + if (!IS_ERR(mtype))
> + list_add(>list, _memory_types);
> + }
> + mutex_unlock(_memory_type_lock);
> +
> + return mtype;
> +}
> +
>  static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>  {
>   struct device *dev = _dax->dev;
>   unsigned long total_len = 0;
>   struct dax_kmem_data *data;
> + struct memory_dev_type *mtype;
>   int i, rc, mapped = 0;
>   int numa_node;
> + int adist = MEMTIER_DEFAULT_DAX_ADISTANCE;
>  
>   /*
>* Ensure good NUMA information for the persistent memory.
> @@ -71,6 +97,11 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>   return -EINVAL;
>   }
>  
> + mt_calc_adistance(numa_node, );
> + mtype = kmem_find_alloc_memorty_type(adist);
> + if (IS_ERR(mtype))
> + return PTR_ERR(mtype);
> +

I wrote my own quick and dirty module to test this and wrote basically
the same code sequence.

I notice your using a list of memory types here though. I think it would
be nice to have a common helper that other users could call to do the
mt_calc_adistance() / kmem_find_alloc_memory_type() /
init_node_memory_type() sequence and cleanup as my naive approach would
result in a new memory_dev_type per device even though adist might be
the same. A common helper would make it easy to de-dup those.

>   for (i = 0; i < dev_dax->nr_range; i++) {
>   struct range range;
>  
> @@ -88,7 +119,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>   return -EINVAL;
>   }
>  
> - init_node_memory_type(numa_node, dax_slowmem_type);
> + init_node_memory_type(numa_node, mtype);
>  
>   rc = -ENOMEM;
>   data = kzalloc(struct_size(data, res, dev_dax->nr_range), GFP_KERNEL);
> @@ -167,7 +198,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
>  err_res_name:
>   kfree(data);
>  err_dax_kmem_data:
> - clear_node_memory_type(numa_node, dax_slowmem_type);
> + clear_node_memory_type(numa_node, mtype);
>   return rc;
>  }
>  
> @@ -219,7 +250,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
>* for that. This implies this reference will be around
>* till next reboot.
>*/
> - clear_node_memory_type(node, dax_slowmem_type);
> + clear_node_memory_type(node, NULL);
>   }
>  }
>  #else
> @@ -251,12 +282,6 @@ static int __init dax_kmem_init(void)
>   if (!kmem_name)
>   return -ENOMEM

Re: [PATCH RESEND 3/4] acpi, hmat: calculate abstract distance with HMAT

2023-07-24 Thread Alistair Popple



Huang Ying  writes:

> A memory tiering abstract distance calculation algorithm based on ACPI
> HMAT is implemented.  The basic idea is as follows.
>
> The performance attributes of system default DRAM nodes are recorded
> as the base line.  Whose abstract distance is MEMTIER_ADISTANCE_DRAM.
> Then, the ratio of the abstract distance of a memory node (target) to
> MEMTIER_ADISTANCE_DRAM is scaled based on the ratio of the performance
> attributes of the node to that of the default DRAM nodes.

The problem I encountered here with the calculations is that HBM memory
ended up in a lower-tiered node which isn't what I wanted (at least when
that HBM is attached to a GPU say).

I suspect this is because the calculations are based on the CPU
point-of-view (access1) which still sees lower bandwidth to remote HBM
than local DRAM, even though the remote GPU has higher bandwidth access
to that memory. Perhaps we need to be considering access0 as well?
Ie. HBM directly attached to a generic initiator should be in a higher
tier regardless of CPU access characteristics?

That said I'm not entirely convinced the HMAT tables I'm testing against
are accurate/complete.

> Signed-off-by: "Huang, Ying" 
> Cc: Aneesh Kumar K.V 
> Cc: Wei Xu 
> Cc: Alistair Popple 
> Cc: Dan Williams 
> Cc: Dave Hansen 
> Cc: Davidlohr Bueso 
> Cc: Johannes Weiner 
> Cc: Jonathan Cameron 
> Cc: Michal Hocko 
> Cc: Yang Shi 
> Cc: Rafael J Wysocki 
> ---
>  drivers/acpi/numa/hmat.c | 138 ++-
>  include/linux/memory-tiers.h |   2 +
>  mm/memory-tiers.c|   2 +-
>  3 files changed, 140 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index 2dee0098f1a9..306a912090f0 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static u8 hmat_revision;
>  static int hmat_disable __initdata;
> @@ -759,6 +760,137 @@ static int hmat_callback(struct notifier_block *self,
>   return NOTIFY_OK;
>  }
>  
> +static int hmat_adistance_disabled;
> +static struct node_hmem_attrs default_dram_attrs;
> +
> +static void dump_hmem_attrs(struct node_hmem_attrs *attrs)
> +{
> + pr_cont("read_latency: %u, write_latency: %u, read_bandwidth: %u, 
> write_bandwidth: %u\n",
> + attrs->read_latency, attrs->write_latency,
> + attrs->read_bandwidth, attrs->write_bandwidth);
> +}
> +
> +static void disable_hmat_adistance_algorithm(void)
> +{
> + hmat_adistance_disabled = true;
> +}
> +
> +static int hmat_init_default_dram_attrs(void)
> +{
> + struct memory_target *target;
> + struct node_hmem_attrs *attrs;
> + int nid, pxm;
> + int nid_dram = NUMA_NO_NODE;
> +
> + if (default_dram_attrs.read_latency +
> + default_dram_attrs.write_latency != 0)
> + return 0;
> +
> + if (!default_dram_type)
> + return -EIO;
> +
> + for_each_node_mask(nid, default_dram_type->nodes) {
> + pxm = node_to_pxm(nid);
> + target = find_mem_target(pxm);
> + if (!target)
> + continue;
> + attrs = >hmem_attrs[1];
> + if (nid_dram == NUMA_NO_NODE) {
> + if (attrs->read_latency + attrs->write_latency == 0 ||
> + attrs->read_bandwidth + attrs->write_bandwidth == 
> 0) {
> + pr_info("hmat: invalid hmem attrs for default 
> DRAM node: %d,\n",
> + nid);
> + pr_info("  ");
> + dump_hmem_attrs(attrs);
> + pr_info("  disable hmat based abstract distance 
> algorithm.\n");
> + disable_hmat_adistance_algorithm();
> + return -EIO;
> + }
> + nid_dram = nid;
> + default_dram_attrs = *attrs;
> + continue;
> + }
> +
> + /*
> +  * The performance of all default DRAM nodes is expected
> +  * to be same (that is, the variation is less than 10%).
> +  * And it will be used as base to calculate the abstract
> +  * distance of other memory nodes.
> +  */
> + if (abs(attrs->read_latency - default_dram_attrs.read_latency) 
> * 10 >
> + default_dram_attrs.read_latency ||
> + abs(attrs->write_latency - 
> default_dr

Re: [PATCH RESEND 2/4] acpi, hmat: refactor hmat_register_target_initiators()

2023-07-24 Thread Alistair Popple



Huang Ying  writes:

> Previously, in hmat_register_target_initiators(), the performance
> attributes are calculated and the corresponding sysfs links and files
> are created too.  Which is called during memory onlining.
>
> But now, to calculate the abstract distance of a memory target before
> memory onlining, we need to calculate the performance attributes for
> a memory target without creating sysfs links and files.
>
> To do that, hmat_register_target_initiators() is refactored to make it
> possible to calculate performance attributes separately.

The refactor looks good and I have run the whole series on a system with
some hmat data so:

Reviewed-by: Alistair Popple 
Tested-by: Alistair Popple 

> Signed-off-by: "Huang, Ying" 
> Cc: Aneesh Kumar K.V 
> Cc: Wei Xu 
> Cc: Alistair Popple 
> Cc: Dan Williams 
> Cc: Dave Hansen 
> Cc: Davidlohr Bueso 
> Cc: Johannes Weiner 
> Cc: Jonathan Cameron 
> Cc: Michal Hocko 
> Cc: Yang Shi 
> Cc: Rafael J Wysocki 
> ---
>  drivers/acpi/numa/hmat.c | 81 +++-
>  1 file changed, 30 insertions(+), 51 deletions(-)
>
> diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> index bba268ecd802..2dee0098f1a9 100644
> --- a/drivers/acpi/numa/hmat.c
> +++ b/drivers/acpi/numa/hmat.c
> @@ -582,28 +582,25 @@ static int initiators_to_nodemask(unsigned long 
> *p_nodes)
>   return 0;
>  }
>  
> -static void hmat_register_target_initiators(struct memory_target *target)
> +static void hmat_update_target_attrs(struct memory_target *target,
> +  unsigned long *p_nodes, int access)
>  {
> - static DECLARE_BITMAP(p_nodes, MAX_NUMNODES);
>   struct memory_initiator *initiator;
> - unsigned int mem_nid, cpu_nid;
> + unsigned int cpu_nid;
>   struct memory_locality *loc = NULL;
>   u32 best = 0;
> - bool access0done = false;
>   int i;
>  
> - mem_nid = pxm_to_node(target->memory_pxm);
> + bitmap_zero(p_nodes, MAX_NUMNODES);
>   /*
> -  * If the Address Range Structure provides a local processor pxm, link
> +  * If the Address Range Structure provides a local processor pxm, set
>* only that one. Otherwise, find the best performance attributes and
> -  * register all initiators that match.
> +  * collect all initiators that match.
>*/
>   if (target->processor_pxm != PXM_INVAL) {
>   cpu_nid = pxm_to_node(target->processor_pxm);
> - register_memory_node_under_compute_node(mem_nid, cpu_nid, 0);
> - access0done = true;
> - if (node_state(cpu_nid, N_CPU)) {
> - register_memory_node_under_compute_node(mem_nid, 
> cpu_nid, 1);
> + if (access == 0 || node_state(cpu_nid, N_CPU)) {
> + set_bit(target->processor_pxm, p_nodes);
>   return;
>   }
>   }
> @@ -617,47 +614,10 @@ static void hmat_register_target_initiators(struct 
> memory_target *target)
>* We'll also use the sorting to prime the candidate nodes with known
>* initiators.
>*/
> - bitmap_zero(p_nodes, MAX_NUMNODES);
>   list_sort(NULL, , initiator_cmp);
>   if (initiators_to_nodemask(p_nodes) < 0)
>   return;
>  
> - if (!access0done) {
> - for (i = WRITE_LATENCY; i <= READ_BANDWIDTH; i++) {
> - loc = localities_types[i];
> - if (!loc)
> - continue;
> -
> - best = 0;
> - list_for_each_entry(initiator, , node) {
> - u32 value;
> -
> - if (!test_bit(initiator->processor_pxm, 
> p_nodes))
> - continue;
> -
> - value = hmat_initiator_perf(target, initiator,
> - loc->hmat_loc);
> - if (hmat_update_best(loc->hmat_loc->data_type, 
> value, ))
> - bitmap_clear(p_nodes, 0, 
> initiator->processor_pxm);
> - if (value != best)
> - clear_bit(initiator->processor_pxm, 
> p_nodes);
> - }
> - if (best)
> - hmat_update_target_access(target, 
> loc->hmat_loc->data_type,
> -   best, 0);
> - }
> -
> - for_each_set_bit(i, p_nodes, MAX_NUMNODES) {
> -

Re: [PATCH RESEND 1/4] memory tiering: add abstract distance calculation algorithms management

2023-07-24 Thread Alistair Popple



Huang Ying  writes:

> The abstract distance may be calculated by various drivers, such as
> ACPI HMAT, CXL CDAT, etc.  While it may be used by various code which
> hot-add memory node, such as dax/kmem etc.  To decouple the algorithm
> users and the providers, the abstract distance calculation algorithms
> management mechanism is implemented in this patch.  It provides
> interface for the providers to register the implementation, and
> interface for the users.

I wonder if we need this level of decoupling though? It seems to me like
it would be simpler and better for drivers to calculate the abstract
distance directly themselves by calling the desired algorithm (eg. ACPI
HMAT) and pass this when creating the nodes rather than having a
notifier chain.

At the moment it seems we've only identified two possible algorithms
(ACPI HMAT and CXL CDAT) and I don't think it would make sense for one
of those to fallback to the other based on priority, so why not just
have drivers call the correct algorithm directly?

> Multiple algorithm implementations can cooperate via calculating
> abstract distance for different memory nodes.  The preference of
> algorithm implementations can be specified via
> priority (notifier_block.priority).

How/what decides the priority though? That seems like something better
decided by a device driver than the algorithm driver IMHO.

> Signed-off-by: "Huang, Ying" 
> Cc: Aneesh Kumar K.V 
> Cc: Wei Xu 
> Cc: Alistair Popple 
> Cc: Dan Williams 
> Cc: Dave Hansen 
> Cc: Davidlohr Bueso 
> Cc: Johannes Weiner 
> Cc: Jonathan Cameron 
> Cc: Michal Hocko 
> Cc: Yang Shi 
> Cc: Rafael J Wysocki 
> ---
>  include/linux/memory-tiers.h | 19 
>  mm/memory-tiers.c| 59 
>  2 files changed, 78 insertions(+)
>
> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> index fc9647b1b4f9..c6429e624244 100644
> --- a/include/linux/memory-tiers.h
> +++ b/include/linux/memory-tiers.h
> @@ -6,6 +6,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  /*
>   * Each tier cover a abstrace distance chunk size of 128
>   */
> @@ -36,6 +37,9 @@ struct memory_dev_type *alloc_memory_type(int adistance);
>  void destroy_memory_type(struct memory_dev_type *memtype);
>  void init_node_memory_type(int node, struct memory_dev_type *default_type);
>  void clear_node_memory_type(int node, struct memory_dev_type *memtype);
> +int register_mt_adistance_algorithm(struct notifier_block *nb);
> +int unregister_mt_adistance_algorithm(struct notifier_block *nb);
> +int mt_calc_adistance(int node, int *adist);
>  #ifdef CONFIG_MIGRATION
>  int next_demotion_node(int node);
>  void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
> @@ -97,5 +101,20 @@ static inline bool node_is_toptier(int node)
>  {
>   return true;
>  }
> +
> +static inline int register_mt_adistance_algorithm(struct notifier_block *nb)
> +{
> + return 0;
> +}
> +
> +static inline int unregister_mt_adistance_algorithm(struct notifier_block 
> *nb)
> +{
> + return 0;
> +}
> +
> +static inline int mt_calc_adistance(int node, int *adist)
> +{
> + return NOTIFY_DONE;
> +}
>  #endif   /* CONFIG_NUMA */
>  #endif  /* _LINUX_MEMORY_TIERS_H */
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index a516e303e304..1e55fbe2ad51 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -5,6 +5,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "internal.h"
>  
> @@ -105,6 +106,8 @@ static int top_tier_adistance;
>  static struct demotion_nodes *node_demotion __read_mostly;
>  #endif /* CONFIG_MIGRATION */
>  
> +static BLOCKING_NOTIFIER_HEAD(mt_adistance_algorithms);
> +
>  static inline struct memory_tier *to_memory_tier(struct device *device)
>  {
>   return container_of(device, struct memory_tier, dev);
> @@ -592,6 +595,62 @@ void clear_node_memory_type(int node, struct 
> memory_dev_type *memtype)
>  }
>  EXPORT_SYMBOL_GPL(clear_node_memory_type);
>  
> +/**
> + * register_mt_adistance_algorithm() - Register memory tiering abstract 
> distance algorithm
> + * @nb: The notifier block which describe the algorithm
> + *
> + * Return: 0 on success, errno on error.
> + *
> + * Every memory tiering abstract distance algorithm provider needs to
> + * register the algorithm with register_mt_adistance_algorithm().  To
> + * calculate the abstract distance for a specified memory node, the
> + * notifier function will be called unless some high priority
> + * algorithm has provided result.  The prototype of the notifier
> + * function is as follows,
> + *
> + *   int

Re: [PATCH RESEND 0/4] memory tiering: calculate abstract distance based on ACPI HMAT

2023-07-20 Thread Alistair Popple



Thanks for this Huang, I had been hoping to take a look at it this week
but have run out of time. I'm keen to do some testing with it as well.

Hopefully next week...

Huang Ying  writes:

> We have the explicit memory tiers framework to manage systems with
> multiple types of memory, e.g., DRAM in DIMM slots and CXL memory
> devices.  Where, same kind of memory devices will be grouped into
> memory types, then put into memory tiers.  To describe the performance
> of a memory type, abstract distance is defined.  Which is in direct
> proportion to the memory latency and inversely proportional to the
> memory bandwidth.  To keep the code as simple as possible, fixed
> abstract distance is used in dax/kmem to describe slow memory such as
> Optane DCPMM.
>
> To support more memory types, in this series, we added the abstract
> distance calculation algorithm management mechanism, provided a
> algorithm implementation based on ACPI HMAT, and used the general
> abstract distance calculation interface in dax/kmem driver.  So,
> dax/kmem can support HBM (high bandwidth memory) in addition to the
> original Optane DCPMM.
>
> Changelog:
>
> V1 (from RFC):
>
> - Added some comments per Aneesh's comments, Thanks!
>
> Best Regards,
> Huang, Ying

[PATCH 2/2] drm/panel: Add support for E Ink VB3300-KCA

2021-04-19 Thread Alistair Francis

Add support for the 10.3" E Ink panel described at:
https://www.eink.com/product.html?type=productdetail=7

Signed-off-by: Alistair Francis 
---
 drivers/gpu/drm/panel/panel-simple.c | 29 
 1 file changed, 29 insertions(+)

diff --git a/drivers/gpu/drm/panel/panel-simple.c 
b/drivers/gpu/drm/panel/panel-simple.c
index 4e2dad314c79..f1f6fd2517f6 100644
--- a/drivers/gpu/drm/panel/panel-simple.c
+++ b/drivers/gpu/drm/panel/panel-simple.c
@@ -1964,6 +1964,32 @@ static const struct panel_desc edt_etm0700g0bdh6 = {
.bus_flags = DRM_BUS_FLAG_DE_HIGH | DRM_BUS_FLAG_PIXDATA_DRIVE_POSEDGE,
 };
 
+static const struct display_timing eink_vb3300_kca_timing = {
+   .pixelclock = { 4000, 4000, 4000 },
+   .hactive = { 334, 334, 334 },
+   .hfront_porch = { 1, 1, 1 },
+   .hback_porch = { 1, 1, 1 },
+   .hsync_len = { 1, 1, 1 },
+   .vactive = { 1405, 1405, 1405 },
+   .vfront_porch = { 1, 1, 1 },
+   .vback_porch = { 1, 1, 1 },
+   .vsync_len = { 1, 1, 1 },
+   .flags = DISPLAY_FLAGS_HSYNC_LOW | DISPLAY_FLAGS_VSYNC_LOW |
+DISPLAY_FLAGS_DE_HIGH | DISPLAY_FLAGS_PIXDATA_POSEDGE,
+};
+
+static const struct panel_desc eink_vb3300_kca = {
+   .modes = _etm0700g0dh6_mode,
+   .num_modes = 1,
+   .bpc = 6,
+   .size = {
+   .width = 157,
+   .height = 209,
+   },
+   .bus_format = MEDIA_BUS_FMT_RGB888_1X24,
+   .bus_flags = DRM_BUS_FLAG_DE_HIGH | DRM_BUS_FLAG_PIXDATA_DRIVE_POSEDGE,
+};
+
 static const struct display_timing evervision_vgg804821_timing = {
.pixelclock = { 2760, 3330, 5000 },
.hactive = { 800, 800, 800 },
@@ -4232,6 +4258,9 @@ static const struct of_device_id platform_of_match[] = {
}, {
.compatible = "edt,etm0700g0dh6",
.data = _etm0700g0dh6,
+   }, {
+   .compatible = "eink,vb3300-kca",
+   .data = _vb3300_kca,
}, {
.compatible = "edt,etm0700g0bdh6",
.data = _etm0700g0bdh6,
-- 
2.31.1

[PATCH 1/2] dt-bindings: Add E Ink to vendor bindings

2021-04-19 Thread Alistair Francis

Add the E Ink Corporation to the vendor bindings.

Signed-off-by: Alistair Francis 
---
 Documentation/devicetree/bindings/vendor-prefixes.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml 
b/Documentation/devicetree/bindings/vendor-prefixes.yaml
index 996f4de2fff5..6c9323dc9b78 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.yaml
+++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml
@@ -329,6 +329,8 @@ patternProperties:
 description: eGalax_eMPIA Technology Inc
   "^einfochips,.*":
 description: Einfochips
+  "^eink,.*":
+description: E Ink Corporation
   "^elan,.*":
 description: Elan Microelectronic Corp.
   "^element14,.*":
-- 
2.31.1

[PATCH v5 5/5] ARM: dts: imx7d: remarkable2: Enable silergy,sy7636a

2021-04-19 Thread Alistair Francis

Enable the silergy,sy7636a and silergy,sy7636a-regulator on the
reMarkable2.

Signed-off-by: Alistair Francis 
---
 arch/arm/boot/dts/imx7d-remarkable2.dts | 61 +
 1 file changed, 61 insertions(+)

diff --git a/arch/arm/boot/dts/imx7d-remarkable2.dts 
b/arch/arm/boot/dts/imx7d-remarkable2.dts
index ea1dd41023f9..bdfc658d89db 100644
--- a/arch/arm/boot/dts/imx7d-remarkable2.dts
+++ b/arch/arm/boot/dts/imx7d-remarkable2.dts
@@ -22,6 +22,27 @@ memory@8000 {
reg = <0x8000 0x4000>;
};
 
+   thermal-zones {
+   epd-thermal {
+   thermal-sensors = <_pmic>;
+   polling-delay-passive = <3>;
+   polling-delay = <3>;
+   trips {
+   trip0 {
+   temperature = <49000>;
+   hysteresis = <2000>;
+   type = "passive";
+   };
+
+   trip1 {
+   temperature = <5>;
+   hysteresis = <2000>;
+   type = "critical";
+   };
+   };
+   };
+   };
+
reg_brcm: regulator-brcm {
compatible = "regulator-fixed";
regulator-name = "brcm_reg";
@@ -86,6 +107,32 @@ wacom_digitizer: digitizer@9 {
};
 };
 
+ {
+   clock-frequency = <10>;
+   pinctrl-names = "default", "sleep";
+   pinctrl-0 = <_i2c4>;
+   pinctrl-1 = <_i2c4>;
+   status = "okay";
+
+   epd_pmic: sy7636a@62 {
+   compatible = "silergy,sy7636a";
+   reg = <0x62>;
+   status = "okay";
+   pinctrl-names = "default";
+   pinctrl-0 = <_epdpmic>;
+   #thermal-sensor-cells = <0>;
+
+   epd-pwr-good-gpios = < 21 GPIO_ACTIVE_HIGH>;
+   regulators {
+   compatible = "silergy,sy7636a-regulator";
+   reg_epdpmic: vcom {
+   regulator-name = "vcom";
+   regulator-boot-on;
+   };
+   };
+   };
+};
+
 _pwrkey {
status = "okay";
 };
@@ -179,6 +226,13 @@ MX7D_PAD_SAI1_TX_BCLK__GPIO6_IO13  0x14
>;
};
 
+   pinctrl_epdpmic: epdpmicgrp {
+   fsl,pins = <
+   MX7D_PAD_SAI2_RX_DATA__GPIO6_IO21 0x0074
+   MX7D_PAD_ENET1_RGMII_TXC__GPIO7_IO11 0x0014
+   >;
+   };
+
pinctrl_i2c1: i2c1grp {
fsl,pins = <
MX7D_PAD_I2C1_SDA__I2C1_SDA 0x407f
@@ -186,6 +240,13 @@ MX7D_PAD_I2C1_SCL__I2C1_SCL0x407f
>;
};
 
+   pinctrl_i2c4: i2c4grp {
+   fsl,pins = <
+   MX7D_PAD_I2C4_SDA__I2C4_SDA 0x407f
+   MX7D_PAD_I2C4_SCL__I2C4_SCL 0x407f
+   >;
+   };
+
pinctrl_uart1: uart1grp {
fsl,pins = <
MX7D_PAD_UART1_TX_DATA__UART1_DCE_TX0x79
-- 
2.31.1

[PATCH v5 4/5] ARM: imx_v6_v7_defconfig: Enable silergy,sy7636a

2021-04-19 Thread Alistair Francis

Enable the silergy,sy7636a and silergy,sy7636a-regulator for the
reMarkable2.

Signed-off-by: Alistair Francis 
---
 arch/arm/configs/imx_v6_v7_defconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index cd80e85d37cf..bafd1d7b4ad5 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -245,6 +245,7 @@ CONFIG_MFD_MC13XXX_I2C=y
 CONFIG_MFD_RN5T618=y
 CONFIG_MFD_STMPE=y
 CONFIG_REGULATOR=y
+CONFIG_MFD_SY7636A=y
 CONFIG_REGULATOR_FIXED_VOLTAGE=y
 CONFIG_REGULATOR_ANATOP=y
 CONFIG_REGULATOR_DA9052=y
@@ -255,6 +256,7 @@ CONFIG_REGULATOR_MC13783=y
 CONFIG_REGULATOR_MC13892=y
 CONFIG_REGULATOR_PFUZE100=y
 CONFIG_REGULATOR_RN5T618=y
+CONFIG_REGULATOR_SY7636A=y
 CONFIG_RC_CORE=y
 CONFIG_RC_DEVICES=y
 CONFIG_IR_GPIO_CIR=y
-- 
2.31.1

[PATCH v5 3/5] regulator: sy7636a: Initial commit

2021-04-19 Thread Alistair Francis

Initial support for the Silergy SY7636A-regulator Power Management chip.

Signed-off-by: Alistair Francis 
---
v5:
 - Simplify the implementation

 drivers/regulator/Kconfig |   6 ++
 drivers/regulator/Makefile|   1 +
 drivers/regulator/sy7636a-regulator.c | 127 ++
 include/linux/mfd/sy7636a.h   |   1 +
 4 files changed, 135 insertions(+)
 create mode 100644 drivers/regulator/sy7636a-regulator.c

diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig
index 77c43134bc9e..6d501ce921a8 100644
--- a/drivers/regulator/Kconfig
+++ b/drivers/regulator/Kconfig
@@ -1130,6 +1130,12 @@ config REGULATOR_STW481X_VMMC
  This driver supports the internal VMMC regulator in the STw481x
  PMIC chips.
 
+config REGULATOR_SY7636A
+   tristate "Silergy SY7636A voltage regulator"
+   depends on MFD_SY7636A
+   help
+ This driver supports Silergy SY3686A voltage regulator.
+
 config REGULATOR_SY8106A
tristate "Silergy SY8106A regulator"
depends on I2C && (OF || COMPILE_TEST)
diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile
index 44d2f8bf4b74..5a981036a9f0 100644
--- a/drivers/regulator/Makefile
+++ b/drivers/regulator/Makefile
@@ -134,6 +134,7 @@ obj-$(CONFIG_REGULATOR_STM32_VREFBUF) += stm32-vrefbuf.o
 obj-$(CONFIG_REGULATOR_STM32_PWR) += stm32-pwr.o
 obj-$(CONFIG_REGULATOR_STPMIC1) += stpmic1_regulator.o
 obj-$(CONFIG_REGULATOR_STW481X_VMMC) += stw481x-vmmc.o
+obj-$(CONFIG_REGULATOR_SY7636A) += sy7636a-regulator.o
 obj-$(CONFIG_REGULATOR_SY8106A) += sy8106a-regulator.o
 obj-$(CONFIG_REGULATOR_SY8824X) += sy8824x.o
 obj-$(CONFIG_REGULATOR_SY8827N) += sy8827n.o
diff --git a/drivers/regulator/sy7636a-regulator.c 
b/drivers/regulator/sy7636a-regulator.c
new file mode 100644
index ..c384c2b6ac46
--- /dev/null
+++ b/drivers/regulator/sy7636a-regulator.c
@@ -0,0 +1,127 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// Functions to access SY3686A power management chip voltages
+//
+// Copyright (C) 2019 reMarkable AS - http://www.remarkable.com/
+//
+// Authors: Lars Ivar Miljeteig 
+//  Alistair Francis 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SY7636A_POLL_ENABLED_TIME 500
+
+static int sy7636a_get_vcom_voltage_op(struct regulator_dev *rdev)
+{
+   int ret;
+   unsigned int val, val_h;
+
+   ret = regmap_read(rdev->regmap, SY7636A_REG_VCOM_ADJUST_CTRL_L, );
+   if (ret)
+   return ret;
+
+   ret = regmap_read(rdev->regmap, SY7636A_REG_VCOM_ADJUST_CTRL_H, _h);
+   if (ret)
+   return ret;
+
+   val |= (val_h << VCOM_ADJUST_CTRL_SHIFT);
+
+   return (val & VCOM_ADJUST_CTRL_MASK) * VCOM_ADJUST_CTRL_SCAL;
+}
+
+static int sy7636a_get_status(struct regulator_dev *rdev)
+{
+   struct sy7636a *sy7636a = dev_get_drvdata(rdev->dev.parent);
+   int ret = 0;
+
+   ret = gpiod_get_value_cansleep(sy7636a->pgood_gpio);
+   if (ret < 0)
+   dev_err(>dev, "Failed to read pgood gpio: %d\n", ret);
+
+   return ret;
+}
+
+static const struct regulator_ops sy7636a_vcom_volt_ops = {
+   .get_voltage = sy7636a_get_vcom_voltage_op,
+   .enable = regulator_enable_regmap,
+   .disable = regulator_disable_regmap,
+   .is_enabled = regulator_is_enabled_regmap,
+   .get_status = sy7636a_get_status,
+};
+
+struct regulator_desc desc = {
+   .name = "vcom",
+   .id = 0,
+   .ops = _vcom_volt_ops,
+   .type = REGULATOR_VOLTAGE,
+   .owner = THIS_MODULE,
+   .enable_reg = SY7636A_REG_OPERATION_MODE_CRL,
+   .enable_mask = SY7636A_OPERATION_MODE_CRL_ONOFF,
+   .poll_enabled_time  = SY7636A_POLL_ENABLED_TIME,
+   .regulators_node = of_match_ptr("regulators"),
+   .of_match = of_match_ptr("vcom"),
+};
+
+static int sy7636a_regulator_probe(struct platform_device *pdev)
+{
+   struct sy7636a *sy7636a = dev_get_drvdata(pdev->dev.parent);
+   struct regulator_config config = { };
+   struct regulator_dev *rdev;
+   struct gpio_desc *gdp;
+   int ret;
+
+   if (!sy7636a)
+   return -EPROBE_DEFER;
+
+   platform_set_drvdata(pdev, sy7636a);
+
+   gdp = devm_gpiod_get(sy7636a->dev, "epd-pwr-good", GPIOD_IN);
+   if (IS_ERR(gdp)) {
+   dev_err(sy7636a->dev, "Power good GPIO fault %ld\n", 
PTR_ERR(gdp));
+   return PTR_ERR(gdp);
+   }
+
+   sy7636a->pgood_gpio = gdp;
+
+   ret = regmap_write(sy7636a->regmap, SY7636A_REG_POWER_ON_DELAY_TIME, 
0x0);
+   if (ret) {
+   dev_err(sy7636a->dev, "Failed to initialize regulator: %d\n", 
ret);
+   return ret;
+   }
+
+   config.dev = >dev;
+   config.dev->of_node = sy7636a->dev->of_node;
+   config.driver_data = sy7636a;
+

[PATCH v5 2/5] mfd: sy7636a: Initial commit

2021-04-19 Thread Alistair Francis

Initial support for the Silergy SY7636A Power Management chip.

Signed-off-by: Alistair Francis 
---
v5:
 - Don't use regmap-irq

 drivers/mfd/Kconfig |  9 
 drivers/mfd/Makefile|  1 +
 drivers/mfd/sy7636a.c   | 82 +
 include/linux/mfd/sy7636a.h | 46 +
 4 files changed, 138 insertions(+)
 create mode 100644 drivers/mfd/sy7636a.c
 create mode 100644 include/linux/mfd/sy7636a.h

diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index b74efa469e90..9516ba932b5e 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -1351,6 +1351,15 @@ config MFD_SYSCON
  Select this option to enable accessing system control registers
  via regmap.
 
+config MFD_SY7636A
+   tristate "Silergy SY7636A Power Management chip"
+   select MFD_CORE
+   select REGMAP_I2C
+   depends on I2C
+   help
+ Select this option to enable support for the Silergy SY7636A
+ Power Management chip.
+
 config MFD_DAVINCI_VOICECODEC
tristate
select MFD_CORE
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index 834f5463af28..5bfa0d6e5dc5 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -265,6 +265,7 @@ obj-$(CONFIG_MFD_STMFX) += stmfx.o
 obj-$(CONFIG_MFD_KHADAS_MCU)   += khadas-mcu.o
 obj-$(CONFIG_MFD_ACER_A500_EC) += acer-ec-a500.o
 
+obj-$(CONFIG_MFD_SY7636A)  += sy7636a.o
 obj-$(CONFIG_SGI_MFD_IOC3) += ioc3.o
 obj-$(CONFIG_MFD_SIMPLE_MFD_I2C)   += simple-mfd-i2c.o
 obj-$(CONFIG_MFD_INTEL_M10_BMC)   += intel-m10-bmc.o
diff --git a/drivers/mfd/sy7636a.c b/drivers/mfd/sy7636a.c
new file mode 100644
index ..e08f29ea63f8
--- /dev/null
+++ b/drivers/mfd/sy7636a.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// MFD parent driver for SY7636A chip
+//
+// Copyright (C) 2021 reMarkable AS - http://www.remarkable.com/
+//
+// Authors: Lars Ivar Miljeteig 
+//  Alistair Francis 
+//
+// Based on the lp87565 driver by Keerthy 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+static const struct regmap_config sy7636a_regmap_config = {
+   .reg_bits = 8,
+   .val_bits = 8,
+};
+
+static const struct mfd_cell sy7636a_cells[] = {
+   { .name = "sy7636a-regulator", },
+   { .name = "sy7636a-temperature", },
+   { .name = "sy7636a-thermal", },
+};
+
+static const struct of_device_id of_sy7636a_match_table[] = {
+   { .compatible = "silergy,sy7636a", },
+   {}
+};
+MODULE_DEVICE_TABLE(of, of_sy7636a_match_table);
+
+static int sy7636a_probe(struct i2c_client *client,
+const struct i2c_device_id *ids)
+{
+   struct sy7636a *sy7636a;
+   int ret;
+
+   sy7636a = devm_kzalloc(>dev, sizeof(*sy7636a), GFP_KERNEL);
+   if (!sy7636a)
+   return -ENOMEM;
+
+   sy7636a->dev = >dev;
+
+   sy7636a->regmap = devm_regmap_init_i2c(client, _regmap_config);
+   if (IS_ERR(sy7636a->regmap)) {
+   ret = PTR_ERR(sy7636a->regmap);
+   dev_err(sy7636a->dev,
+   "Failed to initialize register map: %d\n", ret);
+   return ret;
+   }
+
+   i2c_set_clientdata(client, sy7636a);
+
+   ret = devm_mfd_add_devices(sy7636a->dev, PLATFORM_DEVID_AUTO,
+   sy7636a_cells, 
ARRAY_SIZE(sy7636a_cells),
+   NULL, 0, NULL);
+   return 0;
+}
+
+static const struct i2c_device_id sy7636a_id_table[] = {
+   { "sy7636a", 0 },
+   { },
+};
+MODULE_DEVICE_TABLE(i2c, sy7636a_id_table);
+
+static struct i2c_driver sy7636a_driver = {
+   .driver = {
+   .name   = "sy7636a",
+   .of_match_table = of_sy7636a_match_table,
+   },
+   .probe = sy7636a_probe,
+   .id_table = sy7636a_id_table,
+};
+module_i2c_driver(sy7636a_driver);
+
+MODULE_AUTHOR("Lars Ivar Miljeteig ");
+MODULE_DESCRIPTION("Silergy SY7636A Multi-Function Device Driver");
+MODULE_LICENSE("GPL v2");
diff --git a/include/linux/mfd/sy7636a.h b/include/linux/mfd/sy7636a.h
new file mode 100644
index ..43b0db0f8e6d
--- /dev/null
+++ b/include/linux/mfd/sy7636a.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Functions to access SY3686A power management chip.
+ *
+ * Copyright (C) 2021 reMarkable AS - http://www.remarkable.com/
+ */
+
+#ifndef __MFD_SY7636A_H
+#define __MFD_SY7636A_H
+
+#include 
+#include 
+#include 
+#include 
+
+#define SY7636A_REG_OPERATION_MODE_CRL 0x00
+#define SY7636A_OPERATION_MODE_CRL_VCOMCTL BIT(6)
+#define SY7636A_OPERATION_MODE_CRL_ONOFF   BIT(7)
+#define SY7636A_REG_VCOM_ADJUST_CTRL_L 0x01
+#define SY7636A_REG_VCOM_ADJUST_CTRL_H 0x02
+#define SY7636A_REG_VCOM_ADJUST_CTRL_MASK  0x01ff
+#define SY

[PATCH v5 1/5] dt-bindings: mfd: Initial commit of silergy,sy7636a.yaml

2021-04-19 Thread Alistair Francis

Initial support for the Silergy SY7636A Power Management chip
and regulator.

Signed-off-by: Alistair Francis 
---
v5:
 - Improve the documentation

 .../bindings/mfd/silergy,sy7636a.yaml | 70 +++
 1 file changed, 70 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/silergy,sy7636a.yaml

diff --git a/Documentation/devicetree/bindings/mfd/silergy,sy7636a.yaml 
b/Documentation/devicetree/bindings/mfd/silergy,sy7636a.yaml
new file mode 100644
index ..83050c36acaf
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/silergy,sy7636a.yaml
@@ -0,0 +1,70 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/mfd/silergy,sy7636a.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: silergy sy7636a PMIC
+
+maintainers:
+  - Alistair Francis 
+
+properties:
+  compatible:
+const: silergy,sy7636a
+
+  reg:
+maxItems: 1
+
+  '#thermal-sensor-cells':
+const: 0
+
+  epd-pwr-good-gpios:
+description:
+  Specifying the power good GPIOs. As defined in bindings/gpio.txt.
+maxItems: 1
+
+  regulators:
+type: object
+
+properties:
+  compatible:
+const: silergy,sy7636a-regulator
+  $ref: /schemas/regulator/regulator.yaml#
+
+  regulator-name:
+const: "vcom"
+
+  additionalProperties: false
+
+required:
+  - compatible
+  - reg
+  - '#thermal-sensor-cells'
+
+additionalProperties: false
+
+examples:
+  - |
+i2c {
+  #address-cells = <1>;
+  #size-cells = <0>;
+
+  pmic@62 {
+compatible = "silergy,sy7636a";
+reg = <0x62>;
+status = "okay";
+pinctrl-names = "default";
+pinctrl-0 = <_epdpmic>;
+#thermal-sensor-cells = <0>;
+
+regulators {
+  compatible = "silergy,sy7636a-regulator";
+  reg_epdpmic: vcom {
+regulator-name = "vcom";
+regulator-boot-on;
+  };
+};
+  };
+};
+...
-- 
2.31.1

[PATCH] ARM: imx7d-remarkable2.dts: Add WiFi support

2021-04-19 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 arch/arm/boot/dts/imx7d-remarkable2.dts | 91 +
 1 file changed, 91 insertions(+)

diff --git a/arch/arm/boot/dts/imx7d-remarkable2.dts 
b/arch/arm/boot/dts/imx7d-remarkable2.dts
index 8cbae656395c..c3dda2b92fe6 100644
--- a/arch/arm/boot/dts/imx7d-remarkable2.dts
+++ b/arch/arm/boot/dts/imx7d-remarkable2.dts
@@ -21,6 +21,27 @@ memory@8000 {
device_type = "memory";
reg = <0x8000 0x4000>;
};
+
+   reg_brcm: regulator-brcm {
+   compatible = "regulator-fixed";
+   regulator-name = "brcm_reg";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_brcm_reg>;
+   gpio = < 13 GPIO_ACTIVE_HIGH>;
+   enable-active-high;
+   startup-delay-us = <150>;
+   };
+
+   wifi_pwrseq: wifi_pwrseq {
+   compatible = "mmc-pwrseq-simple";
+   pinctrl-names = "default";
+   pinctrl-0 = <_wifi>;
+   reset-gpios = < 9 GPIO_ACTIVE_LOW>;
+   clocks = < IMX7D_CLKO2_ROOT_DIV>;
+   clock-names = "ext_clock";
+   };
 };
 
  {
@@ -56,6 +77,27 @@  {
status = "okay";
 };
 
+ {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   pinctrl-names = "default", "state_100mhz", "sleep";
+   pinctrl-0 = <_usdhc2>;
+   pinctrl-1 = <_usdhc2_100mhz>;
+   pinctrl-2 = <_usdhc2>;
+   mmc-pwrseq = <_pwrseq>;
+   vmmc-supply = <_brcm>;
+   bus-width = <4>;
+   non-removable;
+   keep-power-in-suspend;
+   cap-power-off-card;
+   status = "okay";
+
+   brcmf: bcrmf@1 {
+   reg = <1>;
+   compatible = "brcm,bcm4329-fmac";
+   };
+};
+
  {
pinctrl-names = "default", "state_100mhz", "state_200mhz", "sleep";
pinctrl-0 = <_usdhc3>;
@@ -76,6 +118,13 @@  {
 };
 
  {
+   pinctrl_brcm_reg: brcmreggrp {
+   fsl,pins = <
+   /* WIFI_PWR_EN */
+   MX7D_PAD_SAI1_TX_BCLK__GPIO6_IO13   0x14
+   >;
+   };
+
pinctrl_uart1: uart1grp {
fsl,pins = <
MX7D_PAD_UART1_TX_DATA__UART1_DCE_TX0x79
@@ -90,6 +139,39 @@ MX7D_PAD_EPDC_DATA08__UART6_DCE_RX  0x79
>;
};
 
+   pinctrl_usdhc2: usdhc2grp {
+   fsl,pins = <
+   MX7D_PAD_SD2_CMD__SD2_CMD   0x59
+   MX7D_PAD_SD2_CLK__SD2_CLK   0x19
+   MX7D_PAD_SD2_DATA0__SD2_DATA0   0x59
+   MX7D_PAD_SD2_DATA1__SD2_DATA1   0x59
+   MX7D_PAD_SD2_DATA2__SD2_DATA2   0x59
+   MX7D_PAD_SD2_DATA3__SD2_DATA3   0x59
+   >;
+   };
+
+   pinctrl_usdhc2_100mhz: usdhc2grp_100mhz {
+   fsl,pins = <
+   MX7D_PAD_SD2_CMD__SD2_CMD   0x5a
+   MX7D_PAD_SD2_CLK__SD2_CLK   0x1a
+   MX7D_PAD_SD2_DATA0__SD2_DATA0   0x5a
+   MX7D_PAD_SD2_DATA1__SD2_DATA1   0x5a
+   MX7D_PAD_SD2_DATA2__SD2_DATA2   0x5a
+   MX7D_PAD_SD2_DATA3__SD2_DATA3   0x5a
+   >;
+   };
+
+   pinctrl_usdhc2_200mhz: usdhc2grp_200mhz {
+   fsl,pins = <
+   MX7D_PAD_SD2_CMD__SD2_CMD   0x5b
+   MX7D_PAD_SD2_CLK__SD2_CLK   0x1b
+   MX7D_PAD_SD2_DATA0__SD2_DATA0   0x5b
+   MX7D_PAD_SD2_DATA1__SD2_DATA1   0x5b
+   MX7D_PAD_SD2_DATA2__SD2_DATA2   0x5b
+   MX7D_PAD_SD2_DATA3__SD2_DATA3   0x5b
+   >;
+   };
+
pinctrl_usdhc3: usdhc3grp {
fsl,pins = <
MX7D_PAD_SD3_CMD__SD3_CMD   0x59
@@ -143,4 +225,13 @@ pinctrl_wdog: wdoggrp {
MX7D_PAD_ENET1_COL__WDOG1_WDOG_ANY  0x74
>;
};
+
+   pinctrl_wifi: wifigrp {
+   fsl,pins = <
+   /* WiFi Reg On */
+   MX7D_PAD_SD2_CD_B__GPIO5_IO90x0014
+   /* WiFi Sleep 32k */
+   MX7D_PAD_SD1_WP__CCM_CLKO2  0x0014
+   >;
+   };
 };
-- 
2.31.1

[PATCH v5 9/9] ARM: dts: imx7d: remarkable2: add wacom digitizer device

2021-04-19 Thread Alistair Francis

Enable the wacom_i2c touchscreen for the reMarkable2.

Signed-off-by: Alistair Francis 
---
 arch/arm/boot/dts/imx7d-remarkable2.dts | 61 +
 1 file changed, 61 insertions(+)

diff --git a/arch/arm/boot/dts/imx7d-remarkable2.dts 
b/arch/arm/boot/dts/imx7d-remarkable2.dts
index c3dda2b92fe6..ea1dd41023f9 100644
--- a/arch/arm/boot/dts/imx7d-remarkable2.dts
+++ b/arch/arm/boot/dts/imx7d-remarkable2.dts
@@ -34,6 +34,19 @@ reg_brcm: regulator-brcm {
startup-delay-us = <150>;
};
 
+   reg_digitizer: regulator-digitizer {
+   compatible = "regulator-fixed";
+   regulator-name = "VDD_3V3_DIGITIZER";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   pinctrl-names = "default", "sleep";
+   pinctrl-0 = <_digitizer_reg>;
+   pinctrl-1 = <_digitizer_reg>;
+   gpio = < 6 GPIO_ACTIVE_HIGH>;
+   enable-active-high;
+   startup-delay-us = <10>; /* 100 ms */
+   };
+
wifi_pwrseq: wifi_pwrseq {
compatible = "mmc-pwrseq-simple";
pinctrl-names = "default";
@@ -51,6 +64,28 @@  {
assigned-clock-rates = <0>, <32768>;
 };
 
+ {
+   clock-frequency = <40>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_i2c1>;
+   status = "okay";
+
+   wacom_digitizer: digitizer@9 {
+   compatible = "wacom,i2c-30";
+   reg = <0x09>;
+   pinctrl-names = "default";
+   pinctrl-0 = <_wacom>;
+   interrupt-parent = <>;
+   interrupts = <1 IRQ_TYPE_EDGE_FALLING>;
+   flip-tilt-x;
+   flip-tilt-y;
+   flip-pos-x;
+   flip-pos-y;
+   flip-distance;
+   vdd-supply = <_digitizer>;
+   };
+};
+
 _pwrkey {
status = "okay";
 };
@@ -117,6 +152,25 @@  {
fsl,ext-reset-output;
 };
 
+_lpsr {
+   pinctrl_digitizer_reg: digitizerreggrp {
+   fsl,pins = <
+   /* DIGITIZER_PWR_EN */
+   MX7D_PAD_LPSR_GPIO1_IO06__GPIO1_IO6 0x14
+   >;
+   };
+
+   pinctrl_wacom: wacomgrp {
+   fsl,pins = <
+   /*MX7D_PAD_LPSR_GPIO1_IO05__GPIO1_IO5   0x0014 /* 
FWE */
+   MX7D_PAD_LPSR_GPIO1_IO04__GPIO1_IO4 0x0074 /* 
PDCTB */
+   MX7D_PAD_LPSR_GPIO1_IO01__GPIO1_IO1 0x0034 /* 
WACOM INT */
+   /*MX7D_PAD_LPSR_GPIO1_IO06__GPIO1_IO6   0x0014 /* 
WACOM PWR ENABLE */
+   /*MX7D_PAD_LPSR_GPIO1_IO00__GPIO1_IO0   0x0074 /* 
WACOM RESET */
+   >;
+   };
+};
+
  {
pinctrl_brcm_reg: brcmreggrp {
fsl,pins = <
@@ -125,6 +179,13 @@ MX7D_PAD_SAI1_TX_BCLK__GPIO6_IO13  0x14
>;
};
 
+   pinctrl_i2c1: i2c1grp {
+   fsl,pins = <
+   MX7D_PAD_I2C1_SDA__I2C1_SDA 0x407f
+   MX7D_PAD_I2C1_SCL__I2C1_SCL 0x407f
+   >;
+   };
+
pinctrl_uart1: uart1grp {
fsl,pins = <
MX7D_PAD_UART1_TX_DATA__UART1_DCE_TX0x79
-- 
2.31.1

[PATCH v5 8/9] ARM: imx_v6_v7_defconfig: Enable Wacom I2C

2021-04-19 Thread Alistair Francis

Enable the Wacom I2C in the imx defconfig as it is used by the
reMarkable2 tablet.

Signed-off-by: Alistair Francis 
---
 arch/arm/configs/imx_v6_v7_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index 70928cc48939..cd80e85d37cf 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -174,6 +174,7 @@ CONFIG_TOUCHSCREEN_DA9052=y
 CONFIG_TOUCHSCREEN_EGALAX=y
 CONFIG_TOUCHSCREEN_GOODIX=y
 CONFIG_TOUCHSCREEN_ILI210X=y
+CONFIG_TOUCHSCREEN_WACOM_I2C=y
 CONFIG_TOUCHSCREEN_MAX11801=y
 CONFIG_TOUCHSCREEN_IMX6UL_TSC=y
 CONFIG_TOUCHSCREEN_EDT_FT5X06=y
-- 
2.31.1

[PATCH v5 5/9] Input: wacom_i2c - Add support for distance and tilt x/y

2021-04-19 Thread Alistair Francis

This is based on the out of tree rM2 driver.

Signed-off-by: Alistair Francis 
---
v5:
 - Check the firmware version

 drivers/input/touchscreen/wacom_i2c.c | 34 +--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/input/touchscreen/wacom_i2c.c 
b/drivers/input/touchscreen/wacom_i2c.c
index 28290724b3da..e0a69e63204d 100644
--- a/drivers/input/touchscreen/wacom_i2c.c
+++ b/drivers/input/touchscreen/wacom_i2c.c
@@ -22,12 +22,18 @@
 #define WACOM_CMD_QUERY3   0x02
 #define WACOM_CMD_THROW0   0x05
 #define WACOM_CMD_THROW1   0x00
-#define WACOM_QUERY_SIZE   19
+#define WACOM_QUERY_SIZE   22
+
+#define WACOM_DISTANCE_TILT_VERSION0x30
 
 struct wacom_features {
int x_max;
int y_max;
int pressure_max;
+   int distance_max;
+   int distance_physical_max;
+   int tilt_x_max;
+   int tilt_y_max;
char fw_version;
 };
 
@@ -79,6 +85,17 @@ static int wacom_query_device(struct i2c_client *client,
features->y_max = get_unaligned_le16([5]);
features->pressure_max = get_unaligned_le16([11]);
features->fw_version = get_unaligned_le16([13]);
+   if (features->fw_version >= WACOM_DISTANCE_TILT_VERSION) {
+   features->distance_max = data[15];
+   features->distance_physical_max = data[16];
+   features->tilt_x_max = get_unaligned_le16([17]);
+   features->tilt_y_max = get_unaligned_le16([19]);
+   } else {
+   features->distance_max = -1;
+   features->distance_physical_max = -1;
+   features->tilt_x_max = -1;
+   features->tilt_y_max = -1;
+   }
 
dev_dbg(>dev,
"x_max:%d, y_max:%d, pressure:%d, fw:%d\n",
@@ -95,6 +112,7 @@ static irqreturn_t wacom_i2c_irq(int irq, void *dev_id)
u8 *data = wac_i2c->data;
unsigned int x, y, pressure;
unsigned char tsw, f1, f2, ers;
+   short tilt_x, tilt_y, distance;
int error;
 
error = i2c_master_recv(wac_i2c->client,
@@ -109,6 +127,11 @@ static irqreturn_t wacom_i2c_irq(int irq, void *dev_id)
x = le16_to_cpup((__le16 *)[4]);
y = le16_to_cpup((__le16 *)[6]);
pressure = le16_to_cpup((__le16 *)[8]);
+   distance = data[10];
+
+   /* Signed */
+   tilt_x = le16_to_cpup((__le16 *)[11]);
+   tilt_y = le16_to_cpup((__le16 *)[13]);
 
if (!wac_i2c->prox)
wac_i2c->tool = (data[3] & 0x0c) ?
@@ -123,6 +146,9 @@ static irqreturn_t wacom_i2c_irq(int irq, void *dev_id)
input_report_key(input, BTN_STYLUS, f1);
input_report_key(input, BTN_STYLUS2, f2);
input_report_abs(input, ABS_PRESSURE, pressure);
+   input_report_abs(input, ABS_DISTANCE, distance);
+   input_report_abs(input, ABS_TILT_X, tilt_x);
+   input_report_abs(input, ABS_TILT_Y, tilt_y);
input_sync(input);
 
 out:
@@ -195,7 +221,11 @@ static int wacom_i2c_probe(struct i2c_client *client,
input_set_abs_params(input, ABS_Y, 0, features.y_max, 0, 0);
input_set_abs_params(input, ABS_PRESSURE,
 0, features.pressure_max, 0, 0);
-
+   input_set_abs_params(input, ABS_DISTANCE, 0, features.distance_max, 0, 
0);
+   input_set_abs_params(input, ABS_TILT_X, -features.tilt_x_max,
+features.tilt_x_max, 0, 0);
+   input_set_abs_params(input, ABS_TILT_Y, -features.tilt_y_max,
+features.tilt_y_max, 0, 0);
input_set_drvdata(input, wac_i2c);
 
error = request_threaded_irq(client->irq, NULL, wacom_i2c_irq,
-- 
2.31.1

[PATCH v5 6/9] Input: wacom_i2c - Clean up the query device fields

2021-04-19 Thread Alistair Francis

Improve the query device fields to be more verbose.

Signed-off-by: Alistair Francis 
---
 drivers/input/touchscreen/wacom_i2c.c | 64 ++-
 1 file changed, 44 insertions(+), 20 deletions(-)

diff --git a/drivers/input/touchscreen/wacom_i2c.c 
b/drivers/input/touchscreen/wacom_i2c.c
index e0a69e63204d..26881149d509 100644
--- a/drivers/input/touchscreen/wacom_i2c.c
+++ b/drivers/input/touchscreen/wacom_i2c.c
@@ -13,15 +13,32 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
-#define WACOM_CMD_QUERY0   0x04
-#define WACOM_CMD_QUERY1   0x00
-#define WACOM_CMD_QUERY2   0x33
-#define WACOM_CMD_QUERY3   0x02
-#define WACOM_CMD_THROW0   0x05
-#define WACOM_CMD_THROW1   0x00
+// Registers
+#define WACOM_COMMAND_LSB   0x04
+#define WACOM_COMMAND_MSB   0x00
+
+#define WACOM_DATA_LSB  0x05
+#define WACOM_DATA_MSB  0x00
+
+// Report types
+#define REPORT_FEATURE  0x30
+
+// Requests / operations
+#define OPCODE_GET_REPORT   0x02
+
+// Power settings
+#define POWER_ON0x00
+#define POWER_SLEEP 0x01
+
+// Input report ids
+#define WACOM_PEN_DATA_REPORT   2
+#define WACOM_SHINONOME_REPORT  26
+
+#define WACOM_QUERY_REPORT 3
 #define WACOM_QUERY_SIZE   22
 
 #define WACOM_DISTANCE_TILT_VERSION0x30
@@ -50,27 +67,30 @@ static int wacom_query_device(struct i2c_client *client,
  struct wacom_features *features)
 {
int ret;
-   u8 cmd1[] = { WACOM_CMD_QUERY0, WACOM_CMD_QUERY1,
-   WACOM_CMD_QUERY2, WACOM_CMD_QUERY3 };
-   u8 cmd2[] = { WACOM_CMD_THROW0, WACOM_CMD_THROW1 };
u8 data[WACOM_QUERY_SIZE];
+
+   u8 get_query_data_cmd[] = {
+   WACOM_COMMAND_LSB,
+   WACOM_COMMAND_MSB,
+   REPORT_FEATURE | WACOM_QUERY_REPORT,
+   OPCODE_GET_REPORT,
+   WACOM_DATA_LSB,
+   WACOM_DATA_MSB,
+   };
+
struct i2c_msg msgs[] = {
+   // Request reading of feature ReportID: 3 (Pen Query Data)
{
.addr = client->addr,
.flags = 0,
-   .len = sizeof(cmd1),
-   .buf = cmd1,
-   },
-   {
-   .addr = client->addr,
-   .flags = 0,
-   .len = sizeof(cmd2),
-   .buf = cmd2,
+   .len = sizeof(get_query_data_cmd),
+   .buf = get_query_data_cmd,
},
+   // Read 21 bytes
{
.addr = client->addr,
.flags = I2C_M_RD,
-   .len = sizeof(data),
+   .len = WACOM_QUERY_SIZE - 1,
.buf = data,
},
};
@@ -98,9 +118,13 @@ static int wacom_query_device(struct i2c_client *client,
}
 
dev_dbg(>dev,
-   "x_max:%d, y_max:%d, pressure:%d, fw:%d\n",
+   "x_max:%d, y_max:%d, pressure:%d, fw:%d, "
+   "distance: %d, phys distance: %d, "
+   "tilt_x_max: %d, tilt_y_max: %d\n",
features->x_max, features->y_max,
-   features->pressure_max, features->fw_version);
+   features->pressure_max, features->fw_version,
+   features->distance_max, features->distance_physical_max,
+   features->tilt_x_max, features->tilt_y_max);
 
return 0;
 }
-- 
2.31.1

[PATCH v5 7/9] Input: wacom_i2c - Add support for vdd regulator

2021-04-19 Thread Alistair Francis

Add support for a VDD regulator. This allows the kernel to prove the
Wacom-I2C device on the rM2.

Signed-off-by: Alistair Francis 
---
 drivers/input/touchscreen/wacom_i2c.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/input/touchscreen/wacom_i2c.c 
b/drivers/input/touchscreen/wacom_i2c.c
index 26881149d509..a5963d9e1194 100644
--- a/drivers/input/touchscreen/wacom_i2c.c
+++ b/drivers/input/touchscreen/wacom_i2c.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -58,6 +59,7 @@ struct wacom_i2c {
struct i2c_client *client;
struct input_dev *input;
struct touchscreen_properties props;
+   struct regulator *vdd;
u8 data[WACOM_QUERY_SIZE];
bool prox;
int tool;
@@ -221,6 +223,20 @@ static int wacom_i2c_probe(struct i2c_client *client,
goto err_free_mem;
}
 
+   wac_i2c->vdd = regulator_get(>dev, "vdd");
+   if (IS_ERR(wac_i2c->vdd)) {
+   error = PTR_ERR(wac_i2c->vdd);
+   kfree(wac_i2c);
+   return error;
+   }
+
+   error = regulator_enable(wac_i2c->vdd);
+   if (error) {
+   regulator_put(wac_i2c->vdd);
+   kfree(wac_i2c);
+   return error;
+   }
+
wac_i2c->client = client;
wac_i2c->input = input;
 
@@ -277,6 +293,8 @@ static int wacom_i2c_probe(struct i2c_client *client,
 err_free_irq:
free_irq(client->irq, wac_i2c);
 err_free_mem:
+   regulator_disable(wac_i2c->vdd);
+   regulator_put(wac_i2c->vdd);
input_free_device(input);
kfree(wac_i2c);
 
-- 
2.31.1

[PATCH v5 3/9] Input: wacom_i2c - Add device tree support to wacom_i2c

2021-04-19 Thread Alistair Francis

Allow the wacom-i2c device to be exposed via device tree.

Signed-off-by: Alistair Francis 
---
 drivers/input/touchscreen/wacom_i2c.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/input/touchscreen/wacom_i2c.c 
b/drivers/input/touchscreen/wacom_i2c.c
index 1afc6bde2891..dd3fc54d3825 100644
--- a/drivers/input/touchscreen/wacom_i2c.c
+++ b/drivers/input/touchscreen/wacom_i2c.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define WACOM_CMD_QUERY0   0x04
@@ -262,10 +263,17 @@ static const struct i2c_device_id wacom_i2c_id[] = {
 };
 MODULE_DEVICE_TABLE(i2c, wacom_i2c_id);
 
+static const struct of_device_id wacom_i2c_of_match_table[] = {
+   { .compatible = "wacom,i2c-30" },
+   {}
+};
+MODULE_DEVICE_TABLE(of, wacom_i2c_of_match_table);
+
 static struct i2c_driver wacom_i2c_driver = {
.driver = {
.name   = "wacom_i2c",
.pm = _i2c_pm,
+   .of_match_table = wacom_i2c_of_match_table,
},
 
.probe  = wacom_i2c_probe,
-- 
2.31.1

[PATCH v5 4/9] Input: wacom_i2c - Add touchscren properties

2021-04-19 Thread Alistair Francis

Connect touchscreen properties to the wacom_i2c.

Signed-off-by: Alistair Francis 
---
 drivers/input/touchscreen/wacom_i2c.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/input/touchscreen/wacom_i2c.c 
b/drivers/input/touchscreen/wacom_i2c.c
index dd3fc54d3825..28290724b3da 100644
--- a/drivers/input/touchscreen/wacom_i2c.c
+++ b/drivers/input/touchscreen/wacom_i2c.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +34,7 @@ struct wacom_features {
 struct wacom_i2c {
struct i2c_client *client;
struct input_dev *input;
+   struct touchscreen_properties props;
u8 data[WACOM_QUERY_SIZE];
bool prox;
int tool;
@@ -114,12 +116,12 @@ static irqreturn_t wacom_i2c_irq(int irq, void *dev_id)
 
wac_i2c->prox = data[3] & 0x20;
 
+   touchscreen_report_pos(input, _i2c->props, features.x_max,
+  features.y_max, true);
input_report_key(input, BTN_TOUCH, tsw || ers);
input_report_key(input, wac_i2c->tool, wac_i2c->prox);
input_report_key(input, BTN_STYLUS, f1);
input_report_key(input, BTN_STYLUS2, f2);
-   input_report_abs(input, ABS_X, x);
-   input_report_abs(input, ABS_Y, y);
input_report_abs(input, ABS_PRESSURE, pressure);
input_sync(input);
 
@@ -188,6 +190,7 @@ static int wacom_i2c_probe(struct i2c_client *client,
__set_bit(BTN_STYLUS2, input->keybit);
__set_bit(BTN_TOUCH, input->keybit);
 
+   touchscreen_parse_properties(input, true, _i2c->props);
input_set_abs_params(input, ABS_X, 0, features.x_max, 0, 0);
input_set_abs_params(input, ABS_Y, 0, features.y_max, 0, 0);
input_set_abs_params(input, ABS_PRESSURE,
-- 
2.31.1

[PATCH v5 1/9] dt-bindings: Add Wacom to vendor bindings

2021-04-19 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 Documentation/devicetree/bindings/vendor-prefixes.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml 
b/Documentation/devicetree/bindings/vendor-prefixes.yaml
index a8e1e8d2ef20..996f4de2fff5 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.yaml
+++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml
@@ -1216,6 +1216,8 @@ patternProperties:
 description: Vision Optical Technology Co., Ltd.
   "^vxt,.*":
 description: VXT Ltd
+  "^wacom,.*":
+description: Wacom Co., Ltd
   "^wand,.*":
 description: Wandbord (Technexion)
   "^waveshare,.*":
-- 
2.31.1

[PATCH v5 2/9] dt-bindings: touchscreen: Initial commit of wacom,generic

2021-04-19 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 .../input/touchscreen/wacom,generic.yaml  | 48 +++
 1 file changed, 48 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/input/touchscreen/wacom,generic.yaml

diff --git 
a/Documentation/devicetree/bindings/input/touchscreen/wacom,generic.yaml 
b/Documentation/devicetree/bindings/input/touchscreen/wacom,generic.yaml
new file mode 100644
index ..19bbfc55ed76
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/touchscreen/wacom,generic.yaml
@@ -0,0 +1,48 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/input/touchscreen/wacom,generic.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Wacom I2C Controller
+
+maintainers:
+  - Alistair Francis 
+
+allOf:
+  - $ref: touchscreen.yaml#
+
+properties:
+  compatible:
+const: wacom,generic
+
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+  vdd-supply:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - interrupts
+
+additionalProperties: false
+
+examples:
+  - |
+#include "dt-bindings/interrupt-controller/irq.h"
+i2c {
+#address-cells = <1>;
+#size-cells = <0>;
+digitiser@9 {
+compatible = "wacom,generic";
+reg = <0x9>;
+interrupt-parent = <>;
+interrupts = <9 IRQ_TYPE_LEVEL_LOW>;
+vdd-supply = <_touch>;
+};
+};
-- 
2.31.1

Re: [PATCH v4] kernel/resource: Fix locking in request_free_mem_region

2021-04-19 Thread Alistair Popple

On Friday, 16 April 2021 2:19:18 PM AEST Dan Williams wrote:
> The revoke_iomem() change seems like something that should be moved
> into a leaf helper and not called by __request_free_mem_region()
> directly.

Ok. I have split this up but left the call to revoke_iomem() in 
__request_free_mem_region(). Perhaps I am missing something but I wasn't sure 
how moving it into a helper would be any better/different as it has to be 
called after dropping the lock.

 - Alistair

[PATCH v5 2/3] kernel/resource: Refactor __request_region to allow external locking

2021-04-19 Thread Alistair Popple

Refactor the portion of __request_region() done whilst holding the
resource_lock into a separate function to allow callers to hold the
lock.

Signed-off-by: Alistair Popple 
---
 kernel/resource.c | 52 +--
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 736768587d2d..75f8da722497 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1181,31 +1181,16 @@ struct address_space *iomem_get_mapping(void)
return smp_load_acquire(_inode)->i_mapping;
 }
 
-/**
- * __request_region - create a new busy resource region
- * @parent: parent resource descriptor
- * @start: resource start address
- * @n: resource region size
- * @name: reserving caller's ID string
- * @flags: IO resource flags
- */
-struct resource * __request_region(struct resource *parent,
+static int __request_region_locked(struct resource *res, struct resource 
*parent,
   resource_size_t start, resource_size_t n,
   const char *name, int flags)
 {
DECLARE_WAITQUEUE(wait, current);
-   struct resource *res = alloc_resource(GFP_KERNEL);
-   struct resource *orig_parent = parent;
-
-   if (!res)
-   return NULL;
 
res->name = name;
res->start = start;
res->end = start + n - 1;
 
-   write_lock(_lock);
-
for (;;) {
struct resource *conflict;
 
@@ -1241,13 +1226,40 @@ struct resource * __request_region(struct resource 
*parent,
continue;
}
/* Uhhuh, that didn't work out.. */
-   free_resource(res);
-   res = NULL;
-   break;
+   return -EBUSY;
}
+
+   return 0;
+}
+
+/**
+ * __request_region - create a new busy resource region
+ * @parent: parent resource descriptor
+ * @start: resource start address
+ * @n: resource region size
+ * @name: reserving caller's ID string
+ * @flags: IO resource flags
+ */
+struct resource *__request_region(struct resource *parent,
+ resource_size_t start, resource_size_t n,
+ const char *name, int flags)
+{
+   struct resource *res = alloc_resource(GFP_KERNEL);
+   int ret;
+
+   if (!res)
+   return NULL;
+
+   write_lock(_lock);
+   ret = __request_region_locked(res, parent, start, n, name, flags);
write_unlock(_lock);
 
-   if (res && orig_parent == _resource)
+   if (ret) {
+   free_resource(res);
+   return NULL;
+   }
+
+   if (parent == _resource)
revoke_iomem(res);
 
return res;
-- 
2.20.1

[PATCH v5 3/3] kernel/resource: Fix locking in request_free_mem_region

2021-04-19 Thread Alistair Popple

request_free_mem_region() is used to find an empty range of physical
addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
over the range of possible addresses using region_intersects() to see if
the range is free before calling request_mem_region() to allocate the
region.

However the resource_lock is dropped between these two calls meaning by the
time request_mem_region() is called in request_free_mem_region() another
thread may have already reserved the requested region. This results in
unexpected failures and a message in the kernel log from hitting this
condition:

/*
 * mm/hmm.c reserves physical addresses which then
 * become unavailable to other users.  Conflicts are
 * not expected.  Warn to aid debugging if encountered.
 */
if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) {
pr_warn("Unaddressable device %s %pR conflicts with %pR",
conflict->name, conflict, res);

These unexpected failures can be corrected by holding resource_lock across
the two calls. This also requires memory allocation to be performed prior
to taking the lock.

Signed-off-by: Alistair Popple 
---
 kernel/resource.c | 45 ++---
 1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 75f8da722497..e8468e867495 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1801,25 +1801,56 @@ static struct resource 
*__request_free_mem_region(struct device *dev,
 {
resource_size_t end, addr;
struct resource *res;
+   struct region_devres *dr = NULL;
 
size = ALIGN(size, 1UL << PA_SECTION_SHIFT);
end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
addr = end - size + 1UL;
 
+   res = alloc_resource(GFP_KERNEL);
+   if (!res)
+   return ERR_PTR(-ENOMEM);
+
+   if (dev) {
+   dr = devres_alloc(devm_region_release,
+   sizeof(struct region_devres), GFP_KERNEL);
+   if (!dr) {
+   free_resource(res);
+   return ERR_PTR(-ENOMEM);
+   }
+   }
+
+   write_lock(_lock);
for (; addr > size && addr >= base->start; addr -= size) {
-   if (region_intersects(addr, size, 0, IORES_DESC_NONE) !=
+   if (__region_intersects(addr, size, 0, IORES_DESC_NONE) !=
REGION_DISJOINT)
continue;
 
-   if (dev)
-   res = devm_request_mem_region(dev, addr, size, name);
-   else
-   res = request_mem_region(addr, size, name);
-   if (!res)
-   return ERR_PTR(-ENOMEM);
+   if (!__request_region_locked(res, _resource, addr, size,
+   name, 0))
+   break;
+
+   if (dev) {
+   dr->parent = _resource;
+   dr->start = addr;
+   dr->n = size;
+   devres_add(dev, dr);
+   }
+
res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
+   write_unlock(_lock);
+
+   /*
+* A driver is claiming this region so revoke any mappings.
+*/
+   revoke_iomem(res);
return res;
}
+   write_unlock(_lock);
+
+   free_resource(res);
+   if (dr)
+   devres_free(dr);
 
return ERR_PTR(-ERANGE);
 }
-- 
2.20.1

[PATCH v5 1/3] kernel/resource: Allow region_intersects users to hold resource_lock

2021-04-19 Thread Alistair Popple

Introduce a version of region_intersects() that can be called with the
resource_lock already held. This is used in a future fix to
__request_free_mem_region().

Signed-off-by: Alistair Popple 
---
 kernel/resource.c | 52 ---
 1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 627e61b0c124..736768587d2d 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -523,6 +523,34 @@ int __weak page_is_ram(unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(page_is_ram);
 
+int __region_intersects(resource_size_t start, size_t size, unsigned long 
flags,
+   unsigned long desc)
+{
+   struct resource res;
+   int type = 0; int other = 0;
+   struct resource *p;
+
+   res.start = start;
+   res.end = start + size - 1;
+
+   for (p = iomem_resource.child; p ; p = p->sibling) {
+   bool is_type = (((p->flags & flags) == flags) &&
+   ((desc == IORES_DESC_NONE) ||
+(desc == p->desc)));
+
+   if (resource_overlaps(p, ))
+   is_type ? type++ : other++;
+   }
+
+   if (type == 0)
+   return REGION_DISJOINT;
+
+   if (other == 0)
+   return REGION_INTERSECTS;
+
+   return REGION_MIXED;
+}
+
 /**
  * region_intersects() - determine intersection of region with known resources
  * @start: region start address
@@ -546,31 +574,13 @@ EXPORT_SYMBOL_GPL(page_is_ram);
 int region_intersects(resource_size_t start, size_t size, unsigned long flags,
  unsigned long desc)
 {
-   struct resource res;
-   int type = 0; int other = 0;
-   struct resource *p;
-
-   res.start = start;
-   res.end = start + size - 1;
+   int ret;
 
read_lock(_lock);
-   for (p = iomem_resource.child; p ; p = p->sibling) {
-   bool is_type = (((p->flags & flags) == flags) &&
-   ((desc == IORES_DESC_NONE) ||
-(desc == p->desc)));
-
-   if (resource_overlaps(p, ))
-   is_type ? type++ : other++;
-   }
+   ret = __region_intersects(start, size, flags, desc);
read_unlock(_lock);
 
-   if (type == 0)
-   return REGION_DISJOINT;
-
-   if (other == 0)
-   return REGION_INTERSECTS;
-
-   return REGION_MIXED;
+   return ret;
 }
 EXPORT_SYMBOL_GPL(region_intersects);
 
-- 
2.20.1

[PATCH v4] kernel/resource: Fix locking in request_free_mem_region

2021-04-15 Thread Alistair Popple

request_free_mem_region() is used to find an empty range of physical
addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
over the range of possible addresses using region_intersects() to see if
the range is free.

region_intersects() obtains a read lock before walking the resource tree
to protect against concurrent changes. However it drops the lock prior
to returning. This means by the time request_mem_region() is called in
request_free_mem_region() another thread may have already reserved the
requested region resulting in unexpected failures and a message in the
kernel log from hitting this condition:

/*
 * mm/hmm.c reserves physical addresses which then
 * become unavailable to other users.  Conflicts are
 * not expected.  Warn to aid debugging if encountered.
 */
if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) {
pr_warn("Unaddressable device %s %pR conflicts with %pR",
conflict->name, conflict, res);

To fix this create versions of region_intersects() and
request_mem_region() that allow the caller to take the appropriate lock
such that it may be held over the required calls.

Instead of creating another version of devm_request_mem_region() that
doesn't take the lock open-code it to allow the caller to pre-allocate
the required memory prior to taking the lock.

On some architectures and kernel configurations revoke_iomem() also
calls resource code so cannot be called with the resource lock held.
Therefore call it only after dropping the lock.

Fixes: 4ef589dc9b10c ("mm/hmm/devmem: device memory hotplug using ZONE_DEVICE")
Signed-off-by: Alistair Popple 
Acked-by: Balbir Singh 
Reported-by: kernel test robot 

---

Changes for v4:

- Update commit log
- Moved calling revoke_iomem() to before devres_add(). This shouldn't
  change anything but it maintains the original ordering.
- Fixed freeing of devres in case of failure.
- Rebased onto linux-next
---
 kernel/resource.c | 144 ++
 1 file changed, 94 insertions(+), 50 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 7e00239a023a..f1f7fe089fc8 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -502,6 +502,34 @@ int __weak page_is_ram(unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(page_is_ram);
 
+static int __region_intersects(resource_size_t start, size_t size,
+  unsigned long flags, unsigned long desc)
+{
+   struct resource res;
+   int type = 0; int other = 0;
+   struct resource *p;
+
+   res.start = start;
+   res.end = start + size - 1;
+
+   for (p = iomem_resource.child; p ; p = p->sibling) {
+   bool is_type = (((p->flags & flags) == flags) &&
+   ((desc == IORES_DESC_NONE) ||
+(desc == p->desc)));
+
+   if (resource_overlaps(p, ))
+   is_type ? type++ : other++;
+   }
+
+   if (type == 0)
+   return REGION_DISJOINT;
+
+   if (other == 0)
+   return REGION_INTERSECTS;
+
+   return REGION_MIXED;
+}
+
 /**
  * region_intersects() - determine intersection of region with known resources
  * @start: region start address
@@ -525,31 +553,12 @@ EXPORT_SYMBOL_GPL(page_is_ram);
 int region_intersects(resource_size_t start, size_t size, unsigned long flags,
  unsigned long desc)
 {
-   struct resource res;
-   int type = 0; int other = 0;
-   struct resource *p;
-
-   res.start = start;
-   res.end = start + size - 1;
+   int rc;
 
read_lock(_lock);
-   for (p = iomem_resource.child; p ; p = p->sibling) {
-   bool is_type = (((p->flags & flags) == flags) &&
-   ((desc == IORES_DESC_NONE) ||
-(desc == p->desc)));
-
-   if (resource_overlaps(p, ))
-   is_type ? type++ : other++;
-   }
+   rc = __region_intersects(start, size, flags, desc);
read_unlock(_lock);
-
-   if (type == 0)
-   return REGION_DISJOINT;
-
-   if (other == 0)
-   return REGION_INTERSECTS;
-
-   return REGION_MIXED;
+   return rc;
 }
 EXPORT_SYMBOL_GPL(region_intersects);
 
@@ -1150,31 +1159,16 @@ struct address_space *iomem_get_mapping(void)
return smp_load_acquire(_inode)->i_mapping;
 }
 
-/**
- * __request_region - create a new busy resource region
- * @parent: parent resource descriptor
- * @start: resource start address
- * @n: resource region size
- * @name: reserving caller's ID string
- * @flags: IO resource flags
- */
-struct resource * __request_region(struct resource *parent,
-  resource_size_t start, resource_size_t n,
-  const char *name

[PATCH v8 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-04-07 Thread Alistair Popple

The behaviour of try_to_unmap_one() is difficult to follow because it
performs different operations based on a fairly large set of flags used
in different combinations.

TTU_MUNLOCK is one such flag. However it is exclusively used by
try_to_munlock() which specifies no other flags. Therefore rather than
overload try_to_unmap_one() with unrelated behaviour split this out into
it's own function and remove the flag.

Signed-off-by: Alistair Popple 
Reviewed-by: Ralph Campbell 
Reviewed-by: Christoph Hellwig 

---

v8:
* Renamed try_to_munlock to page_mlock to better reflect what the
  function actually does.
* Removed the TODO from the documentation that this patch addresses.

v7:
* Added Christoph's Reviewed-by

v4:
* Removed redundant check for VM_LOCKED
---
 Documentation/vm/unevictable-lru.rst | 33 ---
 include/linux/rmap.h |  3 +-
 mm/mlock.c   | 10 +++---
 mm/rmap.c| 48 +---
 4 files changed, 55 insertions(+), 39 deletions(-)

diff --git a/Documentation/vm/unevictable-lru.rst 
b/Documentation/vm/unevictable-lru.rst
index 0e1490524f53..eae3af17f2d9 100644
--- a/Documentation/vm/unevictable-lru.rst
+++ b/Documentation/vm/unevictable-lru.rst
@@ -389,14 +389,14 @@ mlocked, munlock_vma_page() updates that zone statistics 
for the number of
 mlocked pages.  Note, however, that at this point we haven't checked whether
 the page is mapped by other VM_LOCKED VMAs.
 
-We can't call try_to_munlock(), the function that walks the reverse map to
+We can't call page_mlock(), the function that walks the reverse map to
 check for other VM_LOCKED VMAs, without first isolating the page from the LRU.
-try_to_munlock() is a variant of try_to_unmap() and thus requires that the page
+page_mlock() is a variant of try_to_unmap() and thus requires that the page
 not be on an LRU list [more on these below].  However, the call to
-isolate_lru_page() could fail, in which case we couldn't try_to_munlock().  So,
+isolate_lru_page() could fail, in which case we can't call page_mlock().  So,
 we go ahead and clear PG_mlocked up front, as this might be the only chance we
-have.  If we can successfully isolate the page, we go ahead and
-try_to_munlock(), which will restore the PG_mlocked flag and update the zone
+have.  If we can successfully isolate the page, we go ahead and call
+page_mlock(), which will restore the PG_mlocked flag and update the zone
 page statistics if it finds another VMA holding the page mlocked.  If we fail
 to isolate the page, we'll have left a potentially mlocked page on the LRU.
 This is fine, because we'll catch it later if and if vmscan tries to reclaim
@@ -545,31 +545,24 @@ munlock or munmap system calls, mm teardown 
(munlock_vma_pages_all), reclaim,
 holepunching, and truncation of file pages and their anonymous COWed pages.
 
 
-try_to_munlock() Reverse Map Scan
+page_mlock() Reverse Map Scan
 -
 
-.. warning::
-   [!] TODO/FIXME: a better name might be page_mlocked() - analogous to the
-   page_referenced() reverse map walker.
-
 When munlock_vma_page() [see section :ref:`munlock()/munlockall() System Call
 Handling ` above] tries to munlock a
 page, it needs to determine whether or not the page is mapped by any
 VM_LOCKED VMA without actually attempting to unmap all PTEs from the
 page.  For this purpose, the unevictable/mlock infrastructure
-introduced a variant of try_to_unmap() called try_to_munlock().
+introduced a variant of try_to_unmap() called page_mlock().
 
-try_to_munlock() calls the same functions as try_to_unmap() for anonymous and
-mapped file and KSM pages with a flag argument specifying unlock versus unmap
-processing.  Again, these functions walk the respective reverse maps looking
-for VM_LOCKED VMAs.  When such a VMA is found, as in the try_to_unmap() case,
-the functions mlock the page via mlock_vma_page() and return SWAP_MLOCK.  This
-undoes the pre-clearing of the page's PG_mlocked done by munlock_vma_page.
+page_mlock() walks the respective reverse maps looking for VM_LOCKED VMAs. When
+such a VMA is found the page is mlocked via mlock_vma_page(). This undoes the
+pre-clearing of the page's PG_mlocked done by munlock_vma_page.
 
-Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's
+Note that page_mlock()'s reverse map walk must visit every VMA in a page's
 reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
 However, the scan can terminate when it encounters a VM_LOCKED VMA.
-Although try_to_munlock() might be called a great many times when munlocking a
+Although page_mlock() might be called a great many times when munlocking a
 large region or tearing down a large address space that has been mlocked via
 mlockall(), overall this is a fairly rare event.
 
@@ -602,7 +595,7 @@ inactive lists to the appropriate node's unevictable list.
 shrink_inactive_list() should only see SHM_LOCK'd pages

[PATCH v8 6/8] mm: Selftests for exclusive device memory

2021-04-07 Thread Alistair Popple

Adds some selftests for exclusive device memory.

Signed-off-by: Alistair Popple 
Acked-by: Jason Gunthorpe 
Tested-by: Ralph Campbell 
Reviewed-by: Ralph Campbell 
---
 lib/test_hmm.c | 124 +++
 lib/test_hmm_uapi.h|   2 +
 tools/testing/selftests/vm/hmm-tests.c | 158 +
 3 files changed, 284 insertions(+)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 5c9f5a020c1d..305a9d9e2b4c 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "test_hmm_uapi.h"
 
@@ -46,6 +47,7 @@ struct dmirror_bounce {
unsigned long   cpages;
 };
 
+#define DPT_XA_TAG_ATOMIC 1UL
 #define DPT_XA_TAG_WRITE 3UL
 
 /*
@@ -619,6 +621,54 @@ static void dmirror_migrate_alloc_and_copy(struct 
migrate_vma *args,
}
 }
 
+static int dmirror_check_atomic(struct dmirror *dmirror, unsigned long start,
+unsigned long end)
+{
+   unsigned long pfn;
+
+   for (pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); pfn++) {
+   void *entry;
+   struct page *page;
+
+   entry = xa_load(>pt, pfn);
+   page = xa_untag_pointer(entry);
+   if (xa_pointer_tag(entry) == DPT_XA_TAG_ATOMIC)
+   return -EPERM;
+   }
+
+   return 0;
+}
+
+static int dmirror_atomic_map(unsigned long start, unsigned long end,
+ struct page **pages, struct dmirror *dmirror)
+{
+   unsigned long pfn, mapped = 0;
+   int i;
+
+   /* Map the migrated pages into the device's page tables. */
+   mutex_lock(>mutex);
+
+   for (i = 0, pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); 
pfn++, i++) {
+   void *entry;
+
+   if (!pages[i])
+   continue;
+
+   entry = pages[i];
+   entry = xa_tag_pointer(entry, DPT_XA_TAG_ATOMIC);
+   entry = xa_store(>pt, pfn, entry, GFP_ATOMIC);
+   if (xa_is_err(entry)) {
+   mutex_unlock(>mutex);
+   return xa_err(entry);
+   }
+
+   mapped++;
+   }
+
+   mutex_unlock(>mutex);
+   return mapped;
+}
+
 static int dmirror_migrate_finalize_and_map(struct migrate_vma *args,
struct dmirror *dmirror)
 {
@@ -661,6 +711,71 @@ static int dmirror_migrate_finalize_and_map(struct 
migrate_vma *args,
return 0;
 }
 
+static int dmirror_exclusive(struct dmirror *dmirror,
+struct hmm_dmirror_cmd *cmd)
+{
+   unsigned long start, end, addr;
+   unsigned long size = cmd->npages << PAGE_SHIFT;
+   struct mm_struct *mm = dmirror->notifier.mm;
+   struct page *pages[64];
+   struct dmirror_bounce bounce;
+   unsigned long next;
+   int ret;
+
+   start = cmd->addr;
+   end = start + size;
+   if (end < start)
+   return -EINVAL;
+
+   /* Since the mm is for the mirrored process, get a reference first. */
+   if (!mmget_not_zero(mm))
+   return -EINVAL;
+
+   mmap_read_lock(mm);
+   for (addr = start; addr < end; addr = next) {
+   int i, mapped;
+
+   if (end < addr + (ARRAY_SIZE(pages) << PAGE_SHIFT))
+   next = end;
+   else
+   next = addr + (ARRAY_SIZE(pages) << PAGE_SHIFT);
+
+   ret = make_device_exclusive_range(mm, addr, next, pages, NULL);
+   mapped = dmirror_atomic_map(addr, next, pages, dmirror);
+   for (i = 0; i < ret; i++) {
+   if (pages[i]) {
+   unlock_page(pages[i]);
+   put_page(pages[i]);
+   }
+   }
+
+   if (addr + (mapped << PAGE_SHIFT) < next) {
+   mmap_read_unlock(mm);
+   mmput(mm);
+   return -EBUSY;
+   }
+   }
+   mmap_read_unlock(mm);
+   mmput(mm);
+
+   /* Return the migrated data for verification. */
+   ret = dmirror_bounce_init(, start, size);
+   if (ret)
+   return ret;
+   mutex_lock(>mutex);
+   ret = dmirror_do_read(dmirror, start, end, );
+   mutex_unlock(>mutex);
+   if (ret == 0) {
+   if (copy_to_user(u64_to_user_ptr(cmd->ptr), bounce.ptr,
+bounce.size))
+   ret = -EFAULT;
+   }
+
+   cmd->cpages = bounce.cpages;
+   dmirror_bounce_fini();
+   return ret;
+}
+
 static int dmirror_migrate(struct dmirror *dmirror,
   struct hmm_dmirror_cmd *cmd)
 {
@@ -949,6 +1064,15 @@ static long dmi

[PATCH v8 7/8] nouveau/svm: Refactor nouveau_range_fault

2021-04-07 Thread Alistair Popple

Call mmu_interval_notifier_insert() as part of nouveau_range_fault().
This doesn't introduce any functional change but makes it easier for a
subsequent patch to alter the behaviour of nouveau_range_fault() to
support GPU atomic operations.

Signed-off-by: Alistair Popple 
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 34 ---
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 94f841026c3b..a195e48c9aee 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -567,18 +567,27 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
unsigned long hmm_pfns[1];
struct hmm_range range = {
.notifier = >notifier,
-   .start = notifier->notifier.interval_tree.start,
-   .end = notifier->notifier.interval_tree.last + 1,
.default_flags = hmm_flags,
.hmm_pfns = hmm_pfns,
.dev_private_owner = drm->dev,
};
-   struct mm_struct *mm = notifier->notifier.mm;
+   struct mm_struct *mm = svmm->notifier.mm;
int ret;
 
+   ret = mmu_interval_notifier_insert(>notifier, mm,
+   args->p.addr, args->p.size,
+   _svm_mni_ops);
+   if (ret)
+   return ret;
+
+   range.start = notifier->notifier.interval_tree.start;
+   range.end = notifier->notifier.interval_tree.last + 1;
+
while (true) {
-   if (time_after(jiffies, timeout))
-   return -EBUSY;
+   if (time_after(jiffies, timeout)) {
+   ret = -EBUSY;
+   goto out;
+   }
 
range.notifier_seq = mmu_interval_read_begin(range.notifier);
mmap_read_lock(mm);
@@ -587,7 +596,7 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
if (ret) {
if (ret == -EBUSY)
continue;
-   return ret;
+   goto out;
}
 
mutex_lock(>mutex);
@@ -606,6 +615,9 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
svmm->vmm->vmm.object.client->super = false;
mutex_unlock(>mutex);
 
+out:
+   mmu_interval_notifier_remove(>notifier);
+
return ret;
 }
 
@@ -727,14 +739,8 @@ nouveau_svm_fault(struct nvif_notify *notify)
}
 
notifier.svmm = svmm;
-   ret = mmu_interval_notifier_insert(, mm,
-  args.i.p.addr, args.i.p.size,
-  _svm_mni_ops);
-   if (!ret) {
-   ret = nouveau_range_fault(svmm, svm->drm, ,
-   sizeof(args), hmm_flags, );
-   mmu_interval_notifier_remove();
-   }
+   ret = nouveau_range_fault(svmm, svm->drm, ,
+   sizeof(args), hmm_flags, );
mmput(mm);
 
limit = args.i.p.addr + args.i.p.size;
-- 
2.20.1

[PATCH v8 4/8] mm/rmap: Split migration into its own function

2021-04-07 Thread Alistair Popple

Migration is currently implemented as a mode of operation for
try_to_unmap_one() generally specified by passing the TTU_MIGRATION flag
or in the case of splitting a huge anonymous page TTU_SPLIT_FREEZE.

However it does not have much in common with the rest of the unmap
functionality of try_to_unmap_one() and thus splitting it into a
separate function reduces the complexity of try_to_unmap_one() making it
more readable.

Several simplifications can also be made in try_to_migrate_one() based
on the following observations:

 - All users of TTU_MIGRATION also set TTU_IGNORE_MLOCK.
 - No users of TTU_MIGRATION ever set TTU_IGNORE_HWPOISON.
 - No users of TTU_MIGRATION ever set TTU_BATCH_FLUSH.

TTU_SPLIT_FREEZE is a special case of migration used when splitting an
anonymous page. This is most easily dealt with by calling the correct
function from unmap_page() in mm/huge_memory.c  - either
try_to_migrate() for PageAnon or try_to_unmap().

Signed-off-by: Alistair Popple 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Ralph Campbell 

---

v5:
* Added comments about how PMD splitting works for migration vs.
  unmapping
* Tightened up the flag check in try_to_migrate() to be explicit about
  which TTU_XXX flags are supported.
---
 include/linux/rmap.h |   4 +-
 mm/huge_memory.c |  15 +-
 mm/migrate.c |   9 +-
 mm/rmap.c| 358 ---
 4 files changed, 280 insertions(+), 106 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 38a746787c2f..0e25d829f742 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -86,8 +86,6 @@ struct anon_vma_chain {
 };
 
 enum ttu_flags {
-   TTU_MIGRATION   = 0x1,  /* migration mode */
-
TTU_SPLIT_HUGE_PMD  = 0x4,  /* split huge PMD if any */
TTU_IGNORE_MLOCK= 0x8,  /* ignore mlock */
TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */
@@ -96,7 +94,6 @@ enum ttu_flags {
 * do a final flush if necessary */
TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock:
 * caller holds it */
-   TTU_SPLIT_FREEZE= 0x100,/* freeze pte under 
splitting thp */
 };
 
 #ifdef CONFIG_MMU
@@ -193,6 +190,7 @@ static inline void page_dup_rmap(struct page *page, bool 
compound)
 int page_referenced(struct page *, int is_locked,
struct mem_cgroup *memcg, unsigned long *vm_flags);
 
+bool try_to_migrate(struct page *page, enum ttu_flags flags);
 bool try_to_unmap(struct page *, enum ttu_flags flags);
 
 /* Avoid racy checks */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 89af065cea5b..eab004331b97 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2357,16 +2357,21 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
 
 static void unmap_page(struct page *page)
 {
-   enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK |
-   TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
+   enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
bool unmap_success;
 
VM_BUG_ON_PAGE(!PageHead(page), page);
 
if (PageAnon(page))
-   ttu_flags |= TTU_SPLIT_FREEZE;
-
-   unmap_success = try_to_unmap(page, ttu_flags);
+   unmap_success = try_to_migrate(page, ttu_flags);
+   else
+   /*
+* Don't install migration entries for file backed pages. This
+* helps handle cases when i_size is in the middle of the page
+* as there is no need to unmap pages beyond i_size manually.
+*/
+   unmap_success = try_to_unmap(page, ttu_flags |
+   TTU_IGNORE_MLOCK);
VM_BUG_ON_PAGE(!unmap_success, page);
 }
 
diff --git a/mm/migrate.c b/mm/migrate.c
index b752543adb64..cc4612e2a246 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1130,7 +1130,7 @@ static int __unmap_and_move(struct page *page, struct 
page *newpage,
/* Establish migration ptes */
VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
page);
-   try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK);
+   try_to_migrate(page, 0);
page_was_mapped = 1;
}
 
@@ -1332,7 +1332,7 @@ static int unmap_and_move_huge_page(new_page_t 
get_new_page,
 
if (page_mapped(hpage)) {
bool mapping_locked = false;
-   enum ttu_flags ttu = TTU_MIGRATION|TTU_IGNORE_MLOCK;
+   enum ttu_flags ttu = 0;
 
if (!PageAnon(hpage)) {
/*
@@ -1349,7 +1349,7 @@ static int unmap_and_move_huge_page(new_page_t 
get_new_page,
ttu |= TTU_RMAP_LOCKED;
}
 
-   try_to_unmap(hpage, ttu);
+   try_to_migrate(hpage, ttu);

[PATCH v8 0/8] Add support for SVM atomics in Nouveau

2021-04-07 Thread Alistair Popple

This is the eighth version of a series to add support to Nouveau for atomic
memory operations on OpenCL shared virtual memory (SVM) regions.

The main change for this version is a simplification of device exclusive
entry handling. Instead of copying entries for copy-on-write mappings
during fork they are removed instead. This is safer because there could be
unique corner cases when copying, particularly for pinned pages which
should follow the same logic as copy_present_page(). Removing entries
avoids this possiblity by treating them as normal ptes.

Exclusive device access is implemented by adding a new swap entry type
(SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry. The main
difference is that on fault the original entry is immediately restored by
the fault handler instead of waiting.

Restoring the entry triggers calls to MMU notifers which allows a device
driver to revoke the atomic access permission from the GPU prior to the CPU
finalising the entry.

Patches 1 & 2 refactor existing migration and device private entry
functions.

Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated
functionality into separate functions - try_to_migrate_one() and
try_to_munlock_one(). These should not change any functionality, but any
help testing would be much appreciated as I have not been able to test
every usage of try_to_unmap_one().

Patch 5 contains the bulk of the implementation for device exclusive
memory.

Patch 6 contains some additions to the HMM selftests to ensure everything
works as expected.

Patch 7 is a cleanup for the Nouveau SVM implementation.

Patch 8 contains the implementation of atomic access for the Nouveau
driver.

This has been tested using the latest upstream Mesa userspace with a simple
OpenCL test program which checks the results of atomic GPU operations on a
SVM buffer whilst also writing to the same buffer from the CPU.

Alistair Popple (8):
  mm: Remove special swap entry functions
  mm/swapops: Rework swap entry manipulation code
  mm/rmap: Split try_to_munlock from try_to_unmap
  mm/rmap: Split migration into its own function
  mm: Device exclusive memory access
  mm: Selftests for exclusive device memory
  nouveau/svm: Refactor nouveau_range_fault
  nouveau/svm: Implement atomic SVM access

 Documentation/vm/hmm.rst  |  19 +-
 Documentation/vm/unevictable-lru.rst  |  33 +-
 arch/s390/mm/pgtable.c|   2 +-
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
 drivers/gpu/drm/nouveau/nouveau_svm.c | 156 -
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c|   6 +
 fs/proc/task_mmu.c|  23 +-
 include/linux/mmu_notifier.h  |  26 +-
 include/linux/rmap.h  |  11 +-
 include/linux/swap.h  |   8 +-
 include/linux/swapops.h   | 123 ++--
 lib/test_hmm.c| 126 +++-
 lib/test_hmm_uapi.h   |   2 +
 mm/debug_vm_pgtable.c |  12 +-
 mm/hmm.c  |  12 +-
 mm/huge_memory.c  |  45 +-
 mm/hugetlb.c  |  10 +-
 mm/memcontrol.c   |   2 +-
 mm/memory.c   | 196 +-
 mm/migrate.c  |  51 +-
 mm/mlock.c|  10 +-
 mm/mprotect.c |  18 +-
 mm/page_vma_mapped.c  |  15 +-
 mm/rmap.c | 612 +++---
 tools/testing/selftests/vm/hmm-tests.c| 158 +
 26 files changed, 1366 insertions(+), 312 deletions(-)

-- 
2.20.1

[PATCH v8 2/8] mm/swapops: Rework swap entry manipulation code

2021-04-07 Thread Alistair Popple

Both migration and device private pages use special swap entries that
are manipluated by a range of inline functions. The arguments to these
are somewhat inconsitent so rework them to remove flag type arguments
and to make the arguments similar for both read and write entry
creation.

Signed-off-by: Alistair Popple 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Ralph Campbell 
---
 include/linux/swapops.h | 56 ++---
 mm/debug_vm_pgtable.c   | 12 -
 mm/hmm.c|  2 +-
 mm/huge_memory.c| 26 +--
 mm/hugetlb.c| 10 +---
 mm/memory.c | 10 +---
 mm/migrate.c| 26 ++-
 mm/mprotect.c   | 10 +---
 mm/rmap.c   | 10 +---
 9 files changed, 100 insertions(+), 62 deletions(-)

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 139be8235ad2..4dfd807ae52a 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -100,35 +100,35 @@ static inline void *swp_to_radix_entry(swp_entry_t entry)
 }
 
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
-static inline swp_entry_t make_device_private_entry(struct page *page, bool 
write)
+static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset)
 {
-   return swp_entry(write ? SWP_DEVICE_WRITE : SWP_DEVICE_READ,
-page_to_pfn(page));
+   return swp_entry(SWP_DEVICE_READ, offset);
 }
 
-static inline bool is_device_private_entry(swp_entry_t entry)
+static inline swp_entry_t make_writable_device_private_entry(pgoff_t offset)
 {
-   int type = swp_type(entry);
-   return type == SWP_DEVICE_READ || type == SWP_DEVICE_WRITE;
+   return swp_entry(SWP_DEVICE_WRITE, offset);
 }
 
-static inline void make_device_private_entry_read(swp_entry_t *entry)
+static inline bool is_device_private_entry(swp_entry_t entry)
 {
-   *entry = swp_entry(SWP_DEVICE_READ, swp_offset(*entry));
+   int type = swp_type(entry);
+   return type == SWP_DEVICE_READ || type == SWP_DEVICE_WRITE;
 }
 
-static inline bool is_write_device_private_entry(swp_entry_t entry)
+static inline bool is_writable_device_private_entry(swp_entry_t entry)
 {
return unlikely(swp_type(entry) == SWP_DEVICE_WRITE);
 }
 #else /* CONFIG_DEVICE_PRIVATE */
-static inline swp_entry_t make_device_private_entry(struct page *page, bool 
write)
+static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset)
 {
return swp_entry(0, 0);
 }
 
-static inline void make_device_private_entry_read(swp_entry_t *entry)
+static inline swp_entry_t make_writable_device_private_entry(pgoff_t offset)
 {
+   return swp_entry(0, 0);
 }
 
 static inline bool is_device_private_entry(swp_entry_t entry)
@@ -136,35 +136,32 @@ static inline bool is_device_private_entry(swp_entry_t 
entry)
return false;
 }
 
-static inline bool is_write_device_private_entry(swp_entry_t entry)
+static inline bool is_writable_device_private_entry(swp_entry_t entry)
 {
return false;
 }
 #endif /* CONFIG_DEVICE_PRIVATE */
 
 #ifdef CONFIG_MIGRATION
-static inline swp_entry_t make_migration_entry(struct page *page, int write)
-{
-   BUG_ON(!PageLocked(compound_head(page)));
-
-   return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ,
-   page_to_pfn(page));
-}
-
 static inline int is_migration_entry(swp_entry_t entry)
 {
return unlikely(swp_type(entry) == SWP_MIGRATION_READ ||
swp_type(entry) == SWP_MIGRATION_WRITE);
 }
 
-static inline int is_write_migration_entry(swp_entry_t entry)
+static inline int is_writable_migration_entry(swp_entry_t entry)
 {
return unlikely(swp_type(entry) == SWP_MIGRATION_WRITE);
 }
 
-static inline void make_migration_entry_read(swp_entry_t *entry)
+static inline swp_entry_t make_readable_migration_entry(pgoff_t offset)
 {
-   *entry = swp_entry(SWP_MIGRATION_READ, swp_offset(*entry));
+   return swp_entry(SWP_MIGRATION_READ, offset);
+}
+
+static inline swp_entry_t make_writable_migration_entry(pgoff_t offset)
+{
+   return swp_entry(SWP_MIGRATION_WRITE, offset);
 }
 
 extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
@@ -174,21 +171,28 @@ extern void migration_entry_wait(struct mm_struct *mm, 
pmd_t *pmd,
 extern void migration_entry_wait_huge(struct vm_area_struct *vma,
struct mm_struct *mm, pte_t *pte);
 #else
+static inline swp_entry_t make_readable_migration_entry(pgoff_t offset)
+{
+   return swp_entry(0, 0);
+}
+
+static inline swp_entry_t make_writable_migration_entry(pgoff_t offset)
+{
+   return swp_entry(0, 0);
+}
 
-#define make_migration_entry(page, write) swp_entry(0, 0)
 static inline int is_migration_entry(swp_entry_t swp)
 {
return 0;
 }
 
-static inline void make_migration_entry_read(swp_entry_t *entryp) { }
 static inline void __migration_entry_wait(struct mm_struct *mm

[PATCH v8 8/8] nouveau/svm: Implement atomic SVM access

2021-04-07 Thread Alistair Popple

Some NVIDIA GPUs do not support direct atomic access to system memory
via PCIe. Instead this must be emulated by granting the GPU exclusive
access to the memory. This is achieved by replacing CPU page table
entries with special swap entries that fault on userspace access.

The driver then grants the GPU permission to update the page undergoing
atomic access via the GPU page tables. When CPU access to the page is
required a CPU fault is raised which calls into the device driver via
MMU notifiers to revoke the atomic access. The original page table
entries are then restored allowing CPU access to proceed.

Signed-off-by: Alistair Popple 

---

v7:
* Removed magic values for fault access levels
* Improved readability of fault comparison code

v4:
* Check that page table entries haven't changed before mapping on the
  device
---
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
 drivers/gpu/drm/nouveau/nouveau_svm.c | 126 --
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c|   6 +
 4 files changed, 123 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvif/if000c.h 
b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
index d6dd40f21eed..9c7ff56831c5 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/if000c.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
@@ -77,6 +77,7 @@ struct nvif_vmm_pfnmap_v0 {
 #define NVIF_VMM_PFNMAP_V0_APER   0x00f0ULL
 #define NVIF_VMM_PFNMAP_V0_HOST   0xULL
 #define NVIF_VMM_PFNMAP_V0_VRAM   0x0010ULL
+#define NVIF_VMM_PFNMAP_V0_A 0x0004ULL
 #define NVIF_VMM_PFNMAP_V0_W  0x0002ULL
 #define NVIF_VMM_PFNMAP_V0_V  0x0001ULL
 #define NVIF_VMM_PFNMAP_V0_NONE   0xULL
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index a195e48c9aee..81526d65b4e2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct nouveau_svm {
struct nouveau_drm *drm;
@@ -67,6 +68,11 @@ struct nouveau_svm {
} buffer[1];
 };
 
+#define FAULT_ACCESS_READ 0
+#define FAULT_ACCESS_WRITE 1
+#define FAULT_ACCESS_ATOMIC 2
+#define FAULT_ACCESS_PREFETCH 3
+
 #define SVM_DBG(s,f,a...) NV_DEBUG((s)->drm, "svm: "f"\n", ##a)
 #define SVM_ERR(s,f,a...) NV_WARN((s)->drm, "svm: "f"\n", ##a)
 
@@ -411,6 +417,24 @@ nouveau_svm_fault_cancel_fault(struct nouveau_svm *svm,
  fault->client);
 }
 
+static int
+nouveau_svm_fault_priority(u8 fault)
+{
+   switch (fault) {
+   case FAULT_ACCESS_PREFETCH:
+   return 0;
+   case FAULT_ACCESS_READ:
+   return 1;
+   case FAULT_ACCESS_WRITE:
+   return 2;
+   case FAULT_ACCESS_ATOMIC:
+   return 3;
+   default:
+   WARN_ON_ONCE(1);
+   return -1;
+   }
+}
+
 static int
 nouveau_svm_fault_cmp(const void *a, const void *b)
 {
@@ -421,9 +445,8 @@ nouveau_svm_fault_cmp(const void *a, const void *b)
return ret;
if ((ret = (s64)fa->addr - fb->addr))
return ret;
-   /*XXX: atomic? */
-   return (fa->access == 0 || fa->access == 3) -
-  (fb->access == 0 || fb->access == 3);
+   return nouveau_svm_fault_priority(fa->access) -
+   nouveau_svm_fault_priority(fb->access);
 }
 
 static void
@@ -487,6 +510,10 @@ static bool nouveau_svm_range_invalidate(struct 
mmu_interval_notifier *mni,
struct svm_notifier *sn =
container_of(mni, struct svm_notifier, notifier);
 
+   if (range->event == MMU_NOTIFY_EXCLUSIVE &&
+   range->owner == sn->svmm->vmm->cli->drm->dev)
+   return true;
+
/*
 * serializes the update to mni->invalidate_seq done by caller and
 * prevents invalidation of the PTE from progressing while HW is being
@@ -555,6 +582,71 @@ static void nouveau_hmm_convert_pfn(struct nouveau_drm 
*drm,
args->p.phys[0] |= NVIF_VMM_PFNMAP_V0_W;
 }
 
+static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
+  struct nouveau_drm *drm,
+  struct nouveau_pfnmap_args *args, u32 size,
+  struct svm_notifier *notifier)
+{
+   unsigned long timeout =
+   jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
+   struct mm_struct *mm = svmm->notifier.mm;
+   struct page *page;
+   unsigned long start = args->p.addr;
+

[PATCH v8 1/8] mm: Remove special swap entry functions

2021-04-07 Thread Alistair Popple

Remove multiple similar inline functions for dealing with different
types of special swap entries.

Both migration and device private swap entries use the swap offset to
store a pfn. Instead of multiple inline functions to obtain a struct
page for each swap entry type use a common function
pfn_swap_entry_to_page(). Also open-code the various entry_to_pfn()
functions as this results is shorter code that is easier to understand.

Signed-off-by: Alistair Popple 
Reviewed-by: Ralph Campbell 
Reviewed-by: Christoph Hellwig 

---

v7:
* Reworded commit message to include pfn_swap_entry_to_page()
* Added Christoph's Reviewed-by

v6:
* Removed redundant compound_page() call from inside PageLocked()
* Fixed a minor build issue for s390 reported by kernel test bot

v4:
* Added pfn_swap_entry_to_page()
* Reinstated check that migration entries point to locked pages
* Removed #define swapcache_prepare which isn't needed for CONFIG_SWAP=0
  builds
---
 arch/s390/mm/pgtable.c  |  2 +-
 fs/proc/task_mmu.c  | 23 +-
 include/linux/swap.h|  4 +--
 include/linux/swapops.h | 69 ++---
 mm/hmm.c|  5 ++-
 mm/huge_memory.c|  4 +--
 mm/memcontrol.c |  2 +-
 mm/memory.c | 10 +++---
 mm/migrate.c|  6 ++--
 mm/page_vma_mapped.c|  6 ++--
 10 files changed, 50 insertions(+), 81 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 18205f851c24..eec3a9d7176e 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -691,7 +691,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, 
swp_entry_t entry)
if (!non_swap_entry(entry))
dec_mm_counter(mm, MM_SWAPENTS);
else if (is_migration_entry(entry)) {
-   struct page *page = migration_entry_to_page(entry);
+   struct page *page = pfn_swap_entry_to_page(entry);
 
dec_mm_counter(mm, mm_counter(page));
}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3cec6fbef725..08ee59d945c0 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -514,10 +514,8 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
} else {
mss->swap_pss += (u64)PAGE_SIZE << PSS_SHIFT;
}
-   } else if (is_migration_entry(swpent))
-   page = migration_entry_to_page(swpent);
-   else if (is_device_private_entry(swpent))
-   page = device_private_entry_to_page(swpent);
+   } else if (is_pfn_swap_entry(swpent))
+   page = pfn_swap_entry_to_page(swpent);
} else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
&& pte_none(*pte))) {
page = xa_load(>vm_file->f_mapping->i_pages,
@@ -549,7 +547,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
swp_entry_t entry = pmd_to_swp_entry(*pmd);
 
if (is_migration_entry(entry))
-   page = migration_entry_to_page(entry);
+   page = pfn_swap_entry_to_page(entry);
}
if (IS_ERR_OR_NULL(page))
return;
@@ -691,10 +689,8 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long 
hmask,
} else if (is_swap_pte(*pte)) {
swp_entry_t swpent = pte_to_swp_entry(*pte);
 
-   if (is_migration_entry(swpent))
-   page = migration_entry_to_page(swpent);
-   else if (is_device_private_entry(swpent))
-   page = device_private_entry_to_page(swpent);
+   if (is_pfn_swap_entry(swpent))
+   page = pfn_swap_entry_to_page(swpent);
}
if (page) {
int mapcount = page_mapcount(page);
@@ -1383,11 +1379,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct 
pagemapread *pm,
frame = swp_type(entry) |
(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
flags |= PM_SWAP;
-   if (is_migration_entry(entry))
-   page = migration_entry_to_page(entry);
-
-   if (is_device_private_entry(entry))
-   page = device_private_entry_to_page(entry);
+   if (is_pfn_swap_entry(entry))
+   page = pfn_swap_entry_to_page(entry);
}
 
if (page && !PageAnon(page))
@@ -1444,7 +1437,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long 
addr, unsigned long end,
if (pmd_swp_soft_dirty(pmd))
flags |= PM_SOFT_DIRTY;
VM_BUG_ON(!is_pmd_migration_entry(pmd));
-   page = migration_entry_to_page(entry);
+   page = pfn_s

[PATCH v8 5/8] mm: Device exclusive memory access

2021-04-07 Thread Alistair Popple

Some devices require exclusive write access to shared virtual
memory (SVM) ranges to perform atomic operations on that memory. This
requires CPU page tables to be updated to deny access whilst atomic
operations are occurring.

In order to do this introduce a new swap entry
type (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for
exclusive access by a device all page table mappings for the particular
range are replaced with device exclusive swap entries. This causes any
CPU access to the page to result in a fault.

Faults are resovled by replacing the faulting entry with the original
mapping. This results in MMU notifiers being called which a driver uses
to update access permissions such as revoking atomic access. After
notifiers have been called the device will no longer have exclusive
access to the region.

Signed-off-by: Alistair Popple 
Reviewed-by: Christoph Hellwig 

---

v8:
* Remove device exclusive entries on fork rather than copy them.

v7:
* Added Christoph's Reviewed-by.
* Minor cosmetic cleanups suggested by Christoph.
* Replace mmu_notifier_range_init_migrate/exclusive with
  mmu_notifier_range_init_owner as suggested by Christoph.
* Replaced lock_page() with lock_page_retry() when handling faults.
* Restrict to anonymous pages for now.

v6:
* Fixed a bisectablity issue due to incorrectly applying the rename of
  migrate_pgmap_owner to the wrong patches for Nouveau and hmm_test.

v5:
* Renamed range->migrate_pgmap_owner to range->owner.
* Added MMU_NOTIFY_EXCLUSIVE to allow passing of a driver cookie which
  allows notifiers called as a result of make_device_exclusive_range() to
  be ignored.
* Added a check to try_to_protect_one() to detect if the pages originally
  returned from get_user_pages() have been unmapped or not.
* Removed check_device_exclusive_range() as it is no longer required with
  the other changes.
* Documentation update.

v4:
* Add function to check that mappings are still valid and exclusive.
* s/long/unsigned long/ in make_device_exclusive_entry().
---
 Documentation/vm/hmm.rst  |  19 ++-
 drivers/gpu/drm/nouveau/nouveau_svm.c |   2 +-
 include/linux/mmu_notifier.h  |  26 ++--
 include/linux/rmap.h  |   4 +
 include/linux/swap.h  |   4 +-
 include/linux/swapops.h   |  44 +-
 lib/test_hmm.c|   2 +-
 mm/hmm.c  |   5 +
 mm/memory.c   | 176 +++--
 mm/migrate.c  |  10 +-
 mm/mprotect.c |   8 +
 mm/page_vma_mapped.c  |   9 +-
 mm/rmap.c | 210 ++
 13 files changed, 487 insertions(+), 32 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index 09e28507f5b2..a14c2938e7af 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -332,7 +332,7 @@ between device driver specific code and shared common code:
walks to fill in the ``args->src`` array with PFNs to be migrated.
The ``invalidate_range_start()`` callback is passed a
``struct mmu_notifier_range`` with the ``event`` field set to
-   ``MMU_NOTIFY_MIGRATE`` and the ``migrate_pgmap_owner`` field set to
+   ``MMU_NOTIFY_MIGRATE`` and the ``owner`` field set to
the ``args->pgmap_owner`` field passed to migrate_vma_setup(). This is
allows the device driver to skip the invalidation callback and only
invalidate device private MMU mappings that are actually migrating.
@@ -405,6 +405,23 @@ between device driver specific code and shared common code:
 
The lock can now be released.
 
+Exclusive access memory
+===
+
+Some devices have features such as atomic PTE bits that can be used to 
implement
+atomic access to system memory. To support atomic operations to a shared 
virtual
+memory page such a device needs access to that page which is exclusive of any
+userspace access from the CPU. The ``make_device_exclusive_range()`` function
+can be used to make a memory range inaccessible from userspace.
+
+This replaces all mappings for pages in the given range with special swap
+entries. Any attempt to access the swap entry results in a fault which is
+resovled by replacing the entry with the original mapping. A driver gets
+notified that the mapping has been changed by MMU notifiers, after which point
+it will no longer have exclusive access to the page. Exclusive access is
+guranteed to last until the driver drops the page lock and page reference, at
+which point any CPU faults on the page may proceed as described.
+
 Memory cgroup (memcg) and rss accounting
 
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index f18bd53da052..94f841026c3b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -

Re: [PATCH v2] kernel/resource: Fix locking in request_free_mem_region

2021-03-31 Thread Alistair Popple

On Thursday, 1 April 2021 3:56:05 PM AEDT Muchun Song wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Fri, Mar 26, 2021 at 9:22 AM Alistair Popple  wrote:
> >
> > request_free_mem_region() is used to find an empty range of physical
> > addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
> > over the range of possible addresses using region_intersects() to see if
> > the range is free.
> >
> > region_intersects() obtains a read lock before walking the resource tree
> > to protect against concurrent changes. However it drops the lock prior
> > to returning. This means by the time request_mem_region() is called in
> > request_free_mem_region() another thread may have already reserved the
> > requested region resulting in unexpected failures and a message in the
> > kernel log from hitting this condition:
> >
> > /*
> >  * mm/hmm.c reserves physical addresses which then
> >  * become unavailable to other users.  Conflicts are
> >  * not expected.  Warn to aid debugging if encountered.
> >  */
> > if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) {
> > pr_warn("Unaddressable device %s %pR conflicts with %pR",
> > conflict->name, conflict, res);
> >
> > To fix this create versions of region_intersects() and
> > request_mem_region() that allow the caller to take the appropriate lock
> > such that it may be held over the required calls.
> >
> > Instead of creating another version of devm_request_mem_region() that
> > doesn't take the lock open-code it to allow the caller to pre-allocate
> > the required memory prior to taking the lock.
> >
> > Fixes: 0c385190392d8 ("resource: add a not device managed 
request_free_mem_region variant")
> > Fixes: 0092908d16c60 ("mm: factor out a devm_request_free_mem_region 
helper")
> > Fixes: 4ef589dc9b10c ("mm/hmm/devmem: device memory hotplug using 
ZONE_DEVICE")
> > Signed-off-by: Alistair Popple 
> 
> +cc my email (songmuc...@bytedance.com).
> 
> Hi Alistair,
> 
> Thanks for your patch. But there is a deadlock that should be fixed.
> Please see the following scenario.
> 
> __request_region
> write_lock(_lock)
> request_region_locked
> revoke_iomem
> devmem_is_allowed (arch/x86/mm/init.c)
> region_intersects
> read_lock(_lock)   // deadlock

Thanks for the report and apologies for the breakage. The kernel test robot 
caught it pretty quickly - see https://lore.kernel.org/linux-mm/
20210330003842.18948-1-apop...@nvidia.com/ for an updated version that fixes 
this.

 - Alistair

> >
> > ---
> >
> > v2:
> >  - Added Fixes tag
> >
> > ---
> >  kernel/resource.c | 146 +-
> >  1 file changed, 94 insertions(+), 52 deletions(-)
> >
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index 627e61b0c124..2d4652383dd2 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -523,6 +523,34 @@ int __weak page_is_ram(unsigned long pfn)
> >  }
> >  EXPORT_SYMBOL_GPL(page_is_ram);
> >
> > +static int __region_intersects(resource_size_t start, size_t size,
> > +  unsigned long flags, unsigned long desc)
> > +{
> > +   struct resource res;
> > +   int type = 0; int other = 0;
> > +   struct resource *p;
> > +
> > +   res.start = start;
> > +   res.end = start + size - 1;
> > +
> > +   for (p = iomem_resource.child; p ; p = p->sibling) {
> > +   bool is_type = (((p->flags & flags) == flags) &&
> > +   ((desc == IORES_DESC_NONE) ||
> > +(desc == p->desc)));
> > +
> > +   if (resource_overlaps(p, ))
> > +   is_type ? type++ : other++;
> > +   }
> > +
> > +   if (type == 0)
> > +   return REGION_DISJOINT;
> > +
> > +   if (other == 0)
> > +   return REGION_INTERSECTS;
> > +
> > +   return REGION_MIXED;
> > +}
> > +
> >  /**
> >   * region_intersects() - determine intersection of region with known 
resources
> >   * @start: region start address
> > @@ -546,31 +574,12 @@ EXPORT_SYMBOL_GPL(page_is_ram);
> >  int region_intersects(resource_size_t start, size_t size, unsigned long 
flags,
> >

Re: [PATCH v7 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-03-31 Thread Alistair Popple

On Wednesday, 31 March 2021 10:57:46 PM AEDT Jason Gunthorpe wrote:
> On Wed, Mar 31, 2021 at 03:15:47PM +1100, Alistair Popple wrote:
> > On Wednesday, 31 March 2021 2:56:38 PM AEDT John Hubbard wrote:
> > > On 3/30/21 3:56 PM, Alistair Popple wrote:
> > > ...
> > > >> +1 for renaming "munlock*" items to "mlock*", where applicable. good 
> > grief.
> > > > 
> > > > At least the situation was weird enough to prompt further 
investigation :)
> > > > 
> > > > Renaming to mlock* doesn't feel like the right solution to me either 
> > though. I
> > > > am not sure if you saw me responding to myself earlier but I am 
thinking
> > > > renaming try_to_munlock() -> page_mlocked() and try_to_munlock_one() -
>
> > > > page_mlock_one() might be better. Thoughts?
> > > > 
> > > 
> > > Quite confused by this naming idea. Because: try_to_munlock() returns
> > > void, so a boolean-style name such as "page_mlocked()" is already not a
> > > good fit.
> > > 
> > > Even more important, though, is that try_to_munlock() is mlock-ing the
> > > page, right? Is there some subtle point I'm missing? It really is doing
> > > an mlock to the best of my knowledge here. Although the kerneldoc
> > > comment for try_to_munlock() seems questionable too:
> > 
> > It's mlocking the page if it turns out it still needs to be locked after 
> > unlocking it. But I don't think you're missing anything.
> 
> It is really searching all VMA's to see if the VMA flag is set and if
> any are found then it mlocks the page.
> 
> But presenting this rountine in its simplified form raises lots of
> questions:
> 
>  - What locking is being used to read the VMA flag?
>  - Why do we need to manipulate global struct page flags under the
>page table locks of a single VMA?

I was wondering that and questioned it in an earlier version of this series. I 
have done some digging and the commit log for b87537d9e2fe ("mm: rmap use pte 
lock not mmap_sem to set PageMlocked") provides the original justification.

It's fairly long so I won't quote it here but the summary seems to be that 
among other things the combination of page lock and ptl makes this safe. I 
have yet to verify if everything there still holds and is sensible, but the 
last paragraph certainly is :-)

"Stopped short of separating try_to_munlock_one() from try_to_munmap_one()
on this occasion, but that's probably the sensible next step - with a
rename, given that try_to_munlock()'s business is to try to set Mlocked."

>  - Why do we need to check for huge pages inside the VMA loop, not
>before going to the rmap? PageTransCompoundHead() is not sensitive to
>the PTEs. (and what happens if the huge page breaks up concurrently?)
>  - Why do we clear the mlock bit then run around to try and set it?

I don't have an answer for that as I'm not (yet) across all the mlock code 
paths, but I'm hoping this patch at least won't change anything.

>Feels racey.
>
> Jason
>

Re: [PATCH v7 5/8] mm: Device exclusive memory access

2021-03-31 Thread Alistair Popple

On Thursday, 1 April 2021 11:48:13 AM AEDT Jason Gunthorpe wrote:
> On Thu, Apr 01, 2021 at 11:45:57AM +1100, Alistair Popple wrote:
> > On Thursday, 1 April 2021 12:46:04 AM AEDT Jason Gunthorpe wrote:
> > > On Thu, Apr 01, 2021 at 12:27:52AM +1100, Alistair Popple wrote:
> > > > On Thursday, 1 April 2021 12:18:54 AM AEDT Jason Gunthorpe wrote:
> > > > > On Wed, Mar 31, 2021 at 11:59:28PM +1100, Alistair Popple wrote:
> > > > > 
> > > > > > I guess that makes sense as the split could go either way at the
> > > > > > moment but I should add a check to make sure this isn't used with
> > > > > > pinned pages anyway.
> > > > > 
> > > > > Is it possible to have a pinned page under one of these things? If I
> > > > > pin it before you migrate it then it remains pinned but hidden under
> > > > > the swap entry?
> > > > 
> > > > At the moment yes. But I had planned (and this reminded me) to add a 
check 
> > to 
> > > > prevent marking pinned pages for exclusive access. 
> > > 
> > > How do you even do that without races with GUP fast?
> > 
> > Unless I've missed something I think I've convinced myself it should be 
safe 
> > to do the pin check after make_device_exclusive() has replaced all the 
PTEs 
> > with exclusive entries.
> > 
> > GUP fast sequence:
> > 1. Read PTE
> > 2. Pin page
> > 3. Check PTE
> > 4. if PTE changed -> unpin and fallback
> > 
> > If make_device_exclusive() runs after (1) it will either succeed or see 
the 
> > pin from (2) and fail (as desired). GUP should always see the PTE change 
and 
> > fallback which will revoke the exclusive access.
> 
> AFAICT the user can trigger fork at that instant and fork will try to
> copy the desposited migration entry before it has been checked

In that case the child will get a read-only exclusive entry and eventually a 
page copy via do_wp_page() and GUP will fallback (or fail in the case of fast 
only) so the parent's exclusive entry will get removed before the page can be 
pinned and therefore shouldn't split the wrong way.

But that is sounding rather complex, and I am not convinced I haven't missed a 
corner case. It also seems like it shouldn't be necessary to copy exclusive 
entries anyway. I could just remove them and restore the original entry, which 
would be far simpler.

> Jason
>

Re: [PATCH v7 5/8] mm: Device exclusive memory access

2021-03-31 Thread Alistair Popple

On Thursday, 1 April 2021 12:46:04 AM AEDT Jason Gunthorpe wrote:
> On Thu, Apr 01, 2021 at 12:27:52AM +1100, Alistair Popple wrote:
> > On Thursday, 1 April 2021 12:18:54 AM AEDT Jason Gunthorpe wrote:
> > > On Wed, Mar 31, 2021 at 11:59:28PM +1100, Alistair Popple wrote:
> > > 
> > > > I guess that makes sense as the split could go either way at the
> > > > moment but I should add a check to make sure this isn't used with
> > > > pinned pages anyway.
> > > 
> > > Is it possible to have a pinned page under one of these things? If I
> > > pin it before you migrate it then it remains pinned but hidden under
> > > the swap entry?
> > 
> > At the moment yes. But I had planned (and this reminded me) to add a check 
to 
> > prevent marking pinned pages for exclusive access. 
> 
> How do you even do that without races with GUP fast?

Unless I've missed something I think I've convinced myself it should be safe 
to do the pin check after make_device_exclusive() has replaced all the PTEs 
with exclusive entries.

GUP fast sequence:
1. Read PTE
2. Pin page
3. Check PTE
4. if PTE changed -> unpin and fallback

If make_device_exclusive() runs after (1) it will either succeed or see the 
pin from (2) and fail (as desired). GUP should always see the PTE change and 
fallback which will revoke the exclusive access.

 - Alistair

> Jason
>

Re: [PATCH v7 5/8] mm: Device exclusive memory access

2021-03-31 Thread Alistair Popple

On Thursday, 1 April 2021 12:18:54 AM AEDT Jason Gunthorpe wrote:
> On Wed, Mar 31, 2021 at 11:59:28PM +1100, Alistair Popple wrote:
> 
> > I guess that makes sense as the split could go either way at the
> > moment but I should add a check to make sure this isn't used with
> > pinned pages anyway.
> 
> Is it possible to have a pinned page under one of these things? If I
> pin it before you migrate it then it remains pinned but hidden under
> the swap entry?

At the moment yes. But I had planned (and this reminded me) to add a check to 
prevent marking pinned pages for exclusive access. This check was in the 
original migration based implementation as I don't think it makes much sense 
to allow exclusive access to pinned pages given it indicates another device is 
possibly using it. 

> So the special logic is needed and the pinned page has to be copied
> and written as a normal pte, not dropped as a migration entry

Yep, if we end up allowing pinned pages to exist under these then that makes 
sense. Thanks for the clarification.

 - Alistair

> Jason
>

Re: [PATCH v7 5/8] mm: Device exclusive memory access

2021-03-31 Thread Alistair Popple

On Wednesday, 31 March 2021 6:32:34 AM AEDT Jason Gunthorpe wrote:
> On Fri, Mar 26, 2021 at 11:08:02AM +1100, Alistair Popple wrote:
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 3a5705cfc891..33d11527ef77 100644
> > +++ b/mm/memory.c
> > @@ -781,6 +781,27 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
> > pte = pte_swp_mkuffd_wp(pte);
> > set_pte_at(src_mm, addr, src_pte, pte);
> > }
> > +   } else if (is_device_exclusive_entry(entry)) {
> > +   page = pfn_swap_entry_to_page(entry);
> > +
> > +   get_page(page);
> > +   rss[mm_counter(page)]++;
> > +
> > +   if (is_writable_device_exclusive_entry(entry) &&
> > +   is_cow_mapping(vm_flags)) {
> > +   /*
> > +* COW mappings require pages in both
> > +* parent and child to be set to read.
> > +*/
> > +   entry = make_readable_device_exclusive_entry(
> > +   swp_offset(entry));
> > +   pte = swp_entry_to_pte(entry);
> > +   if (pte_swp_soft_dirty(*src_pte))
> > +   pte = pte_swp_mksoft_dirty(pte);
> > +   if (pte_swp_uffd_wp(*src_pte))
> > +   pte = pte_swp_mkuffd_wp(pte);
> > +   set_pte_at(src_mm, addr, src_pte, pte);
> > +   }
> 
> This needs to have the same logic as we now have in
> copy_present_page(). The page *is* present and we can't copy the PTE
> value hidden in a swap entry if we can't copy the PTE normally.

You're saying we need to use copy_present_page() to make sure the split goes 
the right way for pinned pages? I guess that makes sense as the split could go 
either way at the moment but I should add a check to make sure this isn't used 
with pinned pages anyway.

 - Alistair

> The code should be shared because nobody is going to remember about
> this corner case.
> 
> Jason
>

Re: [PATCH v2] kernel/resource: Fix locking in request_free_mem_region

2021-03-31 Thread Alistair Popple

On Tuesday, 30 March 2021 8:13:32 PM AEDT David Hildenbrand wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 29.03.21 03:37, Alistair Popple wrote:
> > On Friday, 26 March 2021 7:57:51 PM AEDT David Hildenbrand wrote:
> >> On 26.03.21 02:20, Alistair Popple wrote:
> >>> request_free_mem_region() is used to find an empty range of physical
> >>> addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
> >>> over the range of possible addresses using region_intersects() to see if
> >>> the range is free.
> >>
> >> Just a high-level question: how does this iteract with memory
> >> hot(un)plug? IOW, how defines and manages the "range of possible
> >> addresses" ?
> >
> > Both the driver and the maximum physical address bits available define the
> > range of possible addresses for device private memory. From
> > __request_free_mem_region():
> >
> > end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
> > addr = end - size + 1UL;
> >
> > There is no lower address range bound here so it is effectively zero. The 
code
> > will try to allocate the highest possible physical address first and 
continue
> > searching down for a free block. Does that answer your question?
> 
> Oh, sorry, the fist time I had a look I got it wrong - I thought (1UL <<
> MAX_PHYSMEM_BITS) would be the lower address limit. That looks indeed
> problematic to me.
> 
> You might end up reserving an iomem region that could be used e.g., by
> memory hotplug code later. If someone plugs a DIMM or adds memory via
> different approaches (virtio-mem), memory hotplug (via add_memory())
> would fail.
> 
> You never should be touching physical memory area reserved for memory
> hotplug, i.e., via SRAT.
> 
> What is the expectation here?

Most drivers call request_free_mem_region() with iomem_resource as the base. 
So zone device private pages currently tend to get allocated from the top of 
that.

By definition ZONE_DEVICE private pages are unaddressable from the CPU. So in 
terms of expectation I think all that is really required for ZONE_DEVICE 
private pages (at least for Nouveau) is a valid range of physical addresses 
that allow page_to_pfn() and pfn_to_page() to work correctly. To make this 
work drivers add the pages via memremap_pages() -> pagemap_range() -> 
add_pages().

 - Alistair

> --
> Thanks,
> 
> David / dhildenb
>

Re: [PATCH v7 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-03-30 Thread Alistair Popple

On Wednesday, 31 March 2021 2:56:38 PM AEDT John Hubbard wrote:
> On 3/30/21 3:56 PM, Alistair Popple wrote:
> ...
> >> +1 for renaming "munlock*" items to "mlock*", where applicable. good 
grief.
> > 
> > At least the situation was weird enough to prompt further investigation :)
> > 
> > Renaming to mlock* doesn't feel like the right solution to me either 
though. I
> > am not sure if you saw me responding to myself earlier but I am thinking
> > renaming try_to_munlock() -> page_mlocked() and try_to_munlock_one() ->
> > page_mlock_one() might be better. Thoughts?
> > 
> 
> Quite confused by this naming idea. Because: try_to_munlock() returns
> void, so a boolean-style name such as "page_mlocked()" is already not a
> good fit.
> 
> Even more important, though, is that try_to_munlock() is mlock-ing the
> page, right? Is there some subtle point I'm missing? It really is doing
> an mlock to the best of my knowledge here. Although the kerneldoc
> comment for try_to_munlock() seems questionable too:

It's mlocking the page if it turns out it still needs to be locked after 
unlocking it. But I don't think you're missing anything.

> /**
>   * try_to_munlock - try to munlock a page
>   * @page: the page to be munlocked
>   *
>   * Called from munlock code.  Checks all of the VMAs mapping the page
>   * to make sure nobody else has this page mlocked. The page will be
>   * returned with PG_mlocked cleared if no other vmas have it mlocked.
>   */
> 
> ...because I don't see where, in *this* routine, it clears PG_mlocked!
>
> Obviously we agree that a routine should be named based on what it does,
> rather than on who calls it. So I think that still leads to:
> 
>   try_to_munlock() --> try_to_mlock()
>   try_to_munlock_one() --> try_to_mlock_one()
> 
> Sorry if I'm missing something really obvious.

Nope, I confused things somewhat by blindly quoting the documentation whilst 
forgetting that try_to_munlock() returns void rather than a bool.

> > This is actually inspired from a suggestion in Documentation/vm/
unevictable-
> > lru.rst which warns about this problem:
> > 
> > try_to_munlock() Reverse Map Scan
> > -
> > 
> > .. warning::
> > [!] TODO/FIXME: a better name might be page_mlocked() - analogous to 
the
> > page_referenced() reverse map walker.
> > 
> 
> This is actually rather bad advice! page_referenced() returns an
> int-that-is-really-a-boolean, whereas try_to_munlock(), at least as it
> stands now, returns void. Usually when I'm writing a TODO item, I'm in a
> hurry, and I think that's what probably happened here, too. :)

So I think we're in agreement. The naming is bad and the advice in the 
documentation is also questionable :-)

Thanks for the thoughts, I will re-send this with naming and documentation 
updates.

> >> Although, it seems reasonable to tack such renaming patches onto the tail
> > end
> >> of this series. But whatever works.
> > 
> > Unless anyone objects strongly I will roll the rename into this patch as 
there
> > is only one caller of try_to_munlock.
> > 
> >   - Alistair
> > 
> 
> No objections here. :)
> 
> thanks,
>

Re: [PATCH v7 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-03-30 Thread Alistair Popple

On Wednesday, 31 March 2021 9:43:19 AM AEDT John Hubbard wrote:
> On 3/30/21 3:24 PM, Jason Gunthorpe wrote:
> ...
> >> As far as I can tell this has always been called try_to_munlock() even 
though
> >> it appears to do the opposite.
> > 
> > Maybe we should change it then?
> > 
> >>> /**
> >>>   * try_to_munlock - try to munlock a page
> >>>   * @page: the page to be munlocked
> >>>   *
> >>>   * Called from munlock code.  Checks all of the VMAs mapping the page
> >>>   * to make sure nobody else has this page mlocked. The page will be
> >>>   * returned with PG_mlocked cleared if no other vmas have it mlocked.
> >>>   */
> >>
> >> In other words it sets PG_mlocked if one or more vmas has it mlocked. So
> >> try_to_mlock() might be a better name, except that seems to have the 
potential
> >> for confusion as well because it's only called from the munlock code path 
and
> >> never for mlock.
> > 
> > That explanation makes more sense.. This function looks like it is
> > 'set PG_mlocked of the page if any vm->flags has VM_LOCKED'
> > 
> > Maybe call it check_vm_locked or something then and reword the above
> > comment?
> > 
> > (and why is it OK to read vm->flags for this without any locking?)
> > 
> >>> Something needs attention here..
> >>
> >> I think the code is correct, but perhaps the naming could be better. 
Would be
> >> interested hearing any thoughts on renaming try_to_munlock() to 
try_to_mlock()
> >> as the current name appears based on the context it is called from 
(munlock)
> >> rather than what it does (mlock).
> > 
> > The point of this patch is to make it clearer, after all, so I'd
> > change something and maybe slightly clarify the comment.
> > 

Yep, agree with that.
 
> I'd add that, after looking around the calling code, this is a really 
unhappy
> pre-existing situation. Anyone reading this has to remember at which point 
in the
> call stack the naming transitions from "do the opposite of what the name 
says",
> to "do what the name says".
>
> +1 for renaming "munlock*" items to "mlock*", where applicable. good grief.

At least the situation was weird enough to prompt further investigation :) 

Renaming to mlock* doesn't feel like the right solution to me either though. I 
am not sure if you saw me responding to myself earlier but I am thinking 
renaming try_to_munlock() -> page_mlocked() and try_to_munlock_one() -> 
page_mlock_one() might be better. Thoughts?

This is actually inspired from a suggestion in Documentation/vm/unevictable-
lru.rst which warns about this problem:

try_to_munlock() Reverse Map Scan
-

.. warning::
   [!] TODO/FIXME: a better name might be page_mlocked() - analogous to the
   page_referenced() reverse map walker.

> Although, it seems reasonable to tack such renaming patches onto the tail 
end
> of this series. But whatever works.

Unless anyone objects strongly I will roll the rename into this patch as there 
is only one caller of try_to_munlock.

 - Alistair

> thanks,
>

Re: [PATCH v7 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-03-30 Thread Alistair Popple

On Wednesday, 31 March 2021 9:09:30 AM AEDT Alistair Popple wrote:
> On Wednesday, 31 March 2021 5:49:03 AM AEDT Jason Gunthorpe wrote:
> > On Fri, Mar 26, 2021 at 11:08:00AM +1100, Alistair Popple wrote:



> > So what clears PG_mlocked on this call path?
> 
> See munlock_vma_page(). munlock works by clearing PG_mlocked, then calling 
> try_to_munlock to check if any VMAs still need it locked in which case 
> PG_mlocked gets set again. There are no other callers of try_to_munlock().
> 
> > Something needs attention here..
> 
> I think the code is correct, but perhaps the naming could be better. Would be 
> interested hearing any thoughts on renaming try_to_munlock() to 
> try_to_mlock() 
> as the current name appears based on the context it is called from (munlock) 
> rather than what it does (mlock).

Actually Documentation/vm/unevictable-lru.rst contains a better suggestion:

try_to_munlock() Reverse Map Scan
-

.. warning::
   [!] TODO/FIXME: a better name might be page_mlocked() - analogous to the
   page_referenced() reverse map walker.

Thoughts on renaming try_to_unlock() -> page_mlocked() and try_to_munlock_one() 
-> page_mlock_one()?

>  - Alistair
> 
> > Jason
> > 
> 
> 
> 
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
>

Re: [PATCH v7 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-03-30 Thread Alistair Popple

On Wednesday, 31 March 2021 5:49:03 AM AEDT Jason Gunthorpe wrote:
> On Fri, Mar 26, 2021 at 11:08:00AM +1100, Alistair Popple wrote:
> 
> > +static bool try_to_munlock_one(struct page *page, struct vm_area_struct 
*vma,
> > +unsigned long address, void *arg)
> > +{
> 
> Is this function name right?

Perhaps. This is called from try_to_munlock() hence the name, but see below 
for some commentary on that naming.

> > +   struct page_vma_mapped_walk pvmw = {
> > +   .page = page,
> > +   .vma = vma,
> > +   .address = address,
> > +   };
> > +
> > +   /* munlock has nothing to gain from examining un-locked vmas */
> > +   if (!(vma->vm_flags & VM_LOCKED))
> > +   return true;
> > +
> > +   while (page_vma_mapped_walk()) {
> > +   /* PTE-mapped THP are never mlocked */
> > +   if (!PageTransCompound(page)) {
> > +   /*
> > +* Holding pte lock, we do *not* need
> > +* mmap_lock here
> > +*/
> > +   mlock_vma_page(page);
> 
> Because the only action this function seems to take is to call
> *mlock*_vma_page()
> 
> > +   }
> > +   page_vma_mapped_walk_done();
> > +
> > +   /* found a mlocked page, no point continuing munlock check */
> > +   return false;
> > +   }
> > +
> > +   return true;
> > +}
> > +
> >  /**
> >   * try_to_munlock - try to munlock a page
> >   * @page: the page to be munlocked
> > @@ -1796,8 +1821,7 @@ bool try_to_unmap(struct page *page, enum ttu_flags 
flags)
> >  void try_to_munlock(struct page *page)
> >  {
> 
> But this is also called try_to_munlock ??

As far as I can tell this has always been called try_to_munlock() even though 
it appears to do the opposite.

> /**
>  * try_to_munlock - try to munlock a page
>  * @page: the page to be munlocked
>  *
>  * Called from munlock code.  Checks all of the VMAs mapping the page
>  * to make sure nobody else has this page mlocked. The page will be
>  * returned with PG_mlocked cleared if no other vmas have it mlocked.
>  */

In other words it sets PG_mlocked if one or more vmas has it mlocked. So 
try_to_mlock() might be a better name, except that seems to have the potential 
for confusion as well because it's only called from the munlock code path and 
never for mlock.

> So what clears PG_mlocked on this call path?

See munlock_vma_page(). munlock works by clearing PG_mlocked, then calling 
try_to_munlock to check if any VMAs still need it locked in which case 
PG_mlocked gets set again. There are no other callers of try_to_munlock().

> Something needs attention here..

I think the code is correct, but perhaps the naming could be better. Would be 
interested hearing any thoughts on renaming try_to_munlock() to try_to_mlock() 
as the current name appears based on the context it is called from (munlock) 
rather than what it does (mlock).

 - Alistair

> Jason
>

Re: [PATCH v3] kernel/resource: Fix locking in request_free_mem_region

2021-03-29 Thread Alistair Popple

On Tuesday, 30 March 2021 2:42:34 PM AEDT John Hubbard wrote:
> On 3/29/21 5:38 PM, Alistair Popple wrote:
> > request_free_mem_region() is used to find an empty range of physical
> > addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
> > over the range of possible addresses using region_intersects() to see if
> > the range is free.
> > 
> > region_intersects() obtains a read lock before walking the resource tree
> > to protect against concurrent changes. However it drops the lock prior
> > to returning. This means by the time request_mem_region() is called in
> > request_free_mem_region() another thread may have already reserved the
> > requested region resulting in unexpected failures and a message in the
> > kernel log from hitting this condition:
> > 
> >  /*
> >   * mm/hmm.c reserves physical addresses which then
> >   * become unavailable to other users.  Conflicts are
> >   * not expected.  Warn to aid debugging if encountered.
> >   */
> >  if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) {
> >  pr_warn("Unaddressable device %s %pR conflicts with %pR",
> >  conflict->name, conflict, res);
> > 
> > To fix this create versions of region_intersects() and
> > request_mem_region() that allow the caller to take the appropriate lock
> > such that it may be held over the required calls.
> > 
> > Instead of creating another version of devm_request_mem_region() that
> > doesn't take the lock open-code it to allow the caller to pre-allocate
> > the required memory prior to taking the lock.
> > 
> > Fixes: 0c385190392d8 ("resource: add a not device managed 
request_free_mem_region variant")
> > Fixes: 0092908d16c60 ("mm: factor out a devm_request_free_mem_region 
helper")
> 
> Hi Alistair!
> 
> The above "Fixes:" tag looks wrong to me, because that commit did not create
> the broken locking that this patch fixes. Therefore, I think that particular 
line
> should be removed from the commit description.

Right, the last "Fixes:" tag is the origin of the bug but the refactoring into 
different functions and files made it non-obvious how this patch was related. 
Happy to drop these though.
 
> Another note below:
> 
> > Fixes: 4ef589dc9b10c ("mm/hmm/devmem: device memory hotplug using 
ZONE_DEVICE")
> > Signed-off-by: Alistair Popple 
> > Acked-by: Balbir Singh 
> > Reported-by: kernel test robot 
> > 
> > ---
> > 
> > Hi Andrew,
> > 
> > This fixes a boot issue reported by the kernel test robot with the
> > previous version of the patch on x86 with CONFIG_IO_STRICT_DEVMEM=y.
> > This was due to the platform specific implementation of
> > devmem_is_allowed() creating a recursive lock which I missed. I notice
> > you have put v2 in mmtom so apologies for the churn but can you please
> > use this version instead? Thanks.
> > 
> >   - Alistair
> > ---
> >   kernel/resource.c | 142 ++
> >   1 file changed, 92 insertions(+), 50 deletions(-)
> > 
> > diff --git a/kernel/resource.c b/kernel/resource.c
> > index 627e61b0c124..7061b9f903ca 100644
> > --- a/kernel/resource.c
> > +++ b/kernel/resource.c
> > @@ -523,6 +523,34 @@ int __weak page_is_ram(unsigned long pfn)
> >   }
> >   EXPORT_SYMBOL_GPL(page_is_ram);
> >   
> > +static int __region_intersects(resource_size_t start, size_t size,
> > +  unsigned long flags, unsigned long desc)
> > +{
> > +   struct resource res;
> > +   int type = 0; int other = 0;
> > +   struct resource *p;
> > +
> > +   res.start = start;
> > +   res.end = start + size - 1;
> > +
> > +   for (p = iomem_resource.child; p ; p = p->sibling) {
> > +   bool is_type = (((p->flags & flags) == flags) &&
> > +   ((desc == IORES_DESC_NONE) ||
> > +(desc == p->desc)));
> > +
> > +   if (resource_overlaps(p, ))
> > +   is_type ? type++ : other++;
> > +   }
> > +
> > +   if (type == 0)
> > +   return REGION_DISJOINT;
> > +
> > +   if (other == 0)
> > +   return REGION_INTERSECTS;
> > +
> > +   return REGION_MIXED;
> > +}
> > +
> >   /**
> >* region_intersects() - determine intersection of region with known 
resources
> >* @start: regi

[PATCH v3] kernel/resource: Fix locking in request_free_mem_region

2021-03-29 Thread Alistair Popple

request_free_mem_region() is used to find an empty range of physical
addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
over the range of possible addresses using region_intersects() to see if
the range is free.

region_intersects() obtains a read lock before walking the resource tree
to protect against concurrent changes. However it drops the lock prior
to returning. This means by the time request_mem_region() is called in
request_free_mem_region() another thread may have already reserved the
requested region resulting in unexpected failures and a message in the
kernel log from hitting this condition:

/*
 * mm/hmm.c reserves physical addresses which then
 * become unavailable to other users.  Conflicts are
 * not expected.  Warn to aid debugging if encountered.
 */
if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) {
pr_warn("Unaddressable device %s %pR conflicts with %pR",
conflict->name, conflict, res);

To fix this create versions of region_intersects() and
request_mem_region() that allow the caller to take the appropriate lock
such that it may be held over the required calls.

Instead of creating another version of devm_request_mem_region() that
doesn't take the lock open-code it to allow the caller to pre-allocate
the required memory prior to taking the lock.

Fixes: 0c385190392d8 ("resource: add a not device managed 
request_free_mem_region variant")
Fixes: 0092908d16c60 ("mm: factor out a devm_request_free_mem_region helper")
Fixes: 4ef589dc9b10c ("mm/hmm/devmem: device memory hotplug using ZONE_DEVICE")
Signed-off-by: Alistair Popple 
Acked-by: Balbir Singh 
Reported-by: kernel test robot 

---

Hi Andrew,

This fixes a boot issue reported by the kernel test robot with the
previous version of the patch on x86 with CONFIG_IO_STRICT_DEVMEM=y.
This was due to the platform specific implementation of
devmem_is_allowed() creating a recursive lock which I missed. I notice
you have put v2 in mmtom so apologies for the churn but can you please
use this version instead? Thanks.

 - Alistair
---
 kernel/resource.c | 142 ++
 1 file changed, 92 insertions(+), 50 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 627e61b0c124..7061b9f903ca 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -523,6 +523,34 @@ int __weak page_is_ram(unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(page_is_ram);
 
+static int __region_intersects(resource_size_t start, size_t size,
+  unsigned long flags, unsigned long desc)
+{
+   struct resource res;
+   int type = 0; int other = 0;
+   struct resource *p;
+
+   res.start = start;
+   res.end = start + size - 1;
+
+   for (p = iomem_resource.child; p ; p = p->sibling) {
+   bool is_type = (((p->flags & flags) == flags) &&
+   ((desc == IORES_DESC_NONE) ||
+(desc == p->desc)));
+
+   if (resource_overlaps(p, ))
+   is_type ? type++ : other++;
+   }
+
+   if (type == 0)
+   return REGION_DISJOINT;
+
+   if (other == 0)
+   return REGION_INTERSECTS;
+
+   return REGION_MIXED;
+}
+
 /**
  * region_intersects() - determine intersection of region with known resources
  * @start: region start address
@@ -546,31 +574,12 @@ EXPORT_SYMBOL_GPL(page_is_ram);
 int region_intersects(resource_size_t start, size_t size, unsigned long flags,
  unsigned long desc)
 {
-   struct resource res;
-   int type = 0; int other = 0;
-   struct resource *p;
-
-   res.start = start;
-   res.end = start + size - 1;
+   int rc;
 
read_lock(_lock);
-   for (p = iomem_resource.child; p ; p = p->sibling) {
-   bool is_type = (((p->flags & flags) == flags) &&
-   ((desc == IORES_DESC_NONE) ||
-(desc == p->desc)));
-
-   if (resource_overlaps(p, ))
-   is_type ? type++ : other++;
-   }
+   rc = __region_intersects(start, size, flags, desc);
read_unlock(_lock);
-
-   if (type == 0)
-   return REGION_DISJOINT;
-
-   if (other == 0)
-   return REGION_INTERSECTS;
-
-   return REGION_MIXED;
+   return rc;
 }
 EXPORT_SYMBOL_GPL(region_intersects);
 
@@ -1171,31 +1180,16 @@ struct address_space *iomem_get_mapping(void)
return smp_load_acquire(_inode)->i_mapping;
 }
 
-/**
- * __request_region - create a new busy resource region
- * @parent: parent resource descriptor
- * @start: resource start address
- * @n: resource region size
- * @name: reserving caller's ID string
- * @flags: IO resource flags
- */
-struct resource * __request_region(struct

Re: [PATCH v4 04/10] Input: wacom_i2c - Add touchscren properties

2021-03-29 Thread Alistair Francis

 On Mon, Mar 29, 2021 at 3:08 PM Dmitry Torokhov
 wrote:
>
> On Thu, Mar 25, 2021 at 09:52:24PM -0400, Alistair Francis wrote:
> > Connect touchscreen properties to the wacom_i2c.
> >
> > Signed-off-by: Alistair Francis 
> > ---
> > v4:
> >  - Add touchscreen_report_pos() as well
> >
> >  drivers/input/touchscreen/wacom_i2c.c | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/input/touchscreen/wacom_i2c.c 
> > b/drivers/input/touchscreen/wacom_i2c.c
> > index eada68770671..ee1829dd35f4 100644
> > --- a/drivers/input/touchscreen/wacom_i2c.c
> > +++ b/drivers/input/touchscreen/wacom_i2c.c
> > @@ -11,6 +11,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -33,6 +34,7 @@ struct wacom_features {
> >  struct wacom_i2c {
> >   struct i2c_client *client;
> >   struct input_dev *input;
> > + struct touchscreen_properties props;
> >   u8 data[WACOM_QUERY_SIZE];
> >   bool prox;
> >   int tool;
> > @@ -188,6 +190,9 @@ static int wacom_i2c_probe(struct i2c_client *client,
> >   __set_bit(BTN_STYLUS2, input->keybit);
> >   __set_bit(BTN_TOUCH, input->keybit);
> >
> > + touchscreen_parse_properties(input, true, _i2c->props);
> > + touchscreen_report_pos(input, _i2c->props, features.x_max,
> > +features.y_max, true);
>
> ??? This goes into wacom_i2c_irq() where it previously used
> input_report_abs() for X and Y so that transformations (swap, mirrot)
> requested via device properties are applied to the coordinates.

Ah sorry. I misunderstood what touchscreen_report_pos() does (and
didn't read it).

Looking at the actual code it seems that I need to remove

input_report_abs(input, ABS_Y, y);
input_report_abs(input, ABS_X, x);

from wacom_i2c_irq() and add touchscreen_report_pos() to wacom_i2c_irq() instead

I'll do that in the next version.

Alistair

>
> Thanks.
>
> --
> Dmitry

Re: [kernel/resource] cf1e4e12c9: WARNING:possible_recursive_locking_detected

2021-03-29 Thread Alistair Popple

Not sure why I didn't hit this in testing but the problem is obvious: I missed 
that revoke_iomem() calls devmem_is_allowed() which on x86 calls 
region_intersects(). I guess I must have forgotten to do a boot test with 
CONFIG_IO_STRICT_DEVMEM. Will put a fix together.

 - Alistair

On Monday, 29 March 2021 4:42:30 PM AEDT kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed the following commit (built with gcc-9):
> 
> commit: cf1e4e12c95dec0bb945df3eb138190fc353460f ("[PATCH v2] kernel/
resource: Fix locking in request_free_mem_region")
> url: 
> https://github.com/0day-ci/linux/commits/Alistair-Popple/kernel-resource-Fix-locking-in-request_free_mem_region/20210326-092150
> base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 
a74e6a014c9d4d4161061f770c9b4f98372ac778
> 
> in testcase: boot
> 
> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> 
> caused below changes (please refer to attached dmesg/kmsg for entire log/
backtrace):
> 
> 
> +--+++
> |  | a74e6a014c | cf1e4e12c9 |
> +--+++
> | boot_successes   | 6  | 0  |
> | boot_failures| 0  | 6  |
> | WARNING:possible_recursive_locking_detected  | 0  | 6  |
> | INFO:rcu_sched_self-detected_stall_on_CPU| 0  | 6  |
> | INFO:rcu_sched_detected_stalls_on_CPUs/tasks | 0  | 1  |
> | EIP:queued_read_lock_slowpath| 0  | 1  |
> +--+++
> 
> 
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot 
> 
> 
> [9.616229] WARNING: possible recursive locking detected
> [9.617758] 5.12.0-rc2-00297-gcf1e4e12c95d #1 Not tainted
> [9.617758] 
> [9.617758] swapper/0/1 is trying to acquire lock:
> [9.617758] 41bb9674 (resource_lock){}-{2:2}, at: region_intersects 
(kbuild/src/consumer/kernel/resource.c:534 kbuild/src/consumer/kernel/
resource.c:580)
> [9.619753]
> [9.619753] but task is already holding lock:
> [9.619753] 41bb9674 (resource_lock){}-{2:2}, at: __request_region 
(kbuild/src/consumer/kernel/resource.c:1188 kbuild/src/consumer/kernel/
resource.c:1255)
> [9.621757]
> [9.621757] other info that might help us debug this:
> [9.621757]  Possible unsafe locking scenario:
> [9.621757]
> [9.621757]CPU0
> [9.621757]
> [9.623721]   lock(resource_lock);
> [9.623747]   lock(resource_lock);
> [9.623747]
> [9.623747]  *** DEADLOCK ***
> [9.623747]
> [9.623747]  May be due to missing lock nesting notation
> [9.623747]
> [9.625725] 2 locks held by swapper/0/1:
> [9.625759] #0: 42e1f160 (>mutex){}-{3:3}, at: device_lock 
(kbuild/src/consumer/include/linux/device.h:741)
> [9.625759] #1: 41bb9674 (resource_lock){}-{2:2}, at: 
__request_region (kbuild/src/consumer/kernel/resource.c:1188 kbuild/src/
consumer/kernel/resource.c:1255)
> [9.625759]
> [9.625759] stack backtrace:
> [9.627748] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc2-00297-
gcf1e4e12c95d #1
> [9.627748] Call Trace:
> [9.627748] ? dump_stack (kbuild/src/consumer/lib/dump_stack.c:122)
> [9.627748] ? validate_chain (kbuild/src/consumer/kernel/locking/
lockdep.c:2829 kbuild/src/consumer/kernel/locking/lockdep.c:2872 kbuild/src/
consumer/kernel/locking/lockdep.c:3661)
> [9.629761] ? __lock_acquire (kbuild/src/consumer/kernel/locking/
lockdep.c:4900)
> [9.629761] ? lock_acquire (kbuild/src/consumer/kernel/locking/lockdep.c:
437 kbuild/src/consumer/kernel/locking/lockdep.c:5512 kbuild/src/consumer/
kernel/locking/lockdep.c:5475)
> [9.629761] ? region_intersects (kbuild/src/consumer/kernel/resource.c:
534 kbuild/src/consumer/kernel/resource.c:580)
> [9.629761] ? lock_acquire (kbuild/src/consumer/kernel/locking/lockdep.c:
437 kbuild/src/consumer/kernel/locking/lockdep.c:5512 kbuild/src/consumer/
kernel/locking/lockdep.c:5475)
> [9.629761] ? lock_is_held_type (kbuild/src/consumer/kernel/locking/
lockdep.c:5253 kbuild/src/consumer/kernel/locking/lockdep.c:5549)
> [9.631752] ? _raw_read_lock (kbuild/src/consumer/include/linux/
rwlock_api_smp.h:150 kbuild/src/consumer/kernel/locking/spinlock.c:223)
> [9.631752] ? region_intersects (kbuild/src/consumer/kernel/resource.c:
534 kbuild/src/consumer/kernel/resource.c:580)
> [9.631752] ? devmem_is_allowed (kbuild/src/consumer/arch/x86/mm

Re: [PATCH v2] kernel/resource: Fix locking in request_free_mem_region

2021-03-28 Thread Alistair Popple

On Friday, 26 March 2021 4:15:36 PM AEDT Balbir Singh wrote:
> On Fri, Mar 26, 2021 at 12:20:35PM +1100, Alistair Popple wrote:
> > +static int __region_intersects(resource_size_t start, size_t size,
> > +unsigned long flags, unsigned long desc)
> > +{
> > + struct resource res;
> > + int type = 0; int other = 0;
> > + struct resource *p;
> > +
> > + res.start = start;
> > + res.end = start + size - 1;
> > +
> > + for (p = iomem_resource.child; p ; p = p->sibling) {
> > + bool is_type = (((p->flags & flags) == flags) &&
> > + ((desc == IORES_DESC_NONE) ||
> > +  (desc == p->desc)));
> 
> is_type is a bad name, are we saying "is" as in boolean question?
> Or is it short for something like intersection_type? I know you've
> just moved the code over :)

Yeah, I'm not a fan of that name either but I was just moving code over and 
couldn't come up with anything better :)

It is a boolean question though - it is checking to see if resource *p is the 
same type (flags+desc) of region as what is being checked for intersection.
 
> > +
> > + if (resource_overlaps(p, ))
> > + is_type ? type++ : other++;
> > + }
> > +
> > + if (type == 0)
> > + return REGION_DISJOINT;
> > +
> > + if (other == 0)
> > + return REGION_INTERSECTS;
> > +
> > + return REGION_MIXED;
> > +}
> > +
> >  /**
> >   * region_intersects() - determine intersection of region with known 
resources
> >   * @start: region start address
> > @@ -546,31 +574,12 @@ EXPORT_SYMBOL_GPL(page_is_ram);
> >  int region_intersects(resource_size_t start, size_t size, unsigned long 
flags,
> > unsigned long desc)
> >  {
> > - struct resource res;
> > - int type = 0; int other = 0;
> > - struct resource *p;
> > -
> > - res.start = start;
> > - res.end = start + size - 1;
> > + int rc;
> >
> >   read_lock(_lock);
> > - for (p = iomem_resource.child; p ; p = p->sibling) {
> > - bool is_type = (((p->flags & flags) == flags) &&
> > - ((desc == IORES_DESC_NONE) ||
> > -  (desc == p->desc)));
> > -
> > - if (resource_overlaps(p, ))
> > - is_type ? type++ : other++;
> > - }
> > + rc = __region_intersects(start, size, flags, desc);
> >   read_unlock(_lock);
> > -
> > - if (type == 0)
> > - return REGION_DISJOINT;
> > -
> > - if (other == 0)
> > - return REGION_INTERSECTS;
> > -
> > - return REGION_MIXED;
> > + return rc;
> >  }
> >  EXPORT_SYMBOL_GPL(region_intersects);
> >
> > @@ -1171,31 +1180,17 @@ struct address_space *iomem_get_mapping(void)
> >   return smp_load_acquire(_inode)->i_mapping;
> >  }
> >
> > -/**
> > - * __request_region - create a new busy resource region
> > - * @parent: parent resource descriptor
> > - * @start: resource start address
> > - * @n: resource region size
> > - * @name: reserving caller's ID string
> > - * @flags: IO resource flags
> > - */
> > -struct resource * __request_region(struct resource *parent,
> > -resource_size_t start, resource_size_t n,
> > -const char *name, int flags)
> > +static bool request_region_locked(struct resource *parent,
> > + struct resource *res, resource_size_t 
start,
> > + resource_size_t n, const char *name, int 
flags)
> >  {
> > - DECLARE_WAITQUEUE(wait, current);
> > - struct resource *res = alloc_resource(GFP_KERNEL);
> >   struct resource *orig_parent = parent;
> > -
> > - if (!res)
> > - return NULL;
> > + DECLARE_WAITQUEUE(wait, current);
> 
> This part of the diff looks confusing, do we have a waitqueue and we call
> schedule() within a function called with the lock held?

Good point. schedule() does get called but the lock is dropped first:

if (conflict->flags & flags & IORESOURCE_MUXED) {
add_wait_queue(_resource_wait, );
write_unlock(_lock);
set_current_state(TASK_UNINTERRUPTIBLE);
schedule();

Re: [PATCH v2] kernel/resource: Fix locking in request_free_mem_region

2021-03-28 Thread Alistair Popple

On Friday, 26 March 2021 7:57:51 PM AEDT David Hildenbrand wrote:
> On 26.03.21 02:20, Alistair Popple wrote:
> > request_free_mem_region() is used to find an empty range of physical
> > addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
> > over the range of possible addresses using region_intersects() to see if
> > the range is free.
> 
> Just a high-level question: how does this iteract with memory
> hot(un)plug? IOW, how defines and manages the "range of possible
> addresses" ?

Both the driver and the maximum physical address bits available define the 
range of possible addresses for device private memory. From 
__request_free_mem_region():

end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1);
addr = end - size + 1UL;

There is no lower address range bound here so it is effectively zero. The code 
will try to allocate the highest possible physical address first and continue 
searching down for a free block. Does that answer your question?

> >
> > region_intersects() obtains a read lock before walking the resource tree
> > to protect against concurrent changes. However it drops the lock prior
> > to returning. This means by the time request_mem_region() is called in
> > request_free_mem_region() another thread may have already reserved the
> > requested region resulting in unexpected failures and a message in the
> > kernel log from hitting this condition:
> 
> I am confused. Why can't we return an error to the caller and let the
> caller continue searching? This feels much simpler than what you propose
> here. What am I missing?

The search occurs as part of the allocation. To allocate memory free space 
needs to be located and allocated as a single operation. However in this case 
the lock is dropped between locating a free region and allocating it resulting 
in an extra debug check firing and subsequent failure.

I did originally consider just allowing the caller to retry, but in the end it 
didn't seem any simpler. Callers would have to differentiate between transient 
and permanent failures and figure out how often to retry and no doubt each 
caller would do this differently. There is also the issue of starvation if one 
thread constantly looses the race to allocate after the search. Overall it 
seems simpler to me to just have a call that allocates a region (or fails due 
to lack of free space).

I also don't think what I am proposing is particularly complex. I agree the 
diff makes it look complex, but at a high level all I'm doing is moving the 
locking to outer function calls. It ends up looking more complex because there 
are some memory allocations which need reordering, but I don't think if things 
were originally written this way it would be considered complex.

 - Alistair

> --
> Thanks,
> 
> David / dhildenb
>

[PATCH v4 4/5] ARM: imx_v6_v7_defconfig: Enable silergy,sy7636a

2021-03-25 Thread Alistair Francis

Enable the silergy,sy7636a and silergy,sy7636a-regulator for the
reMarkable2.

Signed-off-by: Alistair Francis 
---
v3:
 - Change patch title
v2:
 - N/A

 arch/arm/configs/imx_v6_v7_defconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index cd80e85d37cf..bafd1d7b4ad5 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -245,6 +245,7 @@ CONFIG_MFD_MC13XXX_I2C=y
 CONFIG_MFD_RN5T618=y
 CONFIG_MFD_STMPE=y
 CONFIG_REGULATOR=y
+CONFIG_MFD_SY7636A=y
 CONFIG_REGULATOR_FIXED_VOLTAGE=y
 CONFIG_REGULATOR_ANATOP=y
 CONFIG_REGULATOR_DA9052=y
@@ -255,6 +256,7 @@ CONFIG_REGULATOR_MC13783=y
 CONFIG_REGULATOR_MC13892=y
 CONFIG_REGULATOR_PFUZE100=y
 CONFIG_REGULATOR_RN5T618=y
+CONFIG_REGULATOR_SY7636A=y
 CONFIG_RC_CORE=y
 CONFIG_RC_DEVICES=y
 CONFIG_IR_GPIO_CIR=y
-- 
2.31.0

[PATCH v4 5/5] ARM: dts: imx7d: remarkable2: Enable silergy,sy7636a

2021-03-25 Thread Alistair Francis

Enable the silergy,sy7636a and silergy,sy7636a-regulator on the
reMarkable2.

Signed-off-by: Alistair Francis 
---
v3:
 - Change patch title
v2:
 - N/A

 arch/arm/boot/dts/imx7d-remarkable2.dts | 61 +
 1 file changed, 61 insertions(+)

diff --git a/arch/arm/boot/dts/imx7d-remarkable2.dts 
b/arch/arm/boot/dts/imx7d-remarkable2.dts
index 791ad55281cc..37834bc7fc72 100644
--- a/arch/arm/boot/dts/imx7d-remarkable2.dts
+++ b/arch/arm/boot/dts/imx7d-remarkable2.dts
@@ -22,6 +22,27 @@ memory@8000 {
reg = <0x8000 0x4000>;
};
 
+   thermal-zones {
+   epd-thermal {
+   thermal-sensors = <_pmic>;
+   polling-delay-passive = <3>;
+   polling-delay = <3>;
+   trips {
+   trip0 {
+   temperature = <49000>;
+   hysteresis = <2000>;
+   type = "passive";
+   };
+
+   trip1 {
+   temperature = <5>;
+   hysteresis = <2000>;
+   type = "critical";
+   };
+   };
+   };
+   };
+
reg_brcm: regulator-brcm {
compatible = "regulator-fixed";
regulator-name = "brcm_reg";
@@ -86,6 +107,32 @@ wacom_digitizer: digitizer@9 {
};
 };
 
+ {
+   clock-frequency = <10>;
+   pinctrl-names = "default", "sleep";
+   pinctrl-0 = <_i2c4>;
+   pinctrl-1 = <_i2c4>;
+   status = "okay";
+
+   epd_pmic: sy7636a@62 {
+   compatible = "silergy,sy7636a";
+   reg = <0x62>;
+   status = "okay";
+   pinctrl-names = "default";
+   pinctrl-0 = <_epdpmic>;
+   #thermal-sensor-cells = <0>;
+
+   epd-pwr-good-gpios = < 21 GPIO_ACTIVE_HIGH>;
+   regulators {
+   compatible = "silergy,sy7636a-regulator";
+   reg_epdpmic: vcom {
+   regulator-name = "vcom";
+   regulator-boot-on;
+   };
+   };
+   };
+};
+
 _pwrkey {
status = "okay";
 };
@@ -179,6 +226,13 @@ MX7D_PAD_SAI1_TX_BCLK__GPIO6_IO13  0x14
>;
};
 
+   pinctrl_epdpmic: epdpmicgrp {
+   fsl,pins = <
+   MX7D_PAD_SAI2_RX_DATA__GPIO6_IO21 0x0074
+   MX7D_PAD_ENET1_RGMII_TXC__GPIO7_IO11 0x0014
+   >;
+   };
+
pinctrl_i2c1: i2c1grp {
fsl,pins = <
MX7D_PAD_I2C1_SDA__I2C1_SDA 0x407f
@@ -186,6 +240,13 @@ MX7D_PAD_I2C1_SCL__I2C1_SCL0x407f
>;
};
 
+   pinctrl_i2c4: i2c4grp {
+   fsl,pins = <
+   MX7D_PAD_I2C4_SDA__I2C4_SDA 0x407f
+   MX7D_PAD_I2C4_SCL__I2C4_SCL 0x407f
+   >;
+   };
+
pinctrl_uart1: uart1grp {
fsl,pins = <
MX7D_PAD_UART1_TX_DATA__UART1_DCE_TX0x79
-- 
2.31.0

[PATCH v4 3/5] regulator: sy7636a: Initial commit

2021-03-25 Thread Alistair Francis

Initial support for the Silergy SY7636A-regulator Power Management chip.

Signed-off-by: Alistair Francis 
---
v3:
 - Move sysfs power from mfd to regulaator
 - Add ABI documentation
v2:
 - N/A
 .../testing/sysfs-driver-sy7636a-regulator|  21 ++
 drivers/regulator/Kconfig |   6 +
 drivers/regulator/Makefile|   1 +
 drivers/regulator/sy7636a-regulator.c | 354 ++
 include/linux/mfd/sy7636a.h   |   1 +
 5 files changed, 383 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-sy7636a-regulator
 create mode 100644 drivers/regulator/sy7636a-regulator.c

diff --git a/Documentation/ABI/testing/sysfs-driver-sy7636a-regulator 
b/Documentation/ABI/testing/sysfs-driver-sy7636a-regulator
new file mode 100644
index ..ab534a8ea21a
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-sy7636a-regulator
@@ -0,0 +1,21 @@
+What:  /sys/bus/regulator/drivers/sy7636a-regulator/state
+Date:  April 2021
+KernelVersion: 5.12
+Contact:   alist...@alistair23.me
+Description:
+   This file allows you to see the current power rail state.
+
+What:  /sys/bus/regulator/drivers/sy7636a-regulator/power_good
+Date:  April 2021
+KernelVersion: 5.12
+Contact:   alist...@alistair23.me
+Description:
+   This file allows you to see the current state of the regulator
+   as either ON or OFF.
+
+What:  /sys/bus/regulator/drivers/sy7636a-regulator/vcom
+Date:  April 2021
+KernelVersion: 5.12
+Contact:   alist...@alistair23.me
+Description:
+   This file allows you to see and set the current voltage in mV.
diff --git a/drivers/regulator/Kconfig b/drivers/regulator/Kconfig
index 77c43134bc9e..6d501ce921a8 100644
--- a/drivers/regulator/Kconfig
+++ b/drivers/regulator/Kconfig
@@ -1130,6 +1130,12 @@ config REGULATOR_STW481X_VMMC
  This driver supports the internal VMMC regulator in the STw481x
  PMIC chips.
 
+config REGULATOR_SY7636A
+   tristate "Silergy SY7636A voltage regulator"
+   depends on MFD_SY7636A
+   help
+ This driver supports Silergy SY3686A voltage regulator.
+
 config REGULATOR_SY8106A
tristate "Silergy SY8106A regulator"
depends on I2C && (OF || COMPILE_TEST)
diff --git a/drivers/regulator/Makefile b/drivers/regulator/Makefile
index 44d2f8bf4b74..5a981036a9f0 100644
--- a/drivers/regulator/Makefile
+++ b/drivers/regulator/Makefile
@@ -134,6 +134,7 @@ obj-$(CONFIG_REGULATOR_STM32_VREFBUF) += stm32-vrefbuf.o
 obj-$(CONFIG_REGULATOR_STM32_PWR) += stm32-pwr.o
 obj-$(CONFIG_REGULATOR_STPMIC1) += stpmic1_regulator.o
 obj-$(CONFIG_REGULATOR_STW481X_VMMC) += stw481x-vmmc.o
+obj-$(CONFIG_REGULATOR_SY7636A) += sy7636a-regulator.o
 obj-$(CONFIG_REGULATOR_SY8106A) += sy8106a-regulator.o
 obj-$(CONFIG_REGULATOR_SY8824X) += sy8824x.o
 obj-$(CONFIG_REGULATOR_SY8827N) += sy8827n.o
diff --git a/drivers/regulator/sy7636a-regulator.c 
b/drivers/regulator/sy7636a-regulator.c
new file mode 100644
index ..0ec6f852cb3d
--- /dev/null
+++ b/drivers/regulator/sy7636a-regulator.c
@@ -0,0 +1,354 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// Functions to access SY3686A power management chip voltages
+//
+// Copyright (C) 2019 reMarkable AS - http://www.remarkable.com/
+//
+// Authors: Lars Ivar Miljeteig 
+//  Alistair Francis 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static const char * const states[] = {
+   "no fault event",
+   "UVP at VP rail",
+   "UVP at VN rail",
+   "UVP at VPOS rail",
+   "UVP at VNEG rail",
+   "UVP at VDDH rail",
+   "UVP at VEE rail",
+   "SCP at VP rail",
+   "SCP at VN rail",
+   "SCP at VPOS rail",
+   "SCP at VNEG rail",
+   "SCP at VDDH rail",
+   "SCP at VEE rail",
+   "SCP at V COM rail",
+   "UVLO",
+   "Thermal shutdown",
+};
+
+static int sy7636a_get_vcom_voltage_mv(struct regmap *regmap)
+{
+   int ret;
+   unsigned int val, val_h;
+
+   ret = regmap_read(regmap, SY7636A_REG_VCOM_ADJUST_CTRL_L, );
+   if (ret)
+   return ret;
+
+   ret = regmap_read(regmap, SY7636A_REG_VCOM_ADJUST_CTRL_H, _h);
+   if (ret)
+   return ret;
+
+   val |= (val_h << VCOM_ADJUST_CTRL_SHIFT);
+
+   return (val & VCOM_ADJUST_CTRL_MASK) * VCOM_ADJUST_CTRL_SCAL;
+}
+
+static int sy7636a_set_vcom_voltage_mv(struct regmap *regmap, unsigned int 
vcom)
+{
+   int ret;
+   unsigned int val;
+
+   if (vcom < VCOM_MIN || vcom > VCOM_MAX)
+   return -EINVAL;
+
+   val = (unsigned int)(vcom / VCOM_ADJUST_CTRL_SCAL) & 
VCOM_ADJUST_CTRL_MASK;
+
+   ret = regmap_write(

[PATCH v4 1/5] dt-bindings: mfd: Initial commit of silergy,sy7636a.yaml

2021-03-25 Thread Alistair Francis

Initial support for the Silergy SY7636A Power Management chip
and regulator.

Signed-off-by: Alistair Francis 
---
v3:
 - No change
v2:
 - N/A

 .../bindings/mfd/silergy,sy7636a.yaml | 63 +++
 1 file changed, 63 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/mfd/silergy,sy7636a.yaml

diff --git a/Documentation/devicetree/bindings/mfd/silergy,sy7636a.yaml 
b/Documentation/devicetree/bindings/mfd/silergy,sy7636a.yaml
new file mode 100644
index ..f260a8eae226
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/silergy,sy7636a.yaml
@@ -0,0 +1,63 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/mfd/silergy,sy7636a.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: silergy sy7636a PMIC
+
+maintainers:
+  - Alistair Francis 
+
+properties:
+  compatible:
+const: silergy,sy7636a
+
+  reg:
+maxItems: 1
+
+  '#thermal-sensor-cells':
+const: 0
+
+  regulators:
+type: object
+$ref: /schemas/regulator/regulator.yaml#
+
+properties:
+  compatible:
+const: silergy,sy7636a-regulator
+
+  regulator-name:
+pattern: "vcom"
+
+required:
+  - compatible
+  - reg
+  - '#thermal-sensor-cells'
+
+additionalProperties: false
+
+examples:
+  - |
+i2c {
+  #address-cells = <1>;
+  #size-cells = <0>;
+
+  sy7636a@62 {
+compatible = "silergy,sy7636a";
+reg = <0x62>;
+status = "okay";
+pinctrl-names = "default";
+pinctrl-0 = <_epdpmic>;
+#thermal-sensor-cells = <0>;
+
+regulators {
+  compatible = "silergy,sy7636a-regulator";
+  reg_epdpmic: vcom {
+regulator-name = "vcom";
+regulator-boot-on;
+  };
+};
+  };
+};
+...
-- 
2.31.0

[PATCH v4 2/5] mfd: sy7636a: Initial commit

2021-03-25 Thread Alistair Francis

Initial support for the Silergy SY7636A Power Management chip.

Signed-off-by: Alistair Francis 
---
v3:
 - Update copyright year
 - Move power parts to regulator
 - Change Kconfig depends to be tristate
v2:
 - Address comments from review

 drivers/mfd/Kconfig | 10 +
 drivers/mfd/Makefile|  1 +
 drivers/mfd/sy7636a.c   | 82 +
 include/linux/mfd/sy7636a.h | 46 +
 4 files changed, 139 insertions(+)
 create mode 100644 drivers/mfd/sy7636a.c
 create mode 100644 include/linux/mfd/sy7636a.h

diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index b74efa469e90..ac09b40e1724 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -1351,6 +1351,16 @@ config MFD_SYSCON
  Select this option to enable accessing system control registers
  via regmap.
 
+config MFD_SY7636A
+   tristate "Silergy SY7636A Power Management chip"
+   select MFD_CORE
+   select REGMAP_I2C
+   select REGMAP_IRQ
+   depends on I2C
+   help
+ Select this option to enable support for the Silergy SY7636A
+ Power Management chip.
+
 config MFD_DAVINCI_VOICECODEC
tristate
select MFD_CORE
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index 834f5463af28..5bfa0d6e5dc5 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -265,6 +265,7 @@ obj-$(CONFIG_MFD_STMFX) += stmfx.o
 obj-$(CONFIG_MFD_KHADAS_MCU)   += khadas-mcu.o
 obj-$(CONFIG_MFD_ACER_A500_EC) += acer-ec-a500.o
 
+obj-$(CONFIG_MFD_SY7636A)  += sy7636a.o
 obj-$(CONFIG_SGI_MFD_IOC3) += ioc3.o
 obj-$(CONFIG_MFD_SIMPLE_MFD_I2C)   += simple-mfd-i2c.o
 obj-$(CONFIG_MFD_INTEL_M10_BMC)   += intel-m10-bmc.o
diff --git a/drivers/mfd/sy7636a.c b/drivers/mfd/sy7636a.c
new file mode 100644
index ..e08f29ea63f8
--- /dev/null
+++ b/drivers/mfd/sy7636a.c
@@ -0,0 +1,82 @@
+// SPDX-License-Identifier: GPL-2.0+
+//
+// MFD parent driver for SY7636A chip
+//
+// Copyright (C) 2021 reMarkable AS - http://www.remarkable.com/
+//
+// Authors: Lars Ivar Miljeteig 
+//  Alistair Francis 
+//
+// Based on the lp87565 driver by Keerthy 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+static const struct regmap_config sy7636a_regmap_config = {
+   .reg_bits = 8,
+   .val_bits = 8,
+};
+
+static const struct mfd_cell sy7636a_cells[] = {
+   { .name = "sy7636a-regulator", },
+   { .name = "sy7636a-temperature", },
+   { .name = "sy7636a-thermal", },
+};
+
+static const struct of_device_id of_sy7636a_match_table[] = {
+   { .compatible = "silergy,sy7636a", },
+   {}
+};
+MODULE_DEVICE_TABLE(of, of_sy7636a_match_table);
+
+static int sy7636a_probe(struct i2c_client *client,
+const struct i2c_device_id *ids)
+{
+   struct sy7636a *sy7636a;
+   int ret;
+
+   sy7636a = devm_kzalloc(>dev, sizeof(*sy7636a), GFP_KERNEL);
+   if (!sy7636a)
+   return -ENOMEM;
+
+   sy7636a->dev = >dev;
+
+   sy7636a->regmap = devm_regmap_init_i2c(client, _regmap_config);
+   if (IS_ERR(sy7636a->regmap)) {
+   ret = PTR_ERR(sy7636a->regmap);
+   dev_err(sy7636a->dev,
+   "Failed to initialize register map: %d\n", ret);
+   return ret;
+   }
+
+   i2c_set_clientdata(client, sy7636a);
+
+   ret = devm_mfd_add_devices(sy7636a->dev, PLATFORM_DEVID_AUTO,
+   sy7636a_cells, 
ARRAY_SIZE(sy7636a_cells),
+   NULL, 0, NULL);
+   return 0;
+}
+
+static const struct i2c_device_id sy7636a_id_table[] = {
+   { "sy7636a", 0 },
+   { },
+};
+MODULE_DEVICE_TABLE(i2c, sy7636a_id_table);
+
+static struct i2c_driver sy7636a_driver = {
+   .driver = {
+   .name   = "sy7636a",
+   .of_match_table = of_sy7636a_match_table,
+   },
+   .probe = sy7636a_probe,
+   .id_table = sy7636a_id_table,
+};
+module_i2c_driver(sy7636a_driver);
+
+MODULE_AUTHOR("Lars Ivar Miljeteig ");
+MODULE_DESCRIPTION("Silergy SY7636A Multi-Function Device Driver");
+MODULE_LICENSE("GPL v2");
diff --git a/include/linux/mfd/sy7636a.h b/include/linux/mfd/sy7636a.h
new file mode 100644
index ..a5ec5d911b3a
--- /dev/null
+++ b/include/linux/mfd/sy7636a.h
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Functions to access SY3686A power management chip.
+ *
+ * Copyright (C) 2021 reMarkable AS - http://www.remarkable.com/
+ */
+
+#ifndef __MFD_SY7636A_H
+#define __MFD_SY7636A_H
+
+#include 
+#include 
+#include 
+#include 
+
+#define SY7636A_REG_OPERATION_MODE_CRL 0x00
+#define SY7636A_OPERATION_MODE_CRL_VCOMCTL BIT(6)
+#define SY7636A_OPERATION_MODE_CRL_ONOFF   BIT(7)
+#define SY7636A_REG_VCOM_ADJUST_CT

[PATCH v4 07/10] Input: wacom_i2c - Add support for reset control

2021-03-25 Thread Alistair Francis

From: Alistair Francis 

Signed-off-by: Alistair Francis 
---
v4:
 - Initial commit

 drivers/input/touchscreen/wacom_i2c.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/input/touchscreen/wacom_i2c.c 
b/drivers/input/touchscreen/wacom_i2c.c
index 84c7ccb737bd..28004b1180c9 100644
--- a/drivers/input/touchscreen/wacom_i2c.c
+++ b/drivers/input/touchscreen/wacom_i2c.c
@@ -55,6 +55,7 @@ struct wacom_features {
 struct wacom_i2c {
struct i2c_client *client;
struct input_dev *input;
+   struct reset_control *rstc;
struct touchscreen_properties props;
u8 data[WACOM_QUERY_SIZE];
bool prox;
@@ -175,6 +176,8 @@ static int wacom_i2c_open(struct input_dev *dev)
struct wacom_i2c *wac_i2c = input_get_drvdata(dev);
struct i2c_client *client = wac_i2c->client;
 
+   reset_control_reset(wac_i2c->rstc);
+
enable_irq(client->irq);
 
return 0;
@@ -193,6 +196,7 @@ static int wacom_i2c_probe(struct i2c_client *client,
 {
struct wacom_i2c *wac_i2c;
struct input_dev *input;
+   struct reset_control *rstc;
struct wacom_features features = { 0 };
int error;
 
@@ -201,6 +205,12 @@ static int wacom_i2c_probe(struct i2c_client *client,
return -EIO;
}
 
+   rstc = devm_reset_control_get_optional_exclusive(>dev, NULL);
+   if (IS_ERR(rstc)) {
+   dev_err(>dev, "Failed to get reset control before 
init\n");
+   return PTR_ERR(rstc);
+   }
+
error = wacom_query_device(client, );
if (error)
return error;
@@ -214,6 +224,7 @@ static int wacom_i2c_probe(struct i2c_client *client,
 
wac_i2c->client = client;
wac_i2c->input = input;
+   wac_i2c->rstc = rstc;
 
input->name = "Wacom I2C Digitizer";
input->id.bustype = BUS_I2C;
-- 
2.31.0

[PATCH v4 08/10] Input: wacom_i2c - Add support for vdd regulator

2021-03-25 Thread Alistair Francis

Add support for a VDD regulator. This allows the kernel to prove the
Wacom-I2C device on the rM2.

Signed-off-by: Alistair Francis 
---
v4:
 - Don't double allocate wac_i2c

 drivers/input/touchscreen/wacom_i2c.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/input/touchscreen/wacom_i2c.c 
b/drivers/input/touchscreen/wacom_i2c.c
index 28004b1180c9..c78195b6b3b1 100644
--- a/drivers/input/touchscreen/wacom_i2c.c
+++ b/drivers/input/touchscreen/wacom_i2c.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -57,6 +58,7 @@ struct wacom_i2c {
struct input_dev *input;
struct reset_control *rstc;
struct touchscreen_properties props;
+   struct regulator *vdd;
u8 data[WACOM_QUERY_SIZE];
bool prox;
int tool;
@@ -222,6 +224,20 @@ static int wacom_i2c_probe(struct i2c_client *client,
goto err_free_mem;
}
 
+   wac_i2c->vdd = regulator_get(>dev, "vdd");
+   if (IS_ERR(wac_i2c->vdd)) {
+   error = PTR_ERR(wac_i2c->vdd);
+   kfree(wac_i2c);
+   return error;
+   }
+
+   error = regulator_enable(wac_i2c->vdd);
+   if (error) {
+   regulator_put(wac_i2c->vdd);
+   kfree(wac_i2c);
+   return error;
+   }
+
wac_i2c->client = client;
wac_i2c->input = input;
wac_i2c->rstc = rstc;
@@ -281,6 +297,8 @@ static int wacom_i2c_probe(struct i2c_client *client,
 err_free_irq:
free_irq(client->irq, wac_i2c);
 err_free_mem:
+   regulator_disable(wac_i2c->vdd);
+   regulator_put(wac_i2c->vdd);
input_free_device(input);
kfree(wac_i2c);
 
-- 
2.31.0

[PATCH v4 09/10] ARM: imx_v6_v7_defconfig: Enable Wacom I2C

2021-03-25 Thread Alistair Francis

Enable the Wacom I2C in the imx defconfig as it is used by the
reMarkable2 tablet.

Signed-off-by: Alistair Francis 
---
 arch/arm/configs/imx_v6_v7_defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/configs/imx_v6_v7_defconfig 
b/arch/arm/configs/imx_v6_v7_defconfig
index 70928cc48939..cd80e85d37cf 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -174,6 +174,7 @@ CONFIG_TOUCHSCREEN_DA9052=y
 CONFIG_TOUCHSCREEN_EGALAX=y
 CONFIG_TOUCHSCREEN_GOODIX=y
 CONFIG_TOUCHSCREEN_ILI210X=y
+CONFIG_TOUCHSCREEN_WACOM_I2C=y
 CONFIG_TOUCHSCREEN_MAX11801=y
 CONFIG_TOUCHSCREEN_IMX6UL_TSC=y
 CONFIG_TOUCHSCREEN_EDT_FT5X06=y
-- 
2.31.0

[PATCH v4 05/10] Input: wacom_i2c - Add support for distance and tilt x/y

2021-03-25 Thread Alistair Francis

This is based on the out of tree rM2 driver.

Signed-off-by: Alistair Francis 
---
 drivers/input/touchscreen/wacom_i2c.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/input/touchscreen/wacom_i2c.c 
b/drivers/input/touchscreen/wacom_i2c.c
index ee1829dd35f4..3b4bc514dc3f 100644
--- a/drivers/input/touchscreen/wacom_i2c.c
+++ b/drivers/input/touchscreen/wacom_i2c.c
@@ -22,12 +22,16 @@
 #define WACOM_CMD_QUERY3   0x02
 #define WACOM_CMD_THROW0   0x05
 #define WACOM_CMD_THROW1   0x00
-#define WACOM_QUERY_SIZE   19
+#define WACOM_QUERY_SIZE   22
 
 struct wacom_features {
int x_max;
int y_max;
int pressure_max;
+   int distance_max;
+   int distance_physical_max;
+   int tilt_x_max;
+   int tilt_y_max;
char fw_version;
 };
 
@@ -79,6 +83,10 @@ static int wacom_query_device(struct i2c_client *client,
features->y_max = get_unaligned_le16([5]);
features->pressure_max = get_unaligned_le16([11]);
features->fw_version = get_unaligned_le16([13]);
+   features->distance_max = data[15];
+   features->distance_physical_max = data[16];
+   features->tilt_x_max = get_unaligned_le16([17]);
+   features->tilt_y_max = get_unaligned_le16([19]);
 
dev_dbg(>dev,
"x_max:%d, y_max:%d, pressure:%d, fw:%d\n",
@@ -95,6 +103,7 @@ static irqreturn_t wacom_i2c_irq(int irq, void *dev_id)
u8 *data = wac_i2c->data;
unsigned int x, y, pressure;
unsigned char tsw, f1, f2, ers;
+   short tilt_x, tilt_y, distance;
int error;
 
error = i2c_master_recv(wac_i2c->client,
@@ -109,6 +118,11 @@ static irqreturn_t wacom_i2c_irq(int irq, void *dev_id)
x = le16_to_cpup((__le16 *)[4]);
y = le16_to_cpup((__le16 *)[6]);
pressure = le16_to_cpup((__le16 *)[8]);
+   distance = data[10];
+
+   /* Signed */
+   tilt_x = le16_to_cpup((__le16 *)[11]);
+   tilt_y = le16_to_cpup((__le16 *)[13]);
 
if (!wac_i2c->prox)
wac_i2c->tool = (data[3] & 0x0c) ?
@@ -123,6 +137,9 @@ static irqreturn_t wacom_i2c_irq(int irq, void *dev_id)
input_report_abs(input, ABS_X, x);
input_report_abs(input, ABS_Y, y);
input_report_abs(input, ABS_PRESSURE, pressure);
+   input_report_abs(input, ABS_DISTANCE, distance);
+   input_report_abs(input, ABS_TILT_X, tilt_x);
+   input_report_abs(input, ABS_TILT_Y, tilt_y);
input_sync(input);
 
 out:
@@ -197,7 +214,11 @@ static int wacom_i2c_probe(struct i2c_client *client,
input_set_abs_params(input, ABS_Y, 0, features.y_max, 0, 0);
input_set_abs_params(input, ABS_PRESSURE,
 0, features.pressure_max, 0, 0);
-
+   input_set_abs_params(input, ABS_DISTANCE, 0, features.distance_max, 0, 
0);
+   input_set_abs_params(input, ABS_TILT_X, -features.tilt_x_max,
+features.tilt_x_max, 0, 0);
+   input_set_abs_params(input, ABS_TILT_Y, -features.tilt_y_max,
+features.tilt_y_max, 0, 0);
input_set_drvdata(input, wac_i2c);
 
error = request_threaded_irq(client->irq, NULL, wacom_i2c_irq,
-- 
2.31.0

[PATCH v4 04/10] Input: wacom_i2c - Add touchscren properties

2021-03-25 Thread Alistair Francis

Connect touchscreen properties to the wacom_i2c.

Signed-off-by: Alistair Francis 
---
v4:
 - Add touchscreen_report_pos() as well

 drivers/input/touchscreen/wacom_i2c.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/input/touchscreen/wacom_i2c.c 
b/drivers/input/touchscreen/wacom_i2c.c
index eada68770671..ee1829dd35f4 100644
--- a/drivers/input/touchscreen/wacom_i2c.c
+++ b/drivers/input/touchscreen/wacom_i2c.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -33,6 +34,7 @@ struct wacom_features {
 struct wacom_i2c {
struct i2c_client *client;
struct input_dev *input;
+   struct touchscreen_properties props;
u8 data[WACOM_QUERY_SIZE];
bool prox;
int tool;
@@ -188,6 +190,9 @@ static int wacom_i2c_probe(struct i2c_client *client,
__set_bit(BTN_STYLUS2, input->keybit);
__set_bit(BTN_TOUCH, input->keybit);
 
+   touchscreen_parse_properties(input, true, _i2c->props);
+   touchscreen_report_pos(input, _i2c->props, features.x_max,
+  features.y_max, true);
input_set_abs_params(input, ABS_X, 0, features.x_max, 0, 0);
input_set_abs_params(input, ABS_Y, 0, features.y_max, 0, 0);
input_set_abs_params(input, ABS_PRESSURE,
-- 
2.31.0

[PATCH v4 03/10] Input: wacom_i2c - Add device tree support to wacom_i2c

2021-03-25 Thread Alistair Francis

Allow the wacom-i2c device to be exposed via device tree.

Signed-off-by: Alistair Francis 
---
v4:
 - Avoid unused variable warning by not using of_match_ptr()

 drivers/input/touchscreen/wacom_i2c.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/input/touchscreen/wacom_i2c.c 
b/drivers/input/touchscreen/wacom_i2c.c
index 1afc6bde2891..eada68770671 100644
--- a/drivers/input/touchscreen/wacom_i2c.c
+++ b/drivers/input/touchscreen/wacom_i2c.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define WACOM_CMD_QUERY0   0x04
@@ -262,10 +263,17 @@ static const struct i2c_device_id wacom_i2c_id[] = {
 };
 MODULE_DEVICE_TABLE(i2c, wacom_i2c_id);
 
+static const struct of_device_id wacom_i2c_of_match_table[] = {
+   { .compatible = "wacom,generic" },
+   {}
+};
+MODULE_DEVICE_TABLE(of, wacom_i2c_of_match_table);
+
 static struct i2c_driver wacom_i2c_driver = {
.driver = {
.name   = "wacom_i2c",
.pm = _i2c_pm,
+   .of_match_table = wacom_i2c_of_match_table,
},
 
.probe  = wacom_i2c_probe,
-- 
2.31.0

[PATCH v4 06/10] Input: wacom_i2c - Clean up the query device fields

2021-03-25 Thread Alistair Francis

Improve the query device fields to be more verbose.

Signed-off-by: Alistair Francis 
---
v4:
 - Remove the reset_control_reset() logic

 drivers/input/touchscreen/wacom_i2c.c | 64 ++-
 1 file changed, 44 insertions(+), 20 deletions(-)

diff --git a/drivers/input/touchscreen/wacom_i2c.c 
b/drivers/input/touchscreen/wacom_i2c.c
index 3b4bc514dc3f..84c7ccb737bd 100644
--- a/drivers/input/touchscreen/wacom_i2c.c
+++ b/drivers/input/touchscreen/wacom_i2c.c
@@ -13,15 +13,32 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
-#define WACOM_CMD_QUERY0   0x04
-#define WACOM_CMD_QUERY1   0x00
-#define WACOM_CMD_QUERY2   0x33
-#define WACOM_CMD_QUERY3   0x02
-#define WACOM_CMD_THROW0   0x05
-#define WACOM_CMD_THROW1   0x00
+// Registers
+#define WACOM_COMMAND_LSB   0x04
+#define WACOM_COMMAND_MSB   0x00
+
+#define WACOM_DATA_LSB  0x05
+#define WACOM_DATA_MSB  0x00
+
+// Report types
+#define REPORT_FEATURE  0x30
+
+// Requests / operations
+#define OPCODE_GET_REPORT   0x02
+
+// Power settings
+#define POWER_ON0x00
+#define POWER_SLEEP 0x01
+
+// Input report ids
+#define WACOM_PEN_DATA_REPORT   2
+#define WACOM_SHINONOME_REPORT  26
+
+#define WACOM_QUERY_REPORT 3
 #define WACOM_QUERY_SIZE   22
 
 struct wacom_features {
@@ -48,27 +65,30 @@ static int wacom_query_device(struct i2c_client *client,
  struct wacom_features *features)
 {
int ret;
-   u8 cmd1[] = { WACOM_CMD_QUERY0, WACOM_CMD_QUERY1,
-   WACOM_CMD_QUERY2, WACOM_CMD_QUERY3 };
-   u8 cmd2[] = { WACOM_CMD_THROW0, WACOM_CMD_THROW1 };
u8 data[WACOM_QUERY_SIZE];
+
+   u8 get_query_data_cmd[] = {
+   WACOM_COMMAND_LSB,
+   WACOM_COMMAND_MSB,
+   REPORT_FEATURE | WACOM_QUERY_REPORT,
+   OPCODE_GET_REPORT,
+   WACOM_DATA_LSB,
+   WACOM_DATA_MSB,
+   };
+
struct i2c_msg msgs[] = {
+   // Request reading of feature ReportID: 3 (Pen Query Data)
{
.addr = client->addr,
.flags = 0,
-   .len = sizeof(cmd1),
-   .buf = cmd1,
-   },
-   {
-   .addr = client->addr,
-   .flags = 0,
-   .len = sizeof(cmd2),
-   .buf = cmd2,
+   .len = sizeof(get_query_data_cmd),
+   .buf = get_query_data_cmd,
},
+   // Read 21 bytes
{
.addr = client->addr,
.flags = I2C_M_RD,
-   .len = sizeof(data),
+   .len = WACOM_QUERY_SIZE - 1,
.buf = data,
},
};
@@ -89,9 +109,13 @@ static int wacom_query_device(struct i2c_client *client,
features->tilt_y_max = get_unaligned_le16([19]);
 
dev_dbg(>dev,
-   "x_max:%d, y_max:%d, pressure:%d, fw:%d\n",
+   "x_max:%d, y_max:%d, pressure:%d, fw:%d, "
+   "distance: %d, phys distance: %d, "
+   "tilt_x_max: %d, tilt_y_max: %d\n",
features->x_max, features->y_max,
-   features->pressure_max, features->fw_version);
+   features->pressure_max, features->fw_version,
+   features->distance_max, features->distance_physical_max,
+   features->tilt_x_max, features->tilt_y_max);
 
return 0;
 }
-- 
2.31.0

[PATCH v4 01/10] dt-bindings: Add Wacom to vendor bindings

2021-03-25 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 Documentation/devicetree/bindings/vendor-prefixes.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml 
b/Documentation/devicetree/bindings/vendor-prefixes.yaml
index a8e1e8d2ef20..996f4de2fff5 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.yaml
+++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml
@@ -1216,6 +1216,8 @@ patternProperties:
 description: Vision Optical Technology Co., Ltd.
   "^vxt,.*":
 description: VXT Ltd
+  "^wacom,.*":
+description: Wacom Co., Ltd
   "^wand,.*":
 description: Wandbord (Technexion)
   "^waveshare,.*":
-- 
2.31.0

[PATCH v4 02/10] dt-bindings: touchscreen: Initial commit of wacom,generic

2021-03-25 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 .../input/touchscreen/wacom,generic.yaml  | 48 +++
 1 file changed, 48 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/input/touchscreen/wacom,generic.yaml

diff --git 
a/Documentation/devicetree/bindings/input/touchscreen/wacom,generic.yaml 
b/Documentation/devicetree/bindings/input/touchscreen/wacom,generic.yaml
new file mode 100644
index ..19bbfc55ed76
--- /dev/null
+++ b/Documentation/devicetree/bindings/input/touchscreen/wacom,generic.yaml
@@ -0,0 +1,48 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/input/touchscreen/wacom,generic.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Wacom I2C Controller
+
+maintainers:
+  - Alistair Francis 
+
+allOf:
+  - $ref: touchscreen.yaml#
+
+properties:
+  compatible:
+const: wacom,generic
+
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+  vdd-supply:
+maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - interrupts
+
+additionalProperties: false
+
+examples:
+  - |
+#include "dt-bindings/interrupt-controller/irq.h"
+i2c {
+#address-cells = <1>;
+#size-cells = <0>;
+digitiser@9 {
+compatible = "wacom,generic";
+reg = <0x9>;
+interrupt-parent = <>;
+interrupts = <9 IRQ_TYPE_LEVEL_LOW>;
+vdd-supply = <_touch>;
+};
+};
-- 
2.31.0

[PATCH v2] kernel/resource: Fix locking in request_free_mem_region

2021-03-25 Thread Alistair Popple

request_free_mem_region() is used to find an empty range of physical
addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
over the range of possible addresses using region_intersects() to see if
the range is free.

region_intersects() obtains a read lock before walking the resource tree
to protect against concurrent changes. However it drops the lock prior
to returning. This means by the time request_mem_region() is called in
request_free_mem_region() another thread may have already reserved the
requested region resulting in unexpected failures and a message in the
kernel log from hitting this condition:

/*
 * mm/hmm.c reserves physical addresses which then
 * become unavailable to other users.  Conflicts are
 * not expected.  Warn to aid debugging if encountered.
 */
if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) {
pr_warn("Unaddressable device %s %pR conflicts with %pR",
conflict->name, conflict, res);

To fix this create versions of region_intersects() and
request_mem_region() that allow the caller to take the appropriate lock
such that it may be held over the required calls.

Instead of creating another version of devm_request_mem_region() that
doesn't take the lock open-code it to allow the caller to pre-allocate
the required memory prior to taking the lock.

Fixes: 0c385190392d8 ("resource: add a not device managed 
request_free_mem_region variant")
Fixes: 0092908d16c60 ("mm: factor out a devm_request_free_mem_region helper")
Fixes: 4ef589dc9b10c ("mm/hmm/devmem: device memory hotplug using ZONE_DEVICE")
Signed-off-by: Alistair Popple 

---

v2:
 - Added Fixes tag

---
 kernel/resource.c | 146 +-
 1 file changed, 94 insertions(+), 52 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 627e61b0c124..2d4652383dd2 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -523,6 +523,34 @@ int __weak page_is_ram(unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(page_is_ram);
 
+static int __region_intersects(resource_size_t start, size_t size,
+  unsigned long flags, unsigned long desc)
+{
+   struct resource res;
+   int type = 0; int other = 0;
+   struct resource *p;
+
+   res.start = start;
+   res.end = start + size - 1;
+
+   for (p = iomem_resource.child; p ; p = p->sibling) {
+   bool is_type = (((p->flags & flags) == flags) &&
+   ((desc == IORES_DESC_NONE) ||
+(desc == p->desc)));
+
+   if (resource_overlaps(p, ))
+   is_type ? type++ : other++;
+   }
+
+   if (type == 0)
+   return REGION_DISJOINT;
+
+   if (other == 0)
+   return REGION_INTERSECTS;
+
+   return REGION_MIXED;
+}
+
 /**
  * region_intersects() - determine intersection of region with known resources
  * @start: region start address
@@ -546,31 +574,12 @@ EXPORT_SYMBOL_GPL(page_is_ram);
 int region_intersects(resource_size_t start, size_t size, unsigned long flags,
  unsigned long desc)
 {
-   struct resource res;
-   int type = 0; int other = 0;
-   struct resource *p;
-
-   res.start = start;
-   res.end = start + size - 1;
+   int rc;
 
read_lock(_lock);
-   for (p = iomem_resource.child; p ; p = p->sibling) {
-   bool is_type = (((p->flags & flags) == flags) &&
-   ((desc == IORES_DESC_NONE) ||
-(desc == p->desc)));
-
-   if (resource_overlaps(p, ))
-   is_type ? type++ : other++;
-   }
+   rc = __region_intersects(start, size, flags, desc);
read_unlock(_lock);
-
-   if (type == 0)
-   return REGION_DISJOINT;
-
-   if (other == 0)
-   return REGION_INTERSECTS;
-
-   return REGION_MIXED;
+   return rc;
 }
 EXPORT_SYMBOL_GPL(region_intersects);
 
@@ -1171,31 +1180,17 @@ struct address_space *iomem_get_mapping(void)
return smp_load_acquire(_inode)->i_mapping;
 }
 
-/**
- * __request_region - create a new busy resource region
- * @parent: parent resource descriptor
- * @start: resource start address
- * @n: resource region size
- * @name: reserving caller's ID string
- * @flags: IO resource flags
- */
-struct resource * __request_region(struct resource *parent,
-  resource_size_t start, resource_size_t n,
-  const char *name, int flags)
+static bool request_region_locked(struct resource *parent,
+   struct resource *res, resource_size_t start,
+   resource_size_t n, const char *name, int 
flags)
 {
-   DECLARE_WAITQUEUE(wait, c

[PATCH] kernel/resource: Fix locking in request_free_mem_region

2021-03-25 Thread Alistair Popple

request_free_mem_region() is used to find an empty range of physical
addresses for hotplugging ZONE_DEVICE memory. It does this by iterating
over the range of possible addresses using region_intersects() to see if
the range is free.

region_intersects() obtains a read lock before walking the resource tree
to protect against concurrent changes. However it drops the lock prior
to returning. This means by the time request_mem_region() is called in
request_free_mem_region() another thread may have already reserved the
requested region resulting in unexpected failures and a message in the
kernel log from hitting this condition:

/*
 * mm/hmm.c reserves physical addresses which then
 * become unavailable to other users.  Conflicts are
 * not expected.  Warn to aid debugging if encountered.
 */
if (conflict->desc == IORES_DESC_DEVICE_PRIVATE_MEMORY) {
pr_warn("Unaddressable device %s %pR conflicts with %pR",
conflict->name, conflict, res);

To fix this create versions of region_intersects() and
request_mem_region() that allow the caller to take the appropriate lock
such that it may be held over the required calls.

Instead of creating another version of devm_request_mem_region() that
doesn't take the lock open-code it to allow the caller to pre-allocate
the required memory prior to taking the lock.

Signed-off-by: Alistair Popple 
---
 kernel/resource.c | 146 +-
 1 file changed, 94 insertions(+), 52 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index 627e61b0c124..2d4652383dd2 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -523,6 +523,34 @@ int __weak page_is_ram(unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(page_is_ram);
 
+static int __region_intersects(resource_size_t start, size_t size,
+  unsigned long flags, unsigned long desc)
+{
+   struct resource res;
+   int type = 0; int other = 0;
+   struct resource *p;
+
+   res.start = start;
+   res.end = start + size - 1;
+
+   for (p = iomem_resource.child; p ; p = p->sibling) {
+   bool is_type = (((p->flags & flags) == flags) &&
+   ((desc == IORES_DESC_NONE) ||
+(desc == p->desc)));
+
+   if (resource_overlaps(p, ))
+   is_type ? type++ : other++;
+   }
+
+   if (type == 0)
+   return REGION_DISJOINT;
+
+   if (other == 0)
+   return REGION_INTERSECTS;
+
+   return REGION_MIXED;
+}
+
 /**
  * region_intersects() - determine intersection of region with known resources
  * @start: region start address
@@ -546,31 +574,12 @@ EXPORT_SYMBOL_GPL(page_is_ram);
 int region_intersects(resource_size_t start, size_t size, unsigned long flags,
  unsigned long desc)
 {
-   struct resource res;
-   int type = 0; int other = 0;
-   struct resource *p;
-
-   res.start = start;
-   res.end = start + size - 1;
+   int rc;
 
read_lock(_lock);
-   for (p = iomem_resource.child; p ; p = p->sibling) {
-   bool is_type = (((p->flags & flags) == flags) &&
-   ((desc == IORES_DESC_NONE) ||
-(desc == p->desc)));
-
-   if (resource_overlaps(p, ))
-   is_type ? type++ : other++;
-   }
+   rc = __region_intersects(start, size, flags, desc);
read_unlock(_lock);
-
-   if (type == 0)
-   return REGION_DISJOINT;
-
-   if (other == 0)
-   return REGION_INTERSECTS;
-
-   return REGION_MIXED;
+   return rc;
 }
 EXPORT_SYMBOL_GPL(region_intersects);
 
@@ -1171,31 +1180,17 @@ struct address_space *iomem_get_mapping(void)
return smp_load_acquire(_inode)->i_mapping;
 }
 
-/**
- * __request_region - create a new busy resource region
- * @parent: parent resource descriptor
- * @start: resource start address
- * @n: resource region size
- * @name: reserving caller's ID string
- * @flags: IO resource flags
- */
-struct resource * __request_region(struct resource *parent,
-  resource_size_t start, resource_size_t n,
-  const char *name, int flags)
+static bool request_region_locked(struct resource *parent,
+   struct resource *res, resource_size_t start,
+   resource_size_t n, const char *name, int 
flags)
 {
-   DECLARE_WAITQUEUE(wait, current);
-   struct resource *res = alloc_resource(GFP_KERNEL);
struct resource *orig_parent = parent;
-
-   if (!res)
-   return NULL;
+   DECLARE_WAITQUEUE(wait, current);
 
res->name = name;
res->start = start;
res->end = start + n - 1;

[PATCH v7 8/8] nouveau/svm: Implement atomic SVM access

2021-03-25 Thread Alistair Popple

Some NVIDIA GPUs do not support direct atomic access to system memory
via PCIe. Instead this must be emulated by granting the GPU exclusive
access to the memory. This is achieved by replacing CPU page table
entries with special swap entries that fault on userspace access.

The driver then grants the GPU permission to update the page undergoing
atomic access via the GPU page tables. When CPU access to the page is
required a CPU fault is raised which calls into the device driver via
MMU notifiers to revoke the atomic access. The original page table
entries are then restored allowing CPU access to proceed.

Signed-off-by: Alistair Popple 

---

v7:
* Removed magic values for fault access levels
* Improved readability of fault comparison code

v4:
* Check that page table entries haven't changed before mapping on the
  device
---
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
 drivers/gpu/drm/nouveau/nouveau_svm.c | 126 --
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c|   6 +
 4 files changed, 123 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvif/if000c.h 
b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
index d6dd40f21eed..9c7ff56831c5 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/if000c.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
@@ -77,6 +77,7 @@ struct nvif_vmm_pfnmap_v0 {
 #define NVIF_VMM_PFNMAP_V0_APER   0x00f0ULL
 #define NVIF_VMM_PFNMAP_V0_HOST   0xULL
 #define NVIF_VMM_PFNMAP_V0_VRAM   0x0010ULL
+#define NVIF_VMM_PFNMAP_V0_A 0x0004ULL
 #define NVIF_VMM_PFNMAP_V0_W  0x0002ULL
 #define NVIF_VMM_PFNMAP_V0_V  0x0001ULL
 #define NVIF_VMM_PFNMAP_V0_NONE   0xULL
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index a195e48c9aee..81526d65b4e2 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct nouveau_svm {
struct nouveau_drm *drm;
@@ -67,6 +68,11 @@ struct nouveau_svm {
} buffer[1];
 };
 
+#define FAULT_ACCESS_READ 0
+#define FAULT_ACCESS_WRITE 1
+#define FAULT_ACCESS_ATOMIC 2
+#define FAULT_ACCESS_PREFETCH 3
+
 #define SVM_DBG(s,f,a...) NV_DEBUG((s)->drm, "svm: "f"\n", ##a)
 #define SVM_ERR(s,f,a...) NV_WARN((s)->drm, "svm: "f"\n", ##a)
 
@@ -411,6 +417,24 @@ nouveau_svm_fault_cancel_fault(struct nouveau_svm *svm,
  fault->client);
 }
 
+static int
+nouveau_svm_fault_priority(u8 fault)
+{
+   switch (fault) {
+   case FAULT_ACCESS_PREFETCH:
+   return 0;
+   case FAULT_ACCESS_READ:
+   return 1;
+   case FAULT_ACCESS_WRITE:
+   return 2;
+   case FAULT_ACCESS_ATOMIC:
+   return 3;
+   default:
+   WARN_ON_ONCE(1);
+   return -1;
+   }
+}
+
 static int
 nouveau_svm_fault_cmp(const void *a, const void *b)
 {
@@ -421,9 +445,8 @@ nouveau_svm_fault_cmp(const void *a, const void *b)
return ret;
if ((ret = (s64)fa->addr - fb->addr))
return ret;
-   /*XXX: atomic? */
-   return (fa->access == 0 || fa->access == 3) -
-  (fb->access == 0 || fb->access == 3);
+   return nouveau_svm_fault_priority(fa->access) -
+   nouveau_svm_fault_priority(fb->access);
 }
 
 static void
@@ -487,6 +510,10 @@ static bool nouveau_svm_range_invalidate(struct 
mmu_interval_notifier *mni,
struct svm_notifier *sn =
container_of(mni, struct svm_notifier, notifier);
 
+   if (range->event == MMU_NOTIFY_EXCLUSIVE &&
+   range->owner == sn->svmm->vmm->cli->drm->dev)
+   return true;
+
/*
 * serializes the update to mni->invalidate_seq done by caller and
 * prevents invalidation of the PTE from progressing while HW is being
@@ -555,6 +582,71 @@ static void nouveau_hmm_convert_pfn(struct nouveau_drm 
*drm,
args->p.phys[0] |= NVIF_VMM_PFNMAP_V0_W;
 }
 
+static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
+  struct nouveau_drm *drm,
+  struct nouveau_pfnmap_args *args, u32 size,
+  struct svm_notifier *notifier)
+{
+   unsigned long timeout =
+   jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
+   struct mm_struct *mm = svmm->notifier.mm;
+   struct page *page;
+   unsigned long start = args->p.addr;
+

[PATCH v7 2/8] mm/swapops: Rework swap entry manipulation code

2021-03-25 Thread Alistair Popple

Both migration and device private pages use special swap entries that
are manipluated by a range of inline functions. The arguments to these
are somewhat inconsitent so rework them to remove flag type arguments
and to make the arguments similar for both read and write entry
creation.

Signed-off-by: Alistair Popple 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Ralph Campbell 
---
 include/linux/swapops.h | 56 ++---
 mm/debug_vm_pgtable.c   | 12 -
 mm/hmm.c|  2 +-
 mm/huge_memory.c| 26 +--
 mm/hugetlb.c| 10 +---
 mm/memory.c | 10 +---
 mm/migrate.c| 26 ++-
 mm/mprotect.c   | 10 +---
 mm/rmap.c   | 10 +---
 9 files changed, 100 insertions(+), 62 deletions(-)

diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 139be8235ad2..4dfd807ae52a 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -100,35 +100,35 @@ static inline void *swp_to_radix_entry(swp_entry_t entry)
 }
 
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
-static inline swp_entry_t make_device_private_entry(struct page *page, bool 
write)
+static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset)
 {
-   return swp_entry(write ? SWP_DEVICE_WRITE : SWP_DEVICE_READ,
-page_to_pfn(page));
+   return swp_entry(SWP_DEVICE_READ, offset);
 }
 
-static inline bool is_device_private_entry(swp_entry_t entry)
+static inline swp_entry_t make_writable_device_private_entry(pgoff_t offset)
 {
-   int type = swp_type(entry);
-   return type == SWP_DEVICE_READ || type == SWP_DEVICE_WRITE;
+   return swp_entry(SWP_DEVICE_WRITE, offset);
 }
 
-static inline void make_device_private_entry_read(swp_entry_t *entry)
+static inline bool is_device_private_entry(swp_entry_t entry)
 {
-   *entry = swp_entry(SWP_DEVICE_READ, swp_offset(*entry));
+   int type = swp_type(entry);
+   return type == SWP_DEVICE_READ || type == SWP_DEVICE_WRITE;
 }
 
-static inline bool is_write_device_private_entry(swp_entry_t entry)
+static inline bool is_writable_device_private_entry(swp_entry_t entry)
 {
return unlikely(swp_type(entry) == SWP_DEVICE_WRITE);
 }
 #else /* CONFIG_DEVICE_PRIVATE */
-static inline swp_entry_t make_device_private_entry(struct page *page, bool 
write)
+static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset)
 {
return swp_entry(0, 0);
 }
 
-static inline void make_device_private_entry_read(swp_entry_t *entry)
+static inline swp_entry_t make_writable_device_private_entry(pgoff_t offset)
 {
+   return swp_entry(0, 0);
 }
 
 static inline bool is_device_private_entry(swp_entry_t entry)
@@ -136,35 +136,32 @@ static inline bool is_device_private_entry(swp_entry_t 
entry)
return false;
 }
 
-static inline bool is_write_device_private_entry(swp_entry_t entry)
+static inline bool is_writable_device_private_entry(swp_entry_t entry)
 {
return false;
 }
 #endif /* CONFIG_DEVICE_PRIVATE */
 
 #ifdef CONFIG_MIGRATION
-static inline swp_entry_t make_migration_entry(struct page *page, int write)
-{
-   BUG_ON(!PageLocked(compound_head(page)));
-
-   return swp_entry(write ? SWP_MIGRATION_WRITE : SWP_MIGRATION_READ,
-   page_to_pfn(page));
-}
-
 static inline int is_migration_entry(swp_entry_t entry)
 {
return unlikely(swp_type(entry) == SWP_MIGRATION_READ ||
swp_type(entry) == SWP_MIGRATION_WRITE);
 }
 
-static inline int is_write_migration_entry(swp_entry_t entry)
+static inline int is_writable_migration_entry(swp_entry_t entry)
 {
return unlikely(swp_type(entry) == SWP_MIGRATION_WRITE);
 }
 
-static inline void make_migration_entry_read(swp_entry_t *entry)
+static inline swp_entry_t make_readable_migration_entry(pgoff_t offset)
 {
-   *entry = swp_entry(SWP_MIGRATION_READ, swp_offset(*entry));
+   return swp_entry(SWP_MIGRATION_READ, offset);
+}
+
+static inline swp_entry_t make_writable_migration_entry(pgoff_t offset)
+{
+   return swp_entry(SWP_MIGRATION_WRITE, offset);
 }
 
 extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
@@ -174,21 +171,28 @@ extern void migration_entry_wait(struct mm_struct *mm, 
pmd_t *pmd,
 extern void migration_entry_wait_huge(struct vm_area_struct *vma,
struct mm_struct *mm, pte_t *pte);
 #else
+static inline swp_entry_t make_readable_migration_entry(pgoff_t offset)
+{
+   return swp_entry(0, 0);
+}
+
+static inline swp_entry_t make_writable_migration_entry(pgoff_t offset)
+{
+   return swp_entry(0, 0);
+}
 
-#define make_migration_entry(page, write) swp_entry(0, 0)
 static inline int is_migration_entry(swp_entry_t swp)
 {
return 0;
 }
 
-static inline void make_migration_entry_read(swp_entry_t *entryp) { }
 static inline void __migration_entry_wait(struct mm_struct *mm

[PATCH v7 7/8] nouveau/svm: Refactor nouveau_range_fault

2021-03-25 Thread Alistair Popple

Call mmu_interval_notifier_insert() as part of nouveau_range_fault().
This doesn't introduce any functional change but makes it easier for a
subsequent patch to alter the behaviour of nouveau_range_fault() to
support GPU atomic operations.

Signed-off-by: Alistair Popple 
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 34 ---
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 94f841026c3b..a195e48c9aee 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -567,18 +567,27 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
unsigned long hmm_pfns[1];
struct hmm_range range = {
.notifier = >notifier,
-   .start = notifier->notifier.interval_tree.start,
-   .end = notifier->notifier.interval_tree.last + 1,
.default_flags = hmm_flags,
.hmm_pfns = hmm_pfns,
.dev_private_owner = drm->dev,
};
-   struct mm_struct *mm = notifier->notifier.mm;
+   struct mm_struct *mm = svmm->notifier.mm;
int ret;
 
+   ret = mmu_interval_notifier_insert(>notifier, mm,
+   args->p.addr, args->p.size,
+   _svm_mni_ops);
+   if (ret)
+   return ret;
+
+   range.start = notifier->notifier.interval_tree.start;
+   range.end = notifier->notifier.interval_tree.last + 1;
+
while (true) {
-   if (time_after(jiffies, timeout))
-   return -EBUSY;
+   if (time_after(jiffies, timeout)) {
+   ret = -EBUSY;
+   goto out;
+   }
 
range.notifier_seq = mmu_interval_read_begin(range.notifier);
mmap_read_lock(mm);
@@ -587,7 +596,7 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
if (ret) {
if (ret == -EBUSY)
continue;
-   return ret;
+   goto out;
}
 
mutex_lock(>mutex);
@@ -606,6 +615,9 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
svmm->vmm->vmm.object.client->super = false;
mutex_unlock(>mutex);
 
+out:
+   mmu_interval_notifier_remove(>notifier);
+
return ret;
 }
 
@@ -727,14 +739,8 @@ nouveau_svm_fault(struct nvif_notify *notify)
}
 
notifier.svmm = svmm;
-   ret = mmu_interval_notifier_insert(, mm,
-  args.i.p.addr, args.i.p.size,
-  _svm_mni_ops);
-   if (!ret) {
-   ret = nouveau_range_fault(svmm, svm->drm, ,
-   sizeof(args), hmm_flags, );
-   mmu_interval_notifier_remove();
-   }
+   ret = nouveau_range_fault(svmm, svm->drm, ,
+   sizeof(args), hmm_flags, );
mmput(mm);
 
limit = args.i.p.addr + args.i.p.size;
-- 
2.20.1

[PATCH v7 6/8] mm: Selftests for exclusive device memory

2021-03-25 Thread Alistair Popple

Adds some selftests for exclusive device memory.

Signed-off-by: Alistair Popple 
Acked-by: Jason Gunthorpe 
Tested-by: Ralph Campbell 
Reviewed-by: Ralph Campbell 
---
 lib/test_hmm.c | 124 +++
 lib/test_hmm_uapi.h|   2 +
 tools/testing/selftests/vm/hmm-tests.c | 158 +
 3 files changed, 284 insertions(+)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 5c9f5a020c1d..305a9d9e2b4c 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "test_hmm_uapi.h"
 
@@ -46,6 +47,7 @@ struct dmirror_bounce {
unsigned long   cpages;
 };
 
+#define DPT_XA_TAG_ATOMIC 1UL
 #define DPT_XA_TAG_WRITE 3UL
 
 /*
@@ -619,6 +621,54 @@ static void dmirror_migrate_alloc_and_copy(struct 
migrate_vma *args,
}
 }
 
+static int dmirror_check_atomic(struct dmirror *dmirror, unsigned long start,
+unsigned long end)
+{
+   unsigned long pfn;
+
+   for (pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); pfn++) {
+   void *entry;
+   struct page *page;
+
+   entry = xa_load(>pt, pfn);
+   page = xa_untag_pointer(entry);
+   if (xa_pointer_tag(entry) == DPT_XA_TAG_ATOMIC)
+   return -EPERM;
+   }
+
+   return 0;
+}
+
+static int dmirror_atomic_map(unsigned long start, unsigned long end,
+ struct page **pages, struct dmirror *dmirror)
+{
+   unsigned long pfn, mapped = 0;
+   int i;
+
+   /* Map the migrated pages into the device's page tables. */
+   mutex_lock(>mutex);
+
+   for (i = 0, pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); 
pfn++, i++) {
+   void *entry;
+
+   if (!pages[i])
+   continue;
+
+   entry = pages[i];
+   entry = xa_tag_pointer(entry, DPT_XA_TAG_ATOMIC);
+   entry = xa_store(>pt, pfn, entry, GFP_ATOMIC);
+   if (xa_is_err(entry)) {
+   mutex_unlock(>mutex);
+   return xa_err(entry);
+   }
+
+   mapped++;
+   }
+
+   mutex_unlock(>mutex);
+   return mapped;
+}
+
 static int dmirror_migrate_finalize_and_map(struct migrate_vma *args,
struct dmirror *dmirror)
 {
@@ -661,6 +711,71 @@ static int dmirror_migrate_finalize_and_map(struct 
migrate_vma *args,
return 0;
 }
 
+static int dmirror_exclusive(struct dmirror *dmirror,
+struct hmm_dmirror_cmd *cmd)
+{
+   unsigned long start, end, addr;
+   unsigned long size = cmd->npages << PAGE_SHIFT;
+   struct mm_struct *mm = dmirror->notifier.mm;
+   struct page *pages[64];
+   struct dmirror_bounce bounce;
+   unsigned long next;
+   int ret;
+
+   start = cmd->addr;
+   end = start + size;
+   if (end < start)
+   return -EINVAL;
+
+   /* Since the mm is for the mirrored process, get a reference first. */
+   if (!mmget_not_zero(mm))
+   return -EINVAL;
+
+   mmap_read_lock(mm);
+   for (addr = start; addr < end; addr = next) {
+   int i, mapped;
+
+   if (end < addr + (ARRAY_SIZE(pages) << PAGE_SHIFT))
+   next = end;
+   else
+   next = addr + (ARRAY_SIZE(pages) << PAGE_SHIFT);
+
+   ret = make_device_exclusive_range(mm, addr, next, pages, NULL);
+   mapped = dmirror_atomic_map(addr, next, pages, dmirror);
+   for (i = 0; i < ret; i++) {
+   if (pages[i]) {
+   unlock_page(pages[i]);
+   put_page(pages[i]);
+   }
+   }
+
+   if (addr + (mapped << PAGE_SHIFT) < next) {
+   mmap_read_unlock(mm);
+   mmput(mm);
+   return -EBUSY;
+   }
+   }
+   mmap_read_unlock(mm);
+   mmput(mm);
+
+   /* Return the migrated data for verification. */
+   ret = dmirror_bounce_init(, start, size);
+   if (ret)
+   return ret;
+   mutex_lock(>mutex);
+   ret = dmirror_do_read(dmirror, start, end, );
+   mutex_unlock(>mutex);
+   if (ret == 0) {
+   if (copy_to_user(u64_to_user_ptr(cmd->ptr), bounce.ptr,
+bounce.size))
+   ret = -EFAULT;
+   }
+
+   cmd->cpages = bounce.cpages;
+   dmirror_bounce_fini();
+   return ret;
+}
+
 static int dmirror_migrate(struct dmirror *dmirror,
   struct hmm_dmirror_cmd *cmd)
 {
@@ -949,6 +1064,15 @@ static long dmi

[PATCH v7 5/8] mm: Device exclusive memory access

2021-03-25 Thread Alistair Popple

Some devices require exclusive write access to shared virtual
memory (SVM) ranges to perform atomic operations on that memory. This
requires CPU page tables to be updated to deny access whilst atomic
operations are occurring.

In order to do this introduce a new swap entry
type (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for
exclusive access by a device all page table mappings for the particular
range are replaced with device exclusive swap entries. This causes any
CPU access to the page to result in a fault.

Faults are resovled by replacing the faulting entry with the original
mapping. This results in MMU notifiers being called which a driver uses
to update access permissions such as revoking atomic access. After
notifiers have been called the device will no longer have exclusive
access to the region.

Signed-off-by: Alistair Popple 
Reviewed-by: Christoph Hellwig 

---

v7:
* Added Christoph's Reviewed-by.
* Minor cosmetic cleanups suggested by Christoph.
* Replace mmu_notifier_range_init_migrate/exclusive with
  mmu_notifier_range_init_owner as suggested by Christoph.
* Replaced lock_page() with lock_page_retry() when handling faults.
* Restrict to anonymous pages for now.

v6:
* Fixed a bisectablity issue due to incorrectly applying the rename of
  migrate_pgmap_owner to the wrong patches for Nouveau and hmm_test.

v5:
* Renamed range->migrate_pgmap_owner to range->owner.
* Added MMU_NOTIFY_EXCLUSIVE to allow passing of a driver cookie which
  allows notifiers called as a result of make_device_exclusive_range() to
  be ignored.
* Added a check to try_to_protect_one() to detect if the pages originally
  returned from get_user_pages() have been unmapped or not.
* Removed check_device_exclusive_range() as it is no longer required with
  the other changes.
* Documentation update.

v4:
* Add function to check that mappings are still valid and exclusive.
* s/long/unsigned long/ in make_device_exclusive_entry().
---
 Documentation/vm/hmm.rst  |  19 ++-
 drivers/gpu/drm/nouveau/nouveau_svm.c |   2 +-
 include/linux/mmu_notifier.h  |  26 ++--
 include/linux/rmap.h  |   4 +
 include/linux/swap.h  |   4 +-
 include/linux/swapops.h   |  44 +-
 lib/test_hmm.c|   2 +-
 mm/hmm.c  |   5 +
 mm/memory.c   | 108 -
 mm/migrate.c  |  10 +-
 mm/mprotect.c |   8 +
 mm/page_vma_mapped.c  |   9 +-
 mm/rmap.c | 210 ++
 13 files changed, 426 insertions(+), 25 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index 09e28507f5b2..a14c2938e7af 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -332,7 +332,7 @@ between device driver specific code and shared common code:
walks to fill in the ``args->src`` array with PFNs to be migrated.
The ``invalidate_range_start()`` callback is passed a
``struct mmu_notifier_range`` with the ``event`` field set to
-   ``MMU_NOTIFY_MIGRATE`` and the ``migrate_pgmap_owner`` field set to
+   ``MMU_NOTIFY_MIGRATE`` and the ``owner`` field set to
the ``args->pgmap_owner`` field passed to migrate_vma_setup(). This is
allows the device driver to skip the invalidation callback and only
invalidate device private MMU mappings that are actually migrating.
@@ -405,6 +405,23 @@ between device driver specific code and shared common code:
 
The lock can now be released.
 
+Exclusive access memory
+===
+
+Some devices have features such as atomic PTE bits that can be used to 
implement
+atomic access to system memory. To support atomic operations to a shared 
virtual
+memory page such a device needs access to that page which is exclusive of any
+userspace access from the CPU. The ``make_device_exclusive_range()`` function
+can be used to make a memory range inaccessible from userspace.
+
+This replaces all mappings for pages in the given range with special swap
+entries. Any attempt to access the swap entry results in a fault which is
+resovled by replacing the entry with the original mapping. A driver gets
+notified that the mapping has been changed by MMU notifiers, after which point
+it will no longer have exclusive access to the page. Exclusive access is
+guranteed to last until the driver drops the page lock and page reference, at
+which point any CPU faults on the page may proceed as described.
+
 Memory cgroup (memcg) and rss accounting
 
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index f18bd53da052..94f841026c3b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -265,7 +265,7 @@ nouveau_svmm_invalidate_range_start(struct mmu_notifier *mn,
 * the invalidation is han

[PATCH v7 1/8] mm: Remove special swap entry functions

2021-03-25 Thread Alistair Popple

Remove multiple similar inline functions for dealing with different
types of special swap entries.

Both migration and device private swap entries use the swap offset to
store a pfn. Instead of multiple inline functions to obtain a struct
page for each swap entry type use a common function
pfn_swap_entry_to_page(). Also open-code the various entry_to_pfn()
functions as this results is shorter code that is easier to understand.

Signed-off-by: Alistair Popple 
Reviewed-by: Ralph Campbell 
Reviewed-by: Christoph Hellwig 

---

v7:
* Reworded commit message to include pfn_swap_entry_to_page()
* Added Christoph's Reviewed-by

v6:
* Removed redundant compound_page() call from inside PageLocked()
* Fixed a minor build issue for s390 reported by kernel test bot

v4:
* Added pfn_swap_entry_to_page()
* Reinstated check that migration entries point to locked pages
* Removed #define swapcache_prepare which isn't needed for CONFIG_SWAP=0
  builds
---
 arch/s390/mm/pgtable.c  |  2 +-
 fs/proc/task_mmu.c  | 23 +-
 include/linux/swap.h|  4 +--
 include/linux/swapops.h | 69 ++---
 mm/hmm.c|  5 ++-
 mm/huge_memory.c|  4 +--
 mm/memcontrol.c |  2 +-
 mm/memory.c | 10 +++---
 mm/migrate.c|  6 ++--
 mm/page_vma_mapped.c|  6 ++--
 10 files changed, 50 insertions(+), 81 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 18205f851c24..eec3a9d7176e 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -691,7 +691,7 @@ static void ptep_zap_swap_entry(struct mm_struct *mm, 
swp_entry_t entry)
if (!non_swap_entry(entry))
dec_mm_counter(mm, MM_SWAPENTS);
else if (is_migration_entry(entry)) {
-   struct page *page = migration_entry_to_page(entry);
+   struct page *page = pfn_swap_entry_to_page(entry);
 
dec_mm_counter(mm, mm_counter(page));
}
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3cec6fbef725..08ee59d945c0 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -514,10 +514,8 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
} else {
mss->swap_pss += (u64)PAGE_SIZE << PSS_SHIFT;
}
-   } else if (is_migration_entry(swpent))
-   page = migration_entry_to_page(swpent);
-   else if (is_device_private_entry(swpent))
-   page = device_private_entry_to_page(swpent);
+   } else if (is_pfn_swap_entry(swpent))
+   page = pfn_swap_entry_to_page(swpent);
} else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
&& pte_none(*pte))) {
page = xa_load(>vm_file->f_mapping->i_pages,
@@ -549,7 +547,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
swp_entry_t entry = pmd_to_swp_entry(*pmd);
 
if (is_migration_entry(entry))
-   page = migration_entry_to_page(entry);
+   page = pfn_swap_entry_to_page(entry);
}
if (IS_ERR_OR_NULL(page))
return;
@@ -691,10 +689,8 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long 
hmask,
} else if (is_swap_pte(*pte)) {
swp_entry_t swpent = pte_to_swp_entry(*pte);
 
-   if (is_migration_entry(swpent))
-   page = migration_entry_to_page(swpent);
-   else if (is_device_private_entry(swpent))
-   page = device_private_entry_to_page(swpent);
+   if (is_pfn_swap_entry(swpent))
+   page = pfn_swap_entry_to_page(swpent);
}
if (page) {
int mapcount = page_mapcount(page);
@@ -1383,11 +1379,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct 
pagemapread *pm,
frame = swp_type(entry) |
(swp_offset(entry) << MAX_SWAPFILES_SHIFT);
flags |= PM_SWAP;
-   if (is_migration_entry(entry))
-   page = migration_entry_to_page(entry);
-
-   if (is_device_private_entry(entry))
-   page = device_private_entry_to_page(entry);
+   if (is_pfn_swap_entry(entry))
+   page = pfn_swap_entry_to_page(entry);
}
 
if (page && !PageAnon(page))
@@ -1444,7 +1437,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long 
addr, unsigned long end,
if (pmd_swp_soft_dirty(pmd))
flags |= PM_SOFT_DIRTY;
VM_BUG_ON(!is_pmd_migration_entry(pmd));
-   page = migration_entry_to_page(entry);
+   page = pfn_s

[PATCH v7 4/8] mm/rmap: Split migration into its own function

2021-03-25 Thread Alistair Popple

Migration is currently implemented as a mode of operation for
try_to_unmap_one() generally specified by passing the TTU_MIGRATION flag
or in the case of splitting a huge anonymous page TTU_SPLIT_FREEZE.

However it does not have much in common with the rest of the unmap
functionality of try_to_unmap_one() and thus splitting it into a
separate function reduces the complexity of try_to_unmap_one() making it
more readable.

Several simplifications can also be made in try_to_migrate_one() based
on the following observations:

 - All users of TTU_MIGRATION also set TTU_IGNORE_MLOCK.
 - No users of TTU_MIGRATION ever set TTU_IGNORE_HWPOISON.
 - No users of TTU_MIGRATION ever set TTU_BATCH_FLUSH.

TTU_SPLIT_FREEZE is a special case of migration used when splitting an
anonymous page. This is most easily dealt with by calling the correct
function from unmap_page() in mm/huge_memory.c  - either
try_to_migrate() for PageAnon or try_to_unmap().

Signed-off-by: Alistair Popple 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Ralph Campbell 

---

v5:
* Added comments about how PMD splitting works for migration vs.
  unmapping
* Tightened up the flag check in try_to_migrate() to be explicit about
  which TTU_XXX flags are supported.
---
 include/linux/rmap.h |   4 +-
 mm/huge_memory.c |  15 +-
 mm/migrate.c |   9 +-
 mm/rmap.c| 358 ---
 4 files changed, 280 insertions(+), 106 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index e26ac2d71346..6062e0cfca2d 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -86,8 +86,6 @@ struct anon_vma_chain {
 };
 
 enum ttu_flags {
-   TTU_MIGRATION   = 0x1,  /* migration mode */
-
TTU_SPLIT_HUGE_PMD  = 0x4,  /* split huge PMD if any */
TTU_IGNORE_MLOCK= 0x8,  /* ignore mlock */
TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */
@@ -96,7 +94,6 @@ enum ttu_flags {
 * do a final flush if necessary */
TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock:
 * caller holds it */
-   TTU_SPLIT_FREEZE= 0x100,/* freeze pte under 
splitting thp */
 };
 
 #ifdef CONFIG_MMU
@@ -193,6 +190,7 @@ static inline void page_dup_rmap(struct page *page, bool 
compound)
 int page_referenced(struct page *, int is_locked,
struct mem_cgroup *memcg, unsigned long *vm_flags);
 
+bool try_to_migrate(struct page *page, enum ttu_flags flags);
 bool try_to_unmap(struct page *, enum ttu_flags flags);
 
 /* Avoid racy checks */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 89af065cea5b..eab004331b97 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2357,16 +2357,21 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
 
 static void unmap_page(struct page *page)
 {
-   enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK |
-   TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
+   enum ttu_flags ttu_flags = TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD;
bool unmap_success;
 
VM_BUG_ON_PAGE(!PageHead(page), page);
 
if (PageAnon(page))
-   ttu_flags |= TTU_SPLIT_FREEZE;
-
-   unmap_success = try_to_unmap(page, ttu_flags);
+   unmap_success = try_to_migrate(page, ttu_flags);
+   else
+   /*
+* Don't install migration entries for file backed pages. This
+* helps handle cases when i_size is in the middle of the page
+* as there is no need to unmap pages beyond i_size manually.
+*/
+   unmap_success = try_to_unmap(page, ttu_flags |
+   TTU_IGNORE_MLOCK);
VM_BUG_ON_PAGE(!unmap_success, page);
 }
 
diff --git a/mm/migrate.c b/mm/migrate.c
index b752543adb64..cc4612e2a246 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1130,7 +1130,7 @@ static int __unmap_and_move(struct page *page, struct 
page *newpage,
/* Establish migration ptes */
VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
page);
-   try_to_unmap(page, TTU_MIGRATION|TTU_IGNORE_MLOCK);
+   try_to_migrate(page, 0);
page_was_mapped = 1;
}
 
@@ -1332,7 +1332,7 @@ static int unmap_and_move_huge_page(new_page_t 
get_new_page,
 
if (page_mapped(hpage)) {
bool mapping_locked = false;
-   enum ttu_flags ttu = TTU_MIGRATION|TTU_IGNORE_MLOCK;
+   enum ttu_flags ttu = 0;
 
if (!PageAnon(hpage)) {
/*
@@ -1349,7 +1349,7 @@ static int unmap_and_move_huge_page(new_page_t 
get_new_page,
ttu |= TTU_RMAP_LOCKED;
}
 
-   try_to_unmap(hpage, ttu);
+   try_to_migrate(hpage, ttu);

[PATCH v7 3/8] mm/rmap: Split try_to_munlock from try_to_unmap

2021-03-25 Thread Alistair Popple

The behaviour of try_to_unmap_one() is difficult to follow because it
performs different operations based on a fairly large set of flags used
in different combinations.

TTU_MUNLOCK is one such flag. However it is exclusively used by
try_to_munlock() which specifies no other flags. Therefore rather than
overload try_to_unmap_one() with unrelated behaviour split this out into
it's own function and remove the flag.

Signed-off-by: Alistair Popple 
Reviewed-by: Ralph Campbell 
Reviewed-by: Christoph Hellwig 

---

v7:
* Added Christoph's Reviewed-by

v4:
* Removed redundant check for VM_LOCKED
---
 include/linux/rmap.h |  1 -
 mm/rmap.c| 40 
 2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index def5c62c93b3..e26ac2d71346 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -87,7 +87,6 @@ struct anon_vma_chain {
 
 enum ttu_flags {
TTU_MIGRATION   = 0x1,  /* migration mode */
-   TTU_MUNLOCK = 0x2,  /* munlock mode */
 
TTU_SPLIT_HUGE_PMD  = 0x4,  /* split huge PMD if any */
TTU_IGNORE_MLOCK= 0x8,  /* ignore mlock */
diff --git a/mm/rmap.c b/mm/rmap.c
index 977e70803ed8..d02bade5245b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1405,10 +1405,6 @@ static bool try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
struct mmu_notifier_range range;
enum ttu_flags flags = (enum ttu_flags)(long)arg;
 
-   /* munlock has nothing to gain from examining un-locked vmas */
-   if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
-   return true;
-
if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
is_zone_device_page(page) && !is_device_private_page(page))
return true;
@@ -1469,8 +1465,6 @@ static bool try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
page_vma_mapped_walk_done();
break;
}
-   if (flags & TTU_MUNLOCK)
-   continue;
}
 
/* Unexpected PMD-mapped THP? */
@@ -1784,6 +1778,37 @@ bool try_to_unmap(struct page *page, enum ttu_flags 
flags)
return !page_mapcount(page) ? true : false;
 }
 
+static bool try_to_munlock_one(struct page *page, struct vm_area_struct *vma,
+unsigned long address, void *arg)
+{
+   struct page_vma_mapped_walk pvmw = {
+   .page = page,
+   .vma = vma,
+   .address = address,
+   };
+
+   /* munlock has nothing to gain from examining un-locked vmas */
+   if (!(vma->vm_flags & VM_LOCKED))
+   return true;
+
+   while (page_vma_mapped_walk()) {
+   /* PTE-mapped THP are never mlocked */
+   if (!PageTransCompound(page)) {
+   /*
+* Holding pte lock, we do *not* need
+* mmap_lock here
+*/
+   mlock_vma_page(page);
+   }
+   page_vma_mapped_walk_done();
+
+   /* found a mlocked page, no point continuing munlock check */
+   return false;
+   }
+
+   return true;
+}
+
 /**
  * try_to_munlock - try to munlock a page
  * @page: the page to be munlocked
@@ -1796,8 +1821,7 @@ bool try_to_unmap(struct page *page, enum ttu_flags flags)
 void try_to_munlock(struct page *page)
 {
struct rmap_walk_control rwc = {
-   .rmap_one = try_to_unmap_one,
-   .arg = (void *)TTU_MUNLOCK,
+   .rmap_one = try_to_munlock_one,
.done = page_not_mapped,
.anon_lock = page_lock_anon_vma_read,
 
-- 
2.20.1

[PATCH v7 0/8] Add support for SVM atomics in Nouveau

2021-03-25 Thread Alistair Popple

This is the seventh version of a series to add support to Nouveau for
atomic memory operations on OpenCL shared virtual memory (SVM) regions.

This version primarily improves readability of the Nouveau fault priority
calculation code along with other minor functional and cosmetic
improvements listed in the changelogs.

Exclusive device access is implemented by adding a new swap entry type
(SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry. The main
difference is that on fault the original entry is immediately restored by
the fault handler instead of waiting.

Restoring the entry triggers calls to MMU notifers which allows a device
driver to revoke the atomic access permission from the GPU prior to the CPU
finalising the entry.

Patches 1 & 2 refactor existing migration and device private entry
functions.

Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated
functionality into separate functions - try_to_migrate_one() and
try_to_munlock_one(). These should not change any functionality, but any
help testing would be much appreciated as I have not been able to test
every usage of try_to_unmap_one().

Patch 5 contains the bulk of the implementation for device exclusive
memory.

Patch 6 contains some additions to the HMM selftests to ensure everything
works as expected.

Patch 7 is a cleanup for the Nouveau SVM implementation.

Patch 8 contains the implementation of atomic access for the Nouveau
driver.

This has been tested using the latest upstream Mesa userspace with a simple
OpenCL test program which checks the results of atomic GPU operations on a
SVM buffer whilst also writing to the same buffer from the CPU.

Alistair Popple (8):
  mm: Remove special swap entry functions
  mm/swapops: Rework swap entry manipulation code
  mm/rmap: Split try_to_munlock from try_to_unmap
  mm/rmap: Split migration into its own function
  mm: Device exclusive memory access
  mm: Selftests for exclusive device memory
  nouveau/svm: Refactor nouveau_range_fault
  nouveau/svm: Implement atomic SVM access

 Documentation/vm/hmm.rst  |  19 +-
 arch/s390/mm/pgtable.c|   2 +-
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
 drivers/gpu/drm/nouveau/nouveau_svm.c | 156 -
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c|   6 +
 fs/proc/task_mmu.c|  23 +-
 include/linux/mmu_notifier.h  |  26 +-
 include/linux/rmap.h  |   9 +-
 include/linux/swap.h  |   8 +-
 include/linux/swapops.h   | 123 ++--
 lib/test_hmm.c| 126 +++-
 lib/test_hmm_uapi.h   |   2 +
 mm/debug_vm_pgtable.c |  12 +-
 mm/hmm.c  |  12 +-
 mm/huge_memory.c  |  45 +-
 mm/hugetlb.c  |  10 +-
 mm/memcontrol.c   |   2 +-
 mm/memory.c   | 128 +++-
 mm/migrate.c  |  51 +-
 mm/mprotect.c |  18 +-
 mm/page_vma_mapped.c  |  15 +-
 mm/rmap.c | 604 +++---
 tools/testing/selftests/vm/hmm-tests.c| 158 +
 24 files changed, 1282 insertions(+), 275 deletions(-)

-- 
2.20.1

Re: [PATCH 2/6] mfd: Initial commit of sy7636a

2021-03-25 Thread Alistair Francis

On Tue, Mar 23, 2021 at 5:35 AM Lee Jones  wrote:
>
> On Sat, 20 Mar 2021, Alistair Francis wrote:
>
> > On Thu, Feb 4, 2021 at 5:31 AM Lee Jones  wrote:
> > >
> > > On Sat, 16 Jan 2021, Alistair Francis wrote:
> > >
> > > > Initial support for the Silergy SY7636A Power Management chip
> > > > driver.
> > >
> > > Please remove "driver", as this is not support for the driver, it *is*
> > > the driver which supports the chip.
> >
> > Sorry for the long delay here.
> >
> > I have addressed your comments.
>
> [...]
>
> > > > diff --git a/drivers/mfd/sy7636a.c b/drivers/mfd/sy7636a.c
> > > > new file mode 100644
> > > > index ..39aac965d854
> > > > --- /dev/null
> > > > +++ b/drivers/mfd/sy7636a.c
> > > > @@ -0,0 +1,252 @@
> > > > +// SPDX-License-Identifier: GPL-2.0+
> > > > +/*
> > > > + * MFD driver for SY7636A chip
> > >
> > > "Parent driver".
> > >
> > > > + * Copyright (C) 2019 reMarkable AS - http://www.remarkable.com/
> > >
> > > This is quite out of date.  Please update.
> >
> > I don't own this copyright, so I would rather not change it.
>
> I'm not comfortable taking a new driver with an old Copyright.
>
> Maybe ask reMarkable if it's okay to bump it.
>
> > > > + * Author: Lars Ivar Miljeteig 
>
> Or ping this guy.

I reached out to him and have permission to bump the year.

>
> [...]
>
> > > > +int set_vcom_voltage_mv(struct regmap *regmap, unsigned int vcom)
> > > > +{
> > > > + int ret;
> > > > + unsigned int val;
> > > > +
> > > > + if (vcom < 0 || vcom > 5000)
> > >
> > > Please define min/max values.
> > >
> > > > + return -EINVAL;
> > > > +
> > > > + val = (unsigned int)(vcom / 10) & 0x1ff;
> > >
> > > As above.
> >
> > I have used defines for all of these.
> >
> > >
> > > > + ret = regmap_write(regmap, SY7636A_REG_VCOM_ADJUST_CTRL_L, val);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + ret = regmap_write(regmap, SY7636A_REG_VCOM_ADJUST_CTRL_H, val >> 
> > > > 8);
> > > > + if (ret)
> > > > + return ret;
> > > > +
> > > > + return 0;
> > > > +}
> > >
> > > Who calls these?
> >
> > They sysfs store and show functions.
>
> They should be in a power/regulator driver really.

Ok, I have moved these to the regulator.

>
> [...]
>
> > > > + if (val >= ARRAY_SIZE(states)) {
> > > > + dev_err(sy7636a->dev, "Unexpected value read from device: 
> > > > %u\n", val);
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + return snprintf(buf, PAGE_SIZE, "%s\n", states[val]);
> > > > +}
> > > > +static DEVICE_ATTR(state, 0444, state_show, NULL);
> > >
> > > You need to document new sysfs entries.
> >
> > I'm not sure how to document this. Do you mind pointing out an example
> > I can use?
>
> See the final paragraph in:
>
>   Documentation/filesystems/sysfs.rst

Thanks!

>
> [...]
>
> > > > +static struct attribute *sy7636a_sysfs_attrs[] = {
> > > > + _attr_state.attr,
> > > > + _attr_power_good.attr,
> > > > + _attr_vcom.attr,
> > > > + NULL,
> > > > +};
> > >
> > > These all look like power options?  Do they really belong here?
> >
> > From what I can tell I think they do. Let me know if you don't think so.
>
> As above, I think they should be in power or regulator.

Done.

Alistair

>
> [...]
>
> --
> Lee Jones [李琼斯]
> Senior Technical Lead - Developer Services
> Linaro.org │ Open source software for Arm SoCs
> Follow Linaro: Facebook | Twitter | Blog

Re: [PATCH 3/5] mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page()

2021-03-23 Thread Alistair Popple

On Tuesday, 23 March 2021 9:26:43 PM AEDT David Hildenbrand wrote:
> On 20.03.21 10:36, Miaohe Lin wrote:
> > If the zone device page does not belong to un-addressable device memory,
> > the variable entry will be uninitialized and lead to indeterminate pte
> > entry ultimately. Fix this unexpectant case and warn about it.
> 
> s/unexpectant/unexpected/
> 
> > 
> > Fixes: df6ad69838fc ("mm/device-public-memory: device memory cache 
coherent with CPU")
> > Signed-off-by: Miaohe Lin 
> > ---
> >   mm/migrate.c | 7 +++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index 20a3bf75270a..271081b014cb 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -2972,6 +2972,13 @@ static void migrate_vma_insert_page(struct 
migrate_vma *migrate,
> >   
> > swp_entry = make_device_private_entry(page, 
> > vma->vm_flags & 
VM_WRITE);
> > entry = swp_entry_to_pte(swp_entry);
> > +   } else {
> > +   /*
> > +* For now we only support migrating to un-addressable
> > +* device memory.
> > +*/
> > +   WARN_ON(1);
> > +   goto abort;
> 
> Fix it by crashing the kernel with panic_on_warn? :)
> 
> If this case can actual happen, than no WARN_ON() - rather a 
> pr_warn_once(). If this case cannot happen, why do we even care (it's 
> not a fix then)?

There is also already a check for this case in migrate_vma_pages(). The 
problem is it happens after the call to migrate_vma_insert_page(). I wonder if 
instead it would be better just to move the existing check to before that 
call?

Re: [PATCH 1/2] Input: wacom_i2c - do not force interrupt trigger

2021-03-22 Thread Alistair

On Sun, Mar 21, 2021, at 6:00 PM, Dmitry Torokhov wrote:
> Instead of forcing interrupt trigger to "level low" rely on the
> platform to set it up according to how it is wired on the given
> board.
> 
> Signed-off-by: Dmitry Torokhov  <mailto:dmitry.torokhov%40gmail.com>>

Reviewed-by: Alistair Francis 

Alistair

> ---
> drivers/input/touchscreen/wacom_i2c.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/input/touchscreen/wacom_i2c.c 
> b/drivers/input/touchscreen/wacom_i2c.c
> index 1afc6bde2891..609ff84e7693 100644
> --- a/drivers/input/touchscreen/wacom_i2c.c
> +++ b/drivers/input/touchscreen/wacom_i2c.c
> @@ -195,8 +195,7 @@ static int wacom_i2c_probe(struct i2c_client *client,
> input_set_drvdata(input, wac_i2c);
>  
> error = request_threaded_irq(client->irq, NULL, wacom_i2c_irq,
> -  IRQF_TRIGGER_LOW | IRQF_ONESHOT,
> -  "wacom_i2c", wac_i2c);
> +  IRQF_ONESHOT, "wacom_i2c", wac_i2c);
> if (error) {
> dev_err(>dev,
> "Failed to enable IRQ, error: %d\n", error);
> -- 
> 2.31.0.rc2.261.g7f71774620-goog
> 
>

Re: [PATCH 2/2] Input: wacom_i2c - switch to using managed resources

2021-03-22 Thread Alistair

On Sun, Mar 21, 2021, at 6:00 PM, Dmitry Torokhov wrote:
> This simplifies error unwinding path and allows us to get rid of
> remove() method.
> 
> Signed-off-by: Dmitry Torokhov  <mailto:dmitry.torokhov%40gmail.com>>

Reviewed-by: Alistair Francis 

Alistair

> ---
> drivers/input/touchscreen/wacom_i2c.c | 55 +--
> 1 file changed, 17 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/input/touchscreen/wacom_i2c.c 
> b/drivers/input/touchscreen/wacom_i2c.c
> index 609ff84e7693..22826c387da5 100644
> --- a/drivers/input/touchscreen/wacom_i2c.c
> +++ b/drivers/input/touchscreen/wacom_i2c.c
> @@ -145,15 +145,16 @@ static void wacom_i2c_close(struct input_dev *dev)
> }
>  
> static int wacom_i2c_probe(struct i2c_client *client,
> -  const struct i2c_device_id *id)
> +const struct i2c_device_id *id)
> {
> + struct device *dev = >dev;
> struct wacom_i2c *wac_i2c;
> struct input_dev *input;
> struct wacom_features features = { 0 };
> int error;
>  
> if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) {
> - dev_err(>dev, "i2c_check_functionality error\n");
> + dev_err(dev, "i2c_check_functionality error\n");
> return -EIO;
> }
>  
> @@ -161,21 +162,22 @@ static int wacom_i2c_probe(struct i2c_client *client,
> if (error)
> return error;
>  
> - wac_i2c = kzalloc(sizeof(*wac_i2c), GFP_KERNEL);
> - input = input_allocate_device();
> - if (!wac_i2c || !input) {
> - error = -ENOMEM;
> - goto err_free_mem;
> - }
> + wac_i2c = devm_kzalloc(dev, sizeof(*wac_i2c), GFP_KERNEL);
> + if (!wac_i2c)
> + return -ENOMEM;
>  
> wac_i2c->client = client;
> +
> + input = devm_input_allocate_device(dev);
> + if (!input)
> + return -ENOMEM;
> +
> wac_i2c->input = input;
>  
> input->name = "Wacom I2C Digitizer";
> input->id.bustype = BUS_I2C;
> input->id.vendor = 0x56a;
> input->id.version = features.fw_version;
> - input->dev.parent = >dev;
> input->open = wacom_i2c_open;
> input->close = wacom_i2c_close;
>  
> @@ -194,12 +196,11 @@ static int wacom_i2c_probe(struct i2c_client *client,
>  
> input_set_drvdata(input, wac_i2c);
>  
> - error = request_threaded_irq(client->irq, NULL, wacom_i2c_irq,
> -  IRQF_ONESHOT, "wacom_i2c", wac_i2c);
> + error = devm_request_threaded_irq(dev, client->irq, NULL, wacom_i2c_irq,
> +   IRQF_ONESHOT, "wacom_i2c", wac_i2c);
> if (error) {
> - dev_err(>dev,
> - "Failed to enable IRQ, error: %d\n", error);
> - goto err_free_mem;
> + dev_err(dev, "Failed to request IRQ: %d\n", error);
> + return error;
> }
>  
> /* Disable the IRQ, we'll enable it in wac_i2c_open() */
> @@ -207,31 +208,10 @@ static int wacom_i2c_probe(struct i2c_client *client,
>  
> error = input_register_device(wac_i2c->input);
> if (error) {
> - dev_err(>dev,
> - "Failed to register input device, error: %d\n", error);
> - goto err_free_irq;
> + dev_err(dev, "Failed to register input device: %d\n", error);
> + return error;
> }
>  
> - i2c_set_clientdata(client, wac_i2c);
> - return 0;
> -
> -err_free_irq:
> - free_irq(client->irq, wac_i2c);
> -err_free_mem:
> - input_free_device(input);
> - kfree(wac_i2c);
> -
> - return error;
> -}
> -
> -static int wacom_i2c_remove(struct i2c_client *client)
> -{
> - struct wacom_i2c *wac_i2c = i2c_get_clientdata(client);
> -
> - free_irq(client->irq, wac_i2c);
> - input_unregister_device(wac_i2c->input);
> - kfree(wac_i2c);
> -
> return 0;
> }
>  
> @@ -268,7 +248,6 @@ static struct i2c_driver wacom_i2c_driver = {
> },
>  
> .probe = wacom_i2c_probe,
> - .remove = wacom_i2c_remove,
> .id_table = wacom_i2c_id,
> };
> module_i2c_driver(wacom_i2c_driver);
> -- 
> 2.31.0.rc2.261.g7f71774620-goog
> 
>

[PATCH v6 3/3] ARM: imx7d-remarkable2.dts: Initial device tree for reMarkable2

2021-03-22 Thread Alistair Francis

The reMarkable2 (https://remarkable.com) is an e-ink tablet based on
the imx7d SoC.

This commit is based on the DTS provide by reMarkable but ported to the
latest kernel (instead of 4.14). I have removed references to
non-upstream devices and have changed the UART so that the console can
be accessed without having to open up the device via the OTG pogo pins.

Currently the kernel boots, but there is no support for the display.

WiFi is untested (no display or UART RX makes it hard to test), but
should work with the current upstream driver. As it's untested it's not
included in this commit.

Signed-off-by: Alistair Francis 
---
v6:
 - Remove unneeded disables (crypt and dma_apbh)
 - Remove uneeded enables (sdma)
 - Fixup memory entry
 - Remove unused reference

 arch/arm/boot/dts/Makefile  |   1 +
 arch/arm/boot/dts/imx7d-remarkable2.dts | 146 
 2 files changed, 147 insertions(+)
 create mode 100644 arch/arm/boot/dts/imx7d-remarkable2.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 8e5d4ab4e75e..dc8e378689af 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -660,6 +660,7 @@ dtb-$(CONFIG_SOC_IMX7D) += \
imx7d-pico-hobbit.dtb \
imx7d-pico-nymph.dtb \
imx7d-pico-pi.dtb \
+   imx7d-remarkable2.dtb \
imx7d-sbc-imx7.dtb \
imx7d-sdb.dtb \
imx7d-sdb-reva.dtb \
diff --git a/arch/arm/boot/dts/imx7d-remarkable2.dts 
b/arch/arm/boot/dts/imx7d-remarkable2.dts
new file mode 100644
index ..8cbae656395c
--- /dev/null
+++ b/arch/arm/boot/dts/imx7d-remarkable2.dts
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+/*
+ * Copyright (C) 2015 Freescale Semiconductor, Inc.
+ * Copyright (C) 2019 reMarkable AS - http://www.remarkable.com/
+ *
+ */
+
+/dts-v1/;
+
+#include "imx7d.dtsi"
+
+/ {
+   model = "reMarkable 2.0";
+   compatible = "remarkable,imx7d-remarkable2", "fsl,imx7d";
+
+   chosen {
+   stdout-path = 
+   };
+
+   memory@8000 {
+   device_type = "memory";
+   reg = <0x8000 0x4000>;
+   };
+};
+
+ {
+   assigned-clocks = < IMX7D_CLKO2_ROOT_SRC>,
+ < IMX7D_CLKO2_ROOT_DIV>;
+   assigned-clock-parents = < IMX7D_CKIL>;
+   assigned-clock-rates = <0>, <32768>;
+};
+
+_pwrkey {
+   status = "okay";
+};
+
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_uart1>;
+   assigned-clocks = < IMX7D_UART1_ROOT_SRC>;
+   assigned-clock-parents = < IMX7D_OSC_24M_CLK>;
+   status = "okay";
+};
+
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_uart6>;
+   assigned-clocks = < IMX7D_UART6_ROOT_SRC>;
+   assigned-clock-parents = < IMX7D_OSC_24M_CLK>;
+   status = "okay";
+};
+
+ {
+   srp-disable;
+   hnp-disable;
+   status = "okay";
+};
+
+ {
+   pinctrl-names = "default", "state_100mhz", "state_200mhz", "sleep";
+   pinctrl-0 = <_usdhc3>;
+   pinctrl-1 = <_usdhc3_100mhz>;
+   pinctrl-2 = <_usdhc3_200mhz>;
+   pinctrl-3 = <_usdhc3>;
+   assigned-clocks = < IMX7D_USDHC3_ROOT_CLK>;
+   assigned-clock-rates = <4>;
+   bus-width = <8>;
+   non-removable;
+   status = "okay";
+};
+
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_wdog>;
+   fsl,ext-reset-output;
+};
+
+ {
+   pinctrl_uart1: uart1grp {
+   fsl,pins = <
+   MX7D_PAD_UART1_TX_DATA__UART1_DCE_TX0x79
+   MX7D_PAD_UART1_RX_DATA__UART1_DCE_RX0x79
+   >;
+   };
+
+   pinctrl_uart6: uart6grp {
+   fsl,pins = <
+   MX7D_PAD_EPDC_DATA09__UART6_DCE_TX  0x79
+   MX7D_PAD_EPDC_DATA08__UART6_DCE_RX  0x79
+   >;
+   };
+
+   pinctrl_usdhc3: usdhc3grp {
+   fsl,pins = <
+   MX7D_PAD_SD3_CMD__SD3_CMD   0x59
+   MX7D_PAD_SD3_CLK__SD3_CLK   0x19
+   MX7D_PAD_SD3_DATA0__SD3_DATA0   0x59
+   MX7D_PAD_SD3_DATA1__SD3_DATA1   0x59
+   MX7D_PAD_SD3_DATA2__SD3_DATA2   0x59
+   MX7D_PAD_SD3_DATA3__SD3_DATA3   0x59
+   MX7D_PAD_SD3_DATA4__SD3_DATA4   0x59
+   MX7D_PAD_SD3_DATA5__SD3_DATA5   0x59
+   MX7D_PAD_SD3_DATA6__SD3_DATA6   0x59
+   MX7D_PAD_SD3_DATA7__SD3_DATA7   0x59
+

[PATCH v6 1/3] dt-bindings: Add vendor prefix for reMarkable

2021-03-22 Thread Alistair Francis

reMarkable AS produces eInk tablets

Signed-off-by: Alistair Francis 
---
 Documentation/devicetree/bindings/vendor-prefixes.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml 
b/Documentation/devicetree/bindings/vendor-prefixes.yaml
index f6064d84a424..a8e1e8d2ef20 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.yaml
+++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml
@@ -932,6 +932,8 @@ patternProperties:
 description: Unisoc Communications, Inc.
   "^realtek,.*":
 description: Realtek Semiconductor Corp.
+  "^remarkable,.*":
+description: reMarkable AS
   "^renesas,.*":
 description: Renesas Electronics Corporation
   "^rex,.*":
-- 
2.30.1

[PATCH v6 2/3] dt-bindings: arm: fsl: Add the reMarkable 2 e-Ink tablet

2021-03-22 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
 Documentation/devicetree/bindings/arm/fsl.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/arm/fsl.yaml 
b/Documentation/devicetree/bindings/arm/fsl.yaml
index 297c87f45db8..d139440c86b6 100644
--- a/Documentation/devicetree/bindings/arm/fsl.yaml
+++ b/Documentation/devicetree/bindings/arm/fsl.yaml
@@ -617,6 +617,7 @@ properties:
   - kam,imx7d-flex-concentrator   # Kamstrup OMNIA Flex 
Concentrator
   - kam,imx7d-flex-concentrator-mfg   # Kamstrup OMNIA Flex 
Concentrator in manufacturing mode
   - novtech,imx7d-meerkat96   # i.MX7 Meerkat96 Board
+  - remarkable,imx7d-remarkable2  # i.MX7D ReMarkable 2 E-Ink 
Tablet
   - technexion,imx7d-pico-dwarf   # TechNexion i.MX7D Pico-Dwarf
   - technexion,imx7d-pico-hobbit  # TechNexion i.MX7D Pico-Hobbit
   - technexion,imx7d-pico-nymph   # TechNexion i.MX7D Pico-Nymph
-- 
2.30.1

Re: [PATCH v6 5/8] mm: Device exclusive memory access

2021-03-22 Thread Alistair Popple

On Monday, 15 March 2021 6:42:45 PM AEDT Christoph Hellwig wrote:
> > +Not all devices support atomic access to system memory. To support atomic
> > +operations to a shared virtual memory page such a device needs access to 
that
> > +page which is exclusive of any userspace access from the CPU. The
> > +``make_device_exclusive_range()`` function can be used to make a memory 
range
> > +inaccessible from userspace.
> 
> s/Not all devices/Some devices/ ?

I will reword this. What I was trying to convey is that devices may have 
features which allow for atomics to be implemented with SW assistance.

> >  static inline int mm_has_notifiers(struct mm_struct *mm)
> > @@ -528,7 +534,17 @@ static inline void mmu_notifier_range_init_migrate(
> >  {
> > mmu_notifier_range_init(range, MMU_NOTIFY_MIGRATE, flags, vma, mm,
> > start, end);
> > -   range->migrate_pgmap_owner = pgmap;
> > +   range->owner = pgmap;
> > +}
> > +
> > +static inline void mmu_notifier_range_init_exclusive(
> > +   struct mmu_notifier_range *range, unsigned int flags,
> > +   struct vm_area_struct *vma, struct mm_struct *mm,
> > +   unsigned long start, unsigned long end, void *owner)
> > +{
> > +   mmu_notifier_range_init(range, MMU_NOTIFY_EXCLUSIVE, flags, vma, mm,
> > +   start, end);
> > +   range->owner = owner;
> 
> Maybe just replace mmu_notifier_range_init_migrate with a
> mmu_notifier_range_init_owner helper that takes the owner but does
> not hard code a type?

Ok. That does result in a function which takes a fair number of arguments, but 
I guess that's no worse than multiple functions hard coding the different 
types and it does result in less code overall.

> > }
> > +   } else if (is_device_exclusive_entry(entry)) {
> > +   page = pfn_swap_entry_to_page(entry);
> > +
> > +   get_page(page);
> > +   rss[mm_counter(page)]++;
> > +
> > +   if (is_writable_device_exclusive_entry(entry) &&
> > +   is_cow_mapping(vm_flags)) {
> > +   /*
> > +* COW mappings require pages in both
> > +* parent and child to be set to read.
> > +*/
> > +   entry = make_readable_device_exclusive_entry(
> > +   swp_offset(entry));
> > +   pte = swp_entry_to_pte(entry);
> > +   if (pte_swp_soft_dirty(*src_pte))
> > +   pte = pte_swp_mksoft_dirty(pte);
> > +   if (pte_swp_uffd_wp(*src_pte))
> > +   pte = pte_swp_mkuffd_wp(pte);
> > +   set_pte_at(src_mm, addr, src_pte, pte);
> > +   }
> 
> Just cosmetic, but I wonder if should factor this code block into
> a little helper.

In that case there are arguably are other bits of this function which should 
be refactored into helpers as well. Unless you feel strongly about it I would 
like to leave this as is and put together a future series to fix this and a 
couple of other areas I've noticed that could do with some refactoring/clean 
ups.

> > +
> > +static bool try_to_protect_one(struct page *page, struct vm_area_struct 
*vma,
> > +   unsigned long address, void *arg)
> > +{
> > +   struct mm_struct *mm = vma->vm_mm;
> > +   struct page_vma_mapped_walk pvmw = {
> > +   .page = page,
> > +   .vma = vma,
> > +   .address = address,
> > +   };
> > +   struct ttp_args *ttp = (struct ttp_args *) arg;
> 
> This cast should not be needed.
> 
> > +   return ttp.valid && (!page_mapcount(page) ? true : false);
> 
> This can be simplified to:
> 
>   return ttp.valid && !page_mapcount(page);
> 
> > +   npages = get_user_pages_remote(mm, start, npages,
> > +  FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD,
> > +  pages, NULL, NULL);
> > +   for (i = 0; i < npages; i++, start += PAGE_SIZE) {
> > +   if (!trylock_page(pages[i])) {
> > +   put_page(pages[i]);
> > +   pages[i] = NULL;
> > +   continue;
> > +   }
> > +
> > +   if (!try_to_protect(pages[i], mm, start, arg)) {
> > +   unlock_page(pages[i]);
> > +   put_page(pages[i]);
> > +   pages[i] = NULL;
> > +   }
> 
> Should the trylock_page go into try_to_protect to simplify the loop
> a little?  Also I wonder if we need make_device_exclusive_range or
> should just open code the get_user_pages_remote + try_to_protect
> loop in the callers, as that might allow them to also deduct other
> information about the found pages.

This function has evolved over time and putting the trylock_page into 
try_to_protect does simplify things nicely. I'm not sure what other 
information a caller could deduct through open coding though, but I guess in 
some

Re: [PATCH v6 8/8] nouveau/svm: Implement atomic SVM access

2021-03-22 Thread Alistair Popple

On Monday, 15 March 2021 6:51:13 PM AEDT Christoph Hellwig wrote:
> > -   /*XXX: atomic? */
> > -   return (fa->access == 0 || fa->access == 3) -
> > -  (fb->access == 0 || fb->access == 3);
> > +   /* Atomic access (2) has highest priority */
> > +   return (-1*(fa->access == 2) + (fa->access == 0 || fa->access == 3)) -
> > +  (-1*(fb->access == 2) + (fb->access == 0 || fb->access == 3));
> 
> This looks really unreabable.  If the magic values 0, 2 and 3 had names
> it might become a little more understadable, then factor the duplicated
> calculation of the priority value into a helper and we'll have code that
> mere humans can understand..

Fair enough, will add some definitions for the magic values.

> > +   mutex_lock(>mutex);
> > +   if (mmu_interval_read_retry(>notifier,
> > +   notifier_seq)) {
> > +   mutex_unlock(>mutex);
> > +   continue;
> > +   }
> > +   break;
> > +   }
> 
> This looks good, why not:
> 
>   mutex_lock(>mutex);
>   if (!mmu_interval_read_retry(>notifier,
>notifier_seq))
>   break;
>   mutex_unlock(>mutex);
>   }

I had copied that from nouveau_range_fault() but this suggestion is better. 
Will update, thanks for looking.

Re: [PATCH v6 1/8] mm: Remove special swap entry functions

2021-03-22 Thread Alistair Popple

On Monday, 15 March 2021 6:27:57 PM AEDT Christoph Hellwig wrote:
> On Fri, Mar 12, 2021 at 07:38:44PM +1100, Alistair Popple wrote:
> > Remove the migration and device private entry_to_page() and
> > entry_to_pfn() inline functions and instead open code them directly.
> > This results in shorter code which is easier to understand.
> 
> I think this commit log should mention pfn_swap_entry_to_page() now.

Will add. Thanks for the review.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig 
>

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 938 matches

Mail list logo