Re: [PATCH] powerpc/papr_scm: Make 'perf_stats' invisible if perf-stats unavailable

2021-05-05 Thread Aneesh Kumar K.V
Vaibhav Jain  writes:

> In case performance stats for an nvdimm are not available, reading the
> 'perf_stats' sysfs file returns an -ENOENT error. A better approach is
> to make the 'perf_stats' file entirely invisible to indicate that
> performance stats for an nvdimm are unavailable.
>
> So this patch updates 'papr_nd_attribute_group' to add a 'is_visible'
> callback implemented as newly introduced 'papr_nd_attribute_visible()'
> that returns an appropriate mode in case performance stats aren't
> supported in a given nvdimm.
>
> Also the initialization of 'papr_scm_priv.stat_buffer_len' is moved
> from papr_scm_nvdimm_init() to papr_scm_probe() so that it value is
> available when 'papr_nd_attribute_visible()' is called during nvdimm
> initialization.
>
> Fixes: 2d02bf835e57('powerpc/papr_scm: Fetch nvdimm performance stats from 
> PHYP')
> Signed-off-by: Vaibhav Jain 
> ---
>  arch/powerpc/platforms/pseries/papr_scm.c | 37 ---
>  1 file changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index 12f1513f0fca..90f0af8fefe8 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -907,6 +907,20 @@ static ssize_t flags_show(struct device *dev,
>  }
>  DEVICE_ATTR_RO(flags);
>  
> +umode_t papr_nd_attribute_visible(struct kobject *kobj, struct attribute 
> *attr,
> +   int n)
> +{
> + struct device *dev = container_of(kobj, typeof(*dev), kobj);
> + struct nvdimm *nvdimm = to_nvdimm(dev);
> + struct papr_scm_priv *p = nvdimm_provider_data(nvdimm);
> +
> + /* For if perf-stats not available remove perf_stats sysfs */
> + if (attr == _attr_perf_stats.attr && p->stat_buffer_len == 0)
> + return 0;
> +
> + return attr->mode;
> +}
> +
>  /* papr_scm specific dimm attributes */
>  static struct attribute *papr_nd_attributes[] = {
>   _attr_flags.attr,
> @@ -916,6 +930,7 @@ static struct attribute *papr_nd_attributes[] = {
>  
>  static struct attribute_group papr_nd_attribute_group = {
>   .name = "papr",
> + .is_visible = papr_nd_attribute_visible,
>   .attrs = papr_nd_attributes,
>  };
>  
> @@ -931,7 +946,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
>   struct nd_region_desc ndr_desc;
>   unsigned long dimm_flags;
>   int target_nid, online_nid;
> - ssize_t stat_size;
>  
>   p->bus_desc.ndctl = papr_scm_ndctl;
>   p->bus_desc.module = THIS_MODULE;
> @@ -1016,16 +1030,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv 
> *p)
>   list_add_tail(>region_list, _nd_regions);
>   mutex_unlock(_ndr_lock);
>  
> - /* Try retriving the stat buffer and see if its supported */
> - stat_size = drc_pmem_query_stats(p, NULL, 0);
> - if (stat_size > 0) {
> - p->stat_buffer_len = stat_size;
> - dev_dbg(>pdev->dev, "Max perf-stat size %lu-bytes\n",
> - p->stat_buffer_len);
> - } else {
> - dev_info(>pdev->dev, "Dimm performance stats unavailable\n");
> - }
> -
>   return 0;
>  
>  err: nvdimm_bus_unregister(p->bus);
> @@ -1102,6 +1106,7 @@ static int papr_scm_probe(struct platform_device *pdev)
>   u64 blocks, block_size;
>   struct papr_scm_priv *p;
>   const char *uuid_str;
> + ssize_t stat_size;
>   u64 uuid[2];
>   int rc;
>  
> @@ -1179,6 +1184,16 @@ static int papr_scm_probe(struct platform_device *pdev)
>   p->res.name  = pdev->name;
>   p->res.flags = IORESOURCE_MEM;
>  
> + /* Try retriving the stat buffer and see if its supported */
> + stat_size = drc_pmem_query_stats(p, NULL, 0);
> + if (stat_size > 0) {
> + p->stat_buffer_len = stat_size;
> + dev_dbg(>pdev->dev, "Max perf-stat size %lu-bytes\n",
> + p->stat_buffer_len);
> + } else {
> + dev_info(>pdev->dev, "Dimm performance stats unavailable\n");
> + }

With this patch 
https://lore.kernel.org/linuxppc-dev/20210505191606.51666-1-vaib...@linux.ibm.com
We are adding details of whyy performance stat query hcall failed. Do we
need to print again here?  Are we being more verbose here?

-aneesh
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v2] powerpc/papr_scm: Reduce error severity if nvdimm stats inaccessible

2021-05-05 Thread Ira Weiny
On Thu, May 06, 2021 at 12:46:06AM +0530, Vaibhav Jain wrote:
> Currently drc_pmem_qeury_stats() generates a dev_err in case
> "Enable Performance Information Collection" feature is disabled from
> HMC or performance stats are not available for an nvdimm. The error is
> of the form below:
> 
> papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Failed to query
>performance stats, Err:-10
> 
> This error message confuses users as it implies a possible problem
> with the nvdimm even though its due to a disabled/unavailable
> feature. We fix this by explicitly handling the H_AUTHORITY and
> H_UNSUPPORTED errors from the H_SCM_PERFORMANCE_STATS hcall.
> 
> In case of H_AUTHORITY error an info message is logged instead of an
> error, saying that "Permission denied while accessing performance
> stats". Also '-EACCES' error is return instead of -EPERM.

I thought you clarified before that this was a permission issue.  So why change
the error to EACCES?

> 
> In case of H_UNSUPPORTED error we return a -EPERM error back from
> drc_pmem_query_stats() indicating that performance stats-query
> operation is not supported on this nvdimm.

EPERM seems wrong here too...  ENOTSUP?

Ira
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [bug report] system panic at nfit_get_smbios_id+0x6e/0xf0 [nfit] during boot

2021-05-05 Thread Yi Zhang
On Sat, May 1, 2021 at 2:05 PM Dan Williams  wrote:
>
> On Fri, Apr 30, 2021 at 7:28 PM Yi Zhang  wrote:
> >
> > Hi
> >
> > With the latest Linux tree, my DCPMM server boot failed with the
> > bellow panic log, pls help check it, let me know if you need any test
> > for it.
>
> So v5.12 is ok but v5.12+ is not?
>
> Might you be able to bisect?

Hi Dan
This issue was introduced with this patch, let me know if you need more info.

commit cf16b05c607bd716a0a5726dc8d577a89fdc1777
Author: Bob Moore 
Date:   Tue Apr 6 14:30:15 2021 -0700

ACPICA: ACPI 6.4: NFIT: add Location Cookie field

Also, update struct size to reflect these changes in nfit core driver.

ACPICA commit af60199a9a1de9e6844929fd4cc22334522ed195

Link: https://github.com/acpica/acpica/commit/af60199a
Cc: Dan Williams 
Signed-off-by: Bob Moore 
Signed-off-by: Erik Kaneda 
Signed-off-by: Rafael J. Wysocki 

>
> If not can you send the nfit.gz from this command:
>
> acpidump -n NFIT | gzip -c > nfit.gz
>
> Also can you send the full dmesg? I don't suppose you see a message of
> this format before this failure:
>
> dev_err(acpi_desc->dev, "SPA %d missing DCR %d\n",
> spa->range_index, dcr);
>
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v1 07/11] mm/sparse-vmemmap: populate compound pagemaps

2021-05-05 Thread Dan Williams
On Thu, Mar 25, 2021 at 4:10 PM Joao Martins  wrote:
>
> A compound pagemap is a dev_pagemap with @align > PAGE_SIZE and it
> means that pages are mapped at a given huge page alignment and utilize
> uses compound pages as opposed to order-0 pages.
>
> To minimize struct page overhead we take advantage of the fact that

I'm not sure if Andrew is a member of the "we" police, but I am on
team "imperative tense".

"Take advantage of the fact that most tail pages look the same (except
the first two) to minimize struct page overhead."

...I think all the other "we"s below can be dropped without losing any meaning.

> most tail pages look the same (except the first two). We allocate a
> separate page for the vmemmap area which contains the head page and
> separate for the next 64 pages. The rest of the subsections then reuse
> this tail vmemmap page to initialize the rest of the tail pages.
>
> Sections are arch-dependent (e.g. on x86 it's 64M, 128M or 512M) and
> when initializing compound pagemap with big enough @align (e.g. 1G
> PUD) it  will cross various sections. To be able to reuse tail pages
> across sections belonging to the same gigantic page we fetch the
> @range being mapped (nr_ranges + 1).  If the section being mapped is
> not offset 0 of the @align, then lookup the PFN of the struct page
> address that preceeds it and use that to populate the entire

s/preceeds/precedes/

> section.
>
> On compound pagemaps with 2M align, this lets mechanism saves 6 pages

s/lets mechanism saves 6 pages/this mechanism lets 6 pages be saved/

> out of the 8 necessary PFNs necessary to set the subsection's 512
> struct pages being mapped. On a 1G compound pagemap it saves
> 4094 pages.
>
> Altmap isn't supported yet, given various restrictions in altmap pfn
> allocator, thus fallback to the already in use vmemmap_populate().

Perhaps also note that altmap for devmap mappings was there to relieve
the pressure of inordinate amounts of memmap space to map terabytes of
PMEM. With compound pages the motivation for altmaps for PMEM is
reduced.

>
> Signed-off-by: Joao Martins 
> ---
>  include/linux/mm.h  |   2 +-
>  mm/memremap.c   |   1 +
>  mm/sparse-vmemmap.c | 139 
>  3 files changed, 131 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 61474602c2b1..49d717ae40ae 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3040,7 +3040,7 @@ p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long 
> addr, int node);
>  pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node);
>  pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node);
>  pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node,
> -   struct vmem_altmap *altmap);
> +   struct vmem_altmap *altmap, void *block);
>  void *vmemmap_alloc_block(unsigned long size, int node);
>  struct vmem_altmap;
>  void *vmemmap_alloc_block_buf(unsigned long size, int node,
> diff --git a/mm/memremap.c b/mm/memremap.c
> index d160853670c4..2e6bc0b1ff00 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -345,6 +345,7 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
>  {
> struct mhp_params params = {
> .altmap = pgmap_altmap(pgmap),
> +   .pgmap = pgmap,
> .pgprot = PAGE_KERNEL,
> };
> const int nr_range = pgmap->nr_range;
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 8814876edcfa..f57c5eada099 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -141,16 +141,20 @@ void __meminit vmemmap_verify(pte_t *pte, int node,
>  }
>
>  pte_t * __meminit vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int 
> node,
> -  struct vmem_altmap *altmap)
> +  struct vmem_altmap *altmap, void 
> *block)

For type-safety I would make this argument a 'struct page *' and drop
the virt_to_page().

>  {
> pte_t *pte = pte_offset_kernel(pmd, addr);
> if (pte_none(*pte)) {
> pte_t entry;
> -   void *p;
> -
> -   p = vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap);
> -   if (!p)
> -   return NULL;
> +   void *p = block;
> +
> +   if (!block) {
> +   p = vmemmap_alloc_block_buf(PAGE_SIZE, node, altmap);
> +   if (!p)
> +   return NULL;
> +   } else if (!altmap) {
> +   get_page(virt_to_page(block));

This is either super subtle, or there is a missing put_page(). I'm
assuming that in the shutdown path the same page will be freed
multiple times because the same page is mapped multiple times.

Have you validated that the accounting is correct and all allocated
pages get freed? I feel this needs more than a comment, it 

Re: [PATCH v1 05/11] mm/sparse-vmemmap: add a pgmap argument to section activation

2021-05-05 Thread Dan Williams
On Wed, May 5, 2021 at 3:38 PM Joao Martins  wrote:
>
>
>
> On 5/5/21 11:34 PM, Dan Williams wrote:
> > On Thu, Mar 25, 2021 at 4:10 PM Joao Martins  
> > wrote:
> >>
> >> @altmap is stored in a dev_pagemap, but it will be repurposed for
> >> hotplug memory for storing the memmap in the hotplugged memory[*] and
> >> reusing the altmap infrastructure to that end. This is to say that
> >> altmap can't be replaced with a @pgmap as it is going to cover more than
> >> dev_pagemap backend altmaps.
> >
> > I was going to say, just pass the pgmap and lookup the altmap from
> > pgmap, but Oscar added a use case for altmap independent of pgmap. So
> > you might refresh this commit message to clarify why passing pgmap by
> > itself is not sufficient.
> >
> Isn't that what I am doing above with that exact paragraph? I even
> reference his series at the end of commit description :) in [*]

Oh, sorry, it didn't hit me explicitly that you were talking about
Oscar's work I thought you were referring to your own changes in this
set. I see it now... at a minimum the tense needs updating since
Oscar's changes are in the past not the future anymore. If it helps,
the following reads more direct to me: "In support of using compound
pages for devmap mappings, plumb the pgmap down to the
vmemmap_populate* implementation. Note that while altmap is
retrievable from pgmap the memory hotplug codes passes altmap without
pgmap, so both need to be independently plumbed."
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages

2021-05-05 Thread Dan Williams
On Wed, May 5, 2021 at 3:36 PM Joao Martins  wrote:
[..]
> > Ah yup, my eyes glazed over that. I think this is another place that
> > benefits from a more specific name than "align". "pfns_per_compound"
> > "compound_pfns"?
> >
> We are still describing a page, just not a base page. So perhaps 
> @pfns_per_hpage ?
>
> I am fine with @pfns_per_compound or @compound_pfns as well.

My only concern about hpage is that hpage implies PMD, where compound
is generic across PMD and PUD.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH] powerpc/papr_scm: Make 'perf_stats' invisible if perf-stats unavailable

2021-05-05 Thread kernel test robot
Hi Vaibhav,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on v5.12 next-20210505]
[cannot apply to scottwood/next mpe/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Vaibhav-Jain/powerpc-papr_scm-Make-perf_stats-invisible-if-perf-stats-unavailable/20210506-031853
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allyesconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://github.com/0day-ci/linux/commit/8e7ce04ec36dea95fad178585b697a9c5b5c259d
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Vaibhav-Jain/powerpc-papr_scm-Make-perf_stats-invisible-if-perf-stats-unavailable/20210506-031853
git checkout 8e7ce04ec36dea95fad178585b697a9c5b5c259d
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross W=1 
ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> arch/powerpc/platforms/pseries/papr_scm.c:903:9: warning: no previous 
>> prototype for 'papr_nd_attribute_visible' [-Wmissing-prototypes]
 903 | umode_t papr_nd_attribute_visible(struct kobject *kobj, struct 
attribute *attr,
 | ^


vim +/papr_nd_attribute_visible +903 arch/powerpc/platforms/pseries/papr_scm.c

   902  
 > 903  umode_t papr_nd_attribute_visible(struct kobject *kobj, struct 
 > attribute *attr,
   904int n)
   905  {
   906  struct device *dev = container_of(kobj, typeof(*dev), kobj);
   907  struct nvdimm *nvdimm = to_nvdimm(dev);
   908  struct papr_scm_priv *p = nvdimm_provider_data(nvdimm);
   909  
   910  /* For if perf-stats not available remove perf_stats sysfs */
   911  if (attr == _attr_perf_stats.attr && p->stat_buffer_len == 
0)
   912  return 0;
   913  
   914  return attr->mode;
   915  }
   916  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


好玩到欲罢不能,多元服务不断创新◎

2021-05-05 Thread litt audr
每天都上千人注册的游戏平台UC体育有最全的手机娱乐游戏且网址是唯一的官方网站,UC体育平台是大型的国际娱乐游戏的NO.1,玩家可以在网页上登录注册,官网还提供app下载安装,玩游戏最担心就是出款◎UC体育出款快速.不用再怕玩到黑网.◎UC体育用心经营.玩法最多.正规的游戏平台让您玩得开心又放心.这个优惠超推荐,人人有机会,个个有把握每天至少上万人注册.欢迎您也一起加入UC体育!◎◎如以上连接无法打开,
 请复制以下网址到浏览器中打开:https://de383.ku113.net


UC体育登入专用网址https://de383.ku113.net
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v1 06/11] mm/sparse-vmemmap: refactor vmemmap_populate_basepages()

2021-05-05 Thread Dan Williams
I suspect it's a good sign I'm only finding cosmetic and changelog
changes in the review... I have some more:

A year for now if I'm tracking down a problem and looking through mm
commits I would appreciate a subject line like the following:
"refactor core of vmemmap_populate_basepages() to helper" that gives
an idea of the impact and side effects of the change.

On Thu, Mar 25, 2021 at 4:10 PM Joao Martins  wrote:
>

I would add a lead in phrase like: "In preparation for describing a
memmap with compound pages, move the actual..."

> Move the actual pte population logic into a separate function
> vmemmap_populate_address() and have vmemmap_populate_basepages()
> walk through all base pages it needs to populate.

Aside from the above, looks good.

>
> Signed-off-by: Joao Martins 
> ---
>  mm/sparse-vmemmap.c | 44 ++--
>  1 file changed, 26 insertions(+), 18 deletions(-)
>
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 370728c206ee..8814876edcfa 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -216,33 +216,41 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long 
> addr, int node)
> return pgd;
>  }
>
> -int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long 
> end,
> -int node, struct vmem_altmap *altmap)
> +static int __meminit vmemmap_populate_address(unsigned long addr, int node,
> + struct vmem_altmap *altmap)
>  {
> -   unsigned long addr = start;
> pgd_t *pgd;
> p4d_t *p4d;
> pud_t *pud;
> pmd_t *pmd;
> pte_t *pte;
>
> +   pgd = vmemmap_pgd_populate(addr, node);
> +   if (!pgd)
> +   return -ENOMEM;
> +   p4d = vmemmap_p4d_populate(pgd, addr, node);
> +   if (!p4d)
> +   return -ENOMEM;
> +   pud = vmemmap_pud_populate(p4d, addr, node);
> +   if (!pud)
> +   return -ENOMEM;
> +   pmd = vmemmap_pmd_populate(pud, addr, node);
> +   if (!pmd)
> +   return -ENOMEM;
> +   pte = vmemmap_pte_populate(pmd, addr, node, altmap);
> +   if (!pte)
> +   return -ENOMEM;
> +   vmemmap_verify(pte, node, addr, addr + PAGE_SIZE);
> +}
> +
> +int __meminit vmemmap_populate_basepages(unsigned long start, unsigned long 
> end,
> +int node, struct vmem_altmap *altmap)
> +{
> +   unsigned long addr = start;
> +
> for (; addr < end; addr += PAGE_SIZE) {
> -   pgd = vmemmap_pgd_populate(addr, node);
> -   if (!pgd)
> -   return -ENOMEM;
> -   p4d = vmemmap_p4d_populate(pgd, addr, node);
> -   if (!p4d)
> -   return -ENOMEM;
> -   pud = vmemmap_pud_populate(p4d, addr, node);
> -   if (!pud)
> -   return -ENOMEM;
> -   pmd = vmemmap_pmd_populate(pud, addr, node);
> -   if (!pmd)
> -   return -ENOMEM;
> -   pte = vmemmap_pte_populate(pmd, addr, node, altmap);
> -   if (!pte)
> +   if (vmemmap_populate_address(addr, node, altmap))
> return -ENOMEM;
> -   vmemmap_verify(pte, node, addr, addr + PAGE_SIZE);
> }
>
> return 0;
> --
> 2.17.1
>
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v1 05/11] mm/sparse-vmemmap: add a pgmap argument to section activation

2021-05-05 Thread Joao Martins



On 5/5/21 11:34 PM, Dan Williams wrote:
> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins  
> wrote:
>>
>> @altmap is stored in a dev_pagemap, but it will be repurposed for
>> hotplug memory for storing the memmap in the hotplugged memory[*] and
>> reusing the altmap infrastructure to that end. This is to say that
>> altmap can't be replaced with a @pgmap as it is going to cover more than
>> dev_pagemap backend altmaps.
> 
> I was going to say, just pass the pgmap and lookup the altmap from
> pgmap, but Oscar added a use case for altmap independent of pgmap. So
> you might refresh this commit message to clarify why passing pgmap by
> itself is not sufficient.
> 
Isn't that what I am doing above with that exact paragraph? I even
reference his series at the end of commit description :) in [*]

> Other than that, looks good.
> 
/me nods
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages

2021-05-05 Thread Joao Martins
On 5/5/21 11:20 PM, Dan Williams wrote:
> On Wed, May 5, 2021 at 12:50 PM Joao Martins  
> wrote:
>> On 5/5/21 7:44 PM, Dan Williams wrote:
>>> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins  
>>> wrote:

 Add a new align property for struct dev_pagemap which specifies that a
 pagemap is composed of a set of compound pages of size @align, instead
 of base pages. When these pages are initialised, most are initialised as
 tail pages instead of order-0 pages.

 For certain ZONE_DEVICE users like device-dax which have a fixed page
 size, this creates an opportunity to optimize GUP and GUP-fast walkers,
 treating it the same way as THP or hugetlb pages.

 Signed-off-by: Joao Martins 
 ---
  include/linux/memremap.h | 13 +
  mm/memremap.c|  8 ++--
  mm/page_alloc.c  | 24 +++-
  3 files changed, 42 insertions(+), 3 deletions(-)

 diff --git a/include/linux/memremap.h b/include/linux/memremap.h
 index b46f63dcaed3..bb28d82dda5e 100644
 --- a/include/linux/memremap.h
 +++ b/include/linux/memremap.h
 @@ -114,6 +114,7 @@ struct dev_pagemap {
 struct completion done;
 enum memory_type type;
 unsigned int flags;
 +   unsigned long align;
>>>
>>> I think this wants some kernel-doc above to indicate that non-zero
>>> means "use compound pages with tail-page dedup" and zero / PAGE_SIZE
>>> means "use non-compound base pages".
>>
>> Got it. Are you thinking a kernel_doc on top of the variable above, or 
>> preferably on top
>> of pgmap_align()?
> 
> I was thinking in dev_pagemap because this value is more than just a
> plain alignment its restructuring the layout and construction of the
> memmap(), but when I say it that way it seems much less like a vanilla
> align value compared to a construction / geometry mode setting.
> 
/me nods

>>
>>> The non-zero value must be
>>> PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE.
>>> Hmm, maybe it should be an
>>> enum:
>>>
>>> enum devmap_geometry {
>>> DEVMAP_PTE,
>>> DEVMAP_PMD,
>>> DEVMAP_PUD,
>>> }
>>>
>> I suppose a converter between devmap_geometry and page_size would be needed 
>> too? And maybe
>> the whole dax/nvdimm align values change meanwhile (as a followup 
>> improvement)?
> 
> I think it is ok for dax/nvdimm to continue to maintain their align
> value because it should be ok to have 4MB align if the device really
> wanted. However, when it goes to map that alignment with
> memremap_pages() it can pick a mode. For example, it's already the
> case that dax->align == 1GB is mapped with DEVMAP_PTE today, so
> they're already separate concepts that can stay separate.
> 
Gotcha.

>>
>> Although to be fair we only ever care about compound page size in this 
>> series (and
>> similarly dax/nvdimm @align properties).
>>
>>> ...because it's more than just an alignment it's a structural
>>> definition of how the memmap is laid out.
>>>

Somehow I missed this other part of your response.

 const struct dev_pagemap_ops *ops;
 void *owner;
 int nr_range;
 @@ -130,6 +131,18 @@ static inline struct vmem_altmap *pgmap_altmap(struct 
 dev_pagemap *pgmap)
 return NULL;
  }

 +static inline unsigned long pgmap_align(struct dev_pagemap *pgmap)
 +{
 +   if (!pgmap || !pgmap->align)
 +   return PAGE_SIZE;
 +   return pgmap->align;
 +}
 +
 +static inline unsigned long pgmap_pfn_align(struct dev_pagemap *pgmap)
 +{
 +   return PHYS_PFN(pgmap_align(pgmap));
 +}
 +
  #ifdef CONFIG_ZONE_DEVICE
  bool pfn_zone_device_reserved(unsigned long pfn);
  void *memremap_pages(struct dev_pagemap *pgmap, int nid);
 diff --git a/mm/memremap.c b/mm/memremap.c
 index 805d761740c4..d160853670c4 100644
 --- a/mm/memremap.c
 +++ b/mm/memremap.c
 @@ -318,8 +318,12 @@ static int pagemap_range(struct dev_pagemap *pgmap, 
 struct mhp_params *params,
 memmap_init_zone_device(_DATA(nid)->node_zones[ZONE_DEVICE],
 PHYS_PFN(range->start),
 PHYS_PFN(range_len(range)), pgmap);
 -   percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
 -   - pfn_first(pgmap, range_id));
 +   if (pgmap_align(pgmap) > PAGE_SIZE)
 +   percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id)
 +   - pfn_first(pgmap, range_id)) / 
 pgmap_pfn_align(pgmap));
 +   else
 +   percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
 +   - pfn_first(pgmap, range_id));
 return 0;

  err_add_memory:
 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 58974067bbd4..3a77f9e43f3a 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c

Re: [PATCH v1 05/11] mm/sparse-vmemmap: add a pgmap argument to section activation

2021-05-05 Thread Dan Williams
On Thu, Mar 25, 2021 at 4:10 PM Joao Martins  wrote:
>
> @altmap is stored in a dev_pagemap, but it will be repurposed for
> hotplug memory for storing the memmap in the hotplugged memory[*] and
> reusing the altmap infrastructure to that end. This is to say that
> altmap can't be replaced with a @pgmap as it is going to cover more than
> dev_pagemap backend altmaps.

I was going to say, just pass the pgmap and lookup the altmap from
pgmap, but Oscar added a use case for altmap independent of pgmap. So
you might refresh this commit message to clarify why passing pgmap by
itself is not sufficient.

Other than that, looks good.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages

2021-05-05 Thread Dan Williams
On Wed, May 5, 2021 at 12:50 PM Joao Martins  wrote:
>
>
>
> On 5/5/21 7:44 PM, Dan Williams wrote:
> > On Thu, Mar 25, 2021 at 4:10 PM Joao Martins  
> > wrote:
> >>
> >> Add a new align property for struct dev_pagemap which specifies that a
> >> pagemap is composed of a set of compound pages of size @align, instead
> >> of base pages. When these pages are initialised, most are initialised as
> >> tail pages instead of order-0 pages.
> >>
> >> For certain ZONE_DEVICE users like device-dax which have a fixed page
> >> size, this creates an opportunity to optimize GUP and GUP-fast walkers,
> >> treating it the same way as THP or hugetlb pages.
> >>
> >> Signed-off-by: Joao Martins 
> >> ---
> >>  include/linux/memremap.h | 13 +
> >>  mm/memremap.c|  8 ++--
> >>  mm/page_alloc.c  | 24 +++-
> >>  3 files changed, 42 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> >> index b46f63dcaed3..bb28d82dda5e 100644
> >> --- a/include/linux/memremap.h
> >> +++ b/include/linux/memremap.h
> >> @@ -114,6 +114,7 @@ struct dev_pagemap {
> >> struct completion done;
> >> enum memory_type type;
> >> unsigned int flags;
> >> +   unsigned long align;
> >
> > I think this wants some kernel-doc above to indicate that non-zero
> > means "use compound pages with tail-page dedup" and zero / PAGE_SIZE
> > means "use non-compound base pages".
>
> Got it. Are you thinking a kernel_doc on top of the variable above, or 
> preferably on top
> of pgmap_align()?

I was thinking in dev_pagemap because this value is more than just a
plain alignment its restructuring the layout and construction of the
memmap(), but when I say it that way it seems much less like a vanilla
align value compared to a construction / geometry mode setting.

>
> > The non-zero value must be
> > PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE.
> > Hmm, maybe it should be an
> > enum:
> >
> > enum devmap_geometry {
> > DEVMAP_PTE,
> > DEVMAP_PMD,
> > DEVMAP_PUD,
> > }
> >
> I suppose a converter between devmap_geometry and page_size would be needed 
> too? And maybe
> the whole dax/nvdimm align values change meanwhile (as a followup 
> improvement)?

I think it is ok for dax/nvdimm to continue to maintain their align
value because it should be ok to have 4MB align if the device really
wanted. However, when it goes to map that alignment with
memremap_pages() it can pick a mode. For example, it's already the
case that dax->align == 1GB is mapped with DEVMAP_PTE today, so
they're already separate concepts that can stay separate.

>
> Although to be fair we only ever care about compound page size in this series 
> (and
> similarly dax/nvdimm @align properties).
>
> > ...because it's more than just an alignment it's a structural
> > definition of how the memmap is laid out.
> >
> >> const struct dev_pagemap_ops *ops;
> >> void *owner;
> >> int nr_range;
> >> @@ -130,6 +131,18 @@ static inline struct vmem_altmap *pgmap_altmap(struct 
> >> dev_pagemap *pgmap)
> >> return NULL;
> >>  }
> >>
> >> +static inline unsigned long pgmap_align(struct dev_pagemap *pgmap)
> >> +{
> >> +   if (!pgmap || !pgmap->align)
> >> +   return PAGE_SIZE;
> >> +   return pgmap->align;
> >> +}
> >> +
> >> +static inline unsigned long pgmap_pfn_align(struct dev_pagemap *pgmap)
> >> +{
> >> +   return PHYS_PFN(pgmap_align(pgmap));
> >> +}
> >> +
> >>  #ifdef CONFIG_ZONE_DEVICE
> >>  bool pfn_zone_device_reserved(unsigned long pfn);
> >>  void *memremap_pages(struct dev_pagemap *pgmap, int nid);
> >> diff --git a/mm/memremap.c b/mm/memremap.c
> >> index 805d761740c4..d160853670c4 100644
> >> --- a/mm/memremap.c
> >> +++ b/mm/memremap.c
> >> @@ -318,8 +318,12 @@ static int pagemap_range(struct dev_pagemap *pgmap, 
> >> struct mhp_params *params,
> >> memmap_init_zone_device(_DATA(nid)->node_zones[ZONE_DEVICE],
> >> PHYS_PFN(range->start),
> >> PHYS_PFN(range_len(range)), pgmap);
> >> -   percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
> >> -   - pfn_first(pgmap, range_id));
> >> +   if (pgmap_align(pgmap) > PAGE_SIZE)
> >> +   percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id)
> >> +   - pfn_first(pgmap, range_id)) / 
> >> pgmap_pfn_align(pgmap));
> >> +   else
> >> +   percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
> >> +   - pfn_first(pgmap, range_id));
> >> return 0;
> >>
> >>  err_add_memory:
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 58974067bbd4..3a77f9e43f3a 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -6285,6 +6285,8 @@ void __ref memmap_init_zone_device(struct zone *zone,
> >> unsigned long pfn, end_pfn = start_pfn + nr_pages;
> >>  

Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages

2021-05-05 Thread Joao Martins



On 5/5/21 7:44 PM, Dan Williams wrote:
> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins  
> wrote:
>>
>> Add a new align property for struct dev_pagemap which specifies that a
>> pagemap is composed of a set of compound pages of size @align, instead
>> of base pages. When these pages are initialised, most are initialised as
>> tail pages instead of order-0 pages.
>>
>> For certain ZONE_DEVICE users like device-dax which have a fixed page
>> size, this creates an opportunity to optimize GUP and GUP-fast walkers,
>> treating it the same way as THP or hugetlb pages.
>>
>> Signed-off-by: Joao Martins 
>> ---
>>  include/linux/memremap.h | 13 +
>>  mm/memremap.c|  8 ++--
>>  mm/page_alloc.c  | 24 +++-
>>  3 files changed, 42 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>> index b46f63dcaed3..bb28d82dda5e 100644
>> --- a/include/linux/memremap.h
>> +++ b/include/linux/memremap.h
>> @@ -114,6 +114,7 @@ struct dev_pagemap {
>> struct completion done;
>> enum memory_type type;
>> unsigned int flags;
>> +   unsigned long align;
> 
> I think this wants some kernel-doc above to indicate that non-zero
> means "use compound pages with tail-page dedup" and zero / PAGE_SIZE
> means "use non-compound base pages". 

Got it. Are you thinking a kernel_doc on top of the variable above, or 
preferably on top
of pgmap_align()?

> The non-zero value must be
> PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE. 
> Hmm, maybe it should be an
> enum:
> 
> enum devmap_geometry {
> DEVMAP_PTE,
> DEVMAP_PMD,
> DEVMAP_PUD,
> }
> 
I suppose a converter between devmap_geometry and page_size would be needed 
too? And maybe
the whole dax/nvdimm align values change meanwhile (as a followup improvement)?

Although to be fair we only ever care about compound page size in this series 
(and
similarly dax/nvdimm @align properties).

> ...because it's more than just an alignment it's a structural
> definition of how the memmap is laid out.
> 
>> const struct dev_pagemap_ops *ops;
>> void *owner;
>> int nr_range;
>> @@ -130,6 +131,18 @@ static inline struct vmem_altmap *pgmap_altmap(struct 
>> dev_pagemap *pgmap)
>> return NULL;
>>  }
>>
>> +static inline unsigned long pgmap_align(struct dev_pagemap *pgmap)
>> +{
>> +   if (!pgmap || !pgmap->align)
>> +   return PAGE_SIZE;
>> +   return pgmap->align;
>> +}
>> +
>> +static inline unsigned long pgmap_pfn_align(struct dev_pagemap *pgmap)
>> +{
>> +   return PHYS_PFN(pgmap_align(pgmap));
>> +}
>> +
>>  #ifdef CONFIG_ZONE_DEVICE
>>  bool pfn_zone_device_reserved(unsigned long pfn);
>>  void *memremap_pages(struct dev_pagemap *pgmap, int nid);
>> diff --git a/mm/memremap.c b/mm/memremap.c
>> index 805d761740c4..d160853670c4 100644
>> --- a/mm/memremap.c
>> +++ b/mm/memremap.c
>> @@ -318,8 +318,12 @@ static int pagemap_range(struct dev_pagemap *pgmap, 
>> struct mhp_params *params,
>> memmap_init_zone_device(_DATA(nid)->node_zones[ZONE_DEVICE],
>> PHYS_PFN(range->start),
>> PHYS_PFN(range_len(range)), pgmap);
>> -   percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
>> -   - pfn_first(pgmap, range_id));
>> +   if (pgmap_align(pgmap) > PAGE_SIZE)
>> +   percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id)
>> +   - pfn_first(pgmap, range_id)) / 
>> pgmap_pfn_align(pgmap));
>> +   else
>> +   percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
>> +   - pfn_first(pgmap, range_id));
>> return 0;
>>
>>  err_add_memory:
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 58974067bbd4..3a77f9e43f3a 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6285,6 +6285,8 @@ void __ref memmap_init_zone_device(struct zone *zone,
>> unsigned long pfn, end_pfn = start_pfn + nr_pages;
>> struct pglist_data *pgdat = zone->zone_pgdat;
>> struct vmem_altmap *altmap = pgmap_altmap(pgmap);
>> +   unsigned int pfn_align = pgmap_pfn_align(pgmap);
>> +   unsigned int order_align = order_base_2(pfn_align);
>> unsigned long zone_idx = zone_idx(zone);
>> unsigned long start = jiffies;
>> int nid = pgdat->node_id;
>> @@ -6302,10 +6304,30 @@ void __ref memmap_init_zone_device(struct zone *zone,
>> nr_pages = end_pfn - start_pfn;
>> }
>>
>> -   for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>> +   for (pfn = start_pfn; pfn < end_pfn; pfn += pfn_align) {
> 
> pfn_align is in bytes and pfn is in pages... is there a "pfn_align >>=
> PAGE_SHIFT" I missed somewhere?
> 
@pfn_align is in pages too. It's pgmap_align() which is in bytes:

+static inline unsigned long pgmap_pfn_align(struct dev_pagemap *pgmap)
+{
+   return 

[PATCH] powerpc/papr_scm: Make 'perf_stats' invisible if perf-stats unavailable

2021-05-05 Thread Vaibhav Jain
In case performance stats for an nvdimm are not available, reading the
'perf_stats' sysfs file returns an -ENOENT error. A better approach is
to make the 'perf_stats' file entirely invisible to indicate that
performance stats for an nvdimm are unavailable.

So this patch updates 'papr_nd_attribute_group' to add a 'is_visible'
callback implemented as newly introduced 'papr_nd_attribute_visible()'
that returns an appropriate mode in case performance stats aren't
supported in a given nvdimm.

Also the initialization of 'papr_scm_priv.stat_buffer_len' is moved
from papr_scm_nvdimm_init() to papr_scm_probe() so that it value is
available when 'papr_nd_attribute_visible()' is called during nvdimm
initialization.

Fixes: 2d02bf835e57('powerpc/papr_scm: Fetch nvdimm performance stats from 
PHYP')
Signed-off-by: Vaibhav Jain 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 37 ---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 12f1513f0fca..90f0af8fefe8 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -907,6 +907,20 @@ static ssize_t flags_show(struct device *dev,
 }
 DEVICE_ATTR_RO(flags);
 
+umode_t papr_nd_attribute_visible(struct kobject *kobj, struct attribute *attr,
+ int n)
+{
+   struct device *dev = container_of(kobj, typeof(*dev), kobj);
+   struct nvdimm *nvdimm = to_nvdimm(dev);
+   struct papr_scm_priv *p = nvdimm_provider_data(nvdimm);
+
+   /* For if perf-stats not available remove perf_stats sysfs */
+   if (attr == _attr_perf_stats.attr && p->stat_buffer_len == 0)
+   return 0;
+
+   return attr->mode;
+}
+
 /* papr_scm specific dimm attributes */
 static struct attribute *papr_nd_attributes[] = {
_attr_flags.attr,
@@ -916,6 +930,7 @@ static struct attribute *papr_nd_attributes[] = {
 
 static struct attribute_group papr_nd_attribute_group = {
.name = "papr",
+   .is_visible = papr_nd_attribute_visible,
.attrs = papr_nd_attributes,
 };
 
@@ -931,7 +946,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
struct nd_region_desc ndr_desc;
unsigned long dimm_flags;
int target_nid, online_nid;
-   ssize_t stat_size;
 
p->bus_desc.ndctl = papr_scm_ndctl;
p->bus_desc.module = THIS_MODULE;
@@ -1016,16 +1030,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
list_add_tail(>region_list, _nd_regions);
mutex_unlock(_ndr_lock);
 
-   /* Try retriving the stat buffer and see if its supported */
-   stat_size = drc_pmem_query_stats(p, NULL, 0);
-   if (stat_size > 0) {
-   p->stat_buffer_len = stat_size;
-   dev_dbg(>pdev->dev, "Max perf-stat size %lu-bytes\n",
-   p->stat_buffer_len);
-   } else {
-   dev_info(>pdev->dev, "Dimm performance stats unavailable\n");
-   }
-
return 0;
 
 err:   nvdimm_bus_unregister(p->bus);
@@ -1102,6 +1106,7 @@ static int papr_scm_probe(struct platform_device *pdev)
u64 blocks, block_size;
struct papr_scm_priv *p;
const char *uuid_str;
+   ssize_t stat_size;
u64 uuid[2];
int rc;
 
@@ -1179,6 +1184,16 @@ static int papr_scm_probe(struct platform_device *pdev)
p->res.name  = pdev->name;
p->res.flags = IORESOURCE_MEM;
 
+   /* Try retriving the stat buffer and see if its supported */
+   stat_size = drc_pmem_query_stats(p, NULL, 0);
+   if (stat_size > 0) {
+   p->stat_buffer_len = stat_size;
+   dev_dbg(>pdev->dev, "Max perf-stat size %lu-bytes\n",
+   p->stat_buffer_len);
+   } else {
+   dev_info(>pdev->dev, "Dimm performance stats unavailable\n");
+   }
+
rc = papr_scm_nvdimm_init(p);
if (rc)
goto err2;
-- 
2.31.1
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


[PATCH v2] powerpc/papr_scm: Reduce error severity if nvdimm stats inaccessible

2021-05-05 Thread Vaibhav Jain
Currently drc_pmem_qeury_stats() generates a dev_err in case
"Enable Performance Information Collection" feature is disabled from
HMC or performance stats are not available for an nvdimm. The error is
of the form below:

papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Failed to query
 performance stats, Err:-10

This error message confuses users as it implies a possible problem
with the nvdimm even though its due to a disabled/unavailable
feature. We fix this by explicitly handling the H_AUTHORITY and
H_UNSUPPORTED errors from the H_SCM_PERFORMANCE_STATS hcall.

In case of H_AUTHORITY error an info message is logged instead of an
error, saying that "Permission denied while accessing performance
stats". Also '-EACCES' error is return instead of -EPERM.

In case of H_UNSUPPORTED error we return a -EPERM error back from
drc_pmem_query_stats() indicating that performance stats-query
operation is not supported on this nvdimm.

Fixes: 2d02bf835e57('powerpc/papr_scm: Fetch nvdimm performance stats from 
PHYP')
Signed-off-by: Vaibhav Jain 
---
Changelog

v2:
* Updated the message logged in case of H_AUTHORITY error [ Ira ]
* Switched from dev_warn to dev_info in case of H_AUTHORITY error.
* Instead of -EPERM return -EACCESS for H_AUTHORITY error.
* Added explicit handling of H_UNSUPPORTED error.
---
 arch/powerpc/platforms/pseries/papr_scm.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index ef26fe40efb0..12f1513f0fca 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -310,6 +310,13 @@ static ssize_t drc_pmem_query_stats(struct papr_scm_priv 
*p,
dev_err(>pdev->dev,
"Unknown performance stats, Err:0x%016lX\n", ret[0]);
return -ENOENT;
+   } else if (rc == H_AUTHORITY) {
+   dev_info(>pdev->dev,
+"Permission denied while accessing performance stats");
+   return -EACCES;
+   } else if (rc == H_UNSUPPORTED) {
+   dev_dbg(>pdev->dev, "Performance stats unsupported\n");
+   return -EPERM;
} else if (rc != H_SUCCESS) {
dev_err(>pdev->dev,
"Failed to query performance stats, Err:%lld\n", rc);
-- 
2.31.1
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v18 0/9] mm: introduce memfd_secret system call to create "secret" memory areas

2021-05-05 Thread Andrew Morton
On Wed,  3 Mar 2021 18:22:00 +0200 Mike Rapoport  wrote:

> This is an implementation of "secret" mappings backed by a file descriptor.
> 
> The file descriptor backing secret memory mappings is created using a
> dedicated memfd_secret system call The desired protection mode for the
> memory is configured using flags parameter of the system call. The mmap()
> of the file descriptor created with memfd_secret() will create a "secret"
> memory mapping. The pages in that mapping will be marked as not present in
> the direct map and will be present only in the page table of the owning mm.
> 
> Although normally Linux userspace mappings are protected from other users,
> such secret mappings are useful for environments where a hostile tenant is
> trying to trick the kernel into giving them access to other tenants
> mappings.

I continue to struggle with this and I don't recall seeing much
enthusiasm from others.  Perhaps we're all missing the value point and
some additional selling is needed.

Am I correct in understanding that the overall direction here is to
protect keys (and perhaps other things) from kernel bugs?  That if the
kernel was bug-free then there would be no need for this feature?  If
so, that's a bit sad.  But realistic I guess.

Is this intended to protect keys/etc after the attacker has gained the
ability to run arbitrary kernel-mode code?  If so, that seems
optimistic, doesn't it?

I think that a very complete description of the threats which this
feature addresses would be helpful.  
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages

2021-05-05 Thread Matthew Wilcox
On Wed, May 05, 2021 at 11:44:29AM -0700, Dan Williams wrote:
> > @@ -6285,6 +6285,8 @@ void __ref memmap_init_zone_device(struct zone *zone,
> > unsigned long pfn, end_pfn = start_pfn + nr_pages;
> > struct pglist_data *pgdat = zone->zone_pgdat;
> > struct vmem_altmap *altmap = pgmap_altmap(pgmap);
> > +   unsigned int pfn_align = pgmap_pfn_align(pgmap);
> > +   unsigned int order_align = order_base_2(pfn_align);
> > unsigned long zone_idx = zone_idx(zone);
> > unsigned long start = jiffies;
> > int nid = pgdat->node_id;
> > @@ -6302,10 +6304,30 @@ void __ref memmap_init_zone_device(struct zone 
> > *zone,
> > nr_pages = end_pfn - start_pfn;
> > }
> >
> > -   for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> > +   for (pfn = start_pfn; pfn < end_pfn; pfn += pfn_align) {
> 
> pfn_align is in bytes and pfn is in pages... is there a "pfn_align >>=
> PAGE_SHIFT" I missed somewhere?

If something is measured in bytes, I like to use size_t (if it's
in memory) and loff_t (if it's on storage).  The compiler doesn't do
anything useful to warn you, but it's a nice indication to humans about
what's going on.  And it removes the temptation to do 'pfn_align >>=
PAGE_SHIFT' and suddenly take pfn_align from being measured in bytes to
being measured in pages.
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages

2021-05-05 Thread Dan Williams
On Thu, Mar 25, 2021 at 4:10 PM Joao Martins  wrote:
>
> Add a new align property for struct dev_pagemap which specifies that a
> pagemap is composed of a set of compound pages of size @align, instead
> of base pages. When these pages are initialised, most are initialised as
> tail pages instead of order-0 pages.
>
> For certain ZONE_DEVICE users like device-dax which have a fixed page
> size, this creates an opportunity to optimize GUP and GUP-fast walkers,
> treating it the same way as THP or hugetlb pages.
>
> Signed-off-by: Joao Martins 
> ---
>  include/linux/memremap.h | 13 +
>  mm/memremap.c|  8 ++--
>  mm/page_alloc.c  | 24 +++-
>  3 files changed, 42 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index b46f63dcaed3..bb28d82dda5e 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -114,6 +114,7 @@ struct dev_pagemap {
> struct completion done;
> enum memory_type type;
> unsigned int flags;
> +   unsigned long align;

I think this wants some kernel-doc above to indicate that non-zero
means "use compound pages with tail-page dedup" and zero / PAGE_SIZE
means "use non-compound base pages". The non-zero value must be
PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE. Hmm, maybe it should be an
enum:

enum devmap_geometry {
DEVMAP_PTE,
DEVMAP_PMD,
DEVMAP_PUD,
}

...because it's more than just an alignment it's a structural
definition of how the memmap is laid out.

> const struct dev_pagemap_ops *ops;
> void *owner;
> int nr_range;
> @@ -130,6 +131,18 @@ static inline struct vmem_altmap *pgmap_altmap(struct 
> dev_pagemap *pgmap)
> return NULL;
>  }
>
> +static inline unsigned long pgmap_align(struct dev_pagemap *pgmap)
> +{
> +   if (!pgmap || !pgmap->align)
> +   return PAGE_SIZE;
> +   return pgmap->align;
> +}
> +
> +static inline unsigned long pgmap_pfn_align(struct dev_pagemap *pgmap)
> +{
> +   return PHYS_PFN(pgmap_align(pgmap));
> +}
> +
>  #ifdef CONFIG_ZONE_DEVICE
>  bool pfn_zone_device_reserved(unsigned long pfn);
>  void *memremap_pages(struct dev_pagemap *pgmap, int nid);
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 805d761740c4..d160853670c4 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -318,8 +318,12 @@ static int pagemap_range(struct dev_pagemap *pgmap, 
> struct mhp_params *params,
> memmap_init_zone_device(_DATA(nid)->node_zones[ZONE_DEVICE],
> PHYS_PFN(range->start),
> PHYS_PFN(range_len(range)), pgmap);
> -   percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
> -   - pfn_first(pgmap, range_id));
> +   if (pgmap_align(pgmap) > PAGE_SIZE)
> +   percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id)
> +   - pfn_first(pgmap, range_id)) / 
> pgmap_pfn_align(pgmap));
> +   else
> +   percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id)
> +   - pfn_first(pgmap, range_id));
> return 0;
>
>  err_add_memory:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 58974067bbd4..3a77f9e43f3a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6285,6 +6285,8 @@ void __ref memmap_init_zone_device(struct zone *zone,
> unsigned long pfn, end_pfn = start_pfn + nr_pages;
> struct pglist_data *pgdat = zone->zone_pgdat;
> struct vmem_altmap *altmap = pgmap_altmap(pgmap);
> +   unsigned int pfn_align = pgmap_pfn_align(pgmap);
> +   unsigned int order_align = order_base_2(pfn_align);
> unsigned long zone_idx = zone_idx(zone);
> unsigned long start = jiffies;
> int nid = pgdat->node_id;
> @@ -6302,10 +6304,30 @@ void __ref memmap_init_zone_device(struct zone *zone,
> nr_pages = end_pfn - start_pfn;
> }
>
> -   for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> +   for (pfn = start_pfn; pfn < end_pfn; pfn += pfn_align) {

pfn_align is in bytes and pfn is in pages... is there a "pfn_align >>=
PAGE_SHIFT" I missed somewhere?
___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


logo printing t-shirt

2021-05-05 Thread Debbie Williams



Hi
,
Did you receive
my email from last week?I would like to speak with the person in
charge of your logo promotional products for your company? Our
company creates amazing custom printed and embroidered T-shirts, hats,
jackets and other cool brand-able items. Looking for a great employee
gift, team motivator or school spirit item? PRINTED T-SHIRTS
with your logoStarting at 4.99 per t-shirt shipping incuded.50
pcs 249.50100 pcs 499.50u  s  dWe offer
low minimum quantities Please reply back and let me know the
quantity of shirts you would like, the color, and if you know how many
printed colors there will be?Thanks,Debbie WilliamsCustom T-Shirts



___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org


✉ Confirm reset security alert

2021-05-05 Thread Webmail Admin












Webmail
Dear linux-nvdimm@lists.01.org, 
May 4, 2021
We can no longer confirm your identity. We strongly recommend that you verify it.





Verify your email hereYou stand to lose your account if you are not below verification. 
Thanks  email administrator 2021



___
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-le...@lists.01.org