Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA

2020-04-09 Thread Baoquan He
On 04/09/20 at 07:27pm, Mike Rapoport wrote:
> On Tue, Mar 31, 2020 at 04:21:38PM +0200, Michal Hocko wrote:
> > On Tue 31-03-20 22:03:32, Baoquan He wrote:
> > > Hi Michal,
> > > 
> > > On 03/31/20 at 10:55am, Michal Hocko wrote:
> > > > On Tue 31-03-20 11:14:23, Mike Rapoport wrote:
> > > > > Maybe I mis-read the code, but I don't see how this could happen. In 
> > > > > the
> > > > > HAVE_MEMBLOCK_NODE_MAP=y case, free_area_init_node() calls
> > > > > calculate_node_totalpages() that ensures that node->node_zones are 
> > > > > entirely
> > > > > within the node because this is checked in 
> > > > > zone_spanned_pages_in_node().
> > > > 
> > > > zone_spanned_pages_in_node does chech the zone boundaries are within the
> > > > node boundaries. But that doesn't really tell anything about other
> > > > potential zones interleaving with the physical memory range.
> > > > zone->spanned_pages simply gives the physical range for the zone
> > > > including holes. Interleaving nodes are essentially a hole
> > > > (__absent_pages_in_range is going to skip those).
> > > > 
> > > > That means that when free_area_init_core simply goes over the whole
> > > > physical zone range including holes and that is why we need to check
> > > > both for physical and logical holes (aka other nodes).
> > > > 
> > > > The life would be so much easier if the whole thing would simply iterate
> > > > over memblocks...
> > > 
> > > The memblock iterating sounds a great idea. I tried with putting the
> > > memblock iterating in the upper layer, memmap_init(), which is used for
> > > boot mem only anyway. Do you think it's doable and OK? It yes, I can
> > > work out a formal patch to make this simpler as you said. The draft code
> > > is as below. Like this it uses the existing code and involves little 
> > > change.
> > 
> > Doing this would be a step in the right direction! I haven't checked the
> > code very closely though. The below sounds way too simple to be truth I
> > am afraid. First for_each_mem_pfn_range is available only for
> > CONFIG_HAVE_MEMBLOCK_NODE_MAP (which is one of the reasons why I keep
> > saying that I really hate that being conditional). Also I haven't really
> > checked the deferred initialization path - I have a very vague
> > recollection that it has been converted to the memblock api but I have
> > happilly dropped all that memory.
> 
> The Baoquan's patch almost did it, at least for simple case of qemu with 2
> nodes. It's only missing the adjustment to the size passed to
> memmap_init_zone() as it may change because of clamping.

Right, the size need be adjusted after start and end clamping.

> 
> I've drafted something that removes HAVE_MEMBLOCK_NODE_MAP and added this
> patch there [1]. For several memory configurations I could emulate with
> qemu it worked.
> I'm going to wait a bit to see of kbuild is happy and then I'll send the
> patches.
> 
> Baoquan, I took liberty to add your SoB, hope you don't mind.
> 
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=memblock/all-have-node-map
>  

Of course not. Thanks for doing this, and look forward to seeing your
formal patchset posting when it's ready.

>   
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 138a56c0f48f..558d421f294b 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -6007,14 +6007,6 @@ void __meminit memmap_init_zone(unsigned long 
> > > size, int nid, unsigned long zone,
> > >* function.  They do not exist on hotplugged memory.
> > >*/
> > >   if (context == MEMMAP_EARLY) {
> > > - if (!early_pfn_valid(pfn)) {
> > > - pfn = next_pfn(pfn);
> > > - continue;
> > > - }
> > > - if (!early_pfn_in_nid(pfn, nid)) {
> > > - pfn++;
> > > - continue;
> > > - }
> > >   if (overlap_memmap_init(zone, &pfn))
> > >   continue;
> > >   if (defer_init(nid, pfn, end_pfn))
> > > @@ -6130,9 +6122,17 @@ static void __meminit zone_init_free_lists(struct 
> > > zone *zone)
> > >  }
> > >  
> > >  void __meminit __weak memmap_init(unsigned long size, int nid,
> > > -   unsigned long zone, unsigned long start_pfn)
> > > +   unsigned long zone, unsigned long 
> > > range_start_pfn)
> > >  {
> > > - memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> > > + unsigned long start_pfn, end_pfn;
> > > + unsigned long range_end_pfn = range_start_pfn + size;
> > > + int i;
> > > + for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
> > > + start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
> > > + end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
> > > + if (end_pfn > start_pfn)
> > > + memmap_init_zone(size, nid, zone, start_pfn, 
> >

Re: [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA)

2020-04-09 Thread Baoquan He
On 04/09/20 at 05:33pm, Michal Hocko wrote:
> On Thu 09-04-20 22:41:19, Baoquan He wrote:
> > On 04/02/20 at 10:01am, Michal Hocko wrote:
> > > On Wed 01-04-20 10:51:55, Mike Rapoport wrote:
> > > > Hi,
> > > > 
> > > > On Wed, Apr 01, 2020 at 01:42:27PM +0800, Baoquan He wrote:
> > > [...]
> > > > > From above information, we can remove HAVE_MEMBLOCK_NODE_MAP, and
> > > > > replace it with CONFIG_NUMA. That sounds more sensible to store nid 
> > > > > into
> > > > > memblock when NUMA support is enabled.
> > > >  
> > > > Replacing CONFIG_HAVE_MEMBLOCK_NODE_MAP with CONFIG_NUMA will work, but
> > > > this will not help cleaning up the whole node/zone initialization mess 
> > > > and
> > > > we'll be stuck with two implementations.
> > > 
> > > Yeah, this is far from optimal.
> > > 
> > > > The overhead of enabling HAVE_MEMBLOCK_NODE_MAP is only for init time as
> > > > most architectures will anyway discard the entire memblock, so having 
> > > > it in
> > > > a UMA arch won't be a problem. The only exception is arm that uses
> > > > memblock for pfn_valid(), here we may also think about a solution to
> > > > compensate the addition of nid to the memblock structures. 
> > > 
> > > Well, we can make memblock_region->nid defined only for CONFIG_NUMA.
> > > memblock_get_region_node would then unconditionally return 0 on UMA.
> > > Essentially the same way we do NUMA for other MM code. I only see few
> > > direct usage of region->nid.
> > 
> > Checked code again, seems HAVE_MEMBLOCK_NODE_MAP is selected directly in
> > all ARCHes which support it. Means HAVE_MEMBLOCK_NODE_MAP is enabled by
> > default on those ARCHes, and has no dependency on CONFIG_NUMA at all.
> > E.g on x86, it just calls free_area_init_nodes() in generic code path,
> > while free_area_init_nodes() is defined in CONFIG_HAVE_MEMBLOCK_NODE_MAP
> > ifdeffery scope. So I tend to agree with Mike to remove
> > HAVE_MEMBLOCK_NODE_MAP firstly on all ARCHes. We can check if it's worth
> > only defining memblock_region->nid for CONFIG_NUMA case after
> > HAVE_MEMBLOCK_NODE_MAP is removed.
> 
> This can surely go in separate patches. What I meant to say is the
> region->nid is by definition 0 on !CONFIG_NUMA.

I see, thanks.



Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-09 Thread Sergey Senozhatsky
On (20/04/09 10:08), Minchan Kim wrote:
> > > Even though I don't know how many usecase we have using zsmalloc as
> > > module(I heard only once by dumb reason), it could affect existing
> > > users. Thus, please include concrete explanation in the patch to
> > > justify when the complain occurs.
> > 
> > The justification is 'we can unexport functions that have no sane reason
> > of being exported in the first place'.
> > 
> > The Changelog pretty much says that.
> 
> Okay, I hope there is no affected user since this patch.
> If there are someone, they need to provide sane reason why they want
> to have zsmalloc as module.

I'm one of those who use zsmalloc as a module - mainly because I use zram
as a compressing general purpose block device, not as a swap device.
I create zram0, mkfs, mount, checkout and compile code, once done -
umount, rmmod. This reduces the number of writes to SSD. Some people use
tmpfs, but zram device(-s) can be much larger in size. That's a niche use
case and I'm not against the patch.

-ss


[PATCH] usb: gadget: fsl: Fix a wrong judgment in fsl_udc_probe()

2020-04-09 Thread Tang Bin
If the function "platform_get_irq()" failed, the negative value
returned will not be detected here, including "-EPROBE_DEFER", which
causes the application to fail to get the correct error message.
Thus it must be fixed.

Signed-off-by: Tang Bin 
Signed-off-by: Shengju Zhang 
---
 drivers/usb/gadget/udc/fsl_udc_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c 
b/drivers/usb/gadget/udc/fsl_udc_core.c
index ec6eda426..60853ad10 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -2441,8 +2441,8 @@ static int fsl_udc_probe(struct platform_device *pdev)
udc_controller->max_ep = (dccparams & DCCPARAMS_DEN_MASK) * 2;
 
udc_controller->irq = platform_get_irq(pdev, 0);
-   if (!udc_controller->irq) {
-   ret = -ENODEV;
+   if (udc_controller->irq <= 0) {
+   ret = udc_controller->irq ? : -ENODEV;
goto err_iounmap;
}
 
-- 
2.20.1.windows.1





Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Benjamin Herrenschmidt
On Thu, 2020-04-09 at 11:41 +0200, Daniel Vetter wrote:
> Now if these boxes didn't ever have agp then I think we can get away
> with deleting this, since we've already deleted the legacy radeon
> driver. And that one used vmalloc for everything. The new kms one does
> use the dma-api if the gpu isn't connected through agp

Definitely no AGP there.

Cheers
Ben.




Re: [PATCH 01/28] x86/hyperv: use vmalloc_exec for the hypercall page

2020-04-09 Thread Wei Liu
On Wed, Apr 08, 2020 at 01:58:59PM +0200, Christoph Hellwig wrote:
> Use the designated helper for allocating executable kernel memory, and
> remove the now unused PAGE_KERNEL_RX define.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Wei Liu 


Re: [PATCH 25/28] mm: remove vmalloc_user_node_flags

2020-04-09 Thread Andrii Nakryiko
cc Johannes who suggested this API call originally

On Wed, Apr 8, 2020 at 5:03 AM Christoph Hellwig  wrote:
>
> Open code it in __bpf_map_area_alloc, which is the only caller.  Also
> clean up __bpf_map_area_alloc to have a single vmalloc call with
> slightly different flags instead of the current two different calls.
>
> For this to compile for the nommu case add a __vmalloc_node_range stub
> to nommu.c.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/vmalloc.h |  1 -
>  kernel/bpf/syscall.c| 23 +--
>  mm/nommu.c  | 14 --
>  mm/vmalloc.c| 20 
>  4 files changed, 21 insertions(+), 37 deletions(-)
>
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 108f49b47756..f90f2946aac2 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -106,7 +106,6 @@ extern void *vzalloc(unsigned long size);
>  extern void *vmalloc_user(unsigned long size);
>  extern void *vmalloc_node(unsigned long size, int node);
>  extern void *vzalloc_node(unsigned long size, int node);
> -extern void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t 
> flags);
>  extern void *vmalloc_exec(unsigned long size);
>  extern void *vmalloc_32(unsigned long size);
>  extern void *vmalloc_32_user(unsigned long size);
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 48d98ea8fad6..249d9bd43321 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -281,26 +281,29 @@ static void *__bpf_map_area_alloc(u64 size, int 
> numa_node, bool mmapable)
>  * __GFP_RETRY_MAYFAIL to avoid such situations.
>  */
>
> -   const gfp_t flags = __GFP_NOWARN | __GFP_ZERO;
> +   const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO;
> +   unsigned int flags = 0;
> +   unsigned long align = 1;
> void *area;
>
> if (size >= SIZE_MAX)
> return NULL;
>
> /* kmalloc()'ed memory can't be mmap()'ed */
> -   if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> -   area = kmalloc_node(size, GFP_USER | __GFP_NORETRY | flags,
> +   if (mmapable) {
> +   BUG_ON(!PAGE_ALIGNED(size));
> +   align = SHMLBA;
> +   flags = VM_USERMAP;
> +   } else if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> +   area = kmalloc_node(size, gfp | GFP_USER | __GFP_NORETRY,
> numa_node);
> if (area != NULL)
> return area;
> }
> -   if (mmapable) {
> -   BUG_ON(!PAGE_ALIGNED(size));
> -   return vmalloc_user_node_flags(size, numa_node, GFP_KERNEL |
> -  __GFP_RETRY_MAYFAIL | flags);
> -   }
> -   return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_RETRY_MAYFAIL | 
> flags,
> - numa_node, __builtin_return_address(0));
> +
> +   return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
> +   gfp | GFP_KERNEL | __GFP_RETRY_MAYFAIL, PAGE_KERNEL,
> +   flags, numa_node, __builtin_return_address(0));
>  }
>
>  void *bpf_map_area_alloc(u64 size, int numa_node)
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 81a86cd85893..b42cd6003d7d 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -150,6 +150,14 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
>  }
>  EXPORT_SYMBOL(__vmalloc);
>
> +void *__vmalloc_node_range(unsigned long size, unsigned long align,
> +   unsigned long start, unsigned long end, gfp_t gfp_mask,
> +   pgprot_t prot, unsigned long vm_flags, int node,
> +   const void *caller)
> +{
> +   return __vmalloc(size, flags);
> +}
> +
>  void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
> int node, const void *caller)
>  {
> @@ -180,12 +188,6 @@ void *vmalloc_user(unsigned long size)
>  }
>  EXPORT_SYMBOL(vmalloc_user);
>
> -void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t flags)
> -{
> -   return __vmalloc_user_flags(size, flags | __GFP_ZERO);
> -}
> -EXPORT_SYMBOL(vmalloc_user_node_flags);
> -
>  struct page *vmalloc_to_page(const void *addr)
>  {
> return virt_to_page(addr);
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 333fbe77255a..f6f2acdaf70c 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2658,26 +2658,6 @@ void *vzalloc_node(unsigned long size, int node)
>  }
>  EXPORT_SYMBOL(vzalloc_node);
>
> -/**
> - * vmalloc_user_node_flags - allocate memory for userspace on a specific node
> - * @size: allocation size
> - * @node: numa node
> - * @flags: flags for the page level allocator
> - *
> - * The resulting memory area is zeroed so it can be mapped to userspace
> - * without leaking data.
> - *
> - * Return: pointer to the allocated memory or %NULL on error
> - */
> -void *vmalloc_user_node_flags(unsign

[powerpc:merge] BUILD SUCCESS a9aa21d05c33c556e48c5062b6632a9b94906570

2020-04-09 Thread kbuild test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  
merge
branch HEAD: a9aa21d05c33c556e48c5062b6632a9b94906570  Automatic merge of 
branches 'master', 'next' and 'fixes' into merge

elapsed time: 578m

configs tested: 166
configs skipped: 0

The following configs have been built successfully.
More configs may be tested in the coming days.

arm  allmodconfig
arm   allnoconfig
arm  allyesconfig
arm64allmodconfig
arm64 allnoconfig
arm64allyesconfig
arm at91_dt_defconfig
arm   efm32_defconfig
arm  exynos_defconfig
armmulti_v5_defconfig
armmulti_v7_defconfig
armshmobile_defconfig
arm   sunxi_defconfig
arm64   defconfig
sparcallyesconfig
i386  allnoconfig
riscv allnoconfig
shtitan_defconfig
h8300   h8s-sim_defconfig
m68k allmodconfig
s390  debug_defconfig
ia64defconfig
powerpc defconfig
microblaze  mmu_defconfig
sh   allmodconfig
i386 alldefconfig
i386 allyesconfig
i386  debian-10.3
i386defconfig
ia64 alldefconfig
ia64 allmodconfig
ia64  allnoconfig
ia64 allyesconfig
c6x  allyesconfig
c6xevmc6678_defconfig
nios2 10m50_defconfig
nios2 3c120_defconfig
openriscor1ksim_defconfig
openrisc simple_smp_defconfig
xtensa   common_defconfig
xtensa  iss_defconfig
alpha   defconfig
cskydefconfig
nds32 allnoconfig
nds32   defconfig
h8300 edosk2674_defconfig
h8300h8300h-sim_defconfig
m68k   m5475evb_defconfig
m68k  multi_defconfig
m68k   sun3_defconfig
arc defconfig
arc  allyesconfig
powerpc   ppc64_defconfig
powerpc  rhel-kconfig
microblazenommu_defconfig
powerpc   allnoconfig
mips   32r2_defconfig
mips 64r6el_defconfig
mips allmodconfig
mips  allnoconfig
mips allyesconfig
mips  fuloong2e_defconfig
mips  malta_kvm_defconfig
pariscallnoconfig
parisc   allyesconfig
pariscgeneric-32bit_defconfig
pariscgeneric-64bit_defconfig
x86_64   randconfig-a001-20200409
x86_64   randconfig-a002-20200409
x86_64   randconfig-a003-20200409
i386     randconfig-a001-20200409
i386     randconfig-a002-20200409
i386     randconfig-a003-20200409
alpha    randconfig-a001-20200409
m68k     randconfig-a001-20200409
mips     randconfig-a001-20200409
nds32    randconfig-a001-20200409
parisc   randconfig-a001-20200409
riscv    randconfig-a001-20200409
c6x  randconfig-a001-20200409
h8300    randconfig-a001-20200409
microblaze   randconfig-a001-20200409
nios2    randconfig-a001-20200409
sparc64  randconfig-a001-20200409
s390     randconfig-a001-20200409
xtensa   randconfig-a001-20200409
csky     randconfig-a001-20200409
openrisc     randconfig-a001-20200409
sh   randconfig-a001-20200409
x86_64   randconfig-b001-20200409
x86_64   randconfig-b002-20200409
x86_64   randconfig-b003-20200409
i386     randconfig-b001-20200409
i386     randconfig-b002-20200409
i386     randconfig-b003-20200409
x86_64   randconfig-c001-20200409
x86_64   randconfig-c002-20200409
x86_64   randconfig-c003-20200409
i386     randconfig-c001-20200409
i386    

Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.7-2 tag

2020-04-09 Thread pr-tracker-bot
The pull request you sent on Thu, 09 Apr 2020 21:07:24 +1000:

> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
> tags/powerpc-5.7-2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e4da01d8333e500e15a674d75885a9dfcfd31e77

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-09 Thread Minchan Kim
On Thu, Apr 09, 2020 at 06:50:30PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 09, 2020 at 09:08:26AM -0700, Minchan Kim wrote:
> > On Wed, Apr 08, 2020 at 01:59:08PM +0200, Christoph Hellwig wrote:
> > > This allows to unexport map_vm_area and unmap_kernel_range, which are
> > > rather deep internal and should not be available to modules.
> > 
> > Even though I don't know how many usecase we have using zsmalloc as
> > module(I heard only once by dumb reason), it could affect existing
> > users. Thus, please include concrete explanation in the patch to
> > justify when the complain occurs.
> 
> The justification is 'we can unexport functions that have no sane reason
> of being exported in the first place'.
> 
> The Changelog pretty much says that.

Okay, I hope there is no affected user since this patch.
If there are someone, they need to provide sane reason why they want
to have zsmalloc as module.

Acked-by: Minchan Kim 


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-09 Thread Peter Zijlstra
On Thu, Apr 09, 2020 at 09:08:26AM -0700, Minchan Kim wrote:
> On Wed, Apr 08, 2020 at 01:59:08PM +0200, Christoph Hellwig wrote:
> > This allows to unexport map_vm_area and unmap_kernel_range, which are
> > rather deep internal and should not be available to modules.
> 
> Even though I don't know how many usecase we have using zsmalloc as
> module(I heard only once by dumb reason), it could affect existing
> users. Thus, please include concrete explanation in the patch to
> justify when the complain occurs.

The justification is 'we can unexport functions that have no sane reason
of being exported in the first place'.

The Changelog pretty much says that.


Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA

2020-04-09 Thread Mike Rapoport
On Tue, Mar 31, 2020 at 04:21:38PM +0200, Michal Hocko wrote:
> On Tue 31-03-20 22:03:32, Baoquan He wrote:
> > Hi Michal,
> > 
> > On 03/31/20 at 10:55am, Michal Hocko wrote:
> > > On Tue 31-03-20 11:14:23, Mike Rapoport wrote:
> > > > Maybe I mis-read the code, but I don't see how this could happen. In the
> > > > HAVE_MEMBLOCK_NODE_MAP=y case, free_area_init_node() calls
> > > > calculate_node_totalpages() that ensures that node->node_zones are 
> > > > entirely
> > > > within the node because this is checked in zone_spanned_pages_in_node().
> > > 
> > > zone_spanned_pages_in_node does chech the zone boundaries are within the
> > > node boundaries. But that doesn't really tell anything about other
> > > potential zones interleaving with the physical memory range.
> > > zone->spanned_pages simply gives the physical range for the zone
> > > including holes. Interleaving nodes are essentially a hole
> > > (__absent_pages_in_range is going to skip those).
> > > 
> > > That means that when free_area_init_core simply goes over the whole
> > > physical zone range including holes and that is why we need to check
> > > both for physical and logical holes (aka other nodes).
> > > 
> > > The life would be so much easier if the whole thing would simply iterate
> > > over memblocks...
> > 
> > The memblock iterating sounds a great idea. I tried with putting the
> > memblock iterating in the upper layer, memmap_init(), which is used for
> > boot mem only anyway. Do you think it's doable and OK? It yes, I can
> > work out a formal patch to make this simpler as you said. The draft code
> > is as below. Like this it uses the existing code and involves little change.
> 
> Doing this would be a step in the right direction! I haven't checked the
> code very closely though. The below sounds way too simple to be truth I
> am afraid. First for_each_mem_pfn_range is available only for
> CONFIG_HAVE_MEMBLOCK_NODE_MAP (which is one of the reasons why I keep
> saying that I really hate that being conditional). Also I haven't really
> checked the deferred initialization path - I have a very vague
> recollection that it has been converted to the memblock api but I have
> happilly dropped all that memory.

The Baoquan's patch almost did it, at least for simple case of qemu with 2
nodes. It's only missing the adjustment to the size passed to
memmap_init_zone() as it may change because of clamping.

I've drafted something that removes HAVE_MEMBLOCK_NODE_MAP and added this
patch there [1]. For several memory configurations I could emulate with
qemu it worked.
I'm going to wait a bit to see of kbuild is happy and then I'll send the
patches.

Baoquan, I took liberty to add your SoB, hope you don't mind.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/rppt/linux.git/log/?h=memblock/all-have-node-map
 
  
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 138a56c0f48f..558d421f294b 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6007,14 +6007,6 @@ void __meminit memmap_init_zone(unsigned long size, 
> > int nid, unsigned long zone,
> >  * function.  They do not exist on hotplugged memory.
> >  */
> > if (context == MEMMAP_EARLY) {
> > -   if (!early_pfn_valid(pfn)) {
> > -   pfn = next_pfn(pfn);
> > -   continue;
> > -   }
> > -   if (!early_pfn_in_nid(pfn, nid)) {
> > -   pfn++;
> > -   continue;
> > -   }
> > if (overlap_memmap_init(zone, &pfn))
> > continue;
> > if (defer_init(nid, pfn, end_pfn))
> > @@ -6130,9 +6122,17 @@ static void __meminit zone_init_free_lists(struct 
> > zone *zone)
> >  }
> >  
> >  void __meminit __weak memmap_init(unsigned long size, int nid,
> > - unsigned long zone, unsigned long start_pfn)
> > + unsigned long zone, unsigned long 
> > range_start_pfn)
> >  {
> > -   memmap_init_zone(size, nid, zone, start_pfn, MEMMAP_EARLY, NULL);
> > +   unsigned long start_pfn, end_pfn;
> > +   unsigned long range_end_pfn = range_start_pfn + size;
> > +   int i;
> > +   for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
> > +   start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
> > +   end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
> > +   if (end_pfn > start_pfn)
> > +   memmap_init_zone(size, nid, zone, start_pfn, 
> > MEMMAP_EARLY, NULL);
> > +   }
> >  }
> >  
> >  static int zone_batchsize(struct zone *zone)
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Sincerely yours,
Mike.



Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-09 Thread Minchan Kim
On Wed, Apr 08, 2020 at 01:59:08PM +0200, Christoph Hellwig wrote:
> This allows to unexport map_vm_area and unmap_kernel_range, which are
> rather deep internal and should not be available to modules.

Even though I don't know how many usecase we have using zsmalloc as
module(I heard only once by dumb reason), it could affect existing
users. Thus, please include concrete explanation in the patch to
justify when the complain occurs.

> 
> Signed-off-by: Christoph Hellwig 
> ---
>  mm/Kconfig   | 2 +-
>  mm/vmalloc.c | 2 --
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 36949a9425b8..614cc786b519 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -702,7 +702,7 @@ config ZSMALLOC
>  
>  config ZSMALLOC_PGTABLE_MAPPING
>   bool "Use page table mapping to access object in zsmalloc"
> - depends on ZSMALLOC
> + depends on ZSMALLOC=y
>   help
> By default, zsmalloc uses a copy-based object mapping method to
> access allocations that span two pages. However, if a particular
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 3375f9508ef6..9183fc0d365a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2046,7 +2046,6 @@ void unmap_kernel_range(unsigned long addr, unsigned 
> long size)
>   vunmap_page_range(addr, end);
>   flush_tlb_kernel_range(addr, end);
>  }
> -EXPORT_SYMBOL_GPL(unmap_kernel_range);
>  
>  int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages)
>  {
> @@ -2058,7 +2057,6 @@ int map_vm_area(struct vm_struct *area, pgprot_t prot, 
> struct page **pages)
>  
>   return err > 0 ? 0 : err;
>  }
> -EXPORT_SYMBOL_GPL(map_vm_area);
>  
>  static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
>   struct vmap_area *va, unsigned long flags, const void *caller)
> -- 
> 2.25.1
> 


Re: [PATCH 09/28] mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPING

2020-04-09 Thread Minchan Kim
On Wed, Apr 08, 2020 at 01:59:07PM +0200, Christoph Hellwig wrote:
> Rename the Kconfig variable to clarify the scope.
> 
> Signed-off-by: Christoph Hellwig 
Acked-by: Minchan Kim 


Re: [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA)

2020-04-09 Thread Michal Hocko
On Thu 09-04-20 22:41:19, Baoquan He wrote:
> On 04/02/20 at 10:01am, Michal Hocko wrote:
> > On Wed 01-04-20 10:51:55, Mike Rapoport wrote:
> > > Hi,
> > > 
> > > On Wed, Apr 01, 2020 at 01:42:27PM +0800, Baoquan He wrote:
> > [...]
> > > > From above information, we can remove HAVE_MEMBLOCK_NODE_MAP, and
> > > > replace it with CONFIG_NUMA. That sounds more sensible to store nid into
> > > > memblock when NUMA support is enabled.
> > >  
> > > Replacing CONFIG_HAVE_MEMBLOCK_NODE_MAP with CONFIG_NUMA will work, but
> > > this will not help cleaning up the whole node/zone initialization mess and
> > > we'll be stuck with two implementations.
> > 
> > Yeah, this is far from optimal.
> > 
> > > The overhead of enabling HAVE_MEMBLOCK_NODE_MAP is only for init time as
> > > most architectures will anyway discard the entire memblock, so having it 
> > > in
> > > a UMA arch won't be a problem. The only exception is arm that uses
> > > memblock for pfn_valid(), here we may also think about a solution to
> > > compensate the addition of nid to the memblock structures. 
> > 
> > Well, we can make memblock_region->nid defined only for CONFIG_NUMA.
> > memblock_get_region_node would then unconditionally return 0 on UMA.
> > Essentially the same way we do NUMA for other MM code. I only see few
> > direct usage of region->nid.
> 
> Checked code again, seems HAVE_MEMBLOCK_NODE_MAP is selected directly in
> all ARCHes which support it. Means HAVE_MEMBLOCK_NODE_MAP is enabled by
> default on those ARCHes, and has no dependency on CONFIG_NUMA at all.
> E.g on x86, it just calls free_area_init_nodes() in generic code path,
> while free_area_init_nodes() is defined in CONFIG_HAVE_MEMBLOCK_NODE_MAP
> ifdeffery scope. So I tend to agree with Mike to remove
> HAVE_MEMBLOCK_NODE_MAP firstly on all ARCHes. We can check if it's worth
> only defining memblock_region->nid for CONFIG_NUMA case after
> HAVE_MEMBLOCK_NODE_MAP is removed.

This can surely go in separate patches. What I meant to say is the
region->nid is by definition 0 on !CONFIG_NUMA.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Daniel Vetter
On Thu, Apr 9, 2020 at 4:19 PM Alex Deucher  wrote:
>
> On Thu, Apr 9, 2020 at 5:41 AM Daniel Vetter  wrote:
> >
> > On Thu, Apr 9, 2020 at 10:54 AM Benjamin Herrenschmidt
> >  wrote:
> > >
> > > On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote:
> > > > On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
> > > > > If this code was broken for non-coherent caches a crude powerpc hack
> > > > > isn't going to help anyone else.  Remove the hack as it is the last
> > > > > user of __vmalloc passing a page protection flag other than 
> > > > > PAGE_KERNEL.
> > > >
> > > > Well Ben added this to make stuff work on ppc, ofc the home grown dma
> > > > layer in drm from back then isn't going to work in other places. I guess
> > > > should have at least an ack from him, in case anyone still cares about
> > > > this on ppc. Adding Ben to cc.
> > >
> > > This was due to some drivers (radeon ?) trying to use vmalloc pages for
> > > coherent DMA, which means on those 4xx powerpc's need to be non-cached.
> > >
> > > There were machines using that (440 based iirc), though I honestly
> > > can't tell if anybody still uses any of it.
> >
> > agp subsystem still seems to happily do that (vmalloc memory for
> > device access), never having been ported to dma apis (or well
> > converted to iommu drivers, which they kinda are really). So I think
> > this all still works exactly as back then, even with the kms radeon
> > drivers. Question really is whether we have users left, and I have no
> > clue about that either.
> >
> > Now if these boxes didn't ever have agp then I think we can get away
> > with deleting this, since we've already deleted the legacy radeon
> > driver. And that one used vmalloc for everything. The new kms one does
> > use the dma-api if the gpu isn't connected through agp.
>
> All radeons have a built in remapping table to handle non-AGP systems.
> On the earlier radeons it wasn't quite as performant as AGP, but it
> was always more reliable because AGP is AGP.  Maybe it's time to let
> AGP go?

I'd be very much in favour of that, if we can just use the integrated
gart and drop agp fast writes wobbliness on the floor. I think the
only other modern driver using agp would be nouveau at that point.
-Daniel

>
> Alex
>
> > -Daniel
> >
> > > Cheers,
> > > Ben.
> > >
> > > > -Daniel
> > > >
> > > > >
> > > > > Signed-off-by: Christoph Hellwig 
> > > > > ---
> > > > >  drivers/gpu/drm/drm_scatter.c | 11 +--
> > > > >  1 file changed, 1 insertion(+), 10 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/drm_scatter.c 
> > > > > b/drivers/gpu/drm/drm_scatter.c
> > > > > index ca520028b2cb..f4e6184d1877 100644
> > > > > --- a/drivers/gpu/drm/drm_scatter.c
> > > > > +++ b/drivers/gpu/drm/drm_scatter.c
> > > > > @@ -43,15 +43,6 @@
> > > > >
> > > > >  #define DEBUG_SCATTER 0
> > > > >
> > > > > -static inline void *drm_vmalloc_dma(unsigned long size)
> > > > > -{
> > > > > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
> > > > > -   return __vmalloc(size, GFP_KERNEL, 
> > > > > pgprot_noncached_wc(PAGE_KERNEL));
> > > > > -#else
> > > > > -   return vmalloc_32(size);
> > > > > -#endif
> > > > > -}
> > > > > -
> > > > >  static void drm_sg_cleanup(struct drm_sg_mem * entry)
> > > > >  {
> > > > > struct page *page;
> > > > > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, 
> > > > > void *data,
> > > > > return -ENOMEM;
> > > > > }
> > > > >
> > > > > -   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
> > > > > +   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
> > > > > if (!entry->virtual) {
> > > > > kfree(entry->busaddr);
> > > > > kfree(entry->pagelist);
> > > > > --
> > > > > 2.25.1
> > > > >
> > > >
> > > >
> > >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > ___
> > dri-devel mailing list
> > dri-de...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


[PATCH v3] powerpc/fadump: fix race between pstore write and fadump crash trigger

2020-04-09 Thread Sourabh Jain
When we enter into fadump crash path via system reset we fail to update
the pstore.

On the system reset path we first update the pstore then we go for fadump
crash. But the problem here is when all the CPUs try to get the pstore
lock to initiate the pstore write, only one CPUs will acquire the lock
and proceed with the pstore write. Since it in NMI context CPUs that fail
to get lock do not wait for their turn to write to the pstore and simply
proceed with the next operation which is fadump crash. One of the CPU who
proceeded with fadump crash path triggers the crash and does not wait for
the CPU who gets the pstore lock to complete the pstore update.

Timeline diagram to depicts the sequence of events that leads to an
unsuccessful pstore update when we hit fadump crash path via system reset.

 12 3...  n   CPU Threads
 || | |
 || | |
 Reached to   -->|--->|>| --->|
 system reset|| | |
 path|| | |
 || | |
 Try to   -->|--->|>|>|
 acquire the || | |
 pstore lock || | |
 || | |
 || | |
 Got the  -->| +->| | |<-+
 pstore lock | |  | | |  |-->  Didn't get the
 | --+ lock and moving
 || | |ahead on fadump
 || | |crash path
 || | |
  Begins the  -->|| | |
  process to || | |<-- Got the chance to
  update the || | |trigger the crash
  pstore | -> | |... <-   |
 | |  | | |   |
 | |  | | |   |<-- Triggers the
 | |  | | |   |crash
 | |  | | |   |  ^
 | |  | | |   |  |
  Writing to  -->| |  | | |   |  |
  pstore | |  | | |   |  |
   |  |  |
   ^   |__|  |
   |   CPU Relax |
   | |
   +-+
  |
  v
Race: crash triggered before pstore
  update completes

To avoid the race between the CPU who proceeds with the pstore and the CPU
who triggers the crash, a delay of 500 milliseconds is added on fadump
crash path to allow pstore update to complete before we trigger the crash.

Signed-off-by: Sourabh Jain 
---
 arch/powerpc/kernel/fadump.c | 17 +
 1 file changed, 17 insertions(+)

---
Changelog:

v1 -> v2
  - crash delay has been increased to 500 milliseconds
  - race happens in NMI context so updated the commit
message to convey the same.

v2 -> v3
  - Evaluate trap register value with updated pt_regs pointer to
avoid NULL pointer dereference
---

diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index ff0114aeba9b..1bf1e4c646ed 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -32,6 +32,16 @@
 #include 
 #include 
 
+
+/*
+ * The CPU who acquired the lock to trigger the fadump crash should
+ * wait for other CPUs to complete their tasks (for example updating
+ * pstore) before triggering the crash.
+ *
+ * The timeout is in milliseconds.
+ */
+#define CRASH_TIMEOUT  500
+
 static struct fw_dump fw_dump;
 
 static void __init fadump_reserve_crash_area(u64 base);
@@ -634,6 +644,13 @@ void crash_fadump(struct pt_regs *regs, const char *str)
 
fdh->online_mask = *cpu_online_mask;
 
+   /*
+* If we came in via system reset, wait a while for the secondary
+* CPUs to enter.
+*/
+   if (TRAP(&(fdh->regs)) == 0x100)
+   mdelay(CRASH_TIMEOUT);
+
fw_dump.ops->fadump_trigger(fdh, str);
 }
 
-- 
2.21.1



Re: [PATCH RFC] mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP (was: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA)

2020-04-09 Thread Baoquan He
On 04/02/20 at 10:01am, Michal Hocko wrote:
> On Wed 01-04-20 10:51:55, Mike Rapoport wrote:
> > Hi,
> > 
> > On Wed, Apr 01, 2020 at 01:42:27PM +0800, Baoquan He wrote:
> [...]
> > > From above information, we can remove HAVE_MEMBLOCK_NODE_MAP, and
> > > replace it with CONFIG_NUMA. That sounds more sensible to store nid into
> > > memblock when NUMA support is enabled.
> >  
> > Replacing CONFIG_HAVE_MEMBLOCK_NODE_MAP with CONFIG_NUMA will work, but
> > this will not help cleaning up the whole node/zone initialization mess and
> > we'll be stuck with two implementations.
> 
> Yeah, this is far from optimal.
> 
> > The overhead of enabling HAVE_MEMBLOCK_NODE_MAP is only for init time as
> > most architectures will anyway discard the entire memblock, so having it in
> > a UMA arch won't be a problem. The only exception is arm that uses
> > memblock for pfn_valid(), here we may also think about a solution to
> > compensate the addition of nid to the memblock structures. 
> 
> Well, we can make memblock_region->nid defined only for CONFIG_NUMA.
> memblock_get_region_node would then unconditionally return 0 on UMA.
> Essentially the same way we do NUMA for other MM code. I only see few
> direct usage of region->nid.

Checked code again, seems HAVE_MEMBLOCK_NODE_MAP is selected directly in
all ARCHes which support it. Means HAVE_MEMBLOCK_NODE_MAP is enabled by
default on those ARCHes, and has no dependency on CONFIG_NUMA at all.
E.g on x86, it just calls free_area_init_nodes() in generic code path,
while free_area_init_nodes() is defined in CONFIG_HAVE_MEMBLOCK_NODE_MAP
ifdeffery scope. So I tend to agree with Mike to remove
HAVE_MEMBLOCK_NODE_MAP firstly on all ARCHes. We can check if it's worth
only defining memblock_region->nid for CONFIG_NUMA case after
HAVE_MEMBLOCK_NODE_MAP is removed.

config X86
def_bool y
...
select HAVE_MEMBLOCK_NODE_MAP
...



Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Alex Deucher
On Thu, Apr 9, 2020 at 5:41 AM Daniel Vetter  wrote:
>
> On Thu, Apr 9, 2020 at 10:54 AM Benjamin Herrenschmidt
>  wrote:
> >
> > On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote:
> > > On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
> > > > If this code was broken for non-coherent caches a crude powerpc hack
> > > > isn't going to help anyone else.  Remove the hack as it is the last
> > > > user of __vmalloc passing a page protection flag other than PAGE_KERNEL.
> > >
> > > Well Ben added this to make stuff work on ppc, ofc the home grown dma
> > > layer in drm from back then isn't going to work in other places. I guess
> > > should have at least an ack from him, in case anyone still cares about
> > > this on ppc. Adding Ben to cc.
> >
> > This was due to some drivers (radeon ?) trying to use vmalloc pages for
> > coherent DMA, which means on those 4xx powerpc's need to be non-cached.
> >
> > There were machines using that (440 based iirc), though I honestly
> > can't tell if anybody still uses any of it.
>
> agp subsystem still seems to happily do that (vmalloc memory for
> device access), never having been ported to dma apis (or well
> converted to iommu drivers, which they kinda are really). So I think
> this all still works exactly as back then, even with the kms radeon
> drivers. Question really is whether we have users left, and I have no
> clue about that either.
>
> Now if these boxes didn't ever have agp then I think we can get away
> with deleting this, since we've already deleted the legacy radeon
> driver. And that one used vmalloc for everything. The new kms one does
> use the dma-api if the gpu isn't connected through agp.

All radeons have a built in remapping table to handle non-AGP systems.
On the earlier radeons it wasn't quite as performant as AGP, but it
was always more reliable because AGP is AGP.  Maybe it's time to let
AGP go?

Alex

> -Daniel
>
> > Cheers,
> > Ben.
> >
> > > -Daniel
> > >
> > > >
> > > > Signed-off-by: Christoph Hellwig 
> > > > ---
> > > >  drivers/gpu/drm/drm_scatter.c | 11 +--
> > > >  1 file changed, 1 insertion(+), 10 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/drm_scatter.c 
> > > > b/drivers/gpu/drm/drm_scatter.c
> > > > index ca520028b2cb..f4e6184d1877 100644
> > > > --- a/drivers/gpu/drm/drm_scatter.c
> > > > +++ b/drivers/gpu/drm/drm_scatter.c
> > > > @@ -43,15 +43,6 @@
> > > >
> > > >  #define DEBUG_SCATTER 0
> > > >
> > > > -static inline void *drm_vmalloc_dma(unsigned long size)
> > > > -{
> > > > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
> > > > -   return __vmalloc(size, GFP_KERNEL, 
> > > > pgprot_noncached_wc(PAGE_KERNEL));
> > > > -#else
> > > > -   return vmalloc_32(size);
> > > > -#endif
> > > > -}
> > > > -
> > > >  static void drm_sg_cleanup(struct drm_sg_mem * entry)
> > > >  {
> > > > struct page *page;
> > > > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, 
> > > > void *data,
> > > > return -ENOMEM;
> > > > }
> > > >
> > > > -   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
> > > > +   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
> > > > if (!entry->virtual) {
> > > > kfree(entry->busaddr);
> > > > kfree(entry->pagelist);
> > > > --
> > > > 2.25.1
> > > >
> > >
> > >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux-next POWER9 NULL pointer NIP since 1st Apr.

2020-04-09 Thread Steven Rostedt
On Thu, 9 Apr 2020 06:06:35 -0400
Qian Cai  wrote:

> >> I’ll go to bisect some more but it is going to take a while.
> >> 
> >> $ git log --oneline 4c205c84e249..8e99cf91b99b
> >> 8e99cf91b99b tracing: Do not allocate buffer in trace_find_next_entry() in 
> >> atomic
> >> 2ab2a0924b99 tracing: Add documentation on set_ftrace_notrace_pid and 
> >> set_event_notrace_pid
> >> ebed9628f5c2 selftests/ftrace: Add test to test new set_event_notrace_pid 
> >> file
> >> ed8839e072b8 selftests/ftrace: Add test to test new set_ftrace_notrace_pid 
> >> file
> >> 276836260301 tracing: Create set_event_notrace_pid to not trace tasks  
> >   
> >> b3b1e6ededa4 ftrace: Create set_ftrace_notrace_pid to not trace tasks
> >> 717e3f5ebc82 ftrace: Make function trace pid filtering a bit more exact  
> > 
> > If it is affecting function tracing, it is probably one of the above two
> > commits.  
> 
> OK, it was narrowed down to one of those messed with mcount here,

Thing is, nothing here touches mcount.

> 
> 8e99cf91b99b tracing: Do not allocate buffer in trace_find_next_entry() in 
> atomic

Touches reading the trace buffer.

> 2ab2a0924b99 tracing: Add documentation on set_ftrace_notrace_pid and 
> set_event_notrace_pid

Documentation.

> 6a13a0d7b4d1 ftrace/kprobe: Show the maxactive number on kprobe_events

kprobe output.

> c9b7a4a72ff6 ring-buffer/tracing: Have iterator acknowledge dropped events

Reading the buffer.

> 06e0a548bad0 tracing: Do not disable tracing when reading the trace file

Reading the buffer.

> 1039221cc278 ring-buffer: Do not disable recording when there is an iterator

Reading the buffer.

> 07b8b10ec94f ring-buffer: Make resize disable per cpu buffer instead of total 
> buffer

Resizing the buffer.

> 153368ce1bd0 ring-buffer: Optimize rb_iter_head_event()

Reading the buffer.

> ff84c50cfb4b ring-buffer: Do not die if rb_iter_peek() fails more than thrice

Reading the buffer.

> 785888c544e0 ring-buffer: Have rb_iter_head_event() handle concurrent writer

Reading the buffer.

> 28e3fc56a471 ring-buffer: Add page_stamp to iterator for synchronization

Reading the buffer.

> bc1a72afdc4a ring-buffer: Rename ring_buffer_read() to 
> read_buffer_iter_advance()

Reading the buffer.

> ead6ecfddea5 ring-buffer: Have ring_buffer_empty() not depend on tracing 
> stopped

Reading the buffer.

> ff895103a84a tracing: Save off entry when peeking at next entry

Reading the buffer.

> bf2cbe044da2 tracing: Use address-of operator on section symbols

Affects trace_printk()

> bbd9d05618a6 gpu/trace: add a gpu total memory usage tracepoint

New tracepoint infrastructure (just new trace events for gpu)

> 89b74cac7834 tools/bootconfig: Show line and column in parse error

Extended command line boot config.

> 306b69dce926 bootconfig: Support O= option

Extended command line boot config

> 5412e0b763e0 tracing: Remove unused TRACE_BUFFER bits

Removed unused enums.

> b396bfdebffc tracing: Have hwlat ts be first instance and record count of 
> instances

Affects only the hard ware latency detector (most likely not even
configured in the kernel).

So I don't understand how any of the above commits can cause a problem.

-- Steve


Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

2020-04-09 Thread piliu



On 04/09/2020 03:26 PM, David Hildenbrand wrote:
> On 09.04.20 04:59, piliu wrote:
>>
>>
>> On 04/08/2020 10:46 AM, Baoquan He wrote:
>>> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
>>>
>>> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
 In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
 blocks as removable"), the user space interface to compute whether a memory
 block can be offlined (exposed via
 /sys/devices/system/memory/memoryX/removable) has effectively been
 deprecated. We want to remove the leftovers of the kernel implementation.
>>>
>>> Pingfan, can you have a look at this change on PPC?  Please feel free to
>>> give comments if any concern, or offer ack if it's OK to you.
>>>

 When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
 we'll start by:
 1. Testing if it contains any holes, and reject if so
 2. Testing if pages belong to different zones, and reject if so
 3. Isolating the page range, checking if it contains any unmovable pages

 Using is_mem_section_removable() before trying to offline is not only racy,
 it can easily result in false positives/negatives. Let's stop manually
 checking is_mem_section_removable(), and let device_offline() handle it
 completely instead. We can remove the racy is_mem_section_removable()
 implementation next.

 We now take more locks (e.g., memory hotplug lock when offlining and the
 zone lock when isolating), but maybe we should optimize that
 implementation instead if this ever becomes a real problem (after all,
 memory unplug is already an expensive operation). We started using
 is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
 Implement memory hotplug remove in the kernel"), with the initial
 hotremove support of lmbs.

 Cc: Nathan Fontenot 
 Cc: Michael Ellerman 
 Cc: Benjamin Herrenschmidt 
 Cc: Paul Mackerras 
 Cc: Michal Hocko 
 Cc: Andrew Morton 
 Cc: Oscar Salvador 
 Cc: Baoquan He 
 Cc: Wei Yang 
 Signed-off-by: David Hildenbrand 
 ---
  .../platforms/pseries/hotplug-memory.c| 26 +++
  1 file changed, 3 insertions(+), 23 deletions(-)

 diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
 b/arch/powerpc/platforms/pseries/hotplug-memory.c
 index b2cde1732301..5ace2f9a277e 100644
 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
 +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
 @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct 
 device_node *np)
  
  static bool lmb_is_removable(struct drmem_lmb *lmb)
  {
 -  int i, scns_per_block;
 -  bool rc = true;
 -  unsigned long pfn, block_sz;
 -  u64 phys_addr;
 -
if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
return false;
  
 -  block_sz = memory_block_size_bytes();
 -  scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
 -  phys_addr = lmb->base_addr;
 -
  #ifdef CONFIG_FA_DUMP
/*
 * Don't hot-remove memory that falls in fadump boot memory area
 * and memory that is reserved for capturing old kernel memory.
 */
 -  if (is_fadump_memory_area(phys_addr, block_sz))
 +  if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
return false;
  #endif
 -
 -  for (i = 0; i < scns_per_block; i++) {
 -  pfn = PFN_DOWN(phys_addr);
 -  if (!pfn_in_present_section(pfn)) {
 -  phys_addr += MIN_MEMORY_BLOCK_SIZE;
 -  continue;
 -  }
 -
 -  rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
 -  phys_addr += MIN_MEMORY_BLOCK_SIZE;
 -  }
 -
 -  return rc;
 +  /* device_offline() will determine if we can actually remove this lmb */
 +  return true;
>> So I think here swaps the check and do sequence. At least it breaks
>> dlpar_memory_remove_by_count(). It is doable to remove
>> is_mem_section_removable(), but here should be more effort to re-arrange
>> the code.
>>
> 
> Thanks Pingfan,
> 
> 1. "swaps the check and do sequence":
> 
> Partially. Any caller of dlpar_remove_lmb() already has to deal with
> false positives. device_offline() can easily fail after
> dlpar_remove_lmb() == true. It's inherently racy.
> 
> 2. "breaks dlpar_memory_remove_by_count()"
> 
> Can you elaborate why it "breaks" it? It will simply try to
> offline+remove lmbs, detect that it wasn't able to offline+remove as
> much as it wanted (which could happen before as well easily), and re-add
> the already offlined+removed ones.
> 
I overlooked the re-add logic. Then I think
dlpar_memory_remove_by_count() is OK with this patch.
> 3. "more effort to re-arrange the code"
> 
> What would be your suggestion?
> 
I had thought about merging the two loo

Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Gerhard Pircher
Am 09.04.20 um 10:54 schrieb Benjamin Herrenschmidt:
> On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote:
>> On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
>>> If this code was broken for non-coherent caches a crude powerpc hack
>>> isn't going to help anyone else.  Remove the hack as it is the last
>>> user of __vmalloc passing a page protection flag other than PAGE_KERNEL.
>>
>> Well Ben added this to make stuff work on ppc, ofc the home grown dma
>> layer in drm from back then isn't going to work in other places. I guess
>> should have at least an ack from him, in case anyone still cares about
>> this on ppc. Adding Ben to cc.
>
> This was due to some drivers (radeon ?) trying to use vmalloc pages for
> coherent DMA, which means on those 4xx powerpc's need to be non-cached.
>
> There were machines using that (440 based iirc), though I honestly
> can't tell if anybody still uses any of it.
The first-gen amigaone platform (6xx/book32s) uses the radeon driver
together with non-coherent DMA. However this only ever worked reliably
for DRI1.

br,
Gerhard

> Cheers,
> Ben.
>
>> -Daniel
>>
>>>
>>> Signed-off-by: Christoph Hellwig 
>>> ---
>>>  drivers/gpu/drm/drm_scatter.c | 11 +--
>>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c
>>> index ca520028b2cb..f4e6184d1877 100644
>>> --- a/drivers/gpu/drm/drm_scatter.c
>>> +++ b/drivers/gpu/drm/drm_scatter.c
>>> @@ -43,15 +43,6 @@
>>>
>>>  #define DEBUG_SCATTER 0
>>>
>>> -static inline void *drm_vmalloc_dma(unsigned long size)
>>> -{
>>> -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
>>> -   return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL));
>>> -#else
>>> -   return vmalloc_32(size);
>>> -#endif
>>> -}
>>> -
>>>  static void drm_sg_cleanup(struct drm_sg_mem * entry)
>>>  {
>>> struct page *page;
>>> @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void 
>>> *data,
>>> return -ENOMEM;
>>> }
>>>
>>> -   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
>>> +   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
>>> if (!entry->virtual) {
>>> kfree(entry->busaddr);
>>> kfree(entry->pagelist);
>>> --
>>> 2.25.1
>>>
>>
>>
>



Re: [PATCH 03/35] docs: fix broken references to text files

2020-04-09 Thread Federico Vaga
On Wednesday, April 8, 2020 5:45:55 PM CEST Mauro Carvalho Chehab wrote:
> Several references got broken due to txt to ReST conversion.
> 
> Several of them can be automatically fixed with:
> 
>   scripts/documentation-file-ref-check --fix
> 
> Reviewed-by: Mathieu Poirier  #
> hwtracing/coresight/Kconfig Signed-off-by: Mauro Carvalho Chehab
> 
> ---
>  Documentation/memory-barriers.txt|  2 +-
>  Documentation/process/submit-checklist.rst   |  2 +-
>  .../translations/it_IT/process/submit-checklist.rst  |  2 +-

Acked-by: Federico Vaga 





Re: [PATCH] powerpcs: perf: consolidate perf_callchain_user_64 and perf_callchain_user_32

2020-04-09 Thread Michal Suchánek
On Tue, Apr 07, 2020 at 07:21:06AM +0200, Christophe Leroy wrote:
> 
> 
> Le 06/04/2020 à 23:00, Michal Suchanek a écrit :
> > perf_callchain_user_64 and perf_callchain_user_32 are nearly identical.
> > Consolidate into one function with thin wrappers.
> > 
> > Suggested-by: Nicholas Piggin 
> > Signed-off-by: Michal Suchanek 
> > ---
> >   arch/powerpc/perf/callchain.h| 24 +++-
> >   arch/powerpc/perf/callchain_32.c | 21 ++---
> >   arch/powerpc/perf/callchain_64.c | 14 --
> >   3 files changed, 29 insertions(+), 30 deletions(-)
> > 
> > diff --git a/arch/powerpc/perf/callchain.h b/arch/powerpc/perf/callchain.h
> > index 7a2cb9e1181a..7540bb71cb60 100644
> > --- a/arch/powerpc/perf/callchain.h
> > +++ b/arch/powerpc/perf/callchain.h
> > @@ -2,7 +2,7 @@
> >   #ifndef _POWERPC_PERF_CALLCHAIN_H
> >   #define _POWERPC_PERF_CALLCHAIN_H
> > -int read_user_stack_slow(void __user *ptr, void *buf, int nb);
> > +int read_user_stack_slow(const void __user *ptr, void *buf, int nb);
> 
> Does the constification of ptr has to be in this patch ?
It was in the original patch. The code is touched anyway.
> Wouldn't it be better to have it as a separate patch ?
Don't care much either way. Can resend it as separate patches.
> 
> >   void perf_callchain_user_64(struct perf_callchain_entry_ctx *entry,
> > struct pt_regs *regs);
> >   void perf_callchain_user_32(struct perf_callchain_entry_ctx *entry,
> > @@ -16,4 +16,26 @@ static inline bool invalid_user_sp(unsigned long sp)
> > return (!sp || (sp & mask) || (sp > top));
> >   }
> > +/*
> > + * On 32-bit we just access the address and let hash_page create a
> > + * HPTE if necessary, so there is no need to fall back to reading
> > + * the page tables.  Since this is called at interrupt level,
> > + * do_page_fault() won't treat a DSI as a page fault.
> > + */
> > +static inline int __read_user_stack(const void __user *ptr, void *ret,
> > +   size_t size)
> > +{
> > +   int rc;
> > +
> > +   if ((unsigned long)ptr > TASK_SIZE - size ||
> > +   ((unsigned long)ptr & (size - 1)))
> > +   return -EFAULT;
> > +   rc = probe_user_read(ret, ptr, size);
> > +
> > +   if (rc && IS_ENABLED(CONFIG_PPC64))
> 
> gcc is probably smart enough to deal with it efficiently, but it would
> be more correct to test rc after checking CONFIG_PPC64.
IS_ENABLED(CONFIG_PPC64) is constant so that part of the check should be
compiled out in any case.

Thanks

Michal


[GIT PULL] Please pull powerpc/linux.git powerpc-5.7-2 tag

2020-04-09 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi Linus,

Please pull some more powerpc updates for 5.7.

The bulk of this is the series to make CONFIG_COMPAT user-selectable, it's been
around for a long time but was blocked behind the syscall-in-C series. Plus
there's also a few fixes and other minor things.

cheers


The following changes since commit c17eb4dca5a353a9dbbb8ad6934fe57af7165e91:

  powerpc: Make setjmp/longjmp signature standard (2020-04-01 14:30:51 +1100)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-5.7-2

for you to fetch changes up to 6ba4a2d3591039aea1cb45c7c42262d26351a2fa:

  selftests/powerpc: Always build the tm-poison test 64-bit (2020-04-04 
21:41:40 +1100)

- --
powerpc updates for 5.7 #2

 - A fix for a crash in machine check handling on pseries (ie. guests)

 - A small series to make it possible to disable CONFIG_COMPAT, and turn it off
   by default for ppc64le where it's not used.

 - A few other miscellaneous fixes and small improvements.

Thanks to:
  Alexey Kardashevskiy, Anju T Sudhakar, Arnd Bergmann, Christophe Leroy, Dan
  Carpenter, Ganesh Goudar, Geert Uytterhoeven, Geoff Levand, Mahesh Salgaonkar,
  Markus Elfring, Michal Suchanek, Nicholas Piggin, Stephen Boyd, Wen Xiong.

- --
Alexey Kardashevskiy (1):
  powerpc/pseries/ddw: Extend upper limit for huge DMA window for 
persistent memory

Anju T Sudhakar (2):
  powerpc/perf: Implement a global lock to avoid races between trace, core 
and thread imc events.
  powerpc/powernv: Re-enable imc trace-mode in kernel

Dan Carpenter (1):
  powerpc/ps3: Remove an unneeded NULL check

Ganesh Goudar (1):
  powerpc/pseries: Fix MCE handling on pseries

Geert Uytterhoeven (1):
  powerpc/time: Replace  by 

Geoff Levand (1):
  powerpc/ps3: Set CONFIG_UEVENT_HELPER=y in ps3_defconfig

Markus Elfring (1):
  powerpc/ps3: Remove duplicate error message

Michael Ellerman (2):
  selftests/eeh: Skip ahci adapters
  selftests/powerpc: Always build the tm-poison test 64-bit

Michal Suchanek (7):
  powerpc: Add back __ARCH_WANT_SYS_LLSEEK macro
  powerpc: move common register copy functions from signal_32.c to signal.c
  powerpc/perf: consolidate read_user_stack_32
  powerpc/perf: consolidate valid_user_sp -> invalid_user_sp
  powerpc/64: make buildable without CONFIG_COMPAT
  powerpc/64: Make COMPAT user-selectable disabled on littleendian by 
default.
  powerpc/perf: split callchain.c by bitness

Nicholas Piggin (3):
  powerpc/64s: Fix doorbell wakeup msgclr optimisation
  Revert "powerpc/64: irq_work avoid interrupt when called with hardware 
irqs enabled"
  powerpc: Improve ppc_save_regs()


 arch/powerpc/Kconfig |   5 +-
 arch/powerpc/configs/ps3_defconfig   |   2 +
 arch/powerpc/include/asm/thread_info.h   |   4 +-
 arch/powerpc/include/asm/unistd.h|   1 +
 arch/powerpc/kernel/Makefile |   5 +-
 arch/powerpc/kernel/entry_64.S   |   2 +
 arch/powerpc/kernel/exceptions-64s.S |  19 --
 arch/powerpc/kernel/irq.c|  13 +
 arch/powerpc/kernel/ppc_save_regs.S  |   6 +-
 arch/powerpc/kernel/ptrace/Makefile  |   2 +-
 arch/powerpc/kernel/signal.c | 144 +++-
 arch/powerpc/kernel/signal_32.c  | 140 
 arch/powerpc/kernel/syscall_64.c |   6 +-
 arch/powerpc/kernel/time.c   |  48 +--
 arch/powerpc/kernel/vdso.c   |   3 +-
 arch/powerpc/perf/Makefile   |   5 +-
 arch/powerpc/perf/callchain.c| 356 +---
 arch/powerpc/perf/callchain.h|  19 ++
 arch/powerpc/perf/callchain_32.c | 196 +++
 arch/powerpc/perf/callchain_64.c | 174 ++
 arch/powerpc/perf/imc-pmu.c  | 173 --
 arch/powerpc/platforms/powernv/opal-imc.c|   9 +-
 arch/powerpc/platforms/ps3/os-area.c |   4 +-
 arch/powerpc/platforms/pseries/iommu.c   |   9 +
 arch/powerpc/platforms/pseries/ras.c |  11 +
 drivers/ps3/sys-manager-core.c   |   2 +-
 fs/read_write.c  |   3 +-
 tools/testing/selftests/powerpc/eeh/eeh-basic.sh |   5 +
 tools/testing/selftests/powerpc/tm/Makefile  |   1 +
 29 files changed, 766 insertions(+), 601 deletions(-)
 create mode 100644 arch/powerpc/perf/callchain.h
 create mode 100644 arch/powerpc/perf/callchain_32.c
 create mode 100644 arch/powerpc/perf/callchain_64.c
-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl6O+ZoACgkQUevqPMjh
pYBTEA/8Dkp4b60dA28iP/z6XntEIl3QT4rx

Re: Linux-next POWER9 NULL pointer NIP since 1st Apr.

2020-04-09 Thread Qian Cai



> On Apr 7, 2020, at 9:30 AM, Steven Rostedt  wrote:
> 
> On Tue, 7 Apr 2020 09:01:10 -0400
> Qian Cai  wrote:
> 
>> + Steven
>> 
>>> On Apr 7, 2020, at 8:42 AM, Michael Ellerman  wrote:
>>> 
>>> Qian Cai  writes:  
 Ever since 1st Apr, linux-next starts to trigger a NULL pointer NIP on 
 POWER9 below using
 this config,
 
 https://raw.githubusercontent.com/cailca/linux-mm/master/powerpc.config
 
 It takes a while to reproduce, so before I bury myself into bisecting and 
 just send a head-up
 to see if anyone spots anything obvious.
 
 [  206.744625][T13224] LTP: starting fallocate04
 [  207.601583][T27684] /dev/zero: Can't open blockdev
 [  208.674301][T27684] EXT4-fs (loop0): mounting ext3 file system using 
 the ext4 subsystem
 [  208.680347][T27684] BUG: Unable to handle kernel instruction fetch 
 (NULL pointer?)
 [  208.680383][T27684] Faulting instruction address: 0x
 [  208.680406][T27684] Oops: Kernel access of bad area, sig: 11 [#1]
 [  208.680439][T27684] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 
 DEBUG_PAGEALLOC NUMA PowerNV
 [  208.680474][T27684] Modules linked in: ext4 crc16 mbcache jbd2 loop 
 kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci libahci mdio tg3 
 libata libphy firmware_class dm_mirror dm_region_hash dm_log dm_mod
 [  208.680576][T27684] CPU: 117 PID: 27684 Comm: fallocate04 Tainted: G
 W 5.6.0-next-20200401+ #288
 [  208.680614][T27684] NIP:   LR: c008102c0048 CTR: 
 
 [  208.680657][T27684] REGS: c000200361def420 TRAP: 0400   Tainted: G  
   W  (5.6.0-next-20200401+)
 [  208.680700][T27684] MSR:  90004280b033 
   CR: 4208  XER: 2004
 [  208.680760][T27684] CFAR: c0081032c494 IRQMASK: 0 
 [  208.680760][T27684] GPR00: c05ac3f8 c000200361def6b0 
 c165c200 c00020107dae0bd0 
 [  208.680760][T27684] GPR04:  0400 
   
 [  208.680760][T27684] GPR08: c000200361def6e8 c008102c0040 
 7fff c1614e80 
 [  208.680760][T27684] GPR12:  c000201fff671280 
  0002 
 [  208.680760][T27684] GPR16: 0002 00040001 
 c00020030f5a1000 c00020030f5a1548 
 [  208.680760][T27684] GPR20: c15fbad8 c168c654 
 c000200361def818 c05b4c10 
 [  208.680760][T27684] GPR24:  c008103365b8 
 c00020107dae0bd0 0400 
 [  208.680760][T27684] GPR28: c168c3a8  
   
 [  208.681014][T27684] NIP [] 0x0
 [  208.681065][T27684] LR [c008102c0048] ext4_iomap_end+0x8/0x30 
 [ext4]  
>>> 
>>> That LR looks like it's pointing to the return from _mcount in
>>> ext4_iomap_end(), which means we have probably crashed in ftrace
>>> somewhere.
>>> 
>>> Did you have tracing enabled when you ran the test? Or does it do
>>> tracing itself?  
>> 
>> Yes, it run ftrace at first before running LTP to trigger it,
>> 
>> https://github.com/cailca/linux-mm/blob/master/test.sh
>> 
>> echo function > /sys/kernel/debug/tracing/current_tracer
>> echo nop > /sys/kernel/debug/tracing/current_tracer
>> 
>> There is another crash with even non-NULL NIP, but then symbol behaves weird.
>> 
>> # ./scripts/faddr2line vmlinux sysctl_net_busy_read+0x0/0x4
>> skipping sysctl_net_busy_read address at 0xc16804ac due to 
>> non-function symbol of type 'D'
>> 
>> [  148.110969][T13115] LTP: starting chown04_16
>> [  148.255048][T13380] kernel tried to execute exec-protected page 
>> (c16804ac) - exploit attempt? (uid: 0)
>> [  148.255099][T13380] BUG: Unable to handle kernel instruction fetch
>> [  148.255122][T13380] Faulting instruction address: 0xc16804ac
>> [  148.255136][T13380] Oops: Kernel access of bad area, sig: 11 [#1]
>> [  148.255157][T13380] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 
>> DEBUG_PAGEALLOC NUMA PowerNV
>> [  148.255171][T13380] Modules linked in: loop kvm_hv kvm xfs sd_mod bnx2x 
>> mdio ahci tg3 libahci libphy libata firmware_class dm_mirror dm_region_hash 
>> dm_log dm_mod
>> [  148.255213][T13380] CPU: 45 PID: 13380 Comm: chown04_16 Tainted: G
>> W 5.6.0+ #7
>> [  148.255236][T13380] NIP:  c16804ac LR: c0080fa60408 CTR: 
>> c16804ac
>> [  148.255250][T13380] REGS: c010a6fafa00 TRAP: 0400   Tainted: G
>> W  (5.6.0+)
>> [  148.255281][T13380] MSR:  900010009033   CR: 
>> 84000248  XER: 2004
>> [  148.255310][T13380] CFAR: c0080fa66534 IRQMASK: 0 
>> [  148.255310][T13380] GPR00: c0973268 c010a6fafc90 
>> c1648200  
>> [  148.255310][T13380] GPR04: c00d8a22dc00 c010a6fafd30 
>> b5e98331 00012c9f 
>> [  148.25

Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Daniel Vetter
On Thu, Apr 9, 2020 at 10:54 AM Benjamin Herrenschmidt
 wrote:
>
> On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote:
> > On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
> > > If this code was broken for non-coherent caches a crude powerpc hack
> > > isn't going to help anyone else.  Remove the hack as it is the last
> > > user of __vmalloc passing a page protection flag other than PAGE_KERNEL.
> >
> > Well Ben added this to make stuff work on ppc, ofc the home grown dma
> > layer in drm from back then isn't going to work in other places. I guess
> > should have at least an ack from him, in case anyone still cares about
> > this on ppc. Adding Ben to cc.
>
> This was due to some drivers (radeon ?) trying to use vmalloc pages for
> coherent DMA, which means on those 4xx powerpc's need to be non-cached.
>
> There were machines using that (440 based iirc), though I honestly
> can't tell if anybody still uses any of it.

agp subsystem still seems to happily do that (vmalloc memory for
device access), never having been ported to dma apis (or well
converted to iommu drivers, which they kinda are really). So I think
this all still works exactly as back then, even with the kms radeon
drivers. Question really is whether we have users left, and I have no
clue about that either.

Now if these boxes didn't ever have agp then I think we can get away
with deleting this, since we've already deleted the legacy radeon
driver. And that one used vmalloc for everything. The new kms one does
use the dma-api if the gpu isn't connected through agp.
-Daniel

> Cheers,
> Ben.
>
> > -Daniel
> >
> > >
> > > Signed-off-by: Christoph Hellwig 
> > > ---
> > >  drivers/gpu/drm/drm_scatter.c | 11 +--
> > >  1 file changed, 1 insertion(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c
> > > index ca520028b2cb..f4e6184d1877 100644
> > > --- a/drivers/gpu/drm/drm_scatter.c
> > > +++ b/drivers/gpu/drm/drm_scatter.c
> > > @@ -43,15 +43,6 @@
> > >
> > >  #define DEBUG_SCATTER 0
> > >
> > > -static inline void *drm_vmalloc_dma(unsigned long size)
> > > -{
> > > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
> > > -   return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL));
> > > -#else
> > > -   return vmalloc_32(size);
> > > -#endif
> > > -}
> > > -
> > >  static void drm_sg_cleanup(struct drm_sg_mem * entry)
> > >  {
> > > struct page *page;
> > > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void 
> > > *data,
> > > return -ENOMEM;
> > > }
> > >
> > > -   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
> > > +   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
> > > if (!entry->virtual) {
> > > kfree(entry->busaddr);
> > > kfree(entry->pagelist);
> > > --
> > > 2.25.1
> > >
> >
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

2020-04-09 Thread David Hildenbrand
On 09.04.20 09:26, David Hildenbrand wrote:
> On 09.04.20 04:59, piliu wrote:
>>
>>
>> On 04/08/2020 10:46 AM, Baoquan He wrote:
>>> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
>>>
>>> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
 In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
 blocks as removable"), the user space interface to compute whether a memory
 block can be offlined (exposed via
 /sys/devices/system/memory/memoryX/removable) has effectively been
 deprecated. We want to remove the leftovers of the kernel implementation.
>>>
>>> Pingfan, can you have a look at this change on PPC?  Please feel free to
>>> give comments if any concern, or offer ack if it's OK to you.
>>>

 When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
 we'll start by:
 1. Testing if it contains any holes, and reject if so
 2. Testing if pages belong to different zones, and reject if so
 3. Isolating the page range, checking if it contains any unmovable pages

 Using is_mem_section_removable() before trying to offline is not only racy,
 it can easily result in false positives/negatives. Let's stop manually
 checking is_mem_section_removable(), and let device_offline() handle it
 completely instead. We can remove the racy is_mem_section_removable()
 implementation next.

 We now take more locks (e.g., memory hotplug lock when offlining and the
 zone lock when isolating), but maybe we should optimize that
 implementation instead if this ever becomes a real problem (after all,
 memory unplug is already an expensive operation). We started using
 is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
 Implement memory hotplug remove in the kernel"), with the initial
 hotremove support of lmbs.

 Cc: Nathan Fontenot 
 Cc: Michael Ellerman 
 Cc: Benjamin Herrenschmidt 
 Cc: Paul Mackerras 
 Cc: Michal Hocko 
 Cc: Andrew Morton 
 Cc: Oscar Salvador 
 Cc: Baoquan He 
 Cc: Wei Yang 
 Signed-off-by: David Hildenbrand 
 ---
  .../platforms/pseries/hotplug-memory.c| 26 +++
  1 file changed, 3 insertions(+), 23 deletions(-)

 diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
 b/arch/powerpc/platforms/pseries/hotplug-memory.c
 index b2cde1732301..5ace2f9a277e 100644
 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
 +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
 @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct 
 device_node *np)
  
  static bool lmb_is_removable(struct drmem_lmb *lmb)
  {
 -  int i, scns_per_block;
 -  bool rc = true;
 -  unsigned long pfn, block_sz;
 -  u64 phys_addr;
 -
if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
return false;
  
 -  block_sz = memory_block_size_bytes();
 -  scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
 -  phys_addr = lmb->base_addr;
 -
  #ifdef CONFIG_FA_DUMP
/*
 * Don't hot-remove memory that falls in fadump boot memory area
 * and memory that is reserved for capturing old kernel memory.
 */
 -  if (is_fadump_memory_area(phys_addr, block_sz))
 +  if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
return false;
  #endif
 -
 -  for (i = 0; i < scns_per_block; i++) {
 -  pfn = PFN_DOWN(phys_addr);
 -  if (!pfn_in_present_section(pfn)) {
 -  phys_addr += MIN_MEMORY_BLOCK_SIZE;
 -  continue;
 -  }
 -
 -  rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
 -  phys_addr += MIN_MEMORY_BLOCK_SIZE;
 -  }
 -
 -  return rc;
 +  /* device_offline() will determine if we can actually remove this lmb */
 +  return true;
>> So I think here swaps the check and do sequence. At least it breaks
>> dlpar_memory_remove_by_count(). It is doable to remove
>> is_mem_section_removable(), but here should be more effort to re-arrange
>> the code.
>>
> 
> Thanks Pingfan,
> 
> 1. "swaps the check and do sequence":
> 
> Partially. Any caller of dlpar_remove_lmb() already has to deal with
> false positives. device_offline() can easily fail after
> dlpar_remove_lmb() == true. It's inherently racy.

Sorry, s/dlpar_remove_lmb/lmb_is_removable/


-- 
Thanks,

David / dhildenb



Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-09 Thread Benjamin Herrenschmidt
On Wed, 2020-04-08 at 14:25 +0200, Daniel Vetter wrote:
> On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
> > If this code was broken for non-coherent caches a crude powerpc hack
> > isn't going to help anyone else.  Remove the hack as it is the last
> > user of __vmalloc passing a page protection flag other than PAGE_KERNEL.
> 
> Well Ben added this to make stuff work on ppc, ofc the home grown dma
> layer in drm from back then isn't going to work in other places. I guess
> should have at least an ack from him, in case anyone still cares about
> this on ppc. Adding Ben to cc.

This was due to some drivers (radeon ?) trying to use vmalloc pages for
coherent DMA, which means on those 4xx powerpc's need to be non-cached.

There were machines using that (440 based iirc), though I honestly
can't tell if anybody still uses any of it.

Cheers,
Ben.

> -Daniel
> 
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  drivers/gpu/drm/drm_scatter.c | 11 +--
> >  1 file changed, 1 insertion(+), 10 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c
> > index ca520028b2cb..f4e6184d1877 100644
> > --- a/drivers/gpu/drm/drm_scatter.c
> > +++ b/drivers/gpu/drm/drm_scatter.c
> > @@ -43,15 +43,6 @@
> >  
> >  #define DEBUG_SCATTER 0
> >  
> > -static inline void *drm_vmalloc_dma(unsigned long size)
> > -{
> > -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
> > -   return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL));
> > -#else
> > -   return vmalloc_32(size);
> > -#endif
> > -}
> > -
> >  static void drm_sg_cleanup(struct drm_sg_mem * entry)
> >  {
> > struct page *page;
> > @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void 
> > *data,
> > return -ENOMEM;
> > }
> >  
> > -   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
> > +   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
> > if (!entry->virtual) {
> > kfree(entry->busaddr);
> > kfree(entry->pagelist);
> > -- 
> > 2.25.1
> > 
> 
> 



Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

2020-04-09 Thread Michal Hocko
On Thu 09-04-20 10:12:20, David Hildenbrand wrote:
> On 09.04.20 09:59, Michal Hocko wrote:
> > On Thu 09-04-20 17:26:01, Michael Ellerman wrote:
> >> David Hildenbrand  writes:
> >>
> >>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
> >>> blocks as removable"), the user space interface to compute whether a 
> >>> memory
> >>> block can be offlined (exposed via
> >>> /sys/devices/system/memory/memoryX/removable) has effectively been
> >>> deprecated. We want to remove the leftovers of the kernel implementation.
> >>>
> >>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
> >>> we'll start by:
> >>> 1. Testing if it contains any holes, and reject if so
> >>> 2. Testing if pages belong to different zones, and reject if so
> >>> 3. Isolating the page range, checking if it contains any unmovable pages
> >>>
> >>> Using is_mem_section_removable() before trying to offline is not only 
> >>> racy,
> >>> it can easily result in false positives/negatives. Let's stop manually
> >>> checking is_mem_section_removable(), and let device_offline() handle it
> >>> completely instead. We can remove the racy is_mem_section_removable()
> >>> implementation next.
> >>>
> >>> We now take more locks (e.g., memory hotplug lock when offlining and the
> >>> zone lock when isolating), but maybe we should optimize that
> >>> implementation instead if this ever becomes a real problem (after all,
> >>> memory unplug is already an expensive operation). We started using
> >>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
> >>> Implement memory hotplug remove in the kernel"), with the initial
> >>> hotremove support of lmbs.
> >>
> >> It's also not very pretty in dmesg.
> >>
> >> Before:
> >>
> >>   pseries-hotplug-mem: Attempting to hot-add 10 LMB(s)
> >>   pseries-hotplug-mem: Memory hot-add failed, removing any added LMBs
> >>   dlpar: Could not handle DLPAR request "memory add count 10"
> > 
> > Yeah, there is more output but isn't that useful? Or put it differently
> > what is the actual problem from having those messages in the kernel log?
> > 
> > From the below you can clearly tell that there are kernel allocations
> > which prevent hot remove from happening.
> > 
> > If the overall size of the debugging output is a concern then we can
> > think of a way to reduce it. E.g. once you have a couple of pages
> > reported then all others from the same block are likely not interesting
> > much.
> > 
> 
> IIRC, we only report one page per block already. (and stop, as we
> detected something unmovable)

You are right.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

2020-04-09 Thread David Hildenbrand
On 09.04.20 09:59, Michal Hocko wrote:
> On Thu 09-04-20 17:26:01, Michael Ellerman wrote:
>> David Hildenbrand  writes:
>>
>>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>>> blocks as removable"), the user space interface to compute whether a memory
>>> block can be offlined (exposed via
>>> /sys/devices/system/memory/memoryX/removable) has effectively been
>>> deprecated. We want to remove the leftovers of the kernel implementation.
>>>
>>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>>> we'll start by:
>>> 1. Testing if it contains any holes, and reject if so
>>> 2. Testing if pages belong to different zones, and reject if so
>>> 3. Isolating the page range, checking if it contains any unmovable pages
>>>
>>> Using is_mem_section_removable() before trying to offline is not only racy,
>>> it can easily result in false positives/negatives. Let's stop manually
>>> checking is_mem_section_removable(), and let device_offline() handle it
>>> completely instead. We can remove the racy is_mem_section_removable()
>>> implementation next.
>>>
>>> We now take more locks (e.g., memory hotplug lock when offlining and the
>>> zone lock when isolating), but maybe we should optimize that
>>> implementation instead if this ever becomes a real problem (after all,
>>> memory unplug is already an expensive operation). We started using
>>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>>> Implement memory hotplug remove in the kernel"), with the initial
>>> hotremove support of lmbs.
>>
>> It's also not very pretty in dmesg.
>>
>> Before:
>>
>>   pseries-hotplug-mem: Attempting to hot-add 10 LMB(s)
>>   pseries-hotplug-mem: Memory hot-add failed, removing any added LMBs
>>   dlpar: Could not handle DLPAR request "memory add count 10"
> 
> Yeah, there is more output but isn't that useful? Or put it differently
> what is the actual problem from having those messages in the kernel log?
> 
> From the below you can clearly tell that there are kernel allocations
> which prevent hot remove from happening.
> 
> If the overall size of the debugging output is a concern then we can
> think of a way to reduce it. E.g. once you have a couple of pages
> reported then all others from the same block are likely not interesting
> much.
> 

IIRC, we only report one page per block already. (and stop, as we
detected something unmovable)

-- 
Thanks,

David / dhildenb



Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

2020-04-09 Thread Michal Hocko
On Thu 09-04-20 17:26:01, Michael Ellerman wrote:
> David Hildenbrand  writes:
> 
> > In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
> > blocks as removable"), the user space interface to compute whether a memory
> > block can be offlined (exposed via
> > /sys/devices/system/memory/memoryX/removable) has effectively been
> > deprecated. We want to remove the leftovers of the kernel implementation.
> >
> > When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
> > we'll start by:
> > 1. Testing if it contains any holes, and reject if so
> > 2. Testing if pages belong to different zones, and reject if so
> > 3. Isolating the page range, checking if it contains any unmovable pages
> >
> > Using is_mem_section_removable() before trying to offline is not only racy,
> > it can easily result in false positives/negatives. Let's stop manually
> > checking is_mem_section_removable(), and let device_offline() handle it
> > completely instead. We can remove the racy is_mem_section_removable()
> > implementation next.
> >
> > We now take more locks (e.g., memory hotplug lock when offlining and the
> > zone lock when isolating), but maybe we should optimize that
> > implementation instead if this ever becomes a real problem (after all,
> > memory unplug is already an expensive operation). We started using
> > is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
> > Implement memory hotplug remove in the kernel"), with the initial
> > hotremove support of lmbs.
> 
> It's also not very pretty in dmesg.
> 
> Before:
> 
>   pseries-hotplug-mem: Attempting to hot-add 10 LMB(s)
>   pseries-hotplug-mem: Memory hot-add failed, removing any added LMBs
>   dlpar: Could not handle DLPAR request "memory add count 10"

Yeah, there is more output but isn't that useful? Or put it differently
what is the actual problem from having those messages in the kernel log?

>From the below you can clearly tell that there are kernel allocations
which prevent hot remove from happening.

If the overall size of the debugging output is a concern then we can
think of a way to reduce it. E.g. once you have a couple of pages
reported then all others from the same block are likely not interesting
much.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH] papr/scm: Add bad memory ranges to nvdimm bad ranges

2020-04-09 Thread Santosh Sivaraj
On Wed, Apr 1, 2020 at 1:18 PM Santosh Sivaraj  wrote:

> Subscribe to the MCE notification and add the physical address which
> generated a memory error to nvdimm bad range.
>
> Signed-off-by: Santosh Sivaraj 
> ---
>

Any comments on this?

Thanks,
Santosh


> This patch depends on "powerpc/mce: Add MCE notification chain" [1].
>
> Unlike the previous series[2], the patch adds badblock registration only
> for
> pseries scm driver. Handling badblocks for baremetal (powernv) PMEM will
> be done
> later and if possible get the badblock handling as a common code.
>
> [1]
> https://lore.kernel.org/linuxppc-dev/20200330071219.12284-1-ganes...@linux.ibm.com/
> [2]
> https://lore.kernel.org/linuxppc-dev/20190820023030.18232-1-sant...@fossix.org/
>
> arch/powerpc/platforms/pseries/papr_scm.c | 96 ++-
>  1 file changed, 95 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/papr_scm.c
> b/arch/powerpc/platforms/pseries/papr_scm.c
> index 0b4467e378e5..5012cbf4606e 100644
> --- a/arch/powerpc/platforms/pseries/papr_scm.c
> +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> @@ -12,6 +12,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>
>  #include 
>
> @@ -39,8 +41,12 @@ struct papr_scm_priv {
> struct resource res;
> struct nd_region *region;
> struct nd_interleave_set nd_set;
> +   struct list_head region_list;
>  };
>
> +LIST_HEAD(papr_nd_regions);
> +DEFINE_MUTEX(papr_ndr_lock);
> +
>  static int drc_pmem_bind(struct papr_scm_priv *p)
>  {
> unsigned long ret[PLPAR_HCALL_BUFSIZE];
> @@ -372,6 +378,10 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv
> *p)
> dev_info(dev, "Region registered with target node %d and
> online node %d",
>  target_nid, online_nid);
>
> +   mutex_lock(&papr_ndr_lock);
> +   list_add_tail(&p->region_list, &papr_nd_regions);
> +   mutex_unlock(&papr_ndr_lock);
> +
> return 0;
>
>  err:   nvdimm_bus_unregister(p->bus);
> @@ -379,6 +389,68 @@ err:   nvdimm_bus_unregister(p->bus);
> return -ENXIO;
>  }
>
> +void papr_scm_add_badblock(struct nd_region *region, struct nvdimm_bus
> *bus,
> +  u64 phys_addr)
> +{
> +   u64 aligned_addr = ALIGN_DOWN(phys_addr, L1_CACHE_BYTES);
> +
> +   if (nvdimm_bus_add_badrange(bus, aligned_addr, L1_CACHE_BYTES)) {
> +   pr_err("Bad block registration for 0x%llx failed\n",
> phys_addr);
> +   return;
> +   }
> +
> +   pr_debug("Add memory range (0x%llx - 0x%llx) as bad range\n",
> +aligned_addr, aligned_addr + L1_CACHE_BYTES);
> +
> +   nvdimm_region_notify(region, NVDIMM_REVALIDATE_POISON);
> +}
> +
> +static int handle_mce_ue(struct notifier_block *nb, unsigned long val,
> +void *data)
> +{
> +   struct machine_check_event *evt = data;
> +   struct papr_scm_priv *p;
> +   u64 phys_addr;
> +   bool found = false;
> +
> +   if (evt->error_type != MCE_ERROR_TYPE_UE)
> +   return NOTIFY_DONE;
> +
> +   if (list_empty(&papr_nd_regions))
> +   return NOTIFY_DONE;
> +
> +   phys_addr = evt->u.ue_error.physical_address +
> +   (evt->u.ue_error.effective_address & ~PAGE_MASK);
> +
> +   if (!evt->u.ue_error.physical_address_provided ||
> +   !is_zone_device_page(pfn_to_page(phys_addr >> PAGE_SHIFT)))
> +   return NOTIFY_DONE;
> +
> +   /* mce notifier is called from a process context, so mutex is safe
> */
> +   mutex_lock(&papr_ndr_lock);
> +   list_for_each_entry(p, &papr_nd_regions, region_list) {
> +   struct resource res = p->res;
> +
> +   if (phys_addr >= res.start && phys_addr <= res.end) {
> +   found = true;
> +   break;
> +   }
> +   }
> +
> +   mutex_unlock(&papr_ndr_lock);
> +
> +   if (!found)
> +   return NOTIFY_DONE;
> +
> +   papr_scm_add_badblock(p->region, p->bus, phys_addr);
> +
> +   return NOTIFY_OK;
> +}
> +
> +static struct notifier_block mce_ue_nb = {
> +   .notifier_call = handle_mce_ue
> +};
> +
>  static int papr_scm_probe(struct platform_device *pdev)
>  {
> struct device_node *dn = pdev->dev.of_node;
> @@ -476,6 +548,10 @@ static int papr_scm_remove(struct platform_device
> *pdev)
>  {
> struct papr_scm_priv *p = platform_get_drvdata(pdev);
>
> +   mutex_lock(&papr_ndr_lock);
> +   list_del(&(p->region_list));
> +   mutex_unlock(&papr_ndr_lock);
> +
> nvdimm_bus_unregister(p->bus);
> drc_pmem_unbind(p);
> kfree(p->bus_desc.provider_name);
> @@ -498,7 +574,25 @@ static struct platform_driver papr_scm_driver = {
> },
>  };
>
> -module_platform_driver(papr_scm_driver);
> +static int __init papr_scm_init(void)
> +{
> +   int ret;
> +
> +   ret = platform_driver_register(&papr_scm_dri

Re: [PATCH 03/35] docs: fix broken references to text files

2020-04-09 Thread Marc Zyngier
On Wed,  8 Apr 2020 17:45:55 +0200
Mauro Carvalho Chehab  wrote:

> Several references got broken due to txt to ReST conversion.
> 
> Several of them can be automatically fixed with:
> 
>   scripts/documentation-file-ref-check --fix
> 
> Reviewed-by: Mathieu Poirier  # 
> hwtracing/coresight/Kconfig
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>  Documentation/virt/kvm/arm/pvtime.rst|  2 +-
>  virt/kvm/arm/vgic/vgic-mmio-v3.c |  2 +-
>  virt/kvm/arm/vgic/vgic.h |  4 ++--

Acked-by: Marc Zyngier  # kvm/arm64

M.
-- 
Jazz is not dead. It just smells funny...


Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

2020-04-09 Thread David Hildenbrand
> It's also not very pretty in dmesg.
> 
> Before:
> 
>   pseries-hotplug-mem: Attempting to hot-add 10 LMB(s)
>   pseries-hotplug-mem: Memory hot-add failed, removing any added LMBs
>   dlpar: Could not handle DLPAR request "memory add count 10"
> 

Thanks for running it through the mill.

Here you test "hotadd", below you test "hot-remove". But yeah, there is
a change in the amount of dmesg.

> After:
> 
>   pseries-hotplug-mem: Attempting to hot-remove 10 LMB(s)
>   page:c00c01ca8200 refcount:1 mapcount:0 mapping:18c4a547 
> index:0xc0072a080180
>   flags: 0x700200(slab)
>   raw: 00700200 c00c01cffd48 c00781c51410 c00793327580
>   raw: c0072a080180 01550001 0001 
>   page dumped because: unmovable page
>   page:c00c01cc4a80 refcount:1 mapcount:0 mapping:18c4a547 
> index:0xc0079b110080
>   flags: 0x70()
>   raw: 0070 5deadbeef100 5deadbeef122 
>   raw: c0079b110080  0001 
>   page dumped because: unmovable page
>   page:c00c01d08200 refcount:1 mapcount:0 mapping:18c4a547 
> index:0xc0074208ff00
>   flags: 0x700200(slab)
>   raw: 00700200 c00781c5f150 c00c01d37f88 c00798a24880
>   raw: c0074208ff00 01550002 0001 
>   page dumped because: unmovable page
>   page:c00c01d40140 refcount:1 mapcount:0 mapping:18c4a547 
> index:0xc00750059c00
>   flags: 0x700200(slab)
>   raw: 00700200 c00c01dfcfc8 c00c01d3c288 c007851c2d00
>   raw: c00750059c00 0103 0001 
>   page dumped because: unmovable page
>   page:c00c01d9c000 refcount:1 mapcount:0 mapping:18c4a547 
> index:0xc2370080
>   flags: 0x70()
>   raw: 0070 5deadbeef100 5deadbeef122 
>   raw: c2370080  0001 
>   page dumped because: unmovable page
>   page:c00c01dc0200 refcount:1 mapcount:0 mapping:18c4a547 
> index:0xc2370080
>   flags: 0x70()
>   raw: 0070 5deadbeef100 5deadbeef122 
>   raw: c2370080  0001 
>   page dumped because: unmovable page
>   page:c00c01e0 refcount:1 mapcount:0 mapping:18c4a547 
> index:0x0
>   flags: 0x700200(slab)
>   raw: 00700200 5deadbeef100 5deadbeef122 c007a801f500
>   raw:  08000800 0001 
>   page dumped because: unmovable page
>   page:c00c01e40440 refcount:1 mapcount:0 mapping:18c4a547 
> index:0x0
>   flags: 0x700200(slab)
>   raw: 00700200 5deadbeef100 5deadbeef122 c007a801e380
>   raw:  00400040 0001 
>   page dumped because: unmovable page
>   page:c00c01e8 refcount:1 mapcount:0 mapping:18c4a547 
> index:0xc007a640
>   flags: 0x700200(slab)
>   raw: 00700200 c00c01e5af48 c00c01e80408 c00f42d00a00
>   raw: c007a640 066600a2 0001 
>   page dumped because: unmovable page
>   page:c00c03c89d40 refcount:2 mapcount:1 mapping:18c4a547 
> index:0x10a41
>   anon flags: 0x1780024(uptodate|active|swapbacked)
>   raw: 01780024 5deadbeef100 5deadbeef122 c007986b03c9
>   raw: 00010a41  0002 c340b000
>   page dumped because: unmovable page
>   page->mem_cgroup:c340b000
>   page:c00c03cc refcount:1 mapcount:0 mapping:18c4a547 
> index:0xc00f3000fd38
>   flags: 0x1700200(slab)
>   raw: 01700200 c00f3c911890 c00f3c911890 c0079fffd980
>   raw: c00f3000fd38 0073 0001 
>   page dumped because: unmovable page
>   page:c00c03d0 refcount:1 mapcount:0 mapping:18c4a547 
> index:0xc007a2ec
>   flags: 0x170()
>   raw: 0170 5deadbeef100 5deadbeef122 
>   raw: c007a2ec  0001 
>   page dumped because: unmovable page
>   page:c00c03e2c000 refcount:1 mapcount:0 mapping:18c4a547 
> index:0xc00f8b008400
>   flags: 0x2700200(slab)
>   raw: 02700200 c00f8e000190 c00f8e000190 c007a801e380
>   raw: c00f8b008400 00400038 0001 
>   page dumped because: unmovable page
>   page:c00c03fec000 refcount:1 mapcount:0 mapping:18c4a547 
> index:0x0
>   flags: 0x370()
>   raw: 0370 5deadbeef100 5deadbeef122 
>   raw:   000

Re: [PATCH v5 00/21] Initial Prefixed Instruction support

2020-04-09 Thread Jordan Niethe
On Thu, Apr 9, 2020 at 4:39 PM Christophe Leroy  wrote:
>
>
>
> On 04/06/2020 08:09 AM, Jordan Niethe wrote:
> > A future revision of the ISA will introduce prefixed instructions. A
> > prefixed instruction is composed of a 4-byte prefix followed by a
> > 4-byte suffix.
> >
> > All prefixes have the major opcode 1. A prefix will never be a valid
> > word instruction. A suffix may be an existing word instruction or a
> > new instruction.
> >
> > This series enables prefixed instructions and extends the instruction
> > emulation to support them. Then the places where prefixed instructions
> > might need to be emulated are updated.
> >
> > v5 is based on feedback from Nick Piggins, Michael Ellerman, Balamuruhan
> > Suriyakumar and Alistair Popple.
> > The major changes:
> >  - The ppc instruction type is now a struct
> >  - Series now just based on next
> >  - ppc_inst_masked() dropped
> >  - Space for xmon breakpoints allocated in an assembly file
> >  - "Add prefixed instructions to instruction data type" patch seperated 
> > in
> >to smaller patches
> >  - Calling convention for create_branch() is changed
> >  - Some places which had not been updated to use the data type are now 
> > updated
>
> Build fails. I have not investigated why:
Thanks, I will check it out.
>
>CC  arch/powerpc/kernel/process.o
> In file included from ./arch/powerpc/include/asm/code-patching.h:14:0,
>   from arch/powerpc/kernel/process.c:60:
> ./arch/powerpc/include/asm/inst.h:69:38: error: unknown type name ‘ppc_inst’
>   static inline bool ppc_inst_prefixed(ppc_inst x)
>^
> ./arch/powerpc/include/asm/inst.h:79:19: error: redefinition of
> ‘ppc_inst_val’
>   static inline u32 ppc_inst_val(struct ppc_inst x)
> ^
> ./arch/powerpc/include/asm/inst.h:21:19: note: previous definition of
> ‘ppc_inst_val’ was here
>   static inline u32 ppc_inst_val(struct ppc_inst x)
> ^
> ./arch/powerpc/include/asm/inst.h: In function ‘ppc_inst_len’:
> ./arch/powerpc/include/asm/inst.h:103:10: error: implicit declaration of
> function ‘ppc_inst_prefixed’ [-Werror=implicit-function-declaration]
>return (ppc_inst_prefixed(x)) ? 8  : 4;
>^
>
> Christophe
>
> >
> > v4 is based on feedback from Nick Piggins, Christophe Leroy and Daniel 
> > Axtens.
> > The major changes:
> >  - Move xmon breakpoints from data section to text section
> >  - Introduce a data type for instructions on powerpc
> >
> > v3 is based on feedback from Christophe Leroy. The major changes:
> >  - Completely replacing store_inst() with patch_instruction() in
> >xmon
> >  - Improve implementation of mread_instr() to not use mread().
> >  - Base the series on top of
> >https://patchwork.ozlabs.org/patch/1232619/ as this will effect
> >kprobes.
> >  - Some renaming and simplification of conditionals.
> >
> > v2 incorporates feedback from Daniel Axtens and and Balamuruhan
> > S. The major changes are:
> >  - Squashing together all commits about SRR1 bits
> >  - Squashing all commits for supporting prefixed load stores
> >  - Changing abbreviated references to sufx/prfx -> suffix/prefix
> >  - Introducing macros for returning the length of an instruction
> >  - Removing sign extension flag from pstd/pld in sstep.c
> >  - Dropping patch  "powerpc/fault: Use analyse_instr() to check for
> >store with updates to sp" from the series, it did not really fit
> >with prefixed enablement in the first place and as reported by Greg
> >Kurz did not work correctly.
> >
> >
> > Alistair Popple (1):
> >powerpc: Enable Prefixed Instructions
> >
> > Jordan Niethe (20):
> >powerpc/xmon: Remove store_inst() for patch_instruction()
> >powerpc/xmon: Move out-of-line instructions to text section
> >powerpc: Change calling convention for create_branch() et. al.
> >powerpc: Use a macro for creating instructions from u32s
> >powerpc: Use a function for getting the instruction op code
> >powerpc: Use an accessor for instructions
> >powerpc: Use a function for byte swapping instructions
> >powerpc: Introduce functions for instruction equality
> >powerpc: Use a datatype for instructions
> >powerpc: Use a function for reading instructions
> >powerpc: Define and use __get_user_instr{,inatomic}()
> >powerpc: Introduce a function for reporting instruction length
> >powerpc/xmon: Use a function for reading instructions
> >powerpc/xmon: Move insertion of breakpoint for xol'ing
> >powerpc: Make test_translate_branch() independent of instruction
> >  length
> >powerpc: Define new SRR1 bits for a future ISA version
> >powerpc64: Add prefixed instructions to instruction data type
> >powerpc: Support prefixed instructions in alignment handler
> >powerpc sstep: Add support for prefixed load/stores
> >  

Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

2020-04-09 Thread David Hildenbrand
On 09.04.20 04:59, piliu wrote:
> 
> 
> On 04/08/2020 10:46 AM, Baoquan He wrote:
>> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
>>
>> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
>>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>>> blocks as removable"), the user space interface to compute whether a memory
>>> block can be offlined (exposed via
>>> /sys/devices/system/memory/memoryX/removable) has effectively been
>>> deprecated. We want to remove the leftovers of the kernel implementation.
>>
>> Pingfan, can you have a look at this change on PPC?  Please feel free to
>> give comments if any concern, or offer ack if it's OK to you.
>>
>>>
>>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>>> we'll start by:
>>> 1. Testing if it contains any holes, and reject if so
>>> 2. Testing if pages belong to different zones, and reject if so
>>> 3. Isolating the page range, checking if it contains any unmovable pages
>>>
>>> Using is_mem_section_removable() before trying to offline is not only racy,
>>> it can easily result in false positives/negatives. Let's stop manually
>>> checking is_mem_section_removable(), and let device_offline() handle it
>>> completely instead. We can remove the racy is_mem_section_removable()
>>> implementation next.
>>>
>>> We now take more locks (e.g., memory hotplug lock when offlining and the
>>> zone lock when isolating), but maybe we should optimize that
>>> implementation instead if this ever becomes a real problem (after all,
>>> memory unplug is already an expensive operation). We started using
>>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>>> Implement memory hotplug remove in the kernel"), with the initial
>>> hotremove support of lmbs.
>>>
>>> Cc: Nathan Fontenot 
>>> Cc: Michael Ellerman 
>>> Cc: Benjamin Herrenschmidt 
>>> Cc: Paul Mackerras 
>>> Cc: Michal Hocko 
>>> Cc: Andrew Morton 
>>> Cc: Oscar Salvador 
>>> Cc: Baoquan He 
>>> Cc: Wei Yang 
>>> Signed-off-by: David Hildenbrand 
>>> ---
>>>  .../platforms/pseries/hotplug-memory.c| 26 +++
>>>  1 file changed, 3 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
>>> b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> index b2cde1732301..5ace2f9a277e 100644
>>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node 
>>> *np)
>>>  
>>>  static bool lmb_is_removable(struct drmem_lmb *lmb)
>>>  {
>>> -   int i, scns_per_block;
>>> -   bool rc = true;
>>> -   unsigned long pfn, block_sz;
>>> -   u64 phys_addr;
>>> -
>>> if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>>> return false;
>>>  
>>> -   block_sz = memory_block_size_bytes();
>>> -   scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>>> -   phys_addr = lmb->base_addr;
>>> -
>>>  #ifdef CONFIG_FA_DUMP
>>> /*
>>>  * Don't hot-remove memory that falls in fadump boot memory area
>>>  * and memory that is reserved for capturing old kernel memory.
>>>  */
>>> -   if (is_fadump_memory_area(phys_addr, block_sz))
>>> +   if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>>> return false;
>>>  #endif
>>> -
>>> -   for (i = 0; i < scns_per_block; i++) {
>>> -   pfn = PFN_DOWN(phys_addr);
>>> -   if (!pfn_in_present_section(pfn)) {
>>> -   phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>> -   continue;
>>> -   }
>>> -
>>> -   rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
>>> -   phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>> -   }
>>> -
>>> -   return rc;
>>> +   /* device_offline() will determine if we can actually remove this lmb */
>>> +   return true;
> So I think here swaps the check and do sequence. At least it breaks
> dlpar_memory_remove_by_count(). It is doable to remove
> is_mem_section_removable(), but here should be more effort to re-arrange
> the code.
> 

Thanks Pingfan,

1. "swaps the check and do sequence":

Partially. Any caller of dlpar_remove_lmb() already has to deal with
false positives. device_offline() can easily fail after
dlpar_remove_lmb() == true. It's inherently racy.

2. "breaks dlpar_memory_remove_by_count()"

Can you elaborate why it "breaks" it? It will simply try to
offline+remove lmbs, detect that it wasn't able to offline+remove as
much as it wanted (which could happen before as well easily), and re-add
the already offlined+removed ones.

3. "more effort to re-arrange the code"

What would be your suggestion?

We would rip out that racy check if we can remove as much memory as
requested in dlpar_memory_remove_by_count() and simply always try to
remove + recover.


-- 
Thanks,

David / dhildenb



Re: [PATCH v5 02/21] powerpc/xmon: Move out-of-line instructions to text section

2020-04-09 Thread Jordan Niethe
On Thu, Apr 9, 2020 at 4:11 PM Christophe Leroy  wrote:
>
>
>
> Le 06/04/2020 à 10:09, Jordan Niethe a écrit :
> > To execute an instruction out of line after a breakpoint, the NIP is set
> > to the address of struct bpt::instr. Here a copy of the instruction that
> > was replaced with a breakpoint is kept, along with a trap so normal flow
> > can be resumed after XOLing. The struct bpt's are located within the
> > data section. This is problematic as the data section may be marked as
> > no execute.
> >
> > Instead of each struct bpt holding the instructions to be XOL'd, make a
> > new array, bpt_table[], with enough space to hold instructions for the
> > number of supported breakpoints. Place this array in the text section.
> > Make struct bpt::instr a pointer to the instructions in bpt_table[]
> > associated with that breakpoint. This association is a simple mapping:
> > bpts[n] -> bpt_table[n * words per breakpoint]. Currently we only need
> > the copied instruction followed by a trap, so 2 words per breakpoint.
> >
> > Signed-off-by: Jordan Niethe 
> > ---
> > v4: New to series
> > v5: - Do not use __section(), use a .space directive in .S file
>
> I was going to comment to use __section() instead of creating a
> dedicated .S file.
>
> Why did you change that in v5 ?
I noticed with some toolchains I was getting this message:
Warning: setting incorrect section attributes for .text.xmon_bpts
I was talking with mpe about it and he said that the usual way for
doing things like this was with .space in a .S file so I changed to
that way.
>
> >  - Simplify in_breakpoint_table() calculation
> >  - Define BPT_SIZE
> > ---
> >   arch/powerpc/xmon/Makefile|  2 +-
> >   arch/powerpc/xmon/xmon.c  | 23 +--
> >   arch/powerpc/xmon/xmon_bpts.S |  8 
> >   arch/powerpc/xmon/xmon_bpts.h |  8 
> >   4 files changed, 30 insertions(+), 11 deletions(-)
> >   create mode 100644 arch/powerpc/xmon/xmon_bpts.S
> >   create mode 100644 arch/powerpc/xmon/xmon_bpts.h
> >
> > diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
> > index c3842dbeb1b7..515a13ea6f28 100644
> > --- a/arch/powerpc/xmon/Makefile
> > +++ b/arch/powerpc/xmon/Makefile
> > @@ -21,7 +21,7 @@ endif
> >
> >   ccflags-$(CONFIG_PPC64) := $(NO_MINIMAL_TOC)
> >
> > -obj-y+= xmon.o nonstdio.o spr_access.o
> > +obj-y+= xmon.o nonstdio.o spr_access.o xmon_bpts.o
> >
> >   ifdef CONFIG_XMON_DISASSEMBLY
> >   obj-y   += ppc-dis.o ppc-opc.o
> > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> > index 02e3bd62cab4..049375206510 100644
> > --- a/arch/powerpc/xmon/xmon.c
> > +++ b/arch/powerpc/xmon/xmon.c
> > @@ -62,6 +62,7 @@
> >
> >   #include "nonstdio.h"
> >   #include "dis-asm.h"
> > +#include "xmon_bpts.h"
> >
> >   #ifdef CONFIG_SMP
> >   static cpumask_t cpus_in_xmon = CPU_MASK_NONE;
> > @@ -97,7 +98,7 @@ static long *xmon_fault_jmp[NR_CPUS];
> >   /* Breakpoint stuff */
> >   struct bpt {
> >   unsigned long   address;
> > - unsigned intinstr[2];
> > + unsigned int*instr;
> >   atomic_tref_count;
> >   int enabled;
> >   unsigned long   pad;
> > @@ -108,7 +109,6 @@ struct bpt {
> >   #define BP_TRAP 2
> >   #define BP_DABR 4
> >
> > -#define NBPTS256
> >   static struct bpt bpts[NBPTS];
> >   static struct bpt dabr;
> >   static struct bpt *iabr;
> > @@ -116,6 +116,10 @@ static unsigned bpinstr = 0x7fe8;/* trap */
> >
> >   #define BP_NUM(bp)  ((bp) - bpts + 1)
> >
> > +#define BPT_SIZE (sizeof(unsigned int) * 2)
> > +#define BPT_WORDS(BPT_SIZE / sizeof(unsigned int))
>
> Wouldn't it make more sense to do the following ? :
>
> #define BPT_WORDS   2
> #define BPT_SIZE(BPT_WORDS * sizeof(unsigned int))
I defined it like that so when we changed unsigned int -> struct
ppc_inst it would be the correct size whether or not the struct
included a suffix.
Otherwise it would be more straightforward to do it like that.
>
> > +extern unsigned int bpt_table[NBPTS * BPT_WORDS];
>
> Should go in xmon_bpts.h if we keep the definition in xmon_bpts.S
Right.
>
> > +
> >   /* Prototypes */
> >   static int cmds(struct pt_regs *);
> >   static int mread(unsigned long, void *, int);
> > @@ -853,15 +857,13 @@ static struct bpt *in_breakpoint_table(unsigned long 
> > nip, unsigned long *offp)
> >   {
> >   unsigned long off;
> >
> > - off = nip - (unsigned long) bpts;
> > - if (off >= sizeof(bpts))
> > + off = nip - (unsigned long) bpt_table;
> > + if (off >= sizeof(bpt_table))
> >   return NULL;
> > - off %= sizeof(struct bpt);
> > - if (off != offsetof(struct bpt, instr[0])
> > - && off != offsetof(struct bpt, instr[1]))
> > + *offp = off % BPT_SIZE;
>
> Can we use logical operation instead ?
Sure, I was just adapting how it was already but that would probably be clearer.
>

Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()

2020-04-09 Thread Michael Ellerman
David Hildenbrand  writes:

> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
> blocks as removable"), the user space interface to compute whether a memory
> block can be offlined (exposed via
> /sys/devices/system/memory/memoryX/removable) has effectively been
> deprecated. We want to remove the leftovers of the kernel implementation.
>
> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
> we'll start by:
> 1. Testing if it contains any holes, and reject if so
> 2. Testing if pages belong to different zones, and reject if so
> 3. Isolating the page range, checking if it contains any unmovable pages
>
> Using is_mem_section_removable() before trying to offline is not only racy,
> it can easily result in false positives/negatives. Let's stop manually
> checking is_mem_section_removable(), and let device_offline() handle it
> completely instead. We can remove the racy is_mem_section_removable()
> implementation next.
>
> We now take more locks (e.g., memory hotplug lock when offlining and the
> zone lock when isolating), but maybe we should optimize that
> implementation instead if this ever becomes a real problem (after all,
> memory unplug is already an expensive operation). We started using
> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
> Implement memory hotplug remove in the kernel"), with the initial
> hotremove support of lmbs.

It's also not very pretty in dmesg.

Before:

  pseries-hotplug-mem: Attempting to hot-add 10 LMB(s)
  pseries-hotplug-mem: Memory hot-add failed, removing any added LMBs
  dlpar: Could not handle DLPAR request "memory add count 10"

After:

  pseries-hotplug-mem: Attempting to hot-remove 10 LMB(s)
  page:c00c01ca8200 refcount:1 mapcount:0 mapping:18c4a547 
index:0xc0072a080180
  flags: 0x700200(slab)
  raw: 00700200 c00c01cffd48 c00781c51410 c00793327580
  raw: c0072a080180 01550001 0001 
  page dumped because: unmovable page
  page:c00c01cc4a80 refcount:1 mapcount:0 mapping:18c4a547 
index:0xc0079b110080
  flags: 0x70()
  raw: 0070 5deadbeef100 5deadbeef122 
  raw: c0079b110080  0001 
  page dumped because: unmovable page
  page:c00c01d08200 refcount:1 mapcount:0 mapping:18c4a547 
index:0xc0074208ff00
  flags: 0x700200(slab)
  raw: 00700200 c00781c5f150 c00c01d37f88 c00798a24880
  raw: c0074208ff00 01550002 0001 
  page dumped because: unmovable page
  page:c00c01d40140 refcount:1 mapcount:0 mapping:18c4a547 
index:0xc00750059c00
  flags: 0x700200(slab)
  raw: 00700200 c00c01dfcfc8 c00c01d3c288 c007851c2d00
  raw: c00750059c00 0103 0001 
  page dumped because: unmovable page
  page:c00c01d9c000 refcount:1 mapcount:0 mapping:18c4a547 
index:0xc2370080
  flags: 0x70()
  raw: 0070 5deadbeef100 5deadbeef122 
  raw: c2370080  0001 
  page dumped because: unmovable page
  page:c00c01dc0200 refcount:1 mapcount:0 mapping:18c4a547 
index:0xc2370080
  flags: 0x70()
  raw: 0070 5deadbeef100 5deadbeef122 
  raw: c2370080  0001 
  page dumped because: unmovable page
  page:c00c01e0 refcount:1 mapcount:0 mapping:18c4a547 index:0x0
  flags: 0x700200(slab)
  raw: 00700200 5deadbeef100 5deadbeef122 c007a801f500
  raw:  08000800 0001 
  page dumped because: unmovable page
  page:c00c01e40440 refcount:1 mapcount:0 mapping:18c4a547 index:0x0
  flags: 0x700200(slab)
  raw: 00700200 5deadbeef100 5deadbeef122 c007a801e380
  raw:  00400040 0001 
  page dumped because: unmovable page
  page:c00c01e8 refcount:1 mapcount:0 mapping:18c4a547 
index:0xc007a640
  flags: 0x700200(slab)
  raw: 00700200 c00c01e5af48 c00c01e80408 c00f42d00a00
  raw: c007a640 066600a2 0001 
  page dumped because: unmovable page
  page:c00c03c89d40 refcount:2 mapcount:1 mapping:18c4a547 
index:0x10a41
  anon flags: 0x1780024(uptodate|active|swapbacked)
  raw: 01780024 5deadbeef100 5deadbeef122 c007986b03c9
  raw: 00010a41  0002 c340b000
  page dumped because: unmovable page
  page->mem_cgroup:c340b000
  page:c00c03cc refcount:1 mapcount:0 mapping:18c4a547 
index:0xc00f3000fd38
  

Re: usb: gadget: fsl_udc_core: Checking for a failed platform_get_irq() call in fsl_udc_probe()

2020-04-09 Thread Tang Bin
On Thu,Apr 9,2020 08:28:28 Markus Elfring  wrote:


> I was unsure if I noticed another programming mistake.

> Do other contributors know the affected software module better than me?

I discovered this problem fews days ago, and doing experiments on the hardware 
to test my idea.

Thanks
Tang Bin



Re: [PATCH] powerpc/powernv: Add a print indicating when an IODA PE is released

2020-04-09 Thread Sam Bobroff
On Wed, Apr 08, 2020 at 09:22:13PM +1000, Oliver O'Halloran wrote:
> Quite useful to know in some cases.
> 
> Signed-off-by: Oliver O'Halloran 

Agreed.
Reviewed-by: Sam Bobroff 

> ---
>  arch/powerpc/platforms/powernv/pci-ioda.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
> b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 3d81c01..82e5098 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -3475,6 +3475,8 @@ static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
>   struct pnv_phb *phb = pe->phb;
>   struct pnv_ioda_pe *slave, *tmp;
>  
> + pe_info(pe, "Releasing PE\n");
> +
>   mutex_lock(&phb->ioda.pe_list_mutex);
>   list_del(&pe->list);
>   mutex_unlock(&phb->ioda.pe_list_mutex);
> -- 
> 2.9.5
> 


signature.asc
Description: PGP signature