subject:"##freemail## Re\: \[PATCH v2\] mm\: hwpoison\: disable memory error handling on 1GB hugepage"

Re: ##freemail## Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

2019-08-21 Thread Wanpeng Li

On Wed, 21 Aug 2019 at 13:41, Naoya Horiguchi  wrote:
>
> On Tue, Aug 20, 2019 at 03:03:55PM +0800, Wanpeng Li wrote:
> > Cc Mel Gorman, Kirill, Dave Hansen,
> > On Tue, 11 Jun 2019 at 07:51, Naoya Horiguchi  
> > wrote:
> > >
> > > On Wed, May 29, 2019 at 04:31:01PM -0700, Mike Kravetz wrote:
> > > > On 5/28/19 2:49 AM, Wanpeng Li wrote:
> > > > > Cc Paolo,
> > > > > Hi all,
> > > > > On Wed, 14 Feb 2018 at 06:34, Mike Kravetz  
> > > > > wrote:
> > > > >>
> > > > >> On 02/12/2018 06:48 PM, Michael Ellerman wrote:
> > > > >>> Andrew Morton  writes:
> > > > >>>
> > > >  On Thu, 08 Feb 2018 12:30:45 + Punit Agrawal 
> > > >   wrote:
> > > > 
> > > > >>
> > > > >> So I don't think that the above test result means that errors 
> > > > >> are properly
> > > > >> handled, and the proposed patch should help for arm64.
> > > > >
> > > > > Although, the deviation of pud_huge() avoids a kernel crash the 
> > > > > code
> > > > > would be easier to maintain and reason about if arm64 helpers are
> > > > > consistent with expectations by core code.
> > > > >
> > > > > I'll look to update the arm64 helpers once this patch gets 
> > > > > merged. But
> > > > > it would be helpful if there was a clear expression of semantics 
> > > > > for
> > > > > pud_huge() for various cases. Is there any version that can be 
> > > > > used as
> > > > > reference?
> > > > 
> > > >  Is that an ack or tested-by?
> > > > 
> > > >  Mike keeps plaintively asking the powerpc developers to take a 
> > > >  look,
> > > >  but they remain steadfastly in hiding.
> > > > >>>
> > > > >>> Cc'ing linuxppc-dev is always a good idea :)
> > > > >>>
> > > > >>
> > > > >> Thanks Michael,
> > > > >>
> > > > >> I was mostly concerned about use cases for soft/hard offline of huge 
> > > > >> pages
> > > > >> larger than PMD_SIZE on powerpc.  I know that powerpc supports 
> > > > >> PGD_SIZE
> > > > >> huge pages, and soft/hard offline support was specifically added for 
> > > > >> this.
> > > > >> See, 94310cbcaa3c "mm/madvise: enable (soft|hard) offline of HugeTLB 
> > > > >> pages
> > > > >> at PGD level"
> > > > >>
> > > > >> This patch will disable that functionality.  So, at a minimum this 
> > > > >> is a
> > > > >> 'heads up'.  If there are actual use cases that depend on this, then 
> > > > >> more
> > > > >> work/discussions will need to happen.  From the e-mail thread on 
> > > > >> PGD_SIZE
> > > > >> support, I can not tell if there is a real use case or this is just a
> > > > >> 'nice to have'.
> > > > >
> > > > > 1GB hugetlbfs pages are used by DPDK and VMs in cloud deployment, we
> > > > > encounter gup_pud_range() panic several times in product environment.
> > > > > Is there any plan to reenable and fix arch codes?
> > > >
> > > > I too am aware of slightly more interest in 1G huge pages.  Suspect 
> > > > that as
> > > > Intel MMU capacity increases to handle more TLB entries there will be 
> > > > more
> > > > and more interest.
> > > >
> > > > Personally, I am not looking at this issue.  Perhaps Naoya will comment 
> > > > as
> > > > he know most about this code.
> > >
> > > Thanks for forwarding this to me, I'm feeling that memory error handling
> > > on 1GB hugepage is demanded as real use case.
> > >
> > > >
> > > > > In addition, 
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/mmu.c#n3213
> > > > > The memory in guest can be 1GB/2MB/4K, though the host-backed memory
> > > > > are 1GB hugetlbfs pages, after above PUD panic is fixed,
> > > > > try_to_unmap() which is called in MCA recovery path will mark the PUD
> > > > > hwpoison entry. The guest will vmexit and retry endlessly when
> > > > > accessing any memory in the guest which is backed by this 1GB poisoned
> > > > > hugetlbfs page. We have a plan to split this 1GB hugetblfs page by 2MB
> > > > > hugetlbfs pages/4KB pages, maybe file remap to a virtual address range
> > > > > which is 2MB/4KB page granularity, also split the KVM MMU 1GB SPTE
> > > > > into 2MB/4KB and mark the offensive SPTE w/ a hwpoison flag, a sigbus
> > > > > will be delivered to VM at page fault next time for the offensive
> > > > > SPTE. Is this proposal acceptable?
> > > >
> > > > I am not sure of the error handling design, but this does sound 
> > > > reasonable.
> > >
> > > I agree that that's better.
> > >
> > > > That block of code which potentially dissolves a huge page on memory 
> > > > error
> > > > is hard to understand and I'm not sure if that is even the 'normal'
> > > > functionality.  Certainly, we would hate to waste/poison an entire 1G 
> > > > page
> > > > for an error on a small subsection.
> > >
> > > Yes, that's not practical, so we need at first establish the code base for
> > > 2GB hugetlb splitting and then extending it to 1GB next.
> >
> > I found it is not easy to split. There is a unique hugetlb page size
> > that is

Re: ##freemail## Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

2019-08-20 Thread Naoya Horiguchi

On Tue, Aug 20, 2019 at 03:03:55PM +0800, Wanpeng Li wrote:
> Cc Mel Gorman, Kirill, Dave Hansen,
> On Tue, 11 Jun 2019 at 07:51, Naoya Horiguchi  
> wrote:
> >
> > On Wed, May 29, 2019 at 04:31:01PM -0700, Mike Kravetz wrote:
> > > On 5/28/19 2:49 AM, Wanpeng Li wrote:
> > > > Cc Paolo,
> > > > Hi all,
> > > > On Wed, 14 Feb 2018 at 06:34, Mike Kravetz  
> > > > wrote:
> > > >>
> > > >> On 02/12/2018 06:48 PM, Michael Ellerman wrote:
> > > >>> Andrew Morton  writes:
> > > >>>
> > >  On Thu, 08 Feb 2018 12:30:45 + Punit Agrawal 
> > >   wrote:
> > > 
> > > >>
> > > >> So I don't think that the above test result means that errors are 
> > > >> properly
> > > >> handled, and the proposed patch should help for arm64.
> > > >
> > > > Although, the deviation of pud_huge() avoids a kernel crash the code
> > > > would be easier to maintain and reason about if arm64 helpers are
> > > > consistent with expectations by core code.
> > > >
> > > > I'll look to update the arm64 helpers once this patch gets merged. 
> > > > But
> > > > it would be helpful if there was a clear expression of semantics for
> > > > pud_huge() for various cases. Is there any version that can be used 
> > > > as
> > > > reference?
> > > 
> > >  Is that an ack or tested-by?
> > > 
> > >  Mike keeps plaintively asking the powerpc developers to take a look,
> > >  but they remain steadfastly in hiding.
> > > >>>
> > > >>> Cc'ing linuxppc-dev is always a good idea :)
> > > >>>
> > > >>
> > > >> Thanks Michael,
> > > >>
> > > >> I was mostly concerned about use cases for soft/hard offline of huge 
> > > >> pages
> > > >> larger than PMD_SIZE on powerpc.  I know that powerpc supports PGD_SIZE
> > > >> huge pages, and soft/hard offline support was specifically added for 
> > > >> this.
> > > >> See, 94310cbcaa3c "mm/madvise: enable (soft|hard) offline of HugeTLB 
> > > >> pages
> > > >> at PGD level"
> > > >>
> > > >> This patch will disable that functionality.  So, at a minimum this is a
> > > >> 'heads up'.  If there are actual use cases that depend on this, then 
> > > >> more
> > > >> work/discussions will need to happen.  From the e-mail thread on 
> > > >> PGD_SIZE
> > > >> support, I can not tell if there is a real use case or this is just a
> > > >> 'nice to have'.
> > > >
> > > > 1GB hugetlbfs pages are used by DPDK and VMs in cloud deployment, we
> > > > encounter gup_pud_range() panic several times in product environment.
> > > > Is there any plan to reenable and fix arch codes?
> > >
> > > I too am aware of slightly more interest in 1G huge pages.  Suspect that 
> > > as
> > > Intel MMU capacity increases to handle more TLB entries there will be more
> > > and more interest.
> > >
> > > Personally, I am not looking at this issue.  Perhaps Naoya will comment as
> > > he know most about this code.
> >
> > Thanks for forwarding this to me, I'm feeling that memory error handling
> > on 1GB hugepage is demanded as real use case.
> >
> > >
> > > > In addition, 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kvm/mmu.c#n3213
> > > > The memory in guest can be 1GB/2MB/4K, though the host-backed memory
> > > > are 1GB hugetlbfs pages, after above PUD panic is fixed,
> > > > try_to_unmap() which is called in MCA recovery path will mark the PUD
> > > > hwpoison entry. The guest will vmexit and retry endlessly when
> > > > accessing any memory in the guest which is backed by this 1GB poisoned
> > > > hugetlbfs page. We have a plan to split this 1GB hugetblfs page by 2MB
> > > > hugetlbfs pages/4KB pages, maybe file remap to a virtual address range
> > > > which is 2MB/4KB page granularity, also split the KVM MMU 1GB SPTE
> > > > into 2MB/4KB and mark the offensive SPTE w/ a hwpoison flag, a sigbus
> > > > will be delivered to VM at page fault next time for the offensive
> > > > SPTE. Is this proposal acceptable?
> > >
> > > I am not sure of the error handling design, but this does sound 
> > > reasonable.
> >
> > I agree that that's better.
> >
> > > That block of code which potentially dissolves a huge page on memory error
> > > is hard to understand and I'm not sure if that is even the 'normal'
> > > functionality.  Certainly, we would hate to waste/poison an entire 1G page
> > > for an error on a small subsection.
> >
> > Yes, that's not practical, so we need at first establish the code base for
> > 2GB hugetlb splitting and then extending it to 1GB next.
> 
> I found it is not easy to split. There is a unique hugetlb page size
> that is associated with a mounted hugetlbfs filesystem, file remap to
> 2MB/4KB will break this. How about hard offline 1GB hugetlb page as
> what has already done in soft offline, replace the corrupted 1GB page
> by new 1GB page through page migration, the offending/corrupted area
> in the original 1GB page doesn't need to be copied into the new page,
> the offending/corrupted area in

Re: ##freemail## Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

Re: ##freemail## Re: [PATCH v2] mm: hwpoison: disable memory error handling on 1GB hugepage

2 matches

Site Navigation

Mail list logo

Footer information