Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Tue, Nov 18, 2014 at 12:18:43PM -0500, Sasha Levin wrote: > On 11/18/2014 11:56 AM, Aneesh Kumar K.V wrote: > >>> 4. Similarly, does the kernel boot properly without without patches? > >> > > >> > Yes, the kernel works fine without the patches both with and without fake > >> > numa. > > > > Hmm that is interesting. I am not sure how writeback_fid can be > > related. We use writeback fid to enable client side caching with 9p > > (cache=loose). We use this fid to write back dirty pages later. Can you > > share the qemu command line used, 9p mount options and the test details ? > > I'm using kvmtool rather than qemu. rootfs is created via kernel parameters: > > root=/dev/root rw rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p > > The test is just running trinity, there's no 9p or mm specific test going on. > > I've attached my .config. > Ok, based on that I was able to reproduce the problem. I hope to have a V2 before the end of the week. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Tue, Nov 18, 2014 at 12:18:43PM -0500, Sasha Levin wrote: On 11/18/2014 11:56 AM, Aneesh Kumar K.V wrote: 4. Similarly, does the kernel boot properly without without patches? Yes, the kernel works fine without the patches both with and without fake numa. Hmm that is interesting. I am not sure how writeback_fid can be related. We use writeback fid to enable client side caching with 9p (cache=loose). We use this fid to write back dirty pages later. Can you share the qemu command line used, 9p mount options and the test details ? I'm using kvmtool rather than qemu. rootfs is created via kernel parameters: root=/dev/root rw rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p The test is just running trinity, there's no 9p or mm specific test going on. I've attached my .config. Ok, based on that I was able to reproduce the problem. I hope to have a V2 before the end of the week. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On 11/18/2014 11:56 AM, Aneesh Kumar K.V wrote: >>> 4. Similarly, does the kernel boot properly without without patches? >> > >> > Yes, the kernel works fine without the patches both with and without fake >> > numa. > > Hmm that is interesting. I am not sure how writeback_fid can be > related. We use writeback fid to enable client side caching with 9p > (cache=loose). We use this fid to write back dirty pages later. Can you > share the qemu command line used, 9p mount options and the test details ? I'm using kvmtool rather than qemu. rootfs is created via kernel parameters: root=/dev/root rw rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p The test is just running trinity, there's no 9p or mm specific test going on. I've attached my .config. Thanks, Sasha config-sasha.gz Description: application/gzip
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Tue, Nov 18, 2014 at 10:26:41PM +0530, Aneesh Kumar K.V wrote: > Sasha Levin writes: > > > On 11/18/2014 10:42 AM, Mel Gorman wrote: > >> 1. I'm assuming this is a KVM setup but can you confirm? > > > > Yes. > > > >> 2. Are you using numa=fake=N? > > > > Yes. numa=fake=24, which is probably way more nodes on any physical machine > > than the new code was tested on? > > > >> 3. If you are using fake NUMA, what happens if you boot without it as > >>that should make the patches a no-op? > > > > Nope, still seeing it without fake numa. > > > >> 4. Similarly, does the kernel boot properly without without patches? > > > > Yes, the kernel works fine without the patches both with and without fake > > numa. > > > Hmm that is interesting. I am not sure how writeback_fid can be > related. We use writeback fid to enable client side caching with 9p > (cache=loose). We use this fid to write back dirty pages later. Can you > share the qemu command line used, 9p mount options and the test details ? > It would help if the test details included the kernel config. I got KVM working again on an server with an older installation and while it doesn't use 9p, I'm not seeing any other oddities either yet while running trinity. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Tue, Nov 18, 2014 at 10:03:30PM +0530, Aneesh Kumar K.V wrote: > > diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c > > index 5a236f0..46152aa 100644 > > --- a/arch/powerpc/mm/copro_fault.c > > +++ b/arch/powerpc/mm/copro_fault.c > > @@ -64,7 +64,12 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned > > long ea, > > if (!(vma->vm_flags & VM_WRITE)) > > goto out_unlock; > > } else { > > - if (dsisr & DSISR_PROTFAULT) > > + /* > > +* protfault should only happen due to us > > +* mapping a region readonly temporarily. PROT_NONE > > +* is also covered by the VMA check above. > > +*/ > > + if (WARN_ON_ONCE(dsisr & DSISR_PROTFAULT)) > > goto out_unlock; > > if (!(vma->vm_flags & (VM_READ | VM_EXEC))) > > goto out_unlock; > > > we should do that DSISR_PROTFAILT check after vma->vm_flags. It is not > that we will not hit DSISR_PROTFAULT, what we want to ensure here is that > we get a prot fault only for cases convered by that vma check. So > everything should be taking the if (!(vma->vm_flags & (VM_READ | > VM_EXEC))) branch if it is a protfault. If not we would like to know > about that. And hence the idea of not using WARN_ON_ONCE. I was also not > sure whether we want to enable that always. The reason for keeping that > within CONFIG_DEBUG_VM is to make sure that nobody ends up depending on > PROTFAULT outside the vma check convered. So expectations is that > developers working on feature will run with DEBUG_VM enable and finds > this warning. We don't expect to hit this otherwise. > /me slaps self. It's clear now and updated accordingly. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
Sasha Levin writes: > On 11/18/2014 10:42 AM, Mel Gorman wrote: >> 1. I'm assuming this is a KVM setup but can you confirm? > > Yes. > >> 2. Are you using numa=fake=N? > > Yes. numa=fake=24, which is probably way more nodes on any physical machine > than the new code was tested on? > >> 3. If you are using fake NUMA, what happens if you boot without it as >>that should make the patches a no-op? > > Nope, still seeing it without fake numa. > >> 4. Similarly, does the kernel boot properly without without patches? > > Yes, the kernel works fine without the patches both with and without fake > numa. Hmm that is interesting. I am not sure how writeback_fid can be related. We use writeback fid to enable client side caching with 9p (cache=loose). We use this fid to write back dirty pages later. Can you share the qemu command line used, 9p mount options and the test details ? -aneesh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On 11/18/2014 10:42 AM, Mel Gorman wrote: > 1. I'm assuming this is a KVM setup but can you confirm? Yes. > 2. Are you using numa=fake=N? Yes. numa=fake=24, which is probably way more nodes on any physical machine than the new code was tested on? > 3. If you are using fake NUMA, what happens if you boot without it as >that should make the patches a no-op? Nope, still seeing it without fake numa. > 4. Similarly, does the kernel boot properly without without patches? Yes, the kernel works fine without the patches both with and without fake numa. > 5. Are any other patches applied because the line numbers are not lining >up exactly? I have quite a few more patches on top of next, but they're debug patches that add VM_BUG_ONs in quite a few places. One thing that was odd is that your patches had merge conflicts when applied on -next in mm/huge-memory.c, so maybe that's where line number differences are coming from. > 6. As my own KVM setup appears broken, can you tell me if the host >kernel has changed recently? If so, does using an older host kernel >make a difference? Nope, I've been using the same host kernel (Ubuntu's 3.16.0-24-generic #32) for a while now. > At the moment I'm scratching my head trying to figure out how the > patches could break 9p like this as I don't believe KVM is doing any > tricks with the same bits that could result in loss. This issue reproduces rather easily, I'd be happy to try out debug patches rather than having you guess at what might have gone wrong. Thanks, Sasha -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
Mel Gorman writes: > On Mon, Nov 17, 2014 at 01:56:19PM +0530, Aneesh Kumar K.V wrote: >> Mel Gorman writes: >> >> > This is follow up from the "pipe/page fault oddness" thread. >> > >> > Automatic NUMA balancing depends on being able to protect PTEs to trap a >> > fault and gather reference locality information. Very broadly speaking it >> > would mark PTEs as not present and use another bit to distinguish between >> > NUMA hinting faults and other types of faults. It was universally loved >> > by everybody and caused no problems whatsoever. That last sentence might >> > be a lie. >> > >> > This series is very heavily based on patches from Linus and Aneesh to >> > replace the existing PTE/PMD NUMA helper functions with normal change >> > protections. I did alter and add parts of it but I consider them relatively >> > minor contributions. Note that the signed-offs here need addressing. I >> > couldn't use "From" or Signed-off-by from the original authors as the >> > patches had to be broken up and they were never signed off. I expect the >> > two people involved will just stick their signed-off-by on it. >> >> >> How about the additional change listed below for ppc64 ? One part of the >> patch is to make sure that we don't hit the WARN_ON in set_pte and set_pmd >> because we find the _PAGE_PRESENT bit set in case of numa fault. I >> ended up relaxing the check there. >> > > I folded the set_pte_at and set_pmd_at changes into the patch "mm: Convert > p[te|md]_numa users to p[te|md]_protnone_numa" with one change -- both > set_pte_at and set_pmd_at checks are under CONFIG_DEBUG_VM for consistency. > >> Second part of the change is to add a WARN_ON to make sure we are >> not depending on DSISR_PROTFAULT for anything else. We ideally should not >> get a DSISR_PROTFAULT for PROT_NONE or NUMA fault. hash_page_mm do check >> whether the access is allowed by pte before inserting a pte into hash >> page table. Hence we will never find a PROT_NONE or PROT_NONE_NUMA ptes >> in hash page table. But it is good to run with VM_WARN_ON ? >> > > Due to the nature of the check and when they are hit, I converted it to > a WARN_ON_ONCE. Due to the exceptional circumstance the overhead should > be non-existant and shouldn't need to be hidden below VM_WARN_ON. I also > noted that with the patch the kernel potentially no longer recovers > from this exceptional cirsumstance and instead falls through. To avoid > this, I preserved the "goto out_unlock". > > Is this still ok? > > ---8<--- > ppc64: Add paranoid warnings for unexpected DSISR_PROTFAULT > > ppc64 should not be depending on DSISR_PROTFAULT and it's unexpected > if they are triggered. This patch adds warnings just in case they > are being accidentally depended upon. > > Requires-signed-off-by: Aneesh Kumar K.V > Signed-off-by: Mel Gorman > --- > arch/powerpc/mm/copro_fault.c | 7 ++- > arch/powerpc/mm/fault.c | 20 +--- > 2 files changed, 15 insertions(+), 12 deletions(-) > > diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c > index 5a236f0..46152aa 100644 > --- a/arch/powerpc/mm/copro_fault.c > +++ b/arch/powerpc/mm/copro_fault.c > @@ -64,7 +64,12 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned > long ea, > if (!(vma->vm_flags & VM_WRITE)) > goto out_unlock; > } else { > - if (dsisr & DSISR_PROTFAULT) > + /* > + * protfault should only happen due to us > + * mapping a region readonly temporarily. PROT_NONE > + * is also covered by the VMA check above. > + */ > + if (WARN_ON_ONCE(dsisr & DSISR_PROTFAULT)) > goto out_unlock; > if (!(vma->vm_flags & (VM_READ | VM_EXEC))) > goto out_unlock; we should do that DSISR_PROTFAILT check after vma->vm_flags. It is not that we will not hit DSISR_PROTFAULT, what we want to ensure here is that we get a prot fault only for cases convered by that vma check. So everything should be taking the if (!(vma->vm_flags & (VM_READ | VM_EXEC))) branch if it is a protfault. If not we would like to know about that. And hence the idea of not using WARN_ON_ONCE. I was also not sure whether we want to enable that always. The reason for keeping that within CONFIG_DEBUG_VM is to make sure that nobody ends up depending on PROTFAULT outside the vma check convered. So expectations is that developers working on feature will run with DEBUG_VM enable and finds this warning. We don't expect to hit this otherwise. > diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c > index 5007497..9d6e0b3 100644 > --- a/arch/powerpc/mm/fault.c > +++ b/arch/powerpc/mm/fault.c > @@ -396,17 +396,6 @@ good_area: > #endif /* CONFIG_8xx */ > > if (is_exec) { > -#ifdef CONFIG_PPC_STD_MMU > - /* Protection fault on exec go straight to failure on > - * Hash based MMUs as they
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Mon, Nov 17, 2014 at 01:56:19PM +0530, Aneesh Kumar K.V wrote: > Mel Gorman writes: > > > This is follow up from the "pipe/page fault oddness" thread. > > > > Automatic NUMA balancing depends on being able to protect PTEs to trap a > > fault and gather reference locality information. Very broadly speaking it > > would mark PTEs as not present and use another bit to distinguish between > > NUMA hinting faults and other types of faults. It was universally loved > > by everybody and caused no problems whatsoever. That last sentence might > > be a lie. > > > > This series is very heavily based on patches from Linus and Aneesh to > > replace the existing PTE/PMD NUMA helper functions with normal change > > protections. I did alter and add parts of it but I consider them relatively > > minor contributions. Note that the signed-offs here need addressing. I > > couldn't use "From" or Signed-off-by from the original authors as the > > patches had to be broken up and they were never signed off. I expect the > > two people involved will just stick their signed-off-by on it. > > > How about the additional change listed below for ppc64 ? One part of the > patch is to make sure that we don't hit the WARN_ON in set_pte and set_pmd > because we find the _PAGE_PRESENT bit set in case of numa fault. I > ended up relaxing the check there. > I folded the set_pte_at and set_pmd_at changes into the patch "mm: Convert p[te|md]_numa users to p[te|md]_protnone_numa" with one change -- both set_pte_at and set_pmd_at checks are under CONFIG_DEBUG_VM for consistency. > Second part of the change is to add a WARN_ON to make sure we are > not depending on DSISR_PROTFAULT for anything else. We ideally should not > get a DSISR_PROTFAULT for PROT_NONE or NUMA fault. hash_page_mm do check > whether the access is allowed by pte before inserting a pte into hash > page table. Hence we will never find a PROT_NONE or PROT_NONE_NUMA ptes > in hash page table. But it is good to run with VM_WARN_ON ? > Due to the nature of the check and when they are hit, I converted it to a WARN_ON_ONCE. Due to the exceptional circumstance the overhead should be non-existant and shouldn't need to be hidden below VM_WARN_ON. I also noted that with the patch the kernel potentially no longer recovers from this exceptional cirsumstance and instead falls through. To avoid this, I preserved the "goto out_unlock". Is this still ok? ---8<--- ppc64: Add paranoid warnings for unexpected DSISR_PROTFAULT ppc64 should not be depending on DSISR_PROTFAULT and it's unexpected if they are triggered. This patch adds warnings just in case they are being accidentally depended upon. Requires-signed-off-by: Aneesh Kumar K.V Signed-off-by: Mel Gorman --- arch/powerpc/mm/copro_fault.c | 7 ++- arch/powerpc/mm/fault.c | 20 +--- 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index 5a236f0..46152aa 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -64,7 +64,12 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, if (!(vma->vm_flags & VM_WRITE)) goto out_unlock; } else { - if (dsisr & DSISR_PROTFAULT) + /* +* protfault should only happen due to us +* mapping a region readonly temporarily. PROT_NONE +* is also covered by the VMA check above. +*/ + if (WARN_ON_ONCE(dsisr & DSISR_PROTFAULT)) goto out_unlock; if (!(vma->vm_flags & (VM_READ | VM_EXEC))) goto out_unlock; diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 5007497..9d6e0b3 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -396,17 +396,6 @@ good_area: #endif /* CONFIG_8xx */ if (is_exec) { -#ifdef CONFIG_PPC_STD_MMU - /* Protection fault on exec go straight to failure on -* Hash based MMUs as they either don't support per-page -* execute permission, or if they do, it's handled already -* at the hash level. This test would probably have to -* be removed if we change the way this works to make hash -* processors use the same I/D cache coherency mechanism -* as embedded. -*/ -#endif /* CONFIG_PPC_STD_MMU */ - /* * Allow execution from readable areas if the MMU does not * provide separate controls over reading and executing. @@ -421,6 +410,14 @@ good_area: (cpu_has_feature(CPU_FTR_NOEXECUTE) || !(vma->vm_flags & (VM_READ | VM_WRITE goto bad_area; +#ifdef CONFIG_PPC_STD_MMU + /* +* protfault should only happen due to us +
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Fri, Nov 14, 2014 at 10:29:41PM -0500, Sasha Levin wrote: > On 11/14/2014 08:32 AM, Mel Gorman wrote:> This is follow up from the > "pipe/page fault oddness" thread. > > Hi Mel, > > Applying this patch series I've started seeing the following straight away: > > [ 367.547848] page:ea0003fb7db0 count:1007 mapcount:1005 > mapping:8800691f2f58 index:0x37 > [ 367.551481] flags: > 0x5001aa8030202d(locked|referenced|uptodate|lru|writeback|unevictable|mlocked) > [ 367.555382] page dumped because: VM_BUG_ON_PAGE(!v9inode->writeback_fid) > [ 367.558262] page->mem_cgroup:88006d8a1bd8 > [ 367.560403] [ cut here ] > [ 367.562343] kernel BUG at fs/9p/vfs_addr.c:190! > [ 367.564239] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN > [ 367.566991] Dumping ftrace buffer: > [ 367.568481](ftrace buffer empty) > [ 367.569914] Modules linked in: > [ 367.570254] CPU: 3 PID: 8234 Comm: kworker/u52:1 Not tainted > 3.18.0-rc4-next-20141114-sasha-00054-ga9ff95e-dirty #1459 Thanks Sasha. I don't see a next-20141114 so I looked at next-20141113 and assuming they are similar. It does not appear that writeback_fid is a struct page so it's not clear what VM_BUG_ON_PAGE means in this context. Certainly the fields look screwy but I think it's just accessing garbage. I tried reproducing this but my KVM setup appears to be broken after an update and not even able to boot 3.17 properly let alone with the patches. I still have a few questions though. 1. I'm assuming this is a KVM setup but can you confirm? 2. Are you using numa=fake=N? 3. If you are using fake NUMA, what happens if you boot without it as that should make the patches a no-op? 4. Similarly, does the kernel boot properly without without patches? 5. Are any other patches applied because the line numbers are not lining up exactly? 6. As my own KVM setup appears broken, can you tell me if the host kernel has changed recently? If so, does using an older host kernel make a difference? At the moment I'm scratching my head trying to figure out how the patches could break 9p like this as I don't believe KVM is doing any tricks with the same bits that could result in loss. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Fri, Nov 14, 2014 at 10:29:41PM -0500, Sasha Levin wrote: On 11/14/2014 08:32 AM, Mel Gorman wrote: This is follow up from the pipe/page fault oddness thread. Hi Mel, Applying this patch series I've started seeing the following straight away: [ 367.547848] page:ea0003fb7db0 count:1007 mapcount:1005 mapping:8800691f2f58 index:0x37 [ 367.551481] flags: 0x5001aa8030202d(locked|referenced|uptodate|lru|writeback|unevictable|mlocked) [ 367.555382] page dumped because: VM_BUG_ON_PAGE(!v9inode-writeback_fid) [ 367.558262] page-mem_cgroup:88006d8a1bd8 [ 367.560403] [ cut here ] [ 367.562343] kernel BUG at fs/9p/vfs_addr.c:190! [ 367.564239] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN [ 367.566991] Dumping ftrace buffer: [ 367.568481](ftrace buffer empty) [ 367.569914] Modules linked in: [ 367.570254] CPU: 3 PID: 8234 Comm: kworker/u52:1 Not tainted 3.18.0-rc4-next-20141114-sasha-00054-ga9ff95e-dirty #1459 Thanks Sasha. I don't see a next-20141114 so I looked at next-20141113 and assuming they are similar. It does not appear that writeback_fid is a struct page so it's not clear what VM_BUG_ON_PAGE means in this context. Certainly the fields look screwy but I think it's just accessing garbage. I tried reproducing this but my KVM setup appears to be broken after an update and not even able to boot 3.17 properly let alone with the patches. I still have a few questions though. 1. I'm assuming this is a KVM setup but can you confirm? 2. Are you using numa=fake=N? 3. If you are using fake NUMA, what happens if you boot without it as that should make the patches a no-op? 4. Similarly, does the kernel boot properly without without patches? 5. Are any other patches applied because the line numbers are not lining up exactly? 6. As my own KVM setup appears broken, can you tell me if the host kernel has changed recently? If so, does using an older host kernel make a difference? At the moment I'm scratching my head trying to figure out how the patches could break 9p like this as I don't believe KVM is doing any tricks with the same bits that could result in loss. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Mon, Nov 17, 2014 at 01:56:19PM +0530, Aneesh Kumar K.V wrote: Mel Gorman mgor...@suse.de writes: This is follow up from the pipe/page fault oddness thread. Automatic NUMA balancing depends on being able to protect PTEs to trap a fault and gather reference locality information. Very broadly speaking it would mark PTEs as not present and use another bit to distinguish between NUMA hinting faults and other types of faults. It was universally loved by everybody and caused no problems whatsoever. That last sentence might be a lie. This series is very heavily based on patches from Linus and Aneesh to replace the existing PTE/PMD NUMA helper functions with normal change protections. I did alter and add parts of it but I consider them relatively minor contributions. Note that the signed-offs here need addressing. I couldn't use From or Signed-off-by from the original authors as the patches had to be broken up and they were never signed off. I expect the two people involved will just stick their signed-off-by on it. How about the additional change listed below for ppc64 ? One part of the patch is to make sure that we don't hit the WARN_ON in set_pte and set_pmd because we find the _PAGE_PRESENT bit set in case of numa fault. I ended up relaxing the check there. I folded the set_pte_at and set_pmd_at changes into the patch mm: Convert p[te|md]_numa users to p[te|md]_protnone_numa with one change -- both set_pte_at and set_pmd_at checks are under CONFIG_DEBUG_VM for consistency. Second part of the change is to add a WARN_ON to make sure we are not depending on DSISR_PROTFAULT for anything else. We ideally should not get a DSISR_PROTFAULT for PROT_NONE or NUMA fault. hash_page_mm do check whether the access is allowed by pte before inserting a pte into hash page table. Hence we will never find a PROT_NONE or PROT_NONE_NUMA ptes in hash page table. But it is good to run with VM_WARN_ON ? Due to the nature of the check and when they are hit, I converted it to a WARN_ON_ONCE. Due to the exceptional circumstance the overhead should be non-existant and shouldn't need to be hidden below VM_WARN_ON. I also noted that with the patch the kernel potentially no longer recovers from this exceptional cirsumstance and instead falls through. To avoid this, I preserved the goto out_unlock. Is this still ok? ---8--- ppc64: Add paranoid warnings for unexpected DSISR_PROTFAULT ppc64 should not be depending on DSISR_PROTFAULT and it's unexpected if they are triggered. This patch adds warnings just in case they are being accidentally depended upon. Requires-signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Signed-off-by: Mel Gorman mgor...@suse.de --- arch/powerpc/mm/copro_fault.c | 7 ++- arch/powerpc/mm/fault.c | 20 +--- 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index 5a236f0..46152aa 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -64,7 +64,12 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, if (!(vma-vm_flags VM_WRITE)) goto out_unlock; } else { - if (dsisr DSISR_PROTFAULT) + /* +* protfault should only happen due to us +* mapping a region readonly temporarily. PROT_NONE +* is also covered by the VMA check above. +*/ + if (WARN_ON_ONCE(dsisr DSISR_PROTFAULT)) goto out_unlock; if (!(vma-vm_flags (VM_READ | VM_EXEC))) goto out_unlock; diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 5007497..9d6e0b3 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -396,17 +396,6 @@ good_area: #endif /* CONFIG_8xx */ if (is_exec) { -#ifdef CONFIG_PPC_STD_MMU - /* Protection fault on exec go straight to failure on -* Hash based MMUs as they either don't support per-page -* execute permission, or if they do, it's handled already -* at the hash level. This test would probably have to -* be removed if we change the way this works to make hash -* processors use the same I/D cache coherency mechanism -* as embedded. -*/ -#endif /* CONFIG_PPC_STD_MMU */ - /* * Allow execution from readable areas if the MMU does not * provide separate controls over reading and executing. @@ -421,6 +410,14 @@ good_area: (cpu_has_feature(CPU_FTR_NOEXECUTE) || !(vma-vm_flags (VM_READ | VM_WRITE goto bad_area; +#ifdef CONFIG_PPC_STD_MMU + /* +* protfault should only happen due to us +
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
Mel Gorman mgor...@suse.de writes: On Mon, Nov 17, 2014 at 01:56:19PM +0530, Aneesh Kumar K.V wrote: Mel Gorman mgor...@suse.de writes: This is follow up from the pipe/page fault oddness thread. Automatic NUMA balancing depends on being able to protect PTEs to trap a fault and gather reference locality information. Very broadly speaking it would mark PTEs as not present and use another bit to distinguish between NUMA hinting faults and other types of faults. It was universally loved by everybody and caused no problems whatsoever. That last sentence might be a lie. This series is very heavily based on patches from Linus and Aneesh to replace the existing PTE/PMD NUMA helper functions with normal change protections. I did alter and add parts of it but I consider them relatively minor contributions. Note that the signed-offs here need addressing. I couldn't use From or Signed-off-by from the original authors as the patches had to be broken up and they were never signed off. I expect the two people involved will just stick their signed-off-by on it. How about the additional change listed below for ppc64 ? One part of the patch is to make sure that we don't hit the WARN_ON in set_pte and set_pmd because we find the _PAGE_PRESENT bit set in case of numa fault. I ended up relaxing the check there. I folded the set_pte_at and set_pmd_at changes into the patch mm: Convert p[te|md]_numa users to p[te|md]_protnone_numa with one change -- both set_pte_at and set_pmd_at checks are under CONFIG_DEBUG_VM for consistency. Second part of the change is to add a WARN_ON to make sure we are not depending on DSISR_PROTFAULT for anything else. We ideally should not get a DSISR_PROTFAULT for PROT_NONE or NUMA fault. hash_page_mm do check whether the access is allowed by pte before inserting a pte into hash page table. Hence we will never find a PROT_NONE or PROT_NONE_NUMA ptes in hash page table. But it is good to run with VM_WARN_ON ? Due to the nature of the check and when they are hit, I converted it to a WARN_ON_ONCE. Due to the exceptional circumstance the overhead should be non-existant and shouldn't need to be hidden below VM_WARN_ON. I also noted that with the patch the kernel potentially no longer recovers from this exceptional cirsumstance and instead falls through. To avoid this, I preserved the goto out_unlock. Is this still ok? ---8--- ppc64: Add paranoid warnings for unexpected DSISR_PROTFAULT ppc64 should not be depending on DSISR_PROTFAULT and it's unexpected if they are triggered. This patch adds warnings just in case they are being accidentally depended upon. Requires-signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Signed-off-by: Mel Gorman mgor...@suse.de --- arch/powerpc/mm/copro_fault.c | 7 ++- arch/powerpc/mm/fault.c | 20 +--- 2 files changed, 15 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index 5a236f0..46152aa 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -64,7 +64,12 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, if (!(vma-vm_flags VM_WRITE)) goto out_unlock; } else { - if (dsisr DSISR_PROTFAULT) + /* + * protfault should only happen due to us + * mapping a region readonly temporarily. PROT_NONE + * is also covered by the VMA check above. + */ + if (WARN_ON_ONCE(dsisr DSISR_PROTFAULT)) goto out_unlock; if (!(vma-vm_flags (VM_READ | VM_EXEC))) goto out_unlock; we should do that DSISR_PROTFAILT check after vma-vm_flags. It is not that we will not hit DSISR_PROTFAULT, what we want to ensure here is that we get a prot fault only for cases convered by that vma check. So everything should be taking the if (!(vma-vm_flags (VM_READ | VM_EXEC))) branch if it is a protfault. If not we would like to know about that. And hence the idea of not using WARN_ON_ONCE. I was also not sure whether we want to enable that always. The reason for keeping that within CONFIG_DEBUG_VM is to make sure that nobody ends up depending on PROTFAULT outside the vma check convered. So expectations is that developers working on feature will run with DEBUG_VM enable and finds this warning. We don't expect to hit this otherwise. diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 5007497..9d6e0b3 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -396,17 +396,6 @@ good_area: #endif /* CONFIG_8xx */ if (is_exec) { -#ifdef CONFIG_PPC_STD_MMU - /* Protection fault on exec go straight to failure on - * Hash based MMUs as they either don't support per-page - * execute permission, or if they do,
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On 11/18/2014 10:42 AM, Mel Gorman wrote: 1. I'm assuming this is a KVM setup but can you confirm? Yes. 2. Are you using numa=fake=N? Yes. numa=fake=24, which is probably way more nodes on any physical machine than the new code was tested on? 3. If you are using fake NUMA, what happens if you boot without it as that should make the patches a no-op? Nope, still seeing it without fake numa. 4. Similarly, does the kernel boot properly without without patches? Yes, the kernel works fine without the patches both with and without fake numa. 5. Are any other patches applied because the line numbers are not lining up exactly? I have quite a few more patches on top of next, but they're debug patches that add VM_BUG_ONs in quite a few places. One thing that was odd is that your patches had merge conflicts when applied on -next in mm/huge-memory.c, so maybe that's where line number differences are coming from. 6. As my own KVM setup appears broken, can you tell me if the host kernel has changed recently? If so, does using an older host kernel make a difference? Nope, I've been using the same host kernel (Ubuntu's 3.16.0-24-generic #32) for a while now. At the moment I'm scratching my head trying to figure out how the patches could break 9p like this as I don't believe KVM is doing any tricks with the same bits that could result in loss. This issue reproduces rather easily, I'd be happy to try out debug patches rather than having you guess at what might have gone wrong. Thanks, Sasha -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
Sasha Levin sasha.le...@oracle.com writes: On 11/18/2014 10:42 AM, Mel Gorman wrote: 1. I'm assuming this is a KVM setup but can you confirm? Yes. 2. Are you using numa=fake=N? Yes. numa=fake=24, which is probably way more nodes on any physical machine than the new code was tested on? 3. If you are using fake NUMA, what happens if you boot without it as that should make the patches a no-op? Nope, still seeing it without fake numa. 4. Similarly, does the kernel boot properly without without patches? Yes, the kernel works fine without the patches both with and without fake numa. Hmm that is interesting. I am not sure how writeback_fid can be related. We use writeback fid to enable client side caching with 9p (cache=loose). We use this fid to write back dirty pages later. Can you share the qemu command line used, 9p mount options and the test details ? -aneesh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Tue, Nov 18, 2014 at 10:03:30PM +0530, Aneesh Kumar K.V wrote: diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index 5a236f0..46152aa 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -64,7 +64,12 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, if (!(vma-vm_flags VM_WRITE)) goto out_unlock; } else { - if (dsisr DSISR_PROTFAULT) + /* +* protfault should only happen due to us +* mapping a region readonly temporarily. PROT_NONE +* is also covered by the VMA check above. +*/ + if (WARN_ON_ONCE(dsisr DSISR_PROTFAULT)) goto out_unlock; if (!(vma-vm_flags (VM_READ | VM_EXEC))) goto out_unlock; we should do that DSISR_PROTFAILT check after vma-vm_flags. It is not that we will not hit DSISR_PROTFAULT, what we want to ensure here is that we get a prot fault only for cases convered by that vma check. So everything should be taking the if (!(vma-vm_flags (VM_READ | VM_EXEC))) branch if it is a protfault. If not we would like to know about that. And hence the idea of not using WARN_ON_ONCE. I was also not sure whether we want to enable that always. The reason for keeping that within CONFIG_DEBUG_VM is to make sure that nobody ends up depending on PROTFAULT outside the vma check convered. So expectations is that developers working on feature will run with DEBUG_VM enable and finds this warning. We don't expect to hit this otherwise. /me slaps self. It's clear now and updated accordingly. Thanks. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Tue, Nov 18, 2014 at 10:26:41PM +0530, Aneesh Kumar K.V wrote: Sasha Levin sasha.le...@oracle.com writes: On 11/18/2014 10:42 AM, Mel Gorman wrote: 1. I'm assuming this is a KVM setup but can you confirm? Yes. 2. Are you using numa=fake=N? Yes. numa=fake=24, which is probably way more nodes on any physical machine than the new code was tested on? 3. If you are using fake NUMA, what happens if you boot without it as that should make the patches a no-op? Nope, still seeing it without fake numa. 4. Similarly, does the kernel boot properly without without patches? Yes, the kernel works fine without the patches both with and without fake numa. Hmm that is interesting. I am not sure how writeback_fid can be related. We use writeback fid to enable client side caching with 9p (cache=loose). We use this fid to write back dirty pages later. Can you share the qemu command line used, 9p mount options and the test details ? It would help if the test details included the kernel config. I got KVM working again on an server with an older installation and while it doesn't use 9p, I'm not seeing any other oddities either yet while running trinity. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On 11/18/2014 11:56 AM, Aneesh Kumar K.V wrote: 4. Similarly, does the kernel boot properly without without patches? Yes, the kernel works fine without the patches both with and without fake numa. Hmm that is interesting. I am not sure how writeback_fid can be related. We use writeback fid to enable client side caching with 9p (cache=loose). We use this fid to write back dirty pages later. Can you share the qemu command line used, 9p mount options and the test details ? I'm using kvmtool rather than qemu. rootfs is created via kernel parameters: root=/dev/root rw rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p The test is just running trinity, there's no 9p or mm specific test going on. I've attached my .config. Thanks, Sasha config-sasha.gz Description: application/gzip
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
Mel Gorman writes: > This is follow up from the "pipe/page fault oddness" thread. > > Automatic NUMA balancing depends on being able to protect PTEs to trap a > fault and gather reference locality information. Very broadly speaking it > would mark PTEs as not present and use another bit to distinguish between > NUMA hinting faults and other types of faults. It was universally loved > by everybody and caused no problems whatsoever. That last sentence might > be a lie. > > This series is very heavily based on patches from Linus and Aneesh to > replace the existing PTE/PMD NUMA helper functions with normal change > protections. I did alter and add parts of it but I consider them relatively > minor contributions. Note that the signed-offs here need addressing. I > couldn't use "From" or Signed-off-by from the original authors as the > patches had to be broken up and they were never signed off. I expect the > two people involved will just stick their signed-off-by on it. How about the additional change listed below for ppc64 ? One part of the patch is to make sure that we don't hit the WARN_ON in set_pte and set_pmd because we find the _PAGE_PRESENT bit set in case of numa fault. I ended up relaxing the check there. Second part of the change is to add a WARN_ON to make sure we are not depending on DSISR_PROTFAULT for anything else. We ideally should not get a DSISR_PROTFAULT for PROT_NONE or NUMA fault. hash_page_mm do check whether the access is allowed by pte before inserting a pte into hash page table. Hence we will never find a PROT_NONE or PROT_NONE_NUMA ptes in hash page table. But it is good to run with VM_WARN_ON ? I also added a similar change to handle CAPI. This will also need an ack from Ben and Paul . (added them to Cc:) With the below patch you can add Acked-by: Aneesh Kumar K.V for the respective patches. diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index 5a236f082c78..2e208afb7f4c 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -64,10 +64,14 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, if (!(vma->vm_flags & VM_WRITE)) goto out_unlock; } else { - if (dsisr & DSISR_PROTFAULT) - goto out_unlock; if (!(vma->vm_flags & (VM_READ | VM_EXEC))) goto out_unlock; + /* +* protfault should only happen due to us +* mapping a region readonly temporarily. PROT_NONE +* is also covered by the VMA check above. +*/ + VM_WARN_ON(dsisr & DSISR_PROTFAULT); } ret = 0; diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 50074972d555..6df9483e316f 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -396,17 +396,6 @@ good_area: #endif /* CONFIG_8xx */ if (is_exec) { -#ifdef CONFIG_PPC_STD_MMU - /* Protection fault on exec go straight to failure on -* Hash based MMUs as they either don't support per-page -* execute permission, or if they do, it's handled already -* at the hash level. This test would probably have to -* be removed if we change the way this works to make hash -* processors use the same I/D cache coherency mechanism -* as embedded. -*/ -#endif /* CONFIG_PPC_STD_MMU */ - /* * Allow execution from readable areas if the MMU does not * provide separate controls over reading and executing. @@ -421,6 +410,14 @@ good_area: (cpu_has_feature(CPU_FTR_NOEXECUTE) || !(vma->vm_flags & (VM_READ | VM_WRITE goto bad_area; +#ifdef CONFIG_PPC_STD_MMU + /* +* protfault should only happen due to us +* mapping a region readonly temporarily. PROT_NONE +* is also covered by the VMA check above. +*/ + VM_WARN_ON(error_code & DSISR_PROTFAULT); +#endif /* CONFIG_PPC_STD_MMU */ /* a write */ } else if (is_write) { if (!(vma->vm_flags & VM_WRITE)) @@ -430,6 +427,7 @@ good_area: } else { if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))) goto bad_area; + VM_WARN_ON(error_code & DSISR_PROTFAULT); } /* diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index c90e602677c9..75b08098fcf5 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -172,9 +172,13 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma, void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { -#ifdef CONFIG_DEBUG_VM -
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
Mel Gorman mgor...@suse.de writes: This is follow up from the pipe/page fault oddness thread. Automatic NUMA balancing depends on being able to protect PTEs to trap a fault and gather reference locality information. Very broadly speaking it would mark PTEs as not present and use another bit to distinguish between NUMA hinting faults and other types of faults. It was universally loved by everybody and caused no problems whatsoever. That last sentence might be a lie. This series is very heavily based on patches from Linus and Aneesh to replace the existing PTE/PMD NUMA helper functions with normal change protections. I did alter and add parts of it but I consider them relatively minor contributions. Note that the signed-offs here need addressing. I couldn't use From or Signed-off-by from the original authors as the patches had to be broken up and they were never signed off. I expect the two people involved will just stick their signed-off-by on it. How about the additional change listed below for ppc64 ? One part of the patch is to make sure that we don't hit the WARN_ON in set_pte and set_pmd because we find the _PAGE_PRESENT bit set in case of numa fault. I ended up relaxing the check there. Second part of the change is to add a WARN_ON to make sure we are not depending on DSISR_PROTFAULT for anything else. We ideally should not get a DSISR_PROTFAULT for PROT_NONE or NUMA fault. hash_page_mm do check whether the access is allowed by pte before inserting a pte into hash page table. Hence we will never find a PROT_NONE or PROT_NONE_NUMA ptes in hash page table. But it is good to run with VM_WARN_ON ? I also added a similar change to handle CAPI. This will also need an ack from Ben and Paul . (added them to Cc:) With the below patch you can add Acked-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com for the respective patches. diff --git a/arch/powerpc/mm/copro_fault.c b/arch/powerpc/mm/copro_fault.c index 5a236f082c78..2e208afb7f4c 100644 --- a/arch/powerpc/mm/copro_fault.c +++ b/arch/powerpc/mm/copro_fault.c @@ -64,10 +64,14 @@ int copro_handle_mm_fault(struct mm_struct *mm, unsigned long ea, if (!(vma-vm_flags VM_WRITE)) goto out_unlock; } else { - if (dsisr DSISR_PROTFAULT) - goto out_unlock; if (!(vma-vm_flags (VM_READ | VM_EXEC))) goto out_unlock; + /* +* protfault should only happen due to us +* mapping a region readonly temporarily. PROT_NONE +* is also covered by the VMA check above. +*/ + VM_WARN_ON(dsisr DSISR_PROTFAULT); } ret = 0; diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 50074972d555..6df9483e316f 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -396,17 +396,6 @@ good_area: #endif /* CONFIG_8xx */ if (is_exec) { -#ifdef CONFIG_PPC_STD_MMU - /* Protection fault on exec go straight to failure on -* Hash based MMUs as they either don't support per-page -* execute permission, or if they do, it's handled already -* at the hash level. This test would probably have to -* be removed if we change the way this works to make hash -* processors use the same I/D cache coherency mechanism -* as embedded. -*/ -#endif /* CONFIG_PPC_STD_MMU */ - /* * Allow execution from readable areas if the MMU does not * provide separate controls over reading and executing. @@ -421,6 +410,14 @@ good_area: (cpu_has_feature(CPU_FTR_NOEXECUTE) || !(vma-vm_flags (VM_READ | VM_WRITE goto bad_area; +#ifdef CONFIG_PPC_STD_MMU + /* +* protfault should only happen due to us +* mapping a region readonly temporarily. PROT_NONE +* is also covered by the VMA check above. +*/ + VM_WARN_ON(error_code DSISR_PROTFAULT); +#endif /* CONFIG_PPC_STD_MMU */ /* a write */ } else if (is_write) { if (!(vma-vm_flags VM_WRITE)) @@ -430,6 +427,7 @@ good_area: } else { if (!(vma-vm_flags (VM_READ | VM_EXEC | VM_WRITE))) goto bad_area; + VM_WARN_ON(error_code DSISR_PROTFAULT); } /* diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index c90e602677c9..75b08098fcf5 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -172,9 +172,13 @@ static pte_t set_access_flags_filter(pte_t pte, struct vm_area_struct *vma, void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { -#ifdef CONFIG_DEBUG_VM -
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On 11/14/2014 08:32 AM, Mel Gorman wrote:> This is follow up from the "pipe/page fault oddness" thread. Hi Mel, Applying this patch series I've started seeing the following straight away: [ 367.547848] page:ea0003fb7db0 count:1007 mapcount:1005 mapping:8800691f2f58 index:0x37 [ 367.551481] flags: 0x5001aa8030202d(locked|referenced|uptodate|lru|writeback|unevictable|mlocked) [ 367.555382] page dumped because: VM_BUG_ON_PAGE(!v9inode->writeback_fid) [ 367.558262] page->mem_cgroup:88006d8a1bd8 [ 367.560403] [ cut here ] [ 367.562343] kernel BUG at fs/9p/vfs_addr.c:190! [ 367.564239] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN [ 367.566991] Dumping ftrace buffer: [ 367.568481](ftrace buffer empty) [ 367.569914] Modules linked in: [ 367.570254] CPU: 3 PID: 8234 Comm: kworker/u52:1 Not tainted 3.18.0-rc4-next-20141114-sasha-00054-ga9ff95e-dirty #1459 [ 367.570254] Workqueue: writeback bdi_writeback_workfn (flush-9p-1) [ 367.570254] task: 8801e21d8000 ti: 8801e1f34000 task.ti: 8801e1f34000 [ 367.570254] RIP: v9fs_vfs_writepage_locked (fs/9p/vfs_addr.c:190 (discriminator 1)) [ 367.570254] RSP: 0018:8801e1f376c8 EFLAGS: 00010286 [ 367.570254] RAX: 0021 RBX: ea0003fb7db0 RCX: [ 367.570254] RDX: 0021 RSI: 9208b2e6 RDI: 8801e21d8d0c [ 367.570254] RBP: 8801e1f37728 R08: 0001 R09: [ 367.570254] R10: 0001 R11: 0001 R12: 8800691f2d48 [ 367.570254] R13: 1000 R14: 8800691f2c30 R15: 8800691f2c98 [ 367.570254] FS: () GS:8801e5c0() knlGS: [ 367.570254] CS: 0010 DS: ES: CR0: 8005003b [ 367.570254] CR2: CR3: ca00c000 CR4: 06a0 [ 367.570254] DR0: 8100 DR1: DR2: [ 367.570254] DR3: DR6: 0ff0 DR7: 0600 [ 367.570254] Stack: [ 367.570254] da003c43b1a1 da003c43b1a1 0002 [ 367.570254] 0002 00037000 8801e1f37758 ea0003fb7db0 [ 367.570254] 8801e1f37a60 8801e1f37a60 ea0003fb7db0 [ 367.570254] Call Trace: [ 367.570254] v9fs_vfs_writepage (fs/9p/vfs_addr.c:212) [ 367.570254] __writepage (include/linux/pagemap.h:32 mm/page-writeback.c:2006) [ 367.570254] write_cache_pages (mm/page-writeback.c:1943) [ 367.570254] ? bdi_set_max_ratio (mm/page-writeback.c:2003) [ 367.570254] ? sched_clock_local (kernel/sched/clock.c:202) [ 367.570254] generic_writepages (mm/page-writeback.c:2030) [ 367.570254] do_writepages (mm/page-writeback.c:2047) [ 367.570254] __writeback_single_inode (fs/fs-writeback.c:461 (discriminator 3)) [ 367.570254] writeback_sb_inodes (fs/fs-writeback.c:706) [ 367.570254] __writeback_inodes_wb (fs/fs-writeback.c:749) [ 367.570254] wb_writeback (fs/fs-writeback.c:880) [ 367.570254] ? __lock_is_held (kernel/locking/lockdep.c:3518) [ 367.570254] bdi_writeback_workfn (fs/fs-writeback.c:1015 fs/fs-writeback.c:1060) [ 367.570254] process_one_work (kernel/workqueue.c:2023 include/linux/jump_label.h:114 include/trace/events/workqueue.h:111 kernel/workqueue.c:2028) [ 367.570254] ? process_one_work (kernel/workqueue.c:2020) [ 367.570254] ? get_lock_stats (kernel/locking/lockdep.c:249) [ 367.570254] worker_thread (include/linux/list.h:189 kernel/workqueue.c:2156) [ 367.570254] ? __schedule (./arch/x86/include/asm/bitops.h:311 include/linux/thread_info.h:91 include/linux/sched.h:2939 kernel/sched/core.c:2848) [ 367.570254] ? rescuer_thread (kernel/workqueue.c:2100) [ 367.570254] kthread (kernel/kthread.c:207) [ 367.570254] ? flush_kthread_work (kernel/kthread.c:176) [ 367.570254] ret_from_fork (arch/x86/kernel/entry_64.S:348) [ 367.570254] ? flush_kthread_work (kernel/kthread.c:176) [ 367.570254] Code: 48 83 c4 38 44 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c6 f8 18 37 93 48 89 df e8 e1 8b 93 fe <0f> 0b 48 89 de 48 c7 c7 30 bd 9f 95 48 89 4d b8 e8 10 5f 02 0f All code 0: 48 83 c4 38 add$0x38,%rsp 4: 44 89 f0mov%r14d,%eax 7: 5b pop%rbx 8: 41 5c pop%r12 a: 41 5d pop%r13 c: 41 5e pop%r14 e: 41 5f pop%r15 10: 5d pop%rbp 11: c3 retq 12: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 19: 00 00 00 1c: 48 c7 c6 f8 18 37 93mov$0x933718f8,%rsi 23: 48 89 dfmov%rbx,%rdi 26: e8 e1 8b 93 fe callq 0xfe938c0c 2b:* 0f 0b ud2 <-- trapping instruction 2d: 48 89 demov%rbx,%rsi 30: 48 c7
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Fri, Nov 14, 2014 at 5:32 AM, Mel Gorman wrote: > > This series is very heavily based on patches from Linus and Aneesh to > replace the existing PTE/PMD NUMA helper functions with normal change > protections. I did alter and add parts of it but I consider them relatively > minor contributions. Note that the signed-offs here need addressing. I > couldn't use "From" or Signed-off-by from the original authors as the > patches had to be broken up and they were never signed off. I expect the > two people involved will just stick their signed-off-by on it. Feel free to just take authorship of my parts, and make my "Needs-sign-off's" be just "Acked-by:" Or alternatively keep them as "Signed-off-by:", even when it looks a bit odd if it doesn't have a "From:" me, when the actual patch won't then actually go through me - I'm assuming this will come in through the -mm tree. As to the ppc parts, obviously it would be good to have Aneesh re-test the series.. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On Fri, Nov 14, 2014 at 5:32 AM, Mel Gorman mgor...@suse.de wrote: This series is very heavily based on patches from Linus and Aneesh to replace the existing PTE/PMD NUMA helper functions with normal change protections. I did alter and add parts of it but I consider them relatively minor contributions. Note that the signed-offs here need addressing. I couldn't use From or Signed-off-by from the original authors as the patches had to be broken up and they were never signed off. I expect the two people involved will just stick their signed-off-by on it. Feel free to just take authorship of my parts, and make my Needs-sign-off's be just Acked-by: Or alternatively keep them as Signed-off-by:, even when it looks a bit odd if it doesn't have a From: me, when the actual patch won't then actually go through me - I'm assuming this will come in through the -mm tree. As to the ppc parts, obviously it would be good to have Aneesh re-test the series.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/7] Replace _PAGE_NUMA with PAGE_NONE protections
On 11/14/2014 08:32 AM, Mel Gorman wrote: This is follow up from the pipe/page fault oddness thread. Hi Mel, Applying this patch series I've started seeing the following straight away: [ 367.547848] page:ea0003fb7db0 count:1007 mapcount:1005 mapping:8800691f2f58 index:0x37 [ 367.551481] flags: 0x5001aa8030202d(locked|referenced|uptodate|lru|writeback|unevictable|mlocked) [ 367.555382] page dumped because: VM_BUG_ON_PAGE(!v9inode-writeback_fid) [ 367.558262] page-mem_cgroup:88006d8a1bd8 [ 367.560403] [ cut here ] [ 367.562343] kernel BUG at fs/9p/vfs_addr.c:190! [ 367.564239] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN [ 367.566991] Dumping ftrace buffer: [ 367.568481](ftrace buffer empty) [ 367.569914] Modules linked in: [ 367.570254] CPU: 3 PID: 8234 Comm: kworker/u52:1 Not tainted 3.18.0-rc4-next-20141114-sasha-00054-ga9ff95e-dirty #1459 [ 367.570254] Workqueue: writeback bdi_writeback_workfn (flush-9p-1) [ 367.570254] task: 8801e21d8000 ti: 8801e1f34000 task.ti: 8801e1f34000 [ 367.570254] RIP: v9fs_vfs_writepage_locked (fs/9p/vfs_addr.c:190 (discriminator 1)) [ 367.570254] RSP: 0018:8801e1f376c8 EFLAGS: 00010286 [ 367.570254] RAX: 0021 RBX: ea0003fb7db0 RCX: [ 367.570254] RDX: 0021 RSI: 9208b2e6 RDI: 8801e21d8d0c [ 367.570254] RBP: 8801e1f37728 R08: 0001 R09: [ 367.570254] R10: 0001 R11: 0001 R12: 8800691f2d48 [ 367.570254] R13: 1000 R14: 8800691f2c30 R15: 8800691f2c98 [ 367.570254] FS: () GS:8801e5c0() knlGS: [ 367.570254] CS: 0010 DS: ES: CR0: 8005003b [ 367.570254] CR2: CR3: ca00c000 CR4: 06a0 [ 367.570254] DR0: 8100 DR1: DR2: [ 367.570254] DR3: DR6: 0ff0 DR7: 0600 [ 367.570254] Stack: [ 367.570254] da003c43b1a1 da003c43b1a1 0002 [ 367.570254] 0002 00037000 8801e1f37758 ea0003fb7db0 [ 367.570254] 8801e1f37a60 8801e1f37a60 ea0003fb7db0 [ 367.570254] Call Trace: [ 367.570254] v9fs_vfs_writepage (fs/9p/vfs_addr.c:212) [ 367.570254] __writepage (include/linux/pagemap.h:32 mm/page-writeback.c:2006) [ 367.570254] write_cache_pages (mm/page-writeback.c:1943) [ 367.570254] ? bdi_set_max_ratio (mm/page-writeback.c:2003) [ 367.570254] ? sched_clock_local (kernel/sched/clock.c:202) [ 367.570254] generic_writepages (mm/page-writeback.c:2030) [ 367.570254] do_writepages (mm/page-writeback.c:2047) [ 367.570254] __writeback_single_inode (fs/fs-writeback.c:461 (discriminator 3)) [ 367.570254] writeback_sb_inodes (fs/fs-writeback.c:706) [ 367.570254] __writeback_inodes_wb (fs/fs-writeback.c:749) [ 367.570254] wb_writeback (fs/fs-writeback.c:880) [ 367.570254] ? __lock_is_held (kernel/locking/lockdep.c:3518) [ 367.570254] bdi_writeback_workfn (fs/fs-writeback.c:1015 fs/fs-writeback.c:1060) [ 367.570254] process_one_work (kernel/workqueue.c:2023 include/linux/jump_label.h:114 include/trace/events/workqueue.h:111 kernel/workqueue.c:2028) [ 367.570254] ? process_one_work (kernel/workqueue.c:2020) [ 367.570254] ? get_lock_stats (kernel/locking/lockdep.c:249) [ 367.570254] worker_thread (include/linux/list.h:189 kernel/workqueue.c:2156) [ 367.570254] ? __schedule (./arch/x86/include/asm/bitops.h:311 include/linux/thread_info.h:91 include/linux/sched.h:2939 kernel/sched/core.c:2848) [ 367.570254] ? rescuer_thread (kernel/workqueue.c:2100) [ 367.570254] kthread (kernel/kthread.c:207) [ 367.570254] ? flush_kthread_work (kernel/kthread.c:176) [ 367.570254] ret_from_fork (arch/x86/kernel/entry_64.S:348) [ 367.570254] ? flush_kthread_work (kernel/kthread.c:176) [ 367.570254] Code: 48 83 c4 38 44 89 f0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c6 f8 18 37 93 48 89 df e8 e1 8b 93 fe 0f 0b 48 89 de 48 c7 c7 30 bd 9f 95 48 89 4d b8 e8 10 5f 02 0f All code 0: 48 83 c4 38 add$0x38,%rsp 4: 44 89 f0mov%r14d,%eax 7: 5b pop%rbx 8: 41 5c pop%r12 a: 41 5d pop%r13 c: 41 5e pop%r14 e: 41 5f pop%r15 10: 5d pop%rbp 11: c3 retq 12: 66 2e 0f 1f 84 00 00nopw %cs:0x0(%rax,%rax,1) 19: 00 00 00 1c: 48 c7 c6 f8 18 37 93mov$0x933718f8,%rsi 23: 48 89 dfmov%rbx,%rdi 26: e8 e1 8b 93 fe callq 0xfe938c0c 2b:* 0f 0b ud2 -- trapping instruction 2d: 48 89 demov%rbx,%rsi 30: 48 c7 c7 30 bd