Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages

2009-05-18 Thread Mel Gorman
On Fri, May 01, 2009 at 07:48:46AM +1000, Benjamin Herrenschmidt wrote:
 On Thu, 2009-04-30 at 21:59 +0100, Mel Gorman wrote:
 
  This patch fixes the problem by not asseting the PTE is locked for VMAs
  backed by huge pages.
 
 Thanks, will apply.
 

What's the story with this patch? I'm still hearing of failures with huge pages
that this patch fixes but I'm no seeing it upstream. Was the patch
rejected or did it just slip through the cracks?

To refresh, an assertion is being made on ppc64 that only makes sense for
base pages. Hugepages through a wobbly every time. For convenience, here is
the patch again.

Thanks.

 CUT HERE 
powerpc: Do not assert pte_locked for hugepage PTE entries

With DEBUG_VM enabled, an assertion is made when changing the protection
flags of a PTE that the PTE is locked. Huge pages use a different
pagetable format and the assertion is bogus and will always trigger with
a bug looking something like

 Unable to handle kernel paging request for data at address 0xf1a0023586f8
 Faulting instruction address: 0xc0034a80
 Oops: Kernel access of bad area, sig: 11 [#1]
 SMP NR_CPUS=32 NUMA Maple
 Modules linked in: dm_snapshot dm_mirror dm_region_hash
  dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic
  pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid
  windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor
  windfarm_cpufreq_clamp windfarm_core i2c_powermac
 NIP: c0034a80 LR: c0034b18 CTR: 0003
 REGS: c3037600 TRAP: 0300   Not tainted (2.6.30-rc3-autokern1)
 MSR: 90009032 EE,ME,IR,DR  CR: 28002484  XER: 200f
 DAR: f1a0023586f8, DSISR: 4001
 TASK = c002e54cc740[2960] 'map_high_trunca' THREAD: c3034000 CPU: 2
 GPR00: 4000 c3037880 c0895d30 c002e5a2e500
 GPR04: a000 c002edc40880 00570393 0001
 GPR08: f00011ac 01a0023586e8 00f5 f1a0023586e8
 GPR12: 28000484 c08dd780 1000 
 GPR16: f000  a000 c3037a20
 GPR20: c002e5f4ece8 1000 c002edc40880 
 GPR24: c002e5f4ece8  a000 c002e5f4ece8
 GPR28: 00570393 c002e5a2e500 a000 c3037880
 NIP [c0034a80] .assert_pte_locked+0xa4/0xd0
 LR [c0034b18] .ptep_set_access_flags+0x6c/0xb4
 Call Trace:
 [c3037880] [c3037990] 0xc3037990 (unreliable)
 [c3037910] [c0034b18] .ptep_set_access_flags+0x6c/0xb4
 [c30379b0] [c014bef8] .hugetlb_cow+0x124/0x674
 [c3037b00] [c014c930] .hugetlb_fault+0x4e8/0x6f8
 [c3037c00] [c013443c] .handle_mm_fault+0xac/0x828
 [c3037cf0] [c00340a8] .do_page_fault+0x39c/0x584
 [c3037e30] [c00057b0] handle_page_fault+0x20/0x5c
 Instruction dump:
 7d29582a 7d200074 7800d182 0b00 3c004000 3960 780007c6 796b00c4
 7d290214 7929a302 1d290068 7d6b4a14 800b0010 7c74 7800d182 0b00

This patch fixes the problem by not asseting the PTE is locked for VMAs
backed by huge pages.

Signed-off-by: Mel Gorman m...@csn.ul.ie
--- 
 arch/powerpc/mm/pgtable.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index f5c6fd4..ae1d67c 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -219,7 +219,8 @@ int ptep_set_access_flags(struct vm_area_struct *vma, 
unsigned long address,
entry = do_dcache_icache_coherency(entry);
changed = !pte_same(*(ptep), entry);
if (changed) {
-   assert_pte_locked(vma-vm_mm, address);
+   if (!(vma-vm_flags  VM_HUGETLB))
+   assert_pte_locked(vma-vm_mm, address);
__ptep_set_access_flags(ptep, entry);
flush_tlb_page_nohash(vma, address);
}
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages

2009-05-18 Thread Linus Torvalds


On Mon, 18 May 2009, Mel Gorman wrote:
 
 What's the story with this patch? I'm still hearing of failures with huge 
 pages
 that this patch fixes but I'm no seeing it upstream. Was the patch
 rejected or did it just slip through the cracks?

It didn't slip through the cracks, it was apparently just delayed. It's 
part of the merge requests I've gotten today (well, strictly speaking it 
seems to have hit my inbox just before midnight yesterday, but that's 
because those silly aussies stand upside down and sleep at odd hours).

In fact, I just merged it, I haven't even had time to push that out. 

Linus
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages

2009-04-30 Thread Benjamin Herrenschmidt
On Thu, 2009-04-30 at 21:59 +0100, Mel Gorman wrote:

 This patch fixes the problem by not asseting the PTE is locked for VMAs
 backed by huge pages.

Thanks, will apply.

Cheers,
Ben.

 Signed-off-by: Mel Gorman m...@csn.ul.ie
 --- 
  arch/powerpc/mm/pgtable.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
 index f5c6fd4..ae1d67c 100644
 --- a/arch/powerpc/mm/pgtable.c
 +++ b/arch/powerpc/mm/pgtable.c
 @@ -219,7 +219,8 @@ int ptep_set_access_flags(struct vm_area_struct *vma, 
 unsigned long address,
   entry = do_dcache_icache_coherency(entry);
   changed = !pte_same(*(ptep), entry);
   if (changed) {
 - assert_pte_locked(vma-vm_mm, address);
 + if (!(vma-vm_flags  VM_HUGETLB))
 + assert_pte_locked(vma-vm_mm, address);
   __ptep_set_access_flags(ptep, entry);
   flush_tlb_page_nohash(vma, address);
   }

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages

2009-04-27 Thread Benjamin Herrenschmidt
On Fri, 2009-04-24 at 10:51 +0100, Mel Gorman wrote:
 I'm seeing some tests with sysbench+postgres+large pages fail on ppc64
 although a very clear pattern is not forming as to what exactly is
 causing it. However, the libhugetlbfs regression tests (make  make
 func) are triggering the following oops when calling mlock() and so
 are
 likely related.

This would be a spurrious WARN_ON().. the test I added in there should
not apply to huge pages. However, I don't see that causing a functional
problem with sysbench+postgres

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[BUG] 2.6.30-rc3: BUG triggered on some hugepage usages

2009-04-24 Thread Mel Gorman
On Tue, Apr 21, 2009 at 08:27:57PM -0700, Linus Torvalds wrote:
 Another week, another -rc.
 

I'm seeing some tests with sysbench+postgres+large pages fail on ppc64
although a very clear pattern is not forming as to what exactly is
causing it. However, the libhugetlbfs regression tests (make  make
func) are triggering the following oops when calling mlock() and so are
likely related.

[ cut here ]
kernel BUG at arch/powerpc/mm/pgtable.c:243!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=128 NUMA pSeries
Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log qla2xxx
loop nfnetlink iptable_filter iptable_nat nf_nat ip_tables
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
xt_tcpudp xt_limit ipt_LOG xt_pkttype x_tables
NIP: c002becc LR: c002c02c CTR: 
REGS: c000ea92b4c0 TRAP: 0700   Not tainted  (2.6.30-rc3-autokern1)
MSR: 80029032 EE,ME,CE,IR,DR  CR: 28000484  XER: 2020
TASK = c395b660[7611] 'mlock' THREAD: c000ea928000 CPU: 3
GPR00: 0001 c000ea92b740 c08ea170 c000ec7d4980 
GPR04: 3f00 c001e2278cf8 00190393 0001 
GPR08: f2bc  0113 c001e2278c81 
GPR12: 44000482 c093b880 28004422  
GPR16: c000ea92bbf0 c09f06f0 00190113 c000ec7d4980 
GPR20:  f2bc 3f00 c001e2278cf8 
GPR24: c000eaa90bb0  c000eaa90bb0 c000ea928000 
GPR28: f2bc 00190393 0001 c001e2278cf8 
NIP [c002becc] .assert_pte_locked+0x54/0x8c
LR [c002c02c] .ptep_set_access_flags+0x50/0x8c
Call Trace:
[c000ea92b740] [c000eaa90bb0] 0xc000eaa90bb0 (unreliable)
[c000ea92b7d0] [c00ed1b0] .hugetlb_cow+0xd4/0x654
[c000ea92b900] [c00edbf0] .hugetlb_fault+0x4c0/0x708
[c000ea92b9f0] [c00ee890] .follow_hugetlb_page+0x174/0x364
[c000ea92bae0] [c00d8d30] .__get_user_pages+0x288/0x4c0
[c000ea92bbb0] [c00da10c] .make_pages_present+0xa0/0xe0
[c000ea92bc40] [c00db758] .mlock_fixup+0x90/0x228
[c000ea92bd00] [c00dbb38] .do_mlock+0xc4/0x128
[c000ea92bda0] [c00dbccc] .SyS_mlock+0xb0/0xec
[c000ea92be30] [c000852c] syscall_exit+0x0/0x40
Instruction dump:
0b00 78892662 79291f24 7d69582a 7d600074 7800d182 0b00 78895e62 
79291f24 7d29582a 7d200074 7800d182 0b00 3c004000 3960
780007c6 
---[ end trace 36a7faa04fa9452b ]---

This corresponds to

#ifdef CONFIG_DEBUG_VM
void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
{
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;

if (mm == init_mm)
return;
pgd = mm-pgd + pgd_index(addr);
BUG_ON(pgd_none(*pgd));
pud = pud_offset(pgd, addr);
BUG_ON(pud_none(*pud));
pmd = pmd_offset(pud, addr);
BUG_ON(!pmd_present(*pmd)); - THIS LINE
BUG_ON(!spin_is_locked(pte_lockptr(mm, pmd)));
}
#endif /* CONFIG_DEBUG_VM */

This area was last changed by commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36
in the 2.6.30-rc1 timeframe. I think there was another hugepage-related
problem with this patch but I can't remember what it was. Full dmesg is


 dmesg 
Using pSeries machine description
Page orders: linear mapping = 24, virtual = 12, io = 12, vmemmap = 24
Found initrd at 0xc330:0xc4b67000
console [udbg0] enabled
Partition configured for 8 cpus.
CPU maps initialized for 2 threads per core
 (thread shift is 1)
Starting Linux PPC64 #1 SMP Fri Apr 24 09:08:10 UTC 2009
-
ppc64_pft_size= 0x1b
physicalMemorySize= 0x1e800
htab_hash_mask= 0xf
-
Initializing cgroup subsys cpuset
Linux version 2.6.30-rc3-autokern1 (r...@elm3a121) (gcc version 4.1.2 20061115 
(prerelease) (Debian 4.1.1-21)) #1 SMP Fri Apr 24 09:08:10 UTC 2009
[boot]0012 Setup Arch
Node 0 Memory: 0x0-0xee00
Node 1 Memory: 0xee00-0x1e800
PCI host bridge /p...@8002001  ranges:
  IO 0x03fe0010..0x03fe001f - 0x
 MEM 0x04008000..0x0400bfff - 0xc000 
PCI host bridge /p...@8002002  ranges:
  IO 0x03fe0060..0x03fe006f - 0x
 MEM 0x0401..0x04017fff - 0x8000 
PCI host bridge /p...@8002003  ranges:
  IO 0x03fe0030..0x03fe003f - 0x
 MEM 0x0400c000..0x0400 - 0xc000 
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 7168 bytes
Using dedicated idle loop
Zone PFN ranges:
  DMA  0x - 0x001e8000
  Normal   0x001e8000 - 0x001e8000

Re: [BUG] 2.6.30-rc3: BUG triggered on some hugepage usages

2009-04-24 Thread Michael Ellerman
On Fri, 2009-04-24 at 10:51 +0100, Mel Gorman wrote:
 On Tue, Apr 21, 2009 at 08:27:57PM -0700, Linus Torvalds wrote:
  Another week, another -rc.
  
 
 I'm seeing some tests with sysbench+postgres+large pages fail on ppc64
 although a very clear pattern is not forming as to what exactly is
 causing it. However, the libhugetlbfs regression tests (make  make
 func) are triggering the following oops when calling mlock() and so are
 likely related.
 
 [ cut here ]
 kernel BUG at arch/powerpc/mm/pgtable.c:243!
 Oops: Exception in kernel mode, sig: 5 [#1]
 SMP NR_CPUS=128 NUMA pSeries
 Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log qla2xxx
 loop nfnetlink iptable_filter iptable_nat nf_nat ip_tables
 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
 xt_tcpudp xt_limit ipt_LOG xt_pkttype x_tables
 NIP: c002becc LR: c002c02c CTR: 
 REGS: c000ea92b4c0 TRAP: 0700   Not tainted  (2.6.30-rc3-autokern1)
 MSR: 80029032 EE,ME,CE,IR,DR  CR: 28000484  XER: 2020
 TASK = c395b660[7611] 'mlock' THREAD: c000ea928000 CPU: 3
 GPR00: 0001 c000ea92b740 c08ea170 c000ec7d4980 
 GPR04: 3f00 c001e2278cf8 00190393 0001 
 GPR08: f2bc  0113 c001e2278c81 
 GPR12: 44000482 c093b880 28004422  
 GPR16: c000ea92bbf0 c09f06f0 00190113 c000ec7d4980 
 GPR20:  f2bc 3f00 c001e2278cf8 
 GPR24: c000eaa90bb0  c000eaa90bb0 c000ea928000 
 GPR28: f2bc 00190393 0001 c001e2278cf8 
 NIP [c002becc] .assert_pte_locked+0x54/0x8c
 LR [c002c02c] .ptep_set_access_flags+0x50/0x8c
 Call Trace:
 [c000ea92b740] [c000eaa90bb0] 0xc000eaa90bb0 (unreliable)
 [c000ea92b7d0] [c00ed1b0] .hugetlb_cow+0xd4/0x654
 [c000ea92b900] [c00edbf0] .hugetlb_fault+0x4c0/0x708
 [c000ea92b9f0] [c00ee890] .follow_hugetlb_page+0x174/0x364
 [c000ea92bae0] [c00d8d30] .__get_user_pages+0x288/0x4c0
 [c000ea92bbb0] [c00da10c] .make_pages_present+0xa0/0xe0
 [c000ea92bc40] [c00db758] .mlock_fixup+0x90/0x228
 [c000ea92bd00] [c00dbb38] .do_mlock+0xc4/0x128
 [c000ea92bda0] [c00dbccc] .SyS_mlock+0xb0/0xec
 [c000ea92be30] [c000852c] syscall_exit+0x0/0x40
 Instruction dump:
 0b00 78892662 79291f24 7d69582a 7d600074 7800d182 0b00 78895e62 
 79291f24 7d29582a 7d200074 7800d182 0b00 3c004000 3960
 780007c6 
 ---[ end trace 36a7faa04fa9452b ]---
 
 This corresponds to
 
 #ifdef CONFIG_DEBUG_VM
 void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
 {
 pgd_t *pgd;
 pud_t *pud;
 pmd_t *pmd;
 
 if (mm == init_mm)
 return;
 pgd = mm-pgd + pgd_index(addr);
 BUG_ON(pgd_none(*pgd));
 pud = pud_offset(pgd, addr);
 BUG_ON(pud_none(*pud));
 pmd = pmd_offset(pud, addr);
 BUG_ON(!pmd_present(*pmd));   - THIS LINE
 BUG_ON(!spin_is_locked(pte_lockptr(mm, pmd)));
 }
 #endif /* CONFIG_DEBUG_VM */
 
 This area was last changed by commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36
 in the 2.6.30-rc1 timeframe. I think there was another hugepage-related
 problem with this patch but I can't remember what it was.

It broke modules, but I don't remember anything hugepage related.

So the code changed from:

-#define  ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \
-({\
-   int __changed = !pte_same(*(__ptep), __entry); \
-   if (__changed) {   \
-   __ptep_set_access_flags(__ptep, __entry, __dirty); \
-   flush_tlb_page_nohash(__vma, __address);   \
-   }  \
-   __changed; \
-})

to:

+int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
+ pte_t *ptep, pte_t entry, int dirty)
+{
+   int changed;
+   if (!dirty  pte_need_exec_flush(entry, 0))
+   entry = do_dcache_icache_coherency(entry);
+   changed = !pte_same(*(ptep), entry);
+   if (changed) {
+   assert_pte_locked(vma-vm_mm, address);
+   __ptep_set_access_flags(ptep, entry);
+   flush_tlb_page_nohash(vma, address);
+   }
+   return changed;
+}

So the call to assert_pte_locked() is new. And it's never going to work
for huge pages, the page table structure is different right? Notice
pte_update() checks (arch/powerpc/include/asm/pgtable-ppc64.h):

198