Re: sparc64, mm BUG in 3.9-rc8

2013-04-25 Thread Meelis Roos
> This fixes it, I'll push this to Linus immediately.
> 
> Thanks for your report!
> 
> 
> sparc64: Fix missing put_cpu_var() in tlb_batch_add_one() when not batching.

Thank you, this patch fixes it on my 220R and 420R too.

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-25 Thread Meelis Roos
 This fixes it, I'll push this to Linus immediately.
 
 Thanks for your report!
 
 
 sparc64: Fix missing put_cpu_var() in tlb_batch_add_one() when not batching.

Thank you, this patch fixes it on my 220R and 420R too.

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-24 Thread David Miller
From: David Miller 
Date: Wed, 24 Apr 2013 14:36:15 -0400 (EDT)

> From: David Miller 
> Date: Tue, 23 Apr 2013 16:17:51 -0400 (EDT)
> 
>> From: Meelis Roos 
>> Date: Tue, 23 Apr 2013 00:19:49 +0300 (EEST)
>> 
 > Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
 > mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
 > other things break and panic comes from trying to kill init). This is 
 > reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
 > Configuration is below.
 
 It's certainly a bug in the TLB shootdown fix, please verify that
 reverting the following fixes things:
 
 >From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
 From: "David S. Miller" 
 Date: Fri, 19 Apr 2013 17:26:26 -0400
 Subject: [PATCH] sparc64: Fix race in TLB batch processing.
>>> 
>>> Yes, reverting that makes it work again.
>> 
>> Just an update, using your config I was able to make one that boots on my
>> machine and reproduces the problem.
> 
> Ok, I've narrowed it down to CONFIG_DEBUG_ATOMIC_SLEEP, as the exact config
> option which starts the crashes happening when it is enabled.
> 
> I should have this fixed by the end of today.

This fixes it, I'll push this to Linus immediately.

Thanks for your report!


sparc64: Fix missing put_cpu_var() in tlb_batch_add_one() when not batching.

Reported-by: Meelis Roos 
Signed-off-by: David S. Miller 
---
 arch/sparc/mm/tlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 272aa4f..83d89bc 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -87,7 +87,7 @@ static void tlb_batch_add_one(struct mm_struct *mm, unsigned 
long vaddr,
if (!tb->active) {
global_flush_tlb_page(mm, vaddr);
flush_tsb_user_page(mm, vaddr);
-   return;
+   goto out;
}
 
if (nr == 0)
@@ -98,6 +98,7 @@ static void tlb_batch_add_one(struct mm_struct *mm, unsigned 
long vaddr,
if (nr >= TLB_BATCH_NR)
flush_tlb_pending();
 
+out:
put_cpu_var(tlb_batch);
 }
 
-- 
1.8.1.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-24 Thread David Miller
From: David Miller 
Date: Tue, 23 Apr 2013 16:17:51 -0400 (EDT)

> From: Meelis Roos 
> Date: Tue, 23 Apr 2013 00:19:49 +0300 (EEST)
> 
>>> > Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
>>> > mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
>>> > other things break and panic comes from trying to kill init). This is 
>>> > reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
>>> > Configuration is below.
>>> 
>>> It's certainly a bug in the TLB shootdown fix, please verify that
>>> reverting the following fixes things:
>>> 
>>> >From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
>>> From: "David S. Miller" 
>>> Date: Fri, 19 Apr 2013 17:26:26 -0400
>>> Subject: [PATCH] sparc64: Fix race in TLB batch processing.
>> 
>> Yes, reverting that makes it work again.
> 
> Just an update, using your config I was able to make one that boots on my
> machine and reproduces the problem.

Ok, I've narrowed it down to CONFIG_DEBUG_ATOMIC_SLEEP, as the exact config
option which starts the crashes happening when it is enabled.

I should have this fixed by the end of today.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-24 Thread David Miller
From: David Miller da...@davemloft.net
Date: Tue, 23 Apr 2013 16:17:51 -0400 (EDT)

 From: Meelis Roos mr...@linux.ee
 Date: Tue, 23 Apr 2013 00:19:49 +0300 (EEST)
 
  Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
  mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
  other things break and panic comes from trying to kill init). This is 
  reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
  Configuration is below.
 
 It's certainly a bug in the TLB shootdown fix, please verify that
 reverting the following fixes things:
 
 From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
 From: David S. Miller da...@davemloft.net
 Date: Fri, 19 Apr 2013 17:26:26 -0400
 Subject: [PATCH] sparc64: Fix race in TLB batch processing.
 
 Yes, reverting that makes it work again.
 
 Just an update, using your config I was able to make one that boots on my
 machine and reproduces the problem.

Ok, I've narrowed it down to CONFIG_DEBUG_ATOMIC_SLEEP, as the exact config
option which starts the crashes happening when it is enabled.

I should have this fixed by the end of today.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-24 Thread David Miller
From: David Miller da...@davemloft.net
Date: Wed, 24 Apr 2013 14:36:15 -0400 (EDT)

 From: David Miller da...@davemloft.net
 Date: Tue, 23 Apr 2013 16:17:51 -0400 (EDT)
 
 From: Meelis Roos mr...@linux.ee
 Date: Tue, 23 Apr 2013 00:19:49 +0300 (EEST)
 
  Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
  mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
  other things break and panic comes from trying to kill init). This is 
  reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
  Configuration is below.
 
 It's certainly a bug in the TLB shootdown fix, please verify that
 reverting the following fixes things:
 
 From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
 From: David S. Miller da...@davemloft.net
 Date: Fri, 19 Apr 2013 17:26:26 -0400
 Subject: [PATCH] sparc64: Fix race in TLB batch processing.
 
 Yes, reverting that makes it work again.
 
 Just an update, using your config I was able to make one that boots on my
 machine and reproduces the problem.
 
 Ok, I've narrowed it down to CONFIG_DEBUG_ATOMIC_SLEEP, as the exact config
 option which starts the crashes happening when it is enabled.
 
 I should have this fixed by the end of today.

This fixes it, I'll push this to Linus immediately.

Thanks for your report!


sparc64: Fix missing put_cpu_var() in tlb_batch_add_one() when not batching.

Reported-by: Meelis Roos mr...@linux.ee
Signed-off-by: David S. Miller da...@davemloft.net
---
 arch/sparc/mm/tlb.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 272aa4f..83d89bc 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -87,7 +87,7 @@ static void tlb_batch_add_one(struct mm_struct *mm, unsigned 
long vaddr,
if (!tb-active) {
global_flush_tlb_page(mm, vaddr);
flush_tsb_user_page(mm, vaddr);
-   return;
+   goto out;
}
 
if (nr == 0)
@@ -98,6 +98,7 @@ static void tlb_batch_add_one(struct mm_struct *mm, unsigned 
long vaddr,
if (nr = TLB_BATCH_NR)
flush_tlb_pending();
 
+out:
put_cpu_var(tlb_batch);
 }
 
-- 
1.8.1.2


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-23 Thread David Miller
From: Meelis Roos 
Date: Tue, 23 Apr 2013 00:19:49 +0300 (EEST)

>> > Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
>> > mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
>> > other things break and panic comes from trying to kill init). This is 
>> > reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
>> > Configuration is below.
>> 
>> It's certainly a bug in the TLB shootdown fix, please verify that
>> reverting the following fixes things:
>> 
>> >From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
>> From: "David S. Miller" 
>> Date: Fri, 19 Apr 2013 17:26:26 -0400
>> Subject: [PATCH] sparc64: Fix race in TLB batch processing.
> 
> Yes, reverting that makes it work again.

Just an update, using your config I was able to make one that boots on my
machine and reproduces the problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-23 Thread David Miller
From: Meelis Roos 
Date: Tue, 23 Apr 2013 10:21:31 +0300 (EEST)

>> From: Meelis Roos 
>> Date: Tue, 23 Apr 2013 09:47:54 +0300 (EEST)
>> 
>> >> Thanks, could you post a prtconf dump from this machine?  I'll
>> >> add it to the prtconf GIT repo as well.
>> > 
>> > Attached.
>> 
>> Thanks a lot.
> 
> Actually, it was Fire 480R, not Enterprise, so the e in e420r filename 
> is wrong...

I've renamed the file to just plain '480r', thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-23 Thread Meelis Roos
> From: Meelis Roos 
> Date: Tue, 23 Apr 2013 09:47:54 +0300 (EEST)
> 
> >> Thanks, could you post a prtconf dump from this machine?  I'll
> >> add it to the prtconf GIT repo as well.
> > 
> > Attached.
> 
> Thanks a lot.

Actually, it was Fire 480R, not Enterprise, so the e in e420r filename 
is wrong...

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-23 Thread David Miller
From: Meelis Roos 
Date: Tue, 23 Apr 2013 09:47:54 +0300 (EEST)

>> Thanks, could you post a prtconf dump from this machine?  I'll
>> add it to the prtconf GIT repo as well.
> 
> Attached.

Thanks a lot.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-23 Thread Meelis Roos
> Thanks, could you post a prtconf dump from this machine?  I'll
> add it to the prtconf GIT repo as well.

Attached.

-- 
Meelis Roos (mr...@linux.ee)

e420r
Description: Binary data


Re: sparc64, mm BUG in 3.9-rc8

2013-04-23 Thread Meelis Roos
 Thanks, could you post a prtconf dump from this machine?  I'll
 add it to the prtconf GIT repo as well.

Attached.

-- 
Meelis Roos (mr...@linux.ee)

e420r
Description: Binary data


Re: sparc64, mm BUG in 3.9-rc8

2013-04-23 Thread David Miller
From: Meelis Roos mr...@linux.ee
Date: Tue, 23 Apr 2013 09:47:54 +0300 (EEST)

 Thanks, could you post a prtconf dump from this machine?  I'll
 add it to the prtconf GIT repo as well.
 
 Attached.

Thanks a lot.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-23 Thread Meelis Roos
 From: Meelis Roos mr...@linux.ee
 Date: Tue, 23 Apr 2013 09:47:54 +0300 (EEST)
 
  Thanks, could you post a prtconf dump from this machine?  I'll
  add it to the prtconf GIT repo as well.
  
  Attached.
 
 Thanks a lot.

Actually, it was Fire 480R, not Enterprise, so the e in e420r filename 
is wrong...

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-23 Thread David Miller
From: Meelis Roos mr...@linux.ee
Date: Tue, 23 Apr 2013 10:21:31 +0300 (EEST)

 From: Meelis Roos mr...@linux.ee
 Date: Tue, 23 Apr 2013 09:47:54 +0300 (EEST)
 
  Thanks, could you post a prtconf dump from this machine?  I'll
  add it to the prtconf GIT repo as well.
  
  Attached.
 
 Thanks a lot.
 
 Actually, it was Fire 480R, not Enterprise, so the e in e420r filename 
 is wrong...

I've renamed the file to just plain '480r', thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-23 Thread David Miller
From: Meelis Roos mr...@linux.ee
Date: Tue, 23 Apr 2013 00:19:49 +0300 (EEST)

  Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
  mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
  other things break and panic comes from trying to kill init). This is 
  reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
  Configuration is below.
 
 It's certainly a bug in the TLB shootdown fix, please verify that
 reverting the following fixes things:
 
 From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
 From: David S. Miller da...@davemloft.net
 Date: Fri, 19 Apr 2013 17:26:26 -0400
 Subject: [PATCH] sparc64: Fix race in TLB batch processing.
 
 Yes, reverting that makes it work again.

Just an update, using your config I was able to make one that boots on my
machine and reproduces the problem.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-22 Thread David Miller
From: Meelis Roos 
Date: Tue, 23 Apr 2013 00:19:49 +0300 (EEST)

>> > Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
>> > mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
>> > other things break and panic comes from trying to kill init). This is 
>> > reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
>> > Configuration is below.
>> 
>> It's certainly a bug in the TLB shootdown fix, please verify that
>> reverting the following fixes things:
>> 
>> >From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
>> From: "David S. Miller" 
>> Date: Fri, 19 Apr 2013 17:26:26 -0400
>> Subject: [PATCH] sparc64: Fix race in TLB batch processing.
> 
> Yes, reverting that makes it work again.

Thanks, could you post a prtconf dump from this machine?  I'll
add it to the prtconf GIT repo as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-22 Thread Meelis Roos
> > Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
> > mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
> > other things break and panic comes from trying to kill init). This is 
> > reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
> > Configuration is below.
> 
> It's certainly a bug in the TLB shootdown fix, please verify that
> reverting the following fixes things:
> 
> >From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
> From: "David S. Miller" 
> Date: Fri, 19 Apr 2013 17:26:26 -0400
> Subject: [PATCH] sparc64: Fix race in TLB batch processing.

Yes, reverting that makes it work again.

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-22 Thread David Miller
From: Meelis Roos 
Date: Mon, 22 Apr 2013 16:57:22 +0300 (EEST)

> Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
> mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
> other things break and panic comes from trying to kill init). This is 
> reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
> Configuration is below.

It's certainly a bug in the TLB shootdown fix, please verify that
reverting the following fixes things:

>From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
From: "David S. Miller" 
Date: Fri, 19 Apr 2013 17:26:26 -0400
Subject: [PATCH] sparc64: Fix race in TLB batch processing.

As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.

So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.

Fix this by using generic infrastructure to synchonize on the
completion of the cross call.

This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().

We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls.  If we're not in such a
region, we flush TLBs synchronously.

1) Get rid of xcall_flush_tlb_pending and per-cpu type
   implementations.

2) Do TLB batch cross calls instead via:

smp_call_function_many()
tlb_pending_func()
__flush_tlb_pending()

3) Batch only in lazy mmu sequences:

a) Add 'active' member to struct tlb_batch
b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
c) Set 'active' in arch_enter_lazy_mmu_mode()
d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
e) Check 'active' in tlb_batch_add_one() and do a synchronous
   flush if it's clear.

4) Add infrastructure for synchronous TLB page flushes.

a) Implement __flush_tlb_page and per-cpu variants, patch
   as needed.
b) Likewise for xcall_flush_tlb_page.
c) Implement smp_flush_tlb_page() to invoke the cross-call.
d) Wire up global_flush_tlb_page() to the right routine based
   upon CONFIG_SMP

5) It turns out that singleton batches are very common, 2 out of every
   3 batch flushes have only a single entry in them.

   The batch flush waiting is very expensive, both because of the poll
   on sibling cpu completeion, as well as because passing the tlb batch
   pointer to the sibling cpus invokes a shared memory dereference.

   Therefore, in flush_tlb_pending(), if there is only one entry in
   the batch perform a completely asynchronous global_flush_tlb_page()
   instead.

Reported-by: Dave Kleikamp 
Signed-off-by: David S. Miller 
Acked-by: Dave Kleikamp 
---
 arch/sparc/include/asm/pgtable_64.h   |   1 +
 arch/sparc/include/asm/switch_to_64.h |   3 +-
 arch/sparc/include/asm/tlbflush_64.h  |  37 +--
 arch/sparc/kernel/smp_64.c|  41 ++--
 arch/sparc/mm/tlb.c   |  38 +--
 arch/sparc/mm/tsb.c   |  57 +++-
 arch/sparc/mm/ultra.S | 119 +++---
 7 files changed, 241 insertions(+), 55 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 08fcce9..7619f2f 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -915,6 +915,7 @@ static inline int io_remap_pfn_range(struct vm_area_struct 
*vma,
return remap_pfn_range(vma, from, phys_base >> PAGE_SHIFT, size, prot);
 }
 
+#include 
 #include 
 
 /* We provide our own get_unmapped_area to cope with VA holes and
diff --git a/arch/sparc/include/asm/switch_to_64.h 
b/arch/sparc/include/asm/switch_to_64.h
index cad36f5..c7de332 100644
--- a/arch/sparc/include/asm/switch_to_64.h
+++ b/arch/sparc/include/asm/switch_to_64.h
@@ -18,8 +18,7 @@ do {  \
 * and 2 stores in this critical code path.  -DaveM
 */
 #define switch_to(prev, next, last)\
-do {   flush_tlb_pending();\
-   save_and_clear_fpu();   \
+do {   save_and_clear_fpu();   \
/* If you are tempted to conditionalize the following */\
/* so that ASI is only written if it changes, think again. */   \
__asm__ __volatile__("wr %%g0, %0, %%asi"   \
diff --git 

Re: sparc64, mm BUG in 3.9-rc8

2013-04-22 Thread David Miller
From: Meelis Roos mr...@linux.ee
Date: Mon, 22 Apr 2013 16:57:22 +0300 (EEST)

 Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
 mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
 other things break and panic comes from trying to kill init). This is 
 reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
 Configuration is below.

It's certainly a bug in the TLB shootdown fix, please verify that
reverting the following fixes things:

From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
From: David S. Miller da...@davemloft.net
Date: Fri, 19 Apr 2013 17:26:26 -0400
Subject: [PATCH] sparc64: Fix race in TLB batch processing.

As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.

So meanwhile the TLB batch can be reset (tb-tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.

Fix this by using generic infrastructure to synchonize on the
completion of the cross call.

This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().

We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls.  If we're not in such a
region, we flush TLBs synchronously.

1) Get rid of xcall_flush_tlb_pending and per-cpu type
   implementations.

2) Do TLB batch cross calls instead via:

smp_call_function_many()
tlb_pending_func()
__flush_tlb_pending()

3) Batch only in lazy mmu sequences:

a) Add 'active' member to struct tlb_batch
b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
c) Set 'active' in arch_enter_lazy_mmu_mode()
d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
e) Check 'active' in tlb_batch_add_one() and do a synchronous
   flush if it's clear.

4) Add infrastructure for synchronous TLB page flushes.

a) Implement __flush_tlb_page and per-cpu variants, patch
   as needed.
b) Likewise for xcall_flush_tlb_page.
c) Implement smp_flush_tlb_page() to invoke the cross-call.
d) Wire up global_flush_tlb_page() to the right routine based
   upon CONFIG_SMP

5) It turns out that singleton batches are very common, 2 out of every
   3 batch flushes have only a single entry in them.

   The batch flush waiting is very expensive, both because of the poll
   on sibling cpu completeion, as well as because passing the tlb batch
   pointer to the sibling cpus invokes a shared memory dereference.

   Therefore, in flush_tlb_pending(), if there is only one entry in
   the batch perform a completely asynchronous global_flush_tlb_page()
   instead.

Reported-by: Dave Kleikamp dave.kleik...@oracle.com
Signed-off-by: David S. Miller da...@davemloft.net
Acked-by: Dave Kleikamp dave.kleik...@oracle.com
---
 arch/sparc/include/asm/pgtable_64.h   |   1 +
 arch/sparc/include/asm/switch_to_64.h |   3 +-
 arch/sparc/include/asm/tlbflush_64.h  |  37 +--
 arch/sparc/kernel/smp_64.c|  41 ++--
 arch/sparc/mm/tlb.c   |  38 +--
 arch/sparc/mm/tsb.c   |  57 +++-
 arch/sparc/mm/ultra.S | 119 +++---
 7 files changed, 241 insertions(+), 55 deletions(-)

diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 08fcce9..7619f2f 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -915,6 +915,7 @@ static inline int io_remap_pfn_range(struct vm_area_struct 
*vma,
return remap_pfn_range(vma, from, phys_base  PAGE_SHIFT, size, prot);
 }
 
+#include asm/tlbflush.h
 #include asm-generic/pgtable.h
 
 /* We provide our own get_unmapped_area to cope with VA holes and
diff --git a/arch/sparc/include/asm/switch_to_64.h 
b/arch/sparc/include/asm/switch_to_64.h
index cad36f5..c7de332 100644
--- a/arch/sparc/include/asm/switch_to_64.h
+++ b/arch/sparc/include/asm/switch_to_64.h
@@ -18,8 +18,7 @@ do {  \
 * and 2 stores in this critical code path.  -DaveM
 */
 #define switch_to(prev, next, last)\
-do {   flush_tlb_pending();\
-   save_and_clear_fpu();   \
+do {   save_and_clear_fpu();   \
/* If you are tempted to conditionalize the following */\
/* so that ASI is only written if it changes, think again. */  

Re: sparc64, mm BUG in 3.9-rc8

2013-04-22 Thread Meelis Roos
  Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
  mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
  other things break and panic comes from trying to kill init). This is 
  reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
  Configuration is below.
 
 It's certainly a bug in the TLB shootdown fix, please verify that
 reverting the following fixes things:
 
 From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
 From: David S. Miller da...@davemloft.net
 Date: Fri, 19 Apr 2013 17:26:26 -0400
 Subject: [PATCH] sparc64: Fix race in TLB batch processing.

Yes, reverting that makes it work again.

-- 
Meelis Roos (mr...@linux.ee)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: sparc64, mm BUG in 3.9-rc8

2013-04-22 Thread David Miller
From: Meelis Roos mr...@linux.ee
Date: Tue, 23 Apr 2013 00:19:49 +0300 (EEST)

  Hello, I got a non-booting Sun E420R (sparc64) with 3.9-rc8: BUG-s in 
  mm/slub.c:925 and mm/memory.c:1267 (the latter keeps scrolling until 
  other things break and panic comes from trying to kill init). This is 
  reproducible. Same machine runs 3.9.0-rc7-4-gbb33db7 successfully.
  Configuration is below.
 
 It's certainly a bug in the TLB shootdown fix, please verify that
 reverting the following fixes things:
 
 From f36391d2790d04993f48da6a45810033a2cdf847 Mon Sep 17 00:00:00 2001
 From: David S. Miller da...@davemloft.net
 Date: Fri, 19 Apr 2013 17:26:26 -0400
 Subject: [PATCH] sparc64: Fix race in TLB batch processing.
 
 Yes, reverting that makes it work again.

Thanks, could you post a prtconf dump from this machine?  I'll
add it to the prtconf GIT repo as well.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/