Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-24 Thread Benjamin Herrenschmidt

On Mon, 2007-09-24 at 14:42 -0700, Christoph Lameter wrote:
> On Tue, 25 Sep 2007, Benjamin Herrenschmidt wrote:
> 
> > I'd suggest just reverting the patch for now (well, I see from the
> > commit list that you did just that) and I'll try to come up with
> > something better.
> 
> That would be great. Note that the reversal of the x86_64 quicklist 
> support patch does not address the issue that is (as pointed out by 
> Siddha) in core code and not in x86_64 arch dode. The fix that I posted 
> fixes the core issue.
> 
> > Christoph, I'd be happy if you didn't start butchering mmu_gather just
> > right now since I'm doing just that and it will collide all over the
> > place :-) Or if you want something specific done, please throw
> > ideas/patches at me and I'll integrate that in my serie.
> 
> Any "butchering" was done for 2.6.22. 2.6.23 only added arch support for 
> x86_64 since Andi forgot to merge it for .22. I'd be glad if you could 
> look into this and make quicklists interoperate nicely withh mmu gather.

I'll try, I'll let you know. Give me a few days to finish catching up
with backlog of things before i get back to it though.

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-24 Thread Christoph Lameter
On Tue, 25 Sep 2007, Benjamin Herrenschmidt wrote:

> I'd suggest just reverting the patch for now (well, I see from the
> commit list that you did just that) and I'll try to come up with
> something better.

That would be great. Note that the reversal of the x86_64 quicklist 
support patch does not address the issue that is (as pointed out by 
Siddha) in core code and not in x86_64 arch dode. The fix that I posted 
fixes the core issue.

> Christoph, I'd be happy if you didn't start butchering mmu_gather just
> right now since I'm doing just that and it will collide all over the
> place :-) Or if you want something specific done, please throw
> ideas/patches at me and I'll integrate that in my serie.

Any "butchering" was done for 2.6.22. 2.6.23 only added arch support for 
x86_64 since Andi forgot to merge it for .22. I'd be glad if you could 
look into this and make quicklists interoperate nicely withh mmu gather.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-24 Thread Benjamin Herrenschmidt

On Fri, 2007-09-21 at 12:21 -0700, Linus Torvalds wrote:

> Yeah, and the whole thing seems totally bogus. It totally depends on 
> mmu_gather doing everything right (which very much includes the dependency 
> on mmu gathering disabling preempt).
> 
> For exmaple, if we were to go back to the original small tlb_gather with a 
> simple quicklist on the stack, rather than the per-cpu datastructure, the 
> quicklists would immediately break horribly - simply because they are 
> incorrectly now depending on the internal semantics of that tlb-gather.
> 
> As it is, the quicklists try to be something separate, but by virtue of 
> being separate, they will always be buggy.
> 
> The only way to fix it would be to integrate the quicklist stuff *with* 
> the mmu_gather stuff, so that these kinds of implementation issues are 
> explicitly shown in the relationship, instead of havign two "independent" 
> pieces of code where one piece very subtly depends on the exact 
> implementation of the other.

I've been mostly offline since KS (since a bit before in fact), so I
missed some of those discussions but so that you know, I'm toying a bit
with mmu gather and page table accessors at the moment, and one of the
things I've been contemplating is just that ... integrating a quicklist
in the gather to handle just that (and possibly other issues that have
been overlooked on some archs).

I'd suggest just reverting the patch for now (well, I see from the
commit list that you did just that) and I'll try to come up with
something better.

Christoph, I'd be happy if you didn't start butchering mmu_gather just
right now since I'm doing just that and it will collide all over the
place :-) Or if you want something specific done, please throw
ideas/patches at me and I'll integrate that in my serie.

Cheers,
Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-24 Thread Benjamin Herrenschmidt

On Fri, 2007-09-21 at 12:21 -0700, Linus Torvalds wrote:

 Yeah, and the whole thing seems totally bogus. It totally depends on 
 mmu_gather doing everything right (which very much includes the dependency 
 on mmu gathering disabling preempt).
 
 For exmaple, if we were to go back to the original small tlb_gather with a 
 simple quicklist on the stack, rather than the per-cpu datastructure, the 
 quicklists would immediately break horribly - simply because they are 
 incorrectly now depending on the internal semantics of that tlb-gather.
 
 As it is, the quicklists try to be something separate, but by virtue of 
 being separate, they will always be buggy.
 
 The only way to fix it would be to integrate the quicklist stuff *with* 
 the mmu_gather stuff, so that these kinds of implementation issues are 
 explicitly shown in the relationship, instead of havign two independent 
 pieces of code where one piece very subtly depends on the exact 
 implementation of the other.

I've been mostly offline since KS (since a bit before in fact), so I
missed some of those discussions but so that you know, I'm toying a bit
with mmu gather and page table accessors at the moment, and one of the
things I've been contemplating is just that ... integrating a quicklist
in the gather to handle just that (and possibly other issues that have
been overlooked on some archs).

I'd suggest just reverting the patch for now (well, I see from the
commit list that you did just that) and I'll try to come up with
something better.

Christoph, I'd be happy if you didn't start butchering mmu_gather just
right now since I'm doing just that and it will collide all over the
place :-) Or if you want something specific done, please throw
ideas/patches at me and I'll integrate that in my serie.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-24 Thread Christoph Lameter
On Tue, 25 Sep 2007, Benjamin Herrenschmidt wrote:

 I'd suggest just reverting the patch for now (well, I see from the
 commit list that you did just that) and I'll try to come up with
 something better.

That would be great. Note that the reversal of the x86_64 quicklist 
support patch does not address the issue that is (as pointed out by 
Siddha) in core code and not in x86_64 arch dode. The fix that I posted 
fixes the core issue.

 Christoph, I'd be happy if you didn't start butchering mmu_gather just
 right now since I'm doing just that and it will collide all over the
 place :-) Or if you want something specific done, please throw
 ideas/patches at me and I'll integrate that in my serie.

Any butchering was done for 2.6.22. 2.6.23 only added arch support for 
x86_64 since Andi forgot to merge it for .22. I'd be glad if you could 
look into this and make quicklists interoperate nicely withh mmu gather.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-24 Thread Benjamin Herrenschmidt

On Mon, 2007-09-24 at 14:42 -0700, Christoph Lameter wrote:
 On Tue, 25 Sep 2007, Benjamin Herrenschmidt wrote:
 
  I'd suggest just reverting the patch for now (well, I see from the
  commit list that you did just that) and I'll try to come up with
  something better.
 
 That would be great. Note that the reversal of the x86_64 quicklist 
 support patch does not address the issue that is (as pointed out by 
 Siddha) in core code and not in x86_64 arch dode. The fix that I posted 
 fixes the core issue.
 
  Christoph, I'd be happy if you didn't start butchering mmu_gather just
  right now since I'm doing just that and it will collide all over the
  place :-) Or if you want something specific done, please throw
  ideas/patches at me and I'll integrate that in my serie.
 
 Any butchering was done for 2.6.22. 2.6.23 only added arch support for 
 x86_64 since Andi forgot to merge it for .22. I'd be glad if you could 
 look into this and make quicklists interoperate nicely withh mmu gather.

I'll try, I'll let you know. Give me a few days to finish catching up
with backlog of things before i get back to it though.

Cheers,
Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Luck, Tony wrote:

> > The quicklists have a long history and this bug has therefore also been in 
> > IA64 for a long time and it also likely exists on sparc64, sh and sh64. We 
> > need the patch that I posted to fix the other platforms. And with this fix 
> > there would be nothing amiss on x86_64 either.
> 
> IA64 doesn't do a pgd>pud>pmd>pte table walk in h/w.  The VHPT leaps
> straight to the pte page (through the virtual mapping of the VHPT). So
> this wasn't a problem for IA64.

But this is concerning a page being freed before the TLB flush has been 
performed. Another process may then reuse the page and may rely on the 
fact that there is no TLB entry installed for the page that was just 
allocated. But there is still one there. The issues that result from this 
may be depend on the nature of each MMU.

Granted we usually install another TLB entry for the page mapping the page 
into a different address space and never use the old TLB. Which is likely 
the reason why we have never seen an issue before and also why this is 
probably difficult to cause any issues in the first place.

Plus this can only occur for off node pages, meaning this must involve a 
remote processor and the other processor is likely to have a completely 
different set of TLB entries.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Luck, Tony
> The quicklists have a long history and this bug has therefore also been in 
> IA64 for a long time and it also likely exists on sparc64, sh and sh64. We 
> need the patch that I posted to fix the other platforms. And with this fix 
> there would be nothing amiss on x86_64 either.

IA64 doesn't do a pgd>pud>pmd>pte table walk in h/w.  The VHPT leaps
straight to the pte page (through the virtual mapping of the VHPT). So
this wasn't a problem for IA64.

-Tony
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Linus Torvalds wrote:

> The problem with that commit that I just reverted was that it mixed the 
> two, but not completely. It still kept them separate.

The quicklists have a long history and this bug has therefore also been in 
IA64 for a long time and it also likely exists on sparc64, sh and sh64. We 
need the patch that I posted to fix the other platforms. And with this fix 
there would be nothing amiss on x86_64 either.

---
 include/linux/quicklist.h |8 
 1 file changed, 8 deletions(-)

Index: linux-2.6/include/linux/quicklist.h
===
--- linux-2.6.orig/include/linux/quicklist.h2007-09-21 11:46:44.0 
-0700
+++ linux-2.6/include/linux/quicklist.h 2007-09-21 11:55:01.0 -0700
@@ -56,14 +56,6 @@ static inline void __quicklist_free(int 
struct page *page)
 {
struct quicklist *q;
-   int nid = page_to_nid(page);
-
-   if (unlikely(nid != numa_node_id())) {
-   if (dtor)
-   dtor(p);
-   __free_page(page);
-   return;
-   }
 
q = _cpu_var(quicklist)[nr];
*(void **)p = q->page;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Linus Torvalds wrote:
> 
> So for example, if you actually want to use quicklists in the "generic" 
> TLB-gather implementation, you should replace the
> 
>   struct page *pages[FREE_PTE_NR];
> 
> entry in the "struct mmu_gather" with a set of quicklists instead. 

Side note: this would obviously require that the interfaces to the 
"quicklists" would have to be changed, and you'd have an actual head 
pointer rather than using explicitly numbered per-cpu variables, but that 
seems to be a good idea in itself.

Another option is to just not use quicklists AT ALL, but just have a 
simple list of pages that is purely internal to mmu_gather, and just teach 
pgd/pmd/pud_alloc about that list. That actually sounds like the most 
straightforward and least confusing approach. The quicklists code is 
pretty damn ugly in its mixing of "struct page" and the virtual address.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Christoph Lameter wrote:
> > The only way to fix it would be to integrate the quicklist stuff *with* 
> > the mmu_gather stuff, so that these kinds of implementation issues are 
> > explicitly shown in the relationship, instead of havign two "independent" 
> > pieces of code where one piece very subtly depends on the exact 
> > implementation of the other.
> 
> Right. But will that not mean that quicklists would have to be used on all 
> platforms in a generic way?

No. Each architecture can already choose how it does its own "mmu_gather".

*IF* the architecture uses a per-cpu variable for this, *THEN* the 
architecture could also embed a "quicklist" in that per-cpu variable, and 
make use of that. If you were to do it that way, it would all work fine. 

So for example, if you actually want to use quicklists in the "generic" 
TLB-gather implementation, you should replace the

struct page *pages[FREE_PTE_NR];

entry in the "struct mmu_gather" with a set of quicklists instead. 

Or, alternatively, you could decide to not use the generic version at all, 
and instead do a per-architecture one that uses the quicklists.

The problem with that commit that I just reverted was that it mixed the 
two, but not completely. It still kept them separate.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Linus Torvalds wrote:

> Yeah, and the whole thing seems totally bogus. It totally depends on 
> mmu_gather doing everything right (which very much includes the dependency 
> on mmu gathering disabling preempt).

The quicklists have been existing for a long time in this relationship 
and were actually developed initially (by Dave Miller I believe) to work 
this way. The generic TLB flushing thing may have developed in parallel on 
other platforms that did not use quicklists.

> For exmaple, if we were to go back to the original small tlb_gather with a 
> simple quicklist on the stack, rather than the per-cpu datastructure, the 
> quicklists would immediately break horribly - simply because they are 
> incorrectly now depending on the internal semantics of that tlb-gather.

H.. Right the integration of the approaches that have now diverged on 
various platforms could be better.
 
> As it is, the quicklists try to be something separate, but by virtue of 
> being separate, they will always be buggy.

I guess we need to re-join what was separated by developments on different 
platforms.

> The only way to fix it would be to integrate the quicklist stuff *with* 
> the mmu_gather stuff, so that these kinds of implementation issues are 
> explicitly shown in the relationship, instead of havign two "independent" 
> pieces of code where one piece very subtly depends on the exact 
> implementation of the other.

Right. But will that not mean that quicklists would have to be used on all 
platforms in a generic way?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Christoph Lameter wrote:
> 
> The final flush does only check if there are too many pages on the
> quicklists. Otherwise the quicklist is not dumped/freed (unlike the
> mmu_gather list) but kept for the following page table page allocations
> since we have cache hot cpu objects there.

Yeah, and the whole thing seems totally bogus. It totally depends on 
mmu_gather doing everything right (which very much includes the dependency 
on mmu gathering disabling preempt).

For exmaple, if we were to go back to the original small tlb_gather with a 
simple quicklist on the stack, rather than the per-cpu datastructure, the 
quicklists would immediately break horribly - simply because they are 
incorrectly now depending on the internal semantics of that tlb-gather.

As it is, the quicklists try to be something separate, but by virtue of 
being separate, they will always be buggy.

The only way to fix it would be to integrate the quicklist stuff *with* 
the mmu_gather stuff, so that these kinds of implementation issues are 
explicitly shown in the relationship, instead of havign two "independent" 
pieces of code where one piece very subtly depends on the exact 
implementation of the other.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Linus Torvalds wrote:

> On Fri, 21 Sep 2007, Linus Torvalds wrote:
> > 
> > On Fri, 21 Sep 2007, Christoph Lameter wrote:
> > > 
> > > The quicklists collect pages while we gather pages for TLB flushing. 
> > > These pages must be kept until the actual TLB flush has occurred. The 
> > > optimization of releasing off node pages early is therefore not valid.
> > 
> > That should be the "mmu_gather" structure, not the quicklists.
> 
> Oh, and I see what's wrong: you not only switched "free_page()" to 
> "quicklist_free()", you *also* switched "tlb_remove_page()" to 
> "quicklist_free()".
> 
> Ok, that commit is totally and utterly broken. Will revert.

On x86_64 the quicklists partially replace the role of the mmu_gather
list because we would otherwise create more overhead by having to move
pages between the mmu gather and the quicklists lists.
lists.

The final flush does only check if there are too many pages on the
quicklists. Otherwise the quicklist is not dumped/freed (unlike the
mmu_gather list) but kept for the following page table page allocations
since we have cache hot cpu objects there.

The only problem is that we cannot release any pages before the TLB flush 
has happened. The optimization of releasing off node pages in order to 
keep only node local pages is therefore not valid.



---
 include/linux/quicklist.h |8 
 1 file changed, 8 deletions(-)

Index: linux-2.6/include/linux/quicklist.h
===
--- linux-2.6.orig/include/linux/quicklist.h2007-09-21 11:46:44.0 
-0700
+++ linux-2.6/include/linux/quicklist.h 2007-09-21 11:55:01.0 -0700
@@ -56,14 +56,6 @@ static inline void __quicklist_free(int 
struct page *page)
 {
struct quicklist *q;
-   int nid = page_to_nid(page);
-
-   if (unlikely(nid != numa_node_id())) {
-   if (dtor)
-   dtor(p);
-   __free_page(page);
-   return;
-   }
 
q = _cpu_var(quicklist)[nr];
*(void **)p = q->page;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Linus Torvalds wrote:
> 
> On Fri, 21 Sep 2007, Christoph Lameter wrote:
> > 
> > The quicklists collect pages while we gather pages for TLB flushing. 
> > These pages must be kept until the actual TLB flush has occurred. The 
> > optimization of releasing off node pages early is therefore not valid.
> 
> That should be the "mmu_gather" structure, not the quicklists.

Oh, and I see what's wrong: you not only switched "free_page()" to 
"quicklist_free()", you *also* switched "tlb_remove_page()" to 
"quicklist_free()".

Ok, that commit is totally and utterly broken. Will revert.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Christoph Lameter wrote:
> 
> The quicklists collect pages while we gather pages for TLB flushing. 
> These pages must be kept until the actual TLB flush has occurred. The 
> optimization of releasing off node pages early is therefore not valid.

That should be the "mmu_gather" structure, not the quicklists.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Linus Torvalds wrote:

> > Can we revert this commit for 2.6.23 and look at this code post 2.6.23?
> 
> I'll happily revert it, but I want to understand this better, so more of 
> an explanation of the codepath that actually does something wrong, please.

The quicklists collect pages while we gather pages for TLB flushing. 
These pages must be kept until the actual TLB flush has occurred. The 
optimization of releasing off node pages early is therefore not valid.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Siddha, Suresh B wrote:

> Essentially quicklist free routines are doing something like
> __quicklist_free()
>   ...
>   if (unlikely(nid != numa_node_id())) {
>   __free_page(page);
>   ...
>   }
> 
>   
> 
> 
> Now this will potentially cause a problem, if a cpu in someother node starts
> using this page, while the corresponding TLB entries are still alive
> in the original cpu which is still freeing the page table pages.

Hmmm... Yes could be.

> This potentially can cause SW failures and hard to debug issues like
> http://www.ussg.iu.edu/hypermail/linux/kernel/0205.2/1254.html

Cannot get to this page:

Not Found

The requested URL /hypermail/linux/kernel/0205.2/1254.htm was not found on 
this server.


> Can we revert this commit for 2.6.23 and look at this code post 2.6.23?

We can remove this piece alone since it was an optimization.

---
 include/linux/quicklist.h |7 ---
 1 file changed, 7 deletions(-)

Index: linux-2.6/include/linux/quicklist.h
===
--- linux-2.6.orig/include/linux/quicklist.h2007-09-21 11:46:44.0 
-0700
+++ linux-2.6/include/linux/quicklist.h 2007-09-21 11:47:01.0 -0700
@@ -58,13 +58,6 @@ static inline void __quicklist_free(int 
struct quicklist *q;
int nid = page_to_nid(page);
 
-   if (unlikely(nid != numa_node_id())) {
-   if (dtor)
-   dtor(p);
-   __free_page(page);
-   return;
-   }
-
q = _cpu_var(quicklist)[nr];
*(void **)p = q->page;
q->page = p;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Siddha, Suresh B wrote:
>
> git commit 34feb2c83beb3bdf13535a36770f7e50b47ef299 started using quicklists
> for freeing page table pages and removed the usage of tlb_remove_page()
> 
> And looking at quicklist_free() and quicklist_free_page(), on a NUMA platform,
> this can potentially free the page before the corresponding TLB caches
> are flushed.

Hmm? We only add them to the quicklists in the exact same places where we 
*used* to just free them entirely. So I don't see why semantics would have 
changed..

> Can we revert this commit for 2.6.23 and look at this code post 2.6.23?

I'll happily revert it, but I want to understand this better, so more of 
an explanation of the codepath that actually does something wrong, please.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Andi Kleen

> Can we revert this commit for 2.6.23 and look at this code post 2.6.23?

Fine by me to revert it. I must admit I never 100% understood how it was
supposed to work anyways, but eventually gave up on it.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Siddha, Suresh B
git commit 34feb2c83beb3bdf13535a36770f7e50b47ef299 started using quicklists
for freeing page table pages and removed the usage of tlb_remove_page()

And looking at quicklist_free() and quicklist_free_page(), on a NUMA platform,
this can potentially free the page before the corresponding TLB caches
are flushed.

Essentially quicklist free routines are doing something like
__quicklist_free()
...
if (unlikely(nid != numa_node_id())) {
__free_page(page);
...
}




Now this will potentially cause a problem, if a cpu in someother node starts
using this page, while the corresponding TLB entries are still alive
in the original cpu which is still freeing the page table pages.

This violates the guideline documented in
http://developer.intel.com/design/processor/applnots/317080.pdf

This potentially can cause SW failures and hard to debug issues like
http://www.ussg.iu.edu/hypermail/linux/kernel/0205.2/1254.html

Can we revert this commit for 2.6.23 and look at this code post 2.6.23?

thanks,
suresh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Siddha, Suresh B
git commit 34feb2c83beb3bdf13535a36770f7e50b47ef299 started using quicklists
for freeing page table pages and removed the usage of tlb_remove_page()

And looking at quicklist_free() and quicklist_free_page(), on a NUMA platform,
this can potentially free the page before the corresponding TLB caches
are flushed.

Essentially quicklist free routines are doing something like
__quicklist_free()
...
if (unlikely(nid != numa_node_id())) {
__free_page(page);
...
}




Now this will potentially cause a problem, if a cpu in someother node starts
using this page, while the corresponding TLB entries are still alive
in the original cpu which is still freeing the page table pages.

This violates the guideline documented in
http://developer.intel.com/design/processor/applnots/317080.pdf

This potentially can cause SW failures and hard to debug issues like
http://www.ussg.iu.edu/hypermail/linux/kernel/0205.2/1254.html

Can we revert this commit for 2.6.23 and look at this code post 2.6.23?

thanks,
suresh
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Andi Kleen

 Can we revert this commit for 2.6.23 and look at this code post 2.6.23?

Fine by me to revert it. I must admit I never 100% understood how it was
supposed to work anyways, but eventually gave up on it.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Christoph Lameter wrote:
 
 The quicklists collect pages while we gather pages for TLB flushing. 
 These pages must be kept until the actual TLB flush has occurred. The 
 optimization of releasing off node pages early is therefore not valid.

That should be the mmu_gather structure, not the quicklists.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Siddha, Suresh B wrote:

 Essentially quicklist free routines are doing something like
 __quicklist_free()
   ...
   if (unlikely(nid != numa_node_id())) {
   __free_page(page);
   ...
   }
 
   
 
 
 Now this will potentially cause a problem, if a cpu in someother node starts
 using this page, while the corresponding TLB entries are still alive
 in the original cpu which is still freeing the page table pages.

Hmmm... Yes could be.

 This potentially can cause SW failures and hard to debug issues like
 http://www.ussg.iu.edu/hypermail/linux/kernel/0205.2/1254.html

Cannot get to this page:

Not Found

The requested URL /hypermail/linux/kernel/0205.2/1254.htm was not found on 
this server.


 Can we revert this commit for 2.6.23 and look at this code post 2.6.23?

We can remove this piece alone since it was an optimization.

---
 include/linux/quicklist.h |7 ---
 1 file changed, 7 deletions(-)

Index: linux-2.6/include/linux/quicklist.h
===
--- linux-2.6.orig/include/linux/quicklist.h2007-09-21 11:46:44.0 
-0700
+++ linux-2.6/include/linux/quicklist.h 2007-09-21 11:47:01.0 -0700
@@ -58,13 +58,6 @@ static inline void __quicklist_free(int 
struct quicklist *q;
int nid = page_to_nid(page);
 
-   if (unlikely(nid != numa_node_id())) {
-   if (dtor)
-   dtor(p);
-   __free_page(page);
-   return;
-   }
-
q = get_cpu_var(quicklist)[nr];
*(void **)p = q-page;
q-page = p;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Siddha, Suresh B wrote:

 git commit 34feb2c83beb3bdf13535a36770f7e50b47ef299 started using quicklists
 for freeing page table pages and removed the usage of tlb_remove_page()
 
 And looking at quicklist_free() and quicklist_free_page(), on a NUMA platform,
 this can potentially free the page before the corresponding TLB caches
 are flushed.

Hmm? We only add them to the quicklists in the exact same places where we 
*used* to just free them entirely. So I don't see why semantics would have 
changed..

 Can we revert this commit for 2.6.23 and look at this code post 2.6.23?

I'll happily revert it, but I want to understand this better, so more of 
an explanation of the codepath that actually does something wrong, please.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Linus Torvalds wrote:

  Can we revert this commit for 2.6.23 and look at this code post 2.6.23?
 
 I'll happily revert it, but I want to understand this better, so more of 
 an explanation of the codepath that actually does something wrong, please.

The quicklists collect pages while we gather pages for TLB flushing. 
These pages must be kept until the actual TLB flush has occurred. The 
optimization of releasing off node pages early is therefore not valid.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Linus Torvalds wrote:
 
 On Fri, 21 Sep 2007, Christoph Lameter wrote:
  
  The quicklists collect pages while we gather pages for TLB flushing. 
  These pages must be kept until the actual TLB flush has occurred. The 
  optimization of releasing off node pages early is therefore not valid.
 
 That should be the mmu_gather structure, not the quicklists.

Oh, and I see what's wrong: you not only switched free_page() to 
quicklist_free(), you *also* switched tlb_remove_page() to 
quicklist_free().

Ok, that commit is totally and utterly broken. Will revert.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Linus Torvalds wrote:

 On Fri, 21 Sep 2007, Linus Torvalds wrote:
  
  On Fri, 21 Sep 2007, Christoph Lameter wrote:
   
   The quicklists collect pages while we gather pages for TLB flushing. 
   These pages must be kept until the actual TLB flush has occurred. The 
   optimization of releasing off node pages early is therefore not valid.
  
  That should be the mmu_gather structure, not the quicklists.
 
 Oh, and I see what's wrong: you not only switched free_page() to 
 quicklist_free(), you *also* switched tlb_remove_page() to 
 quicklist_free().
 
 Ok, that commit is totally and utterly broken. Will revert.

On x86_64 the quicklists partially replace the role of the mmu_gather
list because we would otherwise create more overhead by having to move
pages between the mmu gather and the quicklists lists.
lists.

The final flush does only check if there are too many pages on the
quicklists. Otherwise the quicklist is not dumped/freed (unlike the
mmu_gather list) but kept for the following page table page allocations
since we have cache hot cpu objects there.

The only problem is that we cannot release any pages before the TLB flush 
has happened. The optimization of releasing off node pages in order to 
keep only node local pages is therefore not valid.



---
 include/linux/quicklist.h |8 
 1 file changed, 8 deletions(-)

Index: linux-2.6/include/linux/quicklist.h
===
--- linux-2.6.orig/include/linux/quicklist.h2007-09-21 11:46:44.0 
-0700
+++ linux-2.6/include/linux/quicklist.h 2007-09-21 11:55:01.0 -0700
@@ -56,14 +56,6 @@ static inline void __quicklist_free(int 
struct page *page)
 {
struct quicklist *q;
-   int nid = page_to_nid(page);
-
-   if (unlikely(nid != numa_node_id())) {
-   if (dtor)
-   dtor(p);
-   __free_page(page);
-   return;
-   }
 
q = get_cpu_var(quicklist)[nr];
*(void **)p = q-page;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Christoph Lameter wrote:
 
 The final flush does only check if there are too many pages on the
 quicklists. Otherwise the quicklist is not dumped/freed (unlike the
 mmu_gather list) but kept for the following page table page allocations
 since we have cache hot cpu objects there.

Yeah, and the whole thing seems totally bogus. It totally depends on 
mmu_gather doing everything right (which very much includes the dependency 
on mmu gathering disabling preempt).

For exmaple, if we were to go back to the original small tlb_gather with a 
simple quicklist on the stack, rather than the per-cpu datastructure, the 
quicklists would immediately break horribly - simply because they are 
incorrectly now depending on the internal semantics of that tlb-gather.

As it is, the quicklists try to be something separate, but by virtue of 
being separate, they will always be buggy.

The only way to fix it would be to integrate the quicklist stuff *with* 
the mmu_gather stuff, so that these kinds of implementation issues are 
explicitly shown in the relationship, instead of havign two independent 
pieces of code where one piece very subtly depends on the exact 
implementation of the other.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Linus Torvalds wrote:

 Yeah, and the whole thing seems totally bogus. It totally depends on 
 mmu_gather doing everything right (which very much includes the dependency 
 on mmu gathering disabling preempt).

The quicklists have been existing for a long time in this relationship 
and were actually developed initially (by Dave Miller I believe) to work 
this way. The generic TLB flushing thing may have developed in parallel on 
other platforms that did not use quicklists.

 For exmaple, if we were to go back to the original small tlb_gather with a 
 simple quicklist on the stack, rather than the per-cpu datastructure, the 
 quicklists would immediately break horribly - simply because they are 
 incorrectly now depending on the internal semantics of that tlb-gather.

H.. Right the integration of the approaches that have now diverged on 
various platforms could be better.
 
 As it is, the quicklists try to be something separate, but by virtue of 
 being separate, they will always be buggy.

I guess we need to re-join what was separated by developments on different 
platforms.

 The only way to fix it would be to integrate the quicklist stuff *with* 
 the mmu_gather stuff, so that these kinds of implementation issues are 
 explicitly shown in the relationship, instead of havign two independent 
 pieces of code where one piece very subtly depends on the exact 
 implementation of the other.

Right. But will that not mean that quicklists would have to be used on all 
platforms in a generic way?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Christoph Lameter wrote:
  The only way to fix it would be to integrate the quicklist stuff *with* 
  the mmu_gather stuff, so that these kinds of implementation issues are 
  explicitly shown in the relationship, instead of havign two independent 
  pieces of code where one piece very subtly depends on the exact 
  implementation of the other.
 
 Right. But will that not mean that quicklists would have to be used on all 
 platforms in a generic way?

No. Each architecture can already choose how it does its own mmu_gather.

*IF* the architecture uses a per-cpu variable for this, *THEN* the 
architecture could also embed a quicklist in that per-cpu variable, and 
make use of that. If you were to do it that way, it would all work fine. 

So for example, if you actually want to use quicklists in the generic 
TLB-gather implementation, you should replace the

struct page *pages[FREE_PTE_NR];

entry in the struct mmu_gather with a set of quicklists instead. 

Or, alternatively, you could decide to not use the generic version at all, 
and instead do a per-architecture one that uses the quicklists.

The problem with that commit that I just reverted was that it mixed the 
two, but not completely. It still kept them separate.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Linus Torvalds


On Fri, 21 Sep 2007, Linus Torvalds wrote:
 
 So for example, if you actually want to use quicklists in the generic 
 TLB-gather implementation, you should replace the
 
   struct page *pages[FREE_PTE_NR];
 
 entry in the struct mmu_gather with a set of quicklists instead. 

Side note: this would obviously require that the interfaces to the 
quicklists would have to be changed, and you'd have an actual head 
pointer rather than using explicitly numbered per-cpu variables, but that 
seems to be a good idea in itself.

Another option is to just not use quicklists AT ALL, but just have a 
simple list of pages that is purely internal to mmu_gather, and just teach 
pgd/pmd/pud_alloc about that list. That actually sounds like the most 
straightforward and least confusing approach. The quicklists code is 
pretty damn ugly in its mixing of struct page and the virtual address.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Linus Torvalds wrote:

 The problem with that commit that I just reverted was that it mixed the 
 two, but not completely. It still kept them separate.

The quicklists have a long history and this bug has therefore also been in 
IA64 for a long time and it also likely exists on sparc64, sh and sh64. We 
need the patch that I posted to fix the other platforms. And with this fix 
there would be nothing amiss on x86_64 either.

---
 include/linux/quicklist.h |8 
 1 file changed, 8 deletions(-)

Index: linux-2.6/include/linux/quicklist.h
===
--- linux-2.6.orig/include/linux/quicklist.h2007-09-21 11:46:44.0 
-0700
+++ linux-2.6/include/linux/quicklist.h 2007-09-21 11:55:01.0 -0700
@@ -56,14 +56,6 @@ static inline void __quicklist_free(int 
struct page *page)
 {
struct quicklist *q;
-   int nid = page_to_nid(page);
-
-   if (unlikely(nid != numa_node_id())) {
-   if (dtor)
-   dtor(p);
-   __free_page(page);
-   return;
-   }
 
q = get_cpu_var(quicklist)[nr];
*(void **)p = q-page;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Luck, Tony
 The quicklists have a long history and this bug has therefore also been in 
 IA64 for a long time and it also likely exists on sparc64, sh and sh64. We 
 need the patch that I posted to fix the other platforms. And with this fix 
 there would be nothing amiss on x86_64 either.

IA64 doesn't do a pgdpudpmdpte table walk in h/w.  The VHPT leaps
straight to the pte page (through the virtual mapping of the VHPT). So
this wasn't a problem for IA64.

-Tony
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: x86_64: potential critical issue with quicklists and page table pages

2007-09-21 Thread Christoph Lameter
On Fri, 21 Sep 2007, Luck, Tony wrote:

  The quicklists have a long history and this bug has therefore also been in 
  IA64 for a long time and it also likely exists on sparc64, sh and sh64. We 
  need the patch that I posted to fix the other platforms. And with this fix 
  there would be nothing amiss on x86_64 either.
 
 IA64 doesn't do a pgdpudpmdpte table walk in h/w.  The VHPT leaps
 straight to the pte page (through the virtual mapping of the VHPT). So
 this wasn't a problem for IA64.

But this is concerning a page being freed before the TLB flush has been 
performed. Another process may then reuse the page and may rely on the 
fact that there is no TLB entry installed for the page that was just 
allocated. But there is still one there. The issues that result from this 
may be depend on the nature of each MMU.

Granted we usually install another TLB entry for the page mapping the page 
into a different address space and never use the old TLB. Which is likely 
the reason why we have never seen an issue before and also why this is 
probably difficult to cause any issues in the first place.

Plus this can only occur for off node pages, meaning this must involve a 
remote processor and the other processor is likely to have a completely 
different set of TLB entries.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/