[PATCH 2.6.14] mm: 8xx MM fix for
> > On Mon, Nov 07, 2005 at 07:37:45PM +0100, Joakim Tjernlund wrote: > > > > > > On Mon, Nov 07, 2005 at 07:14:15PM +0100, Joakim Tjernlund wrote: > > > > > -Original Message- > > > > > From: Tom Rini [mailto:trini at kernel.crashing.org] > > > > > Sent: 07 November 2005 16:52 > > > > > To: Marcelo Tosatti > > > > > Cc: Joakim Tjernlund; Pantelis Antoniou; Dan Malek; > > > > > linuxppc-embedded at ozlabs.org; gtolstolytkin at ru.mvista.com > > > > > Subject: Re: [PATCH 2.6.14] mm: 8xx MM fix for > > > > > > > > > > On Mon, Nov 07, 2005 at 08:16:18AM -0200, Marcelo > Tosatti wrote: > > > > > > Joakim! > > > > > > > > > > > > On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim > > > Tjernlund wrote: > > > > > > > Hi Marcelo > > > > > > > > > > > > > > [SNIP] > > > > > > > > The root of the problem are the changes against > the 8xx TLB > > > > > > > > handlers introduced > > > > > > > > during v2.6. What happens is the TLBMiss > handlers load the > > > > > > > > zeroed pte into > > > > > > > > the TLB, causing the TLBError handler to be > invoked (thats > > > > > > > > two TLB faults per > > > > > > > > pagefault), which then jumps to the generic MM code to > > > > > setup the pte. > > > > > > > > > > > > > > > > The bug is that the zeroed TLB is not invalidated (the > > > > > same reason > > > > > > > > for the "dcbst" misbehaviour), resulting in infinite > > > > > TLBError faults. > > > > > > > > > > > > > > > > Dan, I wonder why we just don't go back to v2.4 > behaviour. > > > > > > > > > > > > > > This is one reason why it is the way it is: > > > > > > > > > > > > > > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html > > > > > > > This details are little fuzzy ATM, but I think the > > > reason for the > > > > > > > current > > > > > > > impl. was only that it was less intrusive to impl. > > > > > > > > > > > > Ah, I see. I wonder if the bug is processor specific: we > > > > > don't have such > > > > > > changes in our v2.4 tree and never experienced such problem. > > > > > > > > > > > > It should be pretty easy to hit it right? (instruction > > > > > pagefaults should > > > > > > fail). > > > > > > > > > > > > Grigori, Tom, can you enlight us about the issue on the URL > > > > > above. How > > > > > > can it be triggered? > > > > > > > > > > So after looking at the code in 2.6.14 and current git, I > > > think the > > > > > above URL isn't relevant, unless there was a change I > > > missed (which > > > > > could totally be possible) that reverted the patch there and > > > > > fixed that > > > > > issue in a different manner. But since I didn't figure that > > > > > out until I > > > > > had finished researching it again: > > > > > > > > I wasn't clear enough. What I meant was that the above > patch made me > > > > think and > > > > the result was that I came up with a simpler fix, the "two > > > exception" > > > > fix that > > > > is in current kernels. See > > > > > > > http://linux.bkbits.net:8080/linux-2.6/diffs/arch/ppc/kernel/head_8xx.S@ > > > > > > > 1.19?nav=index.html|src/.|src/arch|src/arch/ppc|src/arch/ppc/k > > > ernel|hist > > > > /arch/ppc/kernel/head_8xx.S > > > > It appears this fix has some other issues :( > > > > > > > > How do the other ppc arches do? I am guessing that they > don't double > > > > fault, but bails > > > > out to do_page_fault from the TLB Miss handler, like > 8xx used to do. > > > > > > Assuming Dan doesn't come up with a more simple & better > fix, maybe we > > > shoul
[PATCH 2.6.14] mm: 8xx MM fix for
On Sun, Nov 13, 2005 at 01:47:53PM +0100, Joakim Tjernlund wrote: > > > > -Original Message- > > From: Marcelo Tosatti [mailto:marcelo.tosatti at cyclades.com] > > Sent: den 12 november 2005 20:28 > > To: Joakim Tjernlund > > Cc: Tom Rini; Dan Malek; gtolstolytkin at ru.mvista.com; > > linuxppc-embedded at ozlabs.org > > Subject: Re: [PATCH 2.6.14] mm: 8xx MM fix for > > > > On Mon, Nov 07, 2005 at 07:37:45PM +0100, Joakim Tjernlund wrote: > > > > > > > > On Mon, Nov 07, 2005 at 07:14:15PM +0100, Joakim Tjernlund wrote: > > > > > > -Original Message- > > > > > > From: Tom Rini [mailto:trini at kernel.crashing.org] > > > > > > Sent: 07 November 2005 16:52 > > > > > > To: Marcelo Tosatti > > > > > > Cc: Joakim Tjernlund; Pantelis Antoniou; Dan Malek; > > > > > > linuxppc-embedded at ozlabs.org; gtolstolytkin at ru.mvista.com > > > > > > Subject: Re: [PATCH 2.6.14] mm: 8xx MM fix for > > > > > > > > > > > > On Mon, Nov 07, 2005 at 08:16:18AM -0200, Marcelo > > Tosatti wrote: > > > > > > > Joakim! > > > > > > > > > > > > > > On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim > > > > Tjernlund wrote: > > > > > > > > Hi Marcelo > > > > > > > > > > > > > > > > [SNIP] > > > > > > > > > The root of the problem are the changes against the 8xx > > > > > > > > > TLB handlers introduced during v2.6. What > > happens is the > > > > > > > > > TLBMiss handlers load the zeroed pte into the > > TLB, causing > > > > > > > > > the TLBError handler to be invoked (thats two > > TLB faults > > > > > > > > > per pagefault), which then jumps to the generic > > MM code to > > > > > > setup the pte. > > > > > > > > > > > > > > > > > > The bug is that the zeroed TLB is not invalidated (the > > > > > > same reason > > > > > > > > > for the "dcbst" misbehaviour), resulting in infinite > > > > > > TLBError faults. > > > > > > > > > > > > > > > > > > Dan, I wonder why we just don't go back to v2.4 > > behaviour. > > > > > > > > > > > > > > > > This is one reason why it is the way it is: > > > > > > > > > > > > > > > > > > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.ht > > > > ml > > > > > > > > This details are little fuzzy ATM, but I think the > > > > reason for the > > > > > > > > current > > > > > > > > impl. was only that it was less intrusive to impl. > > > > > > > > > > > > > > Ah, I see. I wonder if the bug is processor specific: we > > > > > > don't have such > > > > > > > changes in our v2.4 tree and never experienced such problem. > > > > > > > > > > > > > > It should be pretty easy to hit it right? (instruction > > > > > > pagefaults should > > > > > > > fail). > > > > > > > > > > > > > > Grigori, Tom, can you enlight us about the issue on the URL > > > > > > above. How > > > > > > > can it be triggered? > > > > > > > > > > > > So after looking at the code in 2.6.14 and current git, I > > > > think the > > > > > > above URL isn't relevant, unless there was a change I > > > > missed (which > > > > > > could totally be possible) that reverted the patch there and > > > > > > fixed that issue in a different manner. But since I didn't > > > > > > figure that out until I had finished researching it again: > > > > > > > > > > I wasn't clear enough. What I meant was that the above > > patch made > > > > > me think and the result was that I came up with a > > simpler fix, the > > > > > "two > > > > exception" > > > > > fix that > > > > > is in current kernels. See > > > > >
[PATCH 2.6.14] mm: 8xx MM fix for
> -Original Message- > From: Marcelo Tosatti [mailto:marcelo.tosatti at cyclades.com] > Sent: den 12 november 2005 20:28 > To: Joakim Tjernlund > Cc: Tom Rini; Dan Malek; gtolstolytkin at ru.mvista.com; > linuxppc-embedded at ozlabs.org > Subject: Re: [PATCH 2.6.14] mm: 8xx MM fix for > > On Mon, Nov 07, 2005 at 07:37:45PM +0100, Joakim Tjernlund wrote: > > > > > > On Mon, Nov 07, 2005 at 07:14:15PM +0100, Joakim Tjernlund wrote: > > > > > -Original Message- > > > > > From: Tom Rini [mailto:trini at kernel.crashing.org] > > > > > Sent: 07 November 2005 16:52 > > > > > To: Marcelo Tosatti > > > > > Cc: Joakim Tjernlund; Pantelis Antoniou; Dan Malek; > > > > > linuxppc-embedded at ozlabs.org; gtolstolytkin at ru.mvista.com > > > > > Subject: Re: [PATCH 2.6.14] mm: 8xx MM fix for > > > > > > > > > > On Mon, Nov 07, 2005 at 08:16:18AM -0200, Marcelo > Tosatti wrote: > > > > > > Joakim! > > > > > > > > > > > > On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim > > > Tjernlund wrote: > > > > > > > Hi Marcelo > > > > > > > > > > > > > > [SNIP] > > > > > > > > The root of the problem are the changes against the 8xx > > > > > > > > TLB handlers introduced during v2.6. What > happens is the > > > > > > > > TLBMiss handlers load the zeroed pte into the > TLB, causing > > > > > > > > the TLBError handler to be invoked (thats two > TLB faults > > > > > > > > per pagefault), which then jumps to the generic > MM code to > > > > > setup the pte. > > > > > > > > > > > > > > > > The bug is that the zeroed TLB is not invalidated (the > > > > > same reason > > > > > > > > for the "dcbst" misbehaviour), resulting in infinite > > > > > TLBError faults. > > > > > > > > > > > > > > > > Dan, I wonder why we just don't go back to v2.4 > behaviour. > > > > > > > > > > > > > > This is one reason why it is the way it is: > > > > > > > > > > > > > > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.ht > > > ml > > > > > > > This details are little fuzzy ATM, but I think the > > > reason for the > > > > > > > current > > > > > > > impl. was only that it was less intrusive to impl. > > > > > > > > > > > > Ah, I see. I wonder if the bug is processor specific: we > > > > > don't have such > > > > > > changes in our v2.4 tree and never experienced such problem. > > > > > > > > > > > > It should be pretty easy to hit it right? (instruction > > > > > pagefaults should > > > > > > fail). > > > > > > > > > > > > Grigori, Tom, can you enlight us about the issue on the URL > > > > > above. How > > > > > > can it be triggered? > > > > > > > > > > So after looking at the code in 2.6.14 and current git, I > > > think the > > > > > above URL isn't relevant, unless there was a change I > > > missed (which > > > > > could totally be possible) that reverted the patch there and > > > > > fixed that issue in a different manner. But since I didn't > > > > > figure that out until I had finished researching it again: > > > > > > > > I wasn't clear enough. What I meant was that the above > patch made > > > > me think and the result was that I came up with a > simpler fix, the > > > > "two > > > exception" > > > > fix that > > > > is in current kernels. See > > > > > > > http://linux.bkbits.net:8080/linux-2.6/diffs/arch/ppc/kernel/h > > > ead_8xx.S@ > > > > > > > 1.19?nav=index.html|src/.|src/arch|src/arch/ppc|src/arch/ppc/k > > > ernel|hist > > > > /arch/ppc/kernel/head_8xx.S > > > > It appears this fix has some other issues :( > > > > > > > > How do the other ppc arches do? I am guessing that they don't > > > > double fault, but bails out to do_page_fault
[PATCH 2.6.14] mm: 8xx MM fix for
On Mon, Nov 07, 2005 at 07:37:45PM +0100, Joakim Tjernlund wrote: > > > > On Mon, Nov 07, 2005 at 07:14:15PM +0100, Joakim Tjernlund wrote: > > > > -Original Message- > > > > From: Tom Rini [mailto:trini at kernel.crashing.org] > > > > Sent: 07 November 2005 16:52 > > > > To: Marcelo Tosatti > > > > Cc: Joakim Tjernlund; Pantelis Antoniou; Dan Malek; > > > > linuxppc-embedded at ozlabs.org; gtolstolytkin at ru.mvista.com > > > > Subject: Re: [PATCH 2.6.14] mm: 8xx MM fix for > > > > > > > > On Mon, Nov 07, 2005 at 08:16:18AM -0200, Marcelo Tosatti wrote: > > > > > Joakim! > > > > > > > > > > On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim > > Tjernlund wrote: > > > > > > Hi Marcelo > > > > > > > > > > > > [SNIP] > > > > > > > The root of the problem are the changes against the 8xx TLB > > > > > > > handlers introduced > > > > > > > during v2.6. What happens is the TLBMiss handlers load the > > > > > > > zeroed pte into > > > > > > > the TLB, causing the TLBError handler to be invoked (thats > > > > > > > two TLB faults per > > > > > > > pagefault), which then jumps to the generic MM code to > > > > setup the pte. > > > > > > > > > > > > > > The bug is that the zeroed TLB is not invalidated (the > > > > same reason > > > > > > > for the "dcbst" misbehaviour), resulting in infinite > > > > TLBError faults. > > > > > > > > > > > > > > Dan, I wonder why we just don't go back to v2.4 behaviour. > > > > > > > > > > > > This is one reason why it is the way it is: > > > > > > > > > > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html > > > > > > This details are little fuzzy ATM, but I think the > > reason for the > > > > > > current > > > > > > impl. was only that it was less intrusive to impl. > > > > > > > > > > Ah, I see. I wonder if the bug is processor specific: we > > > > don't have such > > > > > changes in our v2.4 tree and never experienced such problem. > > > > > > > > > > It should be pretty easy to hit it right? (instruction > > > > pagefaults should > > > > > fail). > > > > > > > > > > Grigori, Tom, can you enlight us about the issue on the URL > > > > above. How > > > > > can it be triggered? > > > > > > > > So after looking at the code in 2.6.14 and current git, I > > think the > > > > above URL isn't relevant, unless there was a change I > > missed (which > > > > could totally be possible) that reverted the patch there and > > > > fixed that > > > > issue in a different manner. But since I didn't figure that > > > > out until I > > > > had finished researching it again: > > > > > > I wasn't clear enough. What I meant was that the above patch made me > > > think and > > > the result was that I came up with a simpler fix, the "two > > exception" > > > fix that > > > is in current kernels. See > > > > > http://linux.bkbits.net:8080/linux-2.6/diffs/arch/ppc/kernel/h > > ead_8xx.S@ > > > > > 1.19?nav=index.html|src/.|src/arch|src/arch/ppc|src/arch/ppc/k > > ernel|hist > > > /arch/ppc/kernel/head_8xx.S > > > It appears this fix has some other issues :( > > > > > > How do the other ppc arches do? I am guessing that they don't double > > > fault, but bails > > > out to do_page_fault from the TLB Miss handler, like 8xx used to do. > > > > Assuming Dan doesn't come up with a more simple & better fix, maybe we > > should go back to the original patch I made? > > That was what I was thinking too(or some variation of your patch) > I wonder if that would solve the misbehaving dcbst problem Marcelo found > some time ago too? Hi Joakim, Yes, it would fix the "dcbst" issue. That problem was triggered by a zeroed TLB entry. In practice it seems that the "three exception" approach does not impose a significant overhead in comparison with the "two exception" version (as can be seen by the results of the latency tests). Anyway, if decided upon, the "two exception" version (no zeroed TLB entry state) needs the TLBMiss handler should to the present bit as Dan mentioned. I don't know what Dan is up to, he meant to be doing significant changes. I'll be playing with TLB preloading next week... how's your TLB handler shrinkage idea?
[PATCH 2.6.14] mm: 8xx MM fix for
On Thursday 10 November 2005 08:48, David Jander wrote: >[...] > Hmmm. This is a lot in the line of the tests I did with (the more generic > benchmark) nbench. After looking at those results (see my other post in > this thread) I already suspected something like this. Sorry, I obviously did not mean this thread, but the following post on another thread: http://ozlabs.org/pipermail/linuxppc-embedded/2005-November/020775.html Regards, -- David Jander
[PATCH 2.6.14] mm: 8xx MM fix for
On Wednesday 09 November 2005 13:04, Marcelo Tosatti wrote: >[...] > > ** 2.6.14 DataTLBHandler jump direct ("two exceptions"): > > first batch: > avg: 287ms > avg: 287ms > avg: 287ms > avg: 287ms > avg: 287ms > > second batch: > avg: 287ms > avg: 287ms > avg: 287ms > avg: 287ms > avg: 287ms > > ** 2.6.14 vanilla ("three exceptions"): > > first batch: > avg: 288ms > avg: 285ms > avg: 287ms > avg: 287ms > avg: 288ms > > second batch: > avg: 288ms > avg: 288ms > avg: 287ms > avg: 287ms > avg: 287ms > > ** 2.4.17 (root on RAMDISK): > > avg: 309ms > avg: 313ms > avg: 312ms > avg: 311ms > avg: 310ms Hmmm. This is a lot in the line of the tests I did with (the more generic benchmark) nbench. After looking at those results (see my other post in this thread) I already suspected something like this. > The v2.6.14's kernel jump-direct is more consistent at 287ms, > while vanilla 2.6.14 oscillates between 285 and 288ms, but > no significant difference between the two. > > v2.6's fault handling is clearly faster than 2.4's (note that the compiler > is also different, 2.4 uses gcc 2.95 and 2.6 gcc 3.3). I don't think the compiler does much difference here though. In my test the exact same compiler was used for both kernels, and the same rootfs and binary of nbench. gcc-3.3.3. I did also use oprofile to get an idea of where the code spent its most cpu time during nbench, and AFAIR flush_dcache_icache() took quite a chunk of it, so I assume page fault latency is of importance there too, and might account for the huge difference between 2.4 and 2.6. Greetings, -- David Jander
[PATCH 2.6.14] mm: 8xx MM fix for
On Tue, Nov 08, 2005 at 07:39:59AM +1100, Benjamin Herrenschmidt wrote: > I think the current code, even with your fix, is sub-optimal. But of > course, the only way to be sure is to do real measurements Hi folks, I've written a simple app to estimate pagefault latency using gettimeofday(). Can be found at http://hera.kernel.org/~marcelo/measurefault/ /* This simple program attemps to estimate how long a pagefault takes. * It does that by mmaping() /tmp/latency-test, and touching a page. * Time measurement is done with gettimeofday() before and after the * data touch. * * In the hope to have a more precise measurement two values are subtracted * from the pagefault time delta: * * - Estimated time between two subsequent gettimeofday() calls, average * of 100 runs (this average is around 8ms on 48Mhz PPC 8xx, * 0ms on 1Ghz Pegasos G4) * * - Time taken to touch the data after its TLB cached, aka second run. * This takes 1 and 2ms on 8xx (it varies) and 0ms on 1Ghz Pegasos. */ And results with 48Mhz 855T, comparing internal v2.4.17, vanilla v2.6.14 and v2.6.14-jump-direct (jumping directly to handle_page_fault if the pte is zeroed). Each "avg:" entry is an average of 100 "measure-fault-latency.c" runs. 2.6's root is mounted on NFS. ** 2.6.14 DataTLBHandler jump direct ("two exceptions"): first batch: avg: 287ms avg: 287ms avg: 287ms avg: 287ms avg: 287ms second batch: avg: 287ms avg: 287ms avg: 287ms avg: 287ms avg: 287ms ** 2.6.14 vanilla ("three exceptions"): first batch: avg: 288ms avg: 285ms avg: 287ms avg: 287ms avg: 288ms second batch: avg: 288ms avg: 288ms avg: 287ms avg: 287ms avg: 287ms ** 2.4.17 (root on RAMDISK): avg: 309ms avg: 313ms avg: 312ms avg: 311ms avg: 310ms The v2.6.14's kernel jump-direct is more consistent at 287ms, while vanilla 2.6.14 oscillates between 285 and 288ms, but no significant difference between the two. v2.6's fault handling is clearly faster than 2.4's (note that the compiler is also different, 2.4 uses gcc 2.95 and 2.6 gcc 3.3).
[PATCH 2.6.14] mm: 8xx MM fix for
On Mon, 2005-11-07 at 06:44 -0200, Marcelo Tosatti wrote: > > The bug is that the zeroed TLB is not invalidated (the same reason > for the "dcbst" misbehaviour), resulting in infinite TLBError faults. I see, so you are in the same situation as ia64 which has valid but unmapped TLBs ? > Dan, I wonder why we just don't go back to v2.4 behaviour. It is not very > clear to me that "two exception" speedup offsets the additional code required > for "one exception" version. Have you actually done any measurements? What do you mean by "one exception" version ? You probably get 3 in fact since after you have serviced the fault in the common code, you take another fault to fill the PTE. In fact, you could even go back to one exception by pre-filling the TLB in update_mmu_cache :) > There is chance that the additional code ends up in the same cacheline, > which would mean no huge gain by the "two exception" approach. Might be > even harmful for performance (you need two exceptions instead of one > after all). > > The "two exception" approach requires a TLB flush (to nuke the zeroed) > at each PTE update for correct behaviour (which BTW is another slowdown): I think the current code, even with your fix, is sub-optimal. But of course, the only way to be sure is to do real measurements Ben.
[PATCH 2.6.14] mm: 8xx MM fix for
On Monday 07 November 2005 22:39, Benjamin Herrenschmidt wrote: > On Mon, 2005-11-07 at 06:44 -0200, Marcelo Tosatti wrote: > > > > > The bug is that the zeroed TLB is not invalidated (the same reason > > for the "dcbst" misbehaviour), resulting in infinite TLBError faults. > > I see, so you are in the same situation as ia64 which has valid but > unmapped TLBs ? > > > Dan, I wonder why we just don't go back to v2.4 behaviour. It is not very > > clear to me that "two exception" speedup offsets the additional code > > required > > for "one exception" version. Have you actually done any measurements? > > What do you mean by "one exception" version ? You probably get 3 in fact > since after you have serviced the fault in the common code, you take > another fault to fill the PTE. > > In fact, you could even go back to one exception by pre-filling the TLB > in update_mmu_cache :) > Yep. That should be the target. Remember the poor 8xx is not exactly a speed demon :). > > There is chance that the additional code ends up in the same cacheline, > > which would mean no huge gain by the "two exception" approach. Might be > > even harmful for performance (you need two exceptions instead of one > > after all). > > > > The "two exception" approach requires a TLB flush (to nuke the zeroed) > > at each PTE update for correct behaviour (which BTW is another slowdown): > > I think the current code, even with your fix, is sub-optimal. But of > course, the only way to be sure is to do real measurements > > Ben. > > The TLB flush is bogus IMO. I'm going to try the last patch by marcelo to see if it works for me. Pantelis.
[PATCH 2.6.14] mm: 8xx MM fix for
On Nov 7, 2005, at 1:22 PM, Tom Rini wrote: > Assuming Dan doesn't come up with a more simple & better fix, maybe we > should go back to the original patch I made? I'm working on it. It'll look more like 2.4. Thanks. -- Dan
[PATCH 2.6.14] mm: 8xx MM fix for
On Nov 7, 2005, at 3:50 PM, Pantelis Antoniou wrote: > Yep. That should be the target. Remember the poor 8xx is not exactly a > speed demon :). It really isn't a big speed difference. The context save/restore is minimal. The original thought was " ...well, I'm already here, I know we will take another exception, so may as well fake the error case and call do_page_fault." However, I really do like a minimal TLB miss case for valid PTEs, and push everything else to the heavyweight functions. Thanks. -- Dan
[PATCH 2.6.14] mm: 8xx MM fix for
> > On Mon, Nov 07, 2005 at 07:14:15PM +0100, Joakim Tjernlund wrote: > > > -Original Message- > > > From: Tom Rini [mailto:trini at kernel.crashing.org] > > > Sent: 07 November 2005 16:52 > > > To: Marcelo Tosatti > > > Cc: Joakim Tjernlund; Pantelis Antoniou; Dan Malek; > > > linuxppc-embedded at ozlabs.org; gtolstolytkin at ru.mvista.com > > > Subject: Re: [PATCH 2.6.14] mm: 8xx MM fix for > > > > > > On Mon, Nov 07, 2005 at 08:16:18AM -0200, Marcelo Tosatti wrote: > > > > Joakim! > > > > > > > > On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim > Tjernlund wrote: > > > > > Hi Marcelo > > > > > > > > > > [SNIP] > > > > > > The root of the problem are the changes against the 8xx TLB > > > > > > handlers introduced > > > > > > during v2.6. What happens is the TLBMiss handlers load the > > > > > > zeroed pte into > > > > > > the TLB, causing the TLBError handler to be invoked (thats > > > > > > two TLB faults per > > > > > > pagefault), which then jumps to the generic MM code to > > > setup the pte. > > > > > > > > > > > > The bug is that the zeroed TLB is not invalidated (the > > > same reason > > > > > > for the "dcbst" misbehaviour), resulting in infinite > > > TLBError faults. > > > > > > > > > > > > Dan, I wonder why we just don't go back to v2.4 behaviour. > > > > > > > > > > This is one reason why it is the way it is: > > > > > > > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html > > > > > This details are little fuzzy ATM, but I think the > reason for the > > > > > current > > > > > impl. was only that it was less intrusive to impl. > > > > > > > > Ah, I see. I wonder if the bug is processor specific: we > > > don't have such > > > > changes in our v2.4 tree and never experienced such problem. > > > > > > > > It should be pretty easy to hit it right? (instruction > > > pagefaults should > > > > fail). > > > > > > > > Grigori, Tom, can you enlight us about the issue on the URL > > > above. How > > > > can it be triggered? > > > > > > So after looking at the code in 2.6.14 and current git, I > think the > > > above URL isn't relevant, unless there was a change I > missed (which > > > could totally be possible) that reverted the patch there and > > > fixed that > > > issue in a different manner. But since I didn't figure that > > > out until I > > > had finished researching it again: > > > > I wasn't clear enough. What I meant was that the above patch made me > > think and > > the result was that I came up with a simpler fix, the "two > exception" > > fix that > > is in current kernels. See > > > http://linux.bkbits.net:8080/linux-2.6/diffs/arch/ppc/kernel/h > ead_8xx.S@ > > > 1.19?nav=index.html|src/.|src/arch|src/arch/ppc|src/arch/ppc/k > ernel|hist > > /arch/ppc/kernel/head_8xx.S > > It appears this fix has some other issues :( > > > > How do the other ppc arches do? I am guessing that they don't double > > fault, but bails > > out to do_page_fault from the TLB Miss handler, like 8xx used to do. > > Assuming Dan doesn't come up with a more simple & better fix, maybe we > should go back to the original patch I made? That was what I was thinking too(or some variation of your patch) I wonder if that would solve the misbehaving dcbst problem Marcelo found some time ago too? Jocke
[PATCH 2.6.14] mm: 8xx MM fix for
> -Original Message- > From: Tom Rini [mailto:trini at kernel.crashing.org] > Sent: 07 November 2005 16:52 > To: Marcelo Tosatti > Cc: Joakim Tjernlund; Pantelis Antoniou; Dan Malek; > linuxppc-embedded at ozlabs.org; gtolstolytkin at ru.mvista.com > Subject: Re: [PATCH 2.6.14] mm: 8xx MM fix for > > On Mon, Nov 07, 2005 at 08:16:18AM -0200, Marcelo Tosatti wrote: > > Joakim! > > > > On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim Tjernlund wrote: > > > Hi Marcelo > > > > > > [SNIP] > > > > The root of the problem are the changes against the 8xx TLB > > > > handlers introduced > > > > during v2.6. What happens is the TLBMiss handlers load the > > > > zeroed pte into > > > > the TLB, causing the TLBError handler to be invoked (thats > > > > two TLB faults per > > > > pagefault), which then jumps to the generic MM code to > setup the pte. > > > > > > > > The bug is that the zeroed TLB is not invalidated (the > same reason > > > > for the "dcbst" misbehaviour), resulting in infinite > TLBError faults. > > > > > > > > Dan, I wonder why we just don't go back to v2.4 behaviour. > > > > > > This is one reason why it is the way it is: > > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html > > > This details are little fuzzy ATM, but I think the reason for the > > > current > > > impl. was only that it was less intrusive to impl. > > > > Ah, I see. I wonder if the bug is processor specific: we > don't have such > > changes in our v2.4 tree and never experienced such problem. > > > > It should be pretty easy to hit it right? (instruction > pagefaults should > > fail). > > > > Grigori, Tom, can you enlight us about the issue on the URL > above. How > > can it be triggered? > > So after looking at the code in 2.6.14 and current git, I think the > above URL isn't relevant, unless there was a change I missed (which > could totally be possible) that reverted the patch there and > fixed that > issue in a different manner. But since I didn't figure that > out until I > had finished researching it again: I wasn't clear enough. What I meant was that the above patch made me think and the result was that I came up with a simpler fix, the "two exception" fix that is in current kernels. See http://linux.bkbits.net:8080/linux-2.6/diffs/arch/ppc/kernel/head_8xx.S@ 1.19?nav=index.html|src/.|src/arch|src/arch/ppc|src/arch/ppc/kernel|hist /arch/ppc/kernel/head_8xx.S It appears this fix has some other issues :( How do the other ppc arches do? I am guessing that they don't double fault, but bails out to do_page_fault from the TLB Miss handler, like 8xx used to do. > > Switching hats for a minute, this came from a bug a customer of > MontaVista found, so I can't give out the testcase :( > > To repeat what Joakim said back then: > "I think I have figured this out. The first TLB misses that happen at > app startup is Data TLB misses. These will then hit the NULL L1 entry > and end up in do_page_fault() which will populate the L1 > entry. But when > you have a very large app that spans more than one L1 entry (16 MB I > think) it may happen that you will have I-TLB Miss first one of the L1 > entrys which will make the I-TLB handler bail out to > do_page_fault() and > the app craches(SEGV)." This still stands I think. > > Looking at the patch again, what I don't see is why I talk > about fudging > I-TLB Miss at 0x400 when it's I-TLB Error we fudge at being there, but > then get hung up that there can be a slight diff between the > two ("This > is because we check bit 4 of SRR1 in both cases, but in the case of an > I-TLB Miss, this bit is always set, and it only indicates a protection > fault on an I-TLB Error.") so instead of 0x1300 jumping to the handler > at 0x400, we treat it like a regular exception so we know > where we came > from, and perhaps missed fixing a case somewhere? Didn't look into this part of your patch, sorry. Jocke
[PATCH 2.6.14] mm: 8xx MM fix for
On Monday 07 November 2005 09:44, Marcelo Tosatti wrote: > Seems the bug is exposed by the change which avoids flushing the > TLB when not necessary (in case the pte has not changed), introduced > recently: >[...] Brilliant! I just checked, and now it boots again. Btw, it did boot before this patch, but taking about 2 or 3 hours to get halfway through the init scripts ;-) Thanks for the good work! -- David Jander
[PATCH 2.6.14] mm: 8xx MM fix for
> Joakim! > > On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim Tjernlund wrote: > > Hi Marcelo > > > > [SNIP] > > > The root of the problem are the changes against the 8xx TLB > > > handlers introduced > > > during v2.6. What happens is the TLBMiss handlers load the > > > zeroed pte into > > > the TLB, causing the TLBError handler to be invoked (thats > > > two TLB faults per > > > pagefault), which then jumps to the generic MM code to > setup the pte. > > > > > > The bug is that the zeroed TLB is not invalidated (the same reason > > > for the "dcbst" misbehaviour), resulting in infinite > TLBError faults. > > > > > > Dan, I wonder why we just don't go back to v2.4 behaviour. > > > > This is one reason why it is the way it is: > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html > > This details are little fuzzy ATM, but I think the reason for the > > current > > impl. was only that it was less intrusive to impl. > > Ah, I see. I wonder if the bug is processor specific: we > don't have such > changes in our v2.4 tree and never experienced such problem. > > It should be pretty easy to hit it right? (instruction > pagefaults should > fail). No, its pretty hard to trigger it. Read the all mails on the subject to see why. The one or two exception approach doesn't matter performancewise(at least for ITLB exceptions) I think. > > Grigori, Tom, can you enlight us about the issue on the URL above. How > can it be triggered? > > > >
[PATCH 2.6.14] mm: 8xx MM fix for
Marcelo Tosatti wrote: > Hi folks, > > Seems the bug is exposed by the change which avoids flushing the > TLB when not necessary (in case the pte has not changed), introduced > recently: > [snip] > > Good job Marcelo! :) FWIW I'd rather have the single exception version if at all possible. Regards Pantelis
[PATCH 2.6.14] mm: 8xx MM fix for
Hi Marcelo [SNIP] > The root of the problem are the changes against the 8xx TLB > handlers introduced > during v2.6. What happens is the TLBMiss handlers load the > zeroed pte into > the TLB, causing the TLBError handler to be invoked (thats > two TLB faults per > pagefault), which then jumps to the generic MM code to setup the pte. > > The bug is that the zeroed TLB is not invalidated (the same reason > for the "dcbst" misbehaviour), resulting in infinite TLBError faults. > > Dan, I wonder why we just don't go back to v2.4 behaviour. This is one reason why it is the way it is: http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html This details are little fuzzy ATM, but I think the reason for the current impl. was only that it was less intrusive to impl. Jocke
[PATCH 2.6.14] mm: 8xx MM fix for
On Tue, Nov 08, 2005 at 07:39:59AM +1100, Benjamin Herrenschmidt wrote: > On Mon, 2005-11-07 at 06:44 -0200, Marcelo Tosatti wrote: > > > > > The bug is that the zeroed TLB is not invalidated (the same reason > > for the "dcbst" misbehaviour), resulting in infinite TLBError faults. > > I see, so you are in the same situation as ia64 which has valid but > unmapped TLBs ? > > > Dan, I wonder why we just don't go back to v2.4 behaviour. It is not very > > clear to me that "two exception" speedup offsets the additional code > > required > > for "one exception" version. Have you actually done any measurements? > > What do you mean by "one exception" version ? You probably get 3 in fact > since after you have serviced the fault in the common code, you take > another fault to fill the PTE. Yep, that would be 3! > In fact, you could even go back to one exception by pre-filling the TLB > in update_mmu_cache :) OK, thats a good idea as we talked on IRC. Working on that. > > There is chance that the additional code ends up in the same cacheline, > > which would mean no huge gain by the "two exception" approach. Might be > > even harmful for performance (you need two exceptions instead of one > > after all). > > > > The "two exception" approach requires a TLB flush (to nuke the zeroed) > > at each PTE update for correct behaviour (which BTW is another slowdown): > > I think the current code, even with your fix, is sub-optimal. But of > course, the only way to be sure is to do real measurements Indeed. Thanks!
[PATCH 2.6.14] mm: 8xx MM fix for
On Mon, Nov 07, 2005 at 07:14:15PM +0100, Joakim Tjernlund wrote: > > -Original Message- > > From: Tom Rini [mailto:trini at kernel.crashing.org] > > Sent: 07 November 2005 16:52 > > To: Marcelo Tosatti > > Cc: Joakim Tjernlund; Pantelis Antoniou; Dan Malek; > > linuxppc-embedded at ozlabs.org; gtolstolytkin at ru.mvista.com > > Subject: Re: [PATCH 2.6.14] mm: 8xx MM fix for > > > > On Mon, Nov 07, 2005 at 08:16:18AM -0200, Marcelo Tosatti wrote: > > > Joakim! > > > > > > On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim Tjernlund wrote: > > > > Hi Marcelo > > > > > > > > [SNIP] > > > > > The root of the problem are the changes against the 8xx TLB > > > > > handlers introduced > > > > > during v2.6. What happens is the TLBMiss handlers load the > > > > > zeroed pte into > > > > > the TLB, causing the TLBError handler to be invoked (thats > > > > > two TLB faults per > > > > > pagefault), which then jumps to the generic MM code to > > setup the pte. > > > > > > > > > > The bug is that the zeroed TLB is not invalidated (the > > same reason > > > > > for the "dcbst" misbehaviour), resulting in infinite > > TLBError faults. > > > > > > > > > > Dan, I wonder why we just don't go back to v2.4 behaviour. > > > > > > > > This is one reason why it is the way it is: > > > > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html > > > > This details are little fuzzy ATM, but I think the reason for the > > > > current > > > > impl. was only that it was less intrusive to impl. > > > > > > Ah, I see. I wonder if the bug is processor specific: we > > don't have such > > > changes in our v2.4 tree and never experienced such problem. > > > > > > It should be pretty easy to hit it right? (instruction > > pagefaults should > > > fail). > > > > > > Grigori, Tom, can you enlight us about the issue on the URL > > above. How > > > can it be triggered? > > > > So after looking at the code in 2.6.14 and current git, I think the > > above URL isn't relevant, unless there was a change I missed (which > > could totally be possible) that reverted the patch there and > > fixed that > > issue in a different manner. But since I didn't figure that > > out until I > > had finished researching it again: > > I wasn't clear enough. What I meant was that the above patch made me > think and > the result was that I came up with a simpler fix, the "two exception" > fix that > is in current kernels. See > http://linux.bkbits.net:8080/linux-2.6/diffs/arch/ppc/kernel/head_8xx.S@ > 1.19?nav=index.html|src/.|src/arch|src/arch/ppc|src/arch/ppc/kernel|hist > /arch/ppc/kernel/head_8xx.S > It appears this fix has some other issues :( > > How do the other ppc arches do? I am guessing that they don't double > fault, but bails > out to do_page_fault from the TLB Miss handler, like 8xx used to do. Assuming Dan doesn't come up with a more simple & better fix, maybe we should go back to the original patch I made? -- Tom Rini http://gate.crashing.org/~trini/
[PATCH 2.6.14] mm: 8xx MM fix for
On Nov 7, 2005, at 9:32 AM, Joakim Tjernlund wrote: > This is one reason why it is the way it is: Oh geeze, what a hack! :-) This could have been fixed with a line of assembler code in the TLB miss exception. I'll take a look at all of this and fix it up. Thanks. -- Dan
[PATCH 2.6.14] mm: 8xx MM fix for
On Nov 7, 2005, at 3:44 AM, Marcelo Tosatti wrote: > Dan, I wonder why we just don't go back to v2.4 behaviour. It is not > very > clear to me that "two exception" speedup offsets the additional code > required > for "one exception" version. Have you actually done any measurements? No, and I didn't actually make these changes, either :-) I'm working on some 8xx debugging right now, so let's experiment with some changes. I don't understand why other processors, especially G2 cores like 82xx, aren't finding the same problems we are having with 8xx. Logically, we are all doing the same thing, unless there are some tlb invalidates on these other processors that I'm forgetting about. We just seem to be running into stale entries, and we have to fix it. Thanks. -- Dan
[PATCH 2.6.14] mm: 8xx MM fix for
On Mon, Nov 07, 2005 at 04:44:32PM +0100, Joakim Tjernlund wrote: > > > Joakim! > > > > On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim Tjernlund wrote: > > > Hi Marcelo > > > > > > [SNIP] > > > > The root of the problem are the changes against the 8xx TLB > > > > handlers introduced > > > > during v2.6. What happens is the TLBMiss handlers load the > > > > zeroed pte into > > > > the TLB, causing the TLBError handler to be invoked (thats > > > > two TLB faults per > > > > pagefault), which then jumps to the generic MM code to > > setup the pte. > > > > > > > > The bug is that the zeroed TLB is not invalidated (the same reason > > > > for the "dcbst" misbehaviour), resulting in infinite > > TLBError faults. > > > > > > > > Dan, I wonder why we just don't go back to v2.4 behaviour. > > > > > > This is one reason why it is the way it is: > > > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html > > > This details are little fuzzy ATM, but I think the reason for the > > > current > > > impl. was only that it was less intrusive to impl. > > > > Ah, I see. I wonder if the bug is processor specific: we > > don't have such > > changes in our v2.4 tree and never experienced such problem. > > > > It should be pretty easy to hit it right? (instruction > > pagefaults should > > fail). > > No, its pretty hard to trigger it. Read the all mails on the subject to > see why. > The one or two exception approach doesn't matter performancewise(at > least for ITLB exceptions) > I think. Fine, let it continue the way it is then.
[PATCH 2.6.14] mm: 8xx MM fix for
On Mon, Nov 07, 2005 at 08:16:18AM -0200, Marcelo Tosatti wrote: > Joakim! > > On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim Tjernlund wrote: > > Hi Marcelo > > > > [SNIP] > > > The root of the problem are the changes against the 8xx TLB > > > handlers introduced > > > during v2.6. What happens is the TLBMiss handlers load the > > > zeroed pte into > > > the TLB, causing the TLBError handler to be invoked (thats > > > two TLB faults per > > > pagefault), which then jumps to the generic MM code to setup the pte. > > > > > > The bug is that the zeroed TLB is not invalidated (the same reason > > > for the "dcbst" misbehaviour), resulting in infinite TLBError faults. > > > > > > Dan, I wonder why we just don't go back to v2.4 behaviour. > > > > This is one reason why it is the way it is: > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html > > This details are little fuzzy ATM, but I think the reason for the > > current > > impl. was only that it was less intrusive to impl. > > Ah, I see. I wonder if the bug is processor specific: we don't have such > changes in our v2.4 tree and never experienced such problem. > > It should be pretty easy to hit it right? (instruction pagefaults should > fail). > > Grigori, Tom, can you enlight us about the issue on the URL above. How > can it be triggered? So after looking at the code in 2.6.14 and current git, I think the above URL isn't relevant, unless there was a change I missed (which could totally be possible) that reverted the patch there and fixed that issue in a different manner. But since I didn't figure that out until I had finished researching it again: Switching hats for a minute, this came from a bug a customer of MontaVista found, so I can't give out the testcase :( To repeat what Joakim said back then: "I think I have figured this out. The first TLB misses that happen at app startup is Data TLB misses. These will then hit the NULL L1 entry and end up in do_page_fault() which will populate the L1 entry. But when you have a very large app that spans more than one L1 entry (16 MB I think) it may happen that you will have I-TLB Miss first one of the L1 entrys which will make the I-TLB handler bail out to do_page_fault() and the app craches(SEGV)." Looking at the patch again, what I don't see is why I talk about fudging I-TLB Miss at 0x400 when it's I-TLB Error we fudge at being there, but then get hung up that there can be a slight diff between the two ("This is because we check bit 4 of SRR1 in both cases, but in the case of an I-TLB Miss, this bit is always set, and it only indicates a protection fault on an I-TLB Error.") so instead of 0x1300 jumping to the handler at 0x400, we treat it like a regular exception so we know where we came from, and perhaps missed fixing a case somewhere? -- Tom Rini http://gate.crashing.org/~trini/
[PATCH 2.6.14] mm: 8xx MM fix for
On Mon, Nov 07, 2005 at 09:35:59AM -0500, Dan Malek wrote: > > On Nov 7, 2005, at 3:44 AM, Marcelo Tosatti wrote: > > >Dan, I wonder why we just don't go back to v2.4 behaviour. It is not > >very > >clear to me that "two exception" speedup offsets the additional code > >required > >for "one exception" version. Have you actually done any measurements? > > No, and I didn't actually make these changes, either :-) Ahh, ok. sorry. I remember you arguing that it was faster this way (less code). > I'm working on some 8xx debugging right now, so let's experiment > with some changes. I don't understand why other processors, especially > G2 cores like 82xx, aren't finding the same problems we are having > with 8xx. Logically, we are all doing the same thing, unless there are > some tlb invalidates on these other processors that I'm forgetting > about. I really dont know how the 82xx TLB works, so... > We just seem to be running into stale entries, and we have to fix it. Right - the issue Joakim noted would be one reason for the "two exception" approach.
[PATCH 2.6.14] mm: 8xx MM fix for
Joakim! On Mon, Nov 07, 2005 at 03:32:52PM +0100, Joakim Tjernlund wrote: > Hi Marcelo > > [SNIP] > > The root of the problem are the changes against the 8xx TLB > > handlers introduced > > during v2.6. What happens is the TLBMiss handlers load the > > zeroed pte into > > the TLB, causing the TLBError handler to be invoked (thats > > two TLB faults per > > pagefault), which then jumps to the generic MM code to setup the pte. > > > > The bug is that the zeroed TLB is not invalidated (the same reason > > for the "dcbst" misbehaviour), resulting in infinite TLBError faults. > > > > Dan, I wonder why we just don't go back to v2.4 behaviour. > > This is one reason why it is the way it is: > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016382.html > This details are little fuzzy ATM, but I think the reason for the > current > impl. was only that it was less intrusive to impl. Ah, I see. I wonder if the bug is processor specific: we don't have such changes in our v2.4 tree and never experienced such problem. It should be pretty easy to hit it right? (instruction pagefaults should fail). Grigori, Tom, can you enlight us about the issue on the URL above. How can it be triggered?
[PATCH 2.6.14] mm: 8xx MM fix for
Hi folks, Seems the bug is exposed by the change which avoids flushing the TLB when not necessary (in case the pte has not changed), introduced recently: __handle_mm_fault(): entry = pte_mkyoung(entry); if (!pte_same(old_entry, entry)) { ptep_set_access_flags(vma, address, pte, entry, write_access); update_mmu_cache(vma, address, entry); lazy_mmu_prot_update(entry); } else { /* * This is needed only for protection faults but the arch code * is not yet telling us if this is a protection fault or not. * This still avoids useless tlb flushes for .text page faults * with threads. */ if (write_access) flush_tlb_page(vma, address); } The "update_mmu_cache()" call was unconditional before, which caused the TLB to be flushed by: if (pfn_valid(pfn)) { struct page *page = pfn_to_page(pfn); if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)) { if (vma->vm_mm == current->active_mm) { #ifdef CONFIG_8xx /* On 8xx, cache control instructions (particularly * "dcbst" from flush_dcache_icache) fault as write * operation if there is an unpopulated TLB entry * for the address in question. To workaround that, * we invalidate the TLB here, thus avoiding dcbst * misbehaviour. */ _tlbie(address); #endif __flush_dcache_icache((void *) address); } else flush_dcache_icache_page(page); set_bit(PG_arch_1, &page->flags); } Which worked to due to pure luck: PG_arch_1 was always unset before, but now it isnt. The root of the problem are the changes against the 8xx TLB handlers introduced during v2.6. What happens is the TLBMiss handlers load the zeroed pte into the TLB, causing the TLBError handler to be invoked (thats two TLB faults per pagefault), which then jumps to the generic MM code to setup the pte. The bug is that the zeroed TLB is not invalidated (the same reason for the "dcbst" misbehaviour), resulting in infinite TLBError faults. Dan, I wonder why we just don't go back to v2.4 behaviour. It is not very clear to me that "two exception" speedup offsets the additional code required for "one exception" version. Have you actually done any measurements? There is chance that the additional code ends up in the same cacheline, which would mean no huge gain by the "two exception" approach. Might be even harmful for performance (you need two exceptions instead of one after all). The "two exception" approach requires a TLB flush (to nuke the zeroed) at each PTE update for correct behaviour (which BTW is another slowdown): --- ../git/linux-2.6/arch/ppc/mm/init.c 2005-11-01 07:58:12.0 -0600 +++ linux-2.6-git-wednov02/arch/ppc/mm/init.c 2005-11-07 06:13:58.0 -0600 @@ -597,19 +597,12 @@ if (pfn_valid(pfn)) { struct page *page = pfn_to_page(pfn); +#ifdef CONFIG_8xx + _tlbie(address); +#endif if (!PageReserved(page) && !test_bit(PG_arch_1, &page->flags)) { if (vma->vm_mm == current->active_mm) { -#ifdef CONFIG_8xx - /* On 8xx, cache control instructions (particularly -* "dcbst" from flush_dcache_icache) fault as write -* operation if there is an unpopulated TLB entry -* for the address in question. To workaround that, -* we invalidate the TLB here, thus avoiding dcbst -* misbehaviour. -*/ - _tlbie(address); -#endif __flush_dcache_icache((void *) address); } else flush_dcache_icache_page(page); On Sun, Oct 30, 2005 at 11:03:24PM +0300, Pantelis Antoniou wrote: > Latest MMU changes caused 8xx to stop working. Flushing tlb of the faulting > address fixes the problem. > > --- > commit 978e2f36b1ae53e37ba27b3ab8f1c5ddbb8c8a10 > tree 7dd0e403c240162b1925db0834d694f4b4a0e95e > parent ca02ea5aebcda886d1552c6af73ca96c02bf9fed > author Pantelis Antoniou Sun, 30 Oct 2005 21:53:48 +0200 > committer Pantelis Antoniou Sun, 30 Oct 2005 21:53:48 > +0200 > > arch/ppc/mm/fault.c | 13 + > 1 files changed, 13 insertions(+), 0 deletions(-) > > diff --git a/arch/ppc/mm/fault.c b/arch/ppc/mm/fault.c > --- a/arch/ppc/mm/fault.c > +++ b/arch/ppc/mm/fault.c > @@ -240,6 +240,19 @@ good_area:
[PATCH 2.6.14] mm: 8xx MM fix for
On Wed, Nov 02, 2005 at 12:55:57AM +0200, Pantelis Antoniou wrote: > On Tuesday 01 November 2005 19:25, Marcelo Tosatti wrote: > > On Sun, Oct 30, 2005 at 11:03:24PM +0300, Pantelis Antoniou wrote: > > > Latest MMU changes caused 8xx to stop working. Flushing tlb of the > > > faulting > > > address fixes the problem. > > > > Hi Panto, > > > > Its working fine around here. How much of a vanilla 2.6.14 your is? > > > > [root at CAS root]# cat /proc/cpuinfo > > processor : 0 > > cpu : 8xx > > clock : 48MHz > > bus clock : 48MHz > > revision: 0.0 (pvr 0050 ) > > bogomips: 47.82 > > [root at CAS root]# uname -a > > Linux CAS 2.6.14 #2 Tue Nov 1 16:20:28 CST 2005 ppc unknown > > > > > > Vanila 2.6.14 worked fine too. > > It's the mm patches that started coming in later. > Unfortunately the version did not change, so I can't provide it. > Did you used a current git tree? No I did not - will do and chase the bug. Thanks.
[PATCH 2.6.14] mm: 8xx MM fix for
On Tuesday 01 November 2005 19:25, Marcelo Tosatti wrote: > On Sun, Oct 30, 2005 at 11:03:24PM +0300, Pantelis Antoniou wrote: > > Latest MMU changes caused 8xx to stop working. Flushing tlb of the faulting > > address fixes the problem. > > Hi Panto, > > Its working fine around here. How much of a vanilla 2.6.14 your is? > > [root at CAS root]# cat /proc/cpuinfo > processor : 0 > cpu : 8xx > clock : 48MHz > bus clock : 48MHz > revision: 0.0 (pvr 0050 ) > bogomips: 47.82 > [root at CAS root]# uname -a > Linux CAS 2.6.14 #2 Tue Nov 1 16:20:28 CST 2005 ppc unknown > > Vanila 2.6.14 worked fine too. It's the mm patches that started coming in later. Unfortunately the version did not change, so I can't provide it. Did you used a current git tree? Regards Pantelis
[PATCH 2.6.14] mm: 8xx MM fix for
On Sun, Oct 30, 2005 at 11:03:24PM +0300, Pantelis Antoniou wrote: > Latest MMU changes caused 8xx to stop working. Flushing tlb of the faulting > address fixes the problem. Hi Panto, Its working fine around here. How much of a vanilla 2.6.14 your is? [root at CAS root]# cat /proc/cpuinfo processor : 0 cpu : 8xx clock : 48MHz bus clock : 48MHz revision: 0.0 (pvr 0050 ) bogomips: 47.82 [root at CAS root]# uname -a Linux CAS 2.6.14 #2 Tue Nov 1 16:20:28 CST 2005 ppc unknown
[PATCH 2.6.14] mm: 8xx MM fix for
On Sun, 2005-10-30 at 23:03 +0300, Pantelis Antoniou wrote: > Latest MMU changes caused 8xx to stop working. Flushing tlb of the faulting > address fixes the problem. Ugh ? What is the problem precisely ? This is just a dodgy workaround for an unexplained problem. Normally, the kenrel _WILL_ cause a tlb flush after manipulating a PTE. Ben. > --- > commit 978e2f36b1ae53e37ba27b3ab8f1c5ddbb8c8a10 > tree 7dd0e403c240162b1925db0834d694f4b4a0e95e > parent ca02ea5aebcda886d1552c6af73ca96c02bf9fed > author Pantelis Antoniou Sun, 30 Oct 2005 21:53:48 +0200 > committer Pantelis Antoniou Sun, 30 Oct 2005 21:53:48 > +0200 > > arch/ppc/mm/fault.c | 13 + > 1 files changed, 13 insertions(+), 0 deletions(-) > > diff --git a/arch/ppc/mm/fault.c b/arch/ppc/mm/fault.c > --- a/arch/ppc/mm/fault.c > +++ b/arch/ppc/mm/fault.c > @@ -240,6 +240,19 @@ good_area: > goto bad_area; > if (!(vma->vm_flags & (VM_READ | VM_EXEC))) > goto bad_area; > + > +#ifdef CONFIG_8xx > + { > + /* 8xx is retarded; news at 11 */ > + pte_t *ptep = NULL; > + > + if (get_pteptr(mm, address, &ptep) && pte_present(*ptep)) > + _tlbie(address); > + > + if (ptep != NULL) > + pte_unmap(ptep); > + } > +#endif > } > > /*
[PATCH 2.6.14] mm: 8xx MM fix for
Latest MMU changes caused 8xx to stop working. Flushing tlb of the faulting address fixes the problem. --- commit 978e2f36b1ae53e37ba27b3ab8f1c5ddbb8c8a10 tree 7dd0e403c240162b1925db0834d694f4b4a0e95e parent ca02ea5aebcda886d1552c6af73ca96c02bf9fed author Pantelis Antoniou Sun, 30 Oct 2005 21:53:48 +0200 committer Pantelis Antoniou Sun, 30 Oct 2005 21:53:48 +0200 arch/ppc/mm/fault.c | 13 + 1 files changed, 13 insertions(+), 0 deletions(-) diff --git a/arch/ppc/mm/fault.c b/arch/ppc/mm/fault.c --- a/arch/ppc/mm/fault.c +++ b/arch/ppc/mm/fault.c @@ -240,6 +240,19 @@ good_area: goto bad_area; if (!(vma->vm_flags & (VM_READ | VM_EXEC))) goto bad_area; + +#ifdef CONFIG_8xx + { + /* 8xx is retarded; news at 11 */ + pte_t *ptep = NULL; + + if (get_pteptr(mm, address, &ptep) && pte_present(*ptep)) + _tlbie(address); + + if (ptep != NULL) + pte_unmap(ptep); + } +#endif } /*