Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
On Sep 2, 2008, at 4:21 PM, Benjamin Herrenschmidt wrote: The reason I did that was to distinguish between the size of a software PTE entry, and the actual size of the hardware entry. In the case of 36b physical, the hardware PTE size is the same as it always is; but we've grown the Linux PTE. We can call it something else, or I can change as you suggest; I was worried that the next person to come along might get confused. I suppose there aren't too many of us that are crazy enough to muck around in this code, so maybe it's not that big of a deal. We already called PTE the linux entry everywhere so hopefully there should be no confusion. We call HPTE the hash table ones. Actually, PTE_SIZE is already defined in hash_low_32.S, and it's the size of the HPTE. I'll rename that to HPTE_SIZE and use PTE_SIZE for the linux PTE. -Becky ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
On Wed, 2008-09-03 at 10:10 -0500, Becky Bruce wrote: Actually, PTE_SIZE is already defined in hash_low_32.S, and it's the size of the HPTE. I'll rename that to HPTE_SIZE and use PTE_SIZE for the linux PTE. Good idea. Thanks. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
On Sep 1, 2008, at 12:19 AM, Benjamin Herrenschmidt wrote: Thanks for taking the time to dig through this :) +#ifdef CONFIG_PTE_64BIT +#define PTE_FLAGS_OFFSET 4 /* offset of PTE flags, in bytes */ +#define LNX_PTE_SIZE 8 /* size of a linux PTE, in bytes */ +#else +#define PTE_FLAGS_OFFSET 0 +#define LNX_PTE_SIZE 4 +#endif s/LNX_PTE_SIZE/PTE_BYTES or PTE_SIZE, no need for that horrible LNX prefix. In fact, if it's only used by the asm, then ditch it and have asm-offsets.c create something based on sizeof(pte_t). The reason I did that was to distinguish between the size of a software PTE entry, and the actual size of the hardware entry. In the case of 36b physical, the hardware PTE size is the same as it always is; but we've grown the Linux PTE. We can call it something else, or I can change as you suggest; I was worried that the next person to come along might get confused. I suppose there aren't too many of us that are crazy enough to muck around in this code, so maybe it's not that big of a deal. #define pte_none(pte) ((pte_val(pte) ~_PTE_NONE_MASK) == 0) #define pte_present(pte)(pte_val(pte) _PAGE_PRESENT) -#define pte_clear(mm,addr,ptep) do { pte_update(ptep, ~0, 0); } while (0) +#define pte_clear(mm, addr, ptep) \ + do { pte_update(ptep, ~_PAGE_HASHPTE, 0); } while (0) Where does this previous definition of pte_clear comes from ? It's bogus (and it's not like that upstream). Your updated ones looks ok though. But whatever tree has the previous one would break hash based ppc32 if merged or is there some other related changes in Kumar tree that make the above safe ? That's from Kumar's tree of changes for BookE he'll need to answer that. I think he put out a patchset with that work; not sure if it was correct in this particular or not. #define pmd_none(pmd) (!pmd_val(pmd)) #define pmd_bad(pmd)(pmd_val(pmd) _PMD_BAD) @@ -664,8 +670,30 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { #if _PAGE_HASHPTE != 0 +#ifndef CONFIG_PTE_64BIT pte_update(ptep, ~_PAGE_HASHPTE, pte_val(pte) ~_PAGE_HASHPTE); #else + /* +* We have to do the write of the 64b pte as 2 stores. This +* code assumes that the entry we're storing to is currently +* not valid and that all callers have the page table lock. +* Having the entry be not valid protects readers who might read +* between the first and second stores. +*/ + unsigned int tmp; Do you know for sure the entry isn't valid ? On ppc64, we explicitely test for that and if it was valid, we clear and flush it. The generic code has been going on and off about calling set_pte_at() on an already valid PTE, I wouldn't rely too much on it guaranteeing it will not happen. The 32b PTE code is safe because it preserves _PAGE_HASHPTE . IIRC, we discussed this at some point, months ago, and decided this was the case. If it *can* be valid, and it's possible to have a reader on another cpu, I think we end up having to go through a very gross sequence where we have to mark it invalid before we start mucking around with it; otherwise a reader can get half-and-half. Perhaps I'm missing a key restriction here? Note also that once you have (which you don't now) the guarantee that your previous PTE is invalid, then you can safely do two normal stores instead of a lwarx/stwcx. loop. In any case, having the stw in the middle of the loop doesn't sound very useful. If the pte is invalid, and we're SMP, we need the store to the high word to occur before the stwcx that sets the valid bit, unless you're certain we can't have a reader on another cpu. If we're not SMP, then I agree, let's move that store. I thought we discussed at some point the possibility that there might be an updater on another CPU? Is this not possible? If not, I'll change it. The existing code is also lwarx/stwcxing via pte_update(); is there no reason for that? We should probably change that as well, if that's the case. + __asm__ __volatile__(\ +1: lwarx %0,0,%4\n\ + rlwimi %L2,%0,0,30,30\n\ + stw %2,0(%3)\n\ + eieio\n\ + stwcx. %L2,0,%4\n\ + bne-1b + : =r (tmp), =m (*ptep) + : r (pte), r (ptep), r ((unsigned long)(ptep) + 4), m (*ptep) + : cc); +#endif /* CONFIG_PTE_64BIT */ +#else /* _PAGE_HASHPTE == 0 */ #if defined(CONFIG_PTE_64BIT) defined(CONFIG_SMP) __asm__ __volatile__(\ stw%U0%X0 %2,%0\n\ diff --git a/arch/powerpc/mm/hash_low_32.S b/arch/powerpc/mm/ hash_low_32.S index b9ba7d9..d63e20a 100644 --- a/arch/powerpc/mm/hash_low_32.S +++ b/arch/powerpc/mm/hash_low_32.S @@ -75,7 +75,7 @@ _GLOBAL(hash_page_sync) * Returns to the caller if the access is illegal or there is no *
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
The reason I did that was to distinguish between the size of a software PTE entry, and the actual size of the hardware entry. In the case of 36b physical, the hardware PTE size is the same as it always is; but we've grown the Linux PTE. We can call it something else, or I can change as you suggest; I was worried that the next person to come along might get confused. I suppose there aren't too many of us that are crazy enough to muck around in this code, so maybe it's not that big of a deal. We already called PTE the linux entry everywhere so hopefully there should be no confusion. We call HPTE the hash table ones. That's from Kumar's tree of changes for BookE he'll need to answer that. I think he put out a patchset with that work; not sure if it was correct in this particular or not. Well, unless he did some changes in the area of hash flushing, linux code must -never- clear _PAGE_HASHPTE on 32 bits. Only the assembly hash flushing code will do it while also flushing the hash table. IIRC, we discussed this at some point, months ago, and decided this was the case. If it *can* be valid, and it's possible to have a reader on another cpu, I think we end up having to go through a very gross sequence where we have to mark it invalid before we start mucking around with it; otherwise a reader can get half-and-half. Perhaps I'm missing a key restriction here? No that's correct. You can go out of line with something like if (unlikely(pte_valid(*ptep)) { pte_clear(ptep); eieio(); /* make sure above is visible before * further stw to high word is */ } if (pte_val(*ptep) _PAGE_HASHPTE) flush_hash_entry(ptep, addr) /* or whatever proto */ I'd rather be safe than sorry in that area. ppc64 does something akin to the above. If the pte is invalid, and we're SMP, we need the store to the high word to occur before the stwcx that sets the valid bit, unless you're certain we can't have a reader on another cpu. If we're not SMP, then I agree, let's move that store. Yes, the stores must occur in order, but none of them need to be a stwcx. if your PTE cannot be modified from underneath you and in any case, the high word which isn't atomic doesn't have to be in the loop, it should just be before the loop. I thought we discussed at some point the possibility that there might be an updater on another CPU? Is this not possible? If not, I'll change it. The existing code is also lwarx/stwcxing via pte_update(); is there no reason for that? We should probably change that as well, if that's the case. You are holding the PTE lock, so the only code path that might eventually perform an update are the hash miss and the hash flush. The former will never happen if your PTE is invalid. The later will never happen if your PTE has _PAGE_HASHPTE clear which it should have if you do the sequence above properly and if I didn't miss anything (which is always possible :-) There is indeed the chance that your PTE -will- be concurrently modified without a lock by an invalidation if you have _PAGE_HASHPTE set and you don't first ensure the hash has been flushed. In that case, you do need an atomic operation. Just move the stw to the top word to be before the loop then (and still keep the pte_clear bit). Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
+#ifdef CONFIG_PTE_64BIT +#define PTE_FLAGS_OFFSET 4 /* offset of PTE flags, in bytes */ +#define LNX_PTE_SIZE 8 /* size of a linux PTE, in bytes */ +#else +#define PTE_FLAGS_OFFSET 0 +#define LNX_PTE_SIZE 4 +#endif s/LNX_PTE_SIZE/PTE_BYTES or PTE_SIZE, no need for that horrible LNX prefix. In fact, if it's only used by the asm, then ditch it and have asm-offsets.c create something based on sizeof(pte_t). #define pte_none(pte)((pte_val(pte) ~_PTE_NONE_MASK) == 0) #define pte_present(pte) (pte_val(pte) _PAGE_PRESENT) -#define pte_clear(mm,addr,ptep) do { pte_update(ptep, ~0, 0); } while (0) +#define pte_clear(mm, addr, ptep) \ + do { pte_update(ptep, ~_PAGE_HASHPTE, 0); } while (0) Where does this previous definition of pte_clear comes from ? It's bogus (and it's not like that upstream). Your updated ones looks ok though. But whatever tree has the previous one would break hash based ppc32 if merged or is there some other related changes in Kumar tree that make the above safe ? #define pmd_none(pmd)(!pmd_val(pmd)) #define pmd_bad(pmd)(pmd_val(pmd) _PMD_BAD) @@ -664,8 +670,30 @@ static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { #if _PAGE_HASHPTE != 0 +#ifndef CONFIG_PTE_64BIT pte_update(ptep, ~_PAGE_HASHPTE, pte_val(pte) ~_PAGE_HASHPTE); #else + /* + * We have to do the write of the 64b pte as 2 stores. This + * code assumes that the entry we're storing to is currently + * not valid and that all callers have the page table lock. + * Having the entry be not valid protects readers who might read + * between the first and second stores. + */ + unsigned int tmp; Do you know for sure the entry isn't valid ? On ppc64, we explicitely test for that and if it was valid, we clear and flush it. The generic code has been going on and off about calling set_pte_at() on an already valid PTE, I wouldn't rely too much on it guaranteeing it will not happen. The 32b PTE code is safe because it preserves _PAGE_HASHPTE . Note also that once you have (which you don't now) the guarantee that your previous PTE is invalid, then you can safely do two normal stores instead of a lwarx/stwcx. loop. In any case, having the stw in the middle of the loop doesn't sound very useful. + __asm__ __volatile__(\ +1: lwarx %0,0,%4\n\ + rlwimi %L2,%0,0,30,30\n\ + stw %2,0(%3)\n\ + eieio\n\ + stwcx. %L2,0,%4\n\ + bne-1b + : =r (tmp), =m (*ptep) + : r (pte), r (ptep), r ((unsigned long)(ptep) + 4), m (*ptep) + : cc); +#endif /* CONFIG_PTE_64BIT */ +#else /* _PAGE_HASHPTE == 0 */ #if defined(CONFIG_PTE_64BIT) defined(CONFIG_SMP) __asm__ __volatile__(\ stw%U0%X0 %2,%0\n\ diff --git a/arch/powerpc/mm/hash_low_32.S b/arch/powerpc/mm/hash_low_32.S index b9ba7d9..d63e20a 100644 --- a/arch/powerpc/mm/hash_low_32.S +++ b/arch/powerpc/mm/hash_low_32.S @@ -75,7 +75,7 @@ _GLOBAL(hash_page_sync) * Returns to the caller if the access is illegal or there is no * mapping for the address. Otherwise it places an appropriate PTE * in the hash table and returns from the exception. - * Uses r0, r3 - r8, ctr, lr. + * Uses r0, r3 - r8, r10, ctr, lr. */ .text _GLOBAL(hash_page) @@ -106,9 +106,15 @@ _GLOBAL(hash_page) addir5,r5,[EMAIL PROTECTED] /* kernel page table */ rlwimi r3,r9,32-12,29,29 /* MSR_PR - _PAGE_USER */ 112: add r5,r5,r7/* convert to phys addr */ +#ifndef CONFIG_PTE_64BIT rlwimi r5,r4,12,20,29 /* insert top 10 bits of address */ lwz r8,0(r5)/* get pmd entry */ rlwinm. r8,r8,0,0,19/* extract address of pte page */ +#else + rlwinm r8,r4,13,19,29 /* Compute pgdir/pmd offset */ + lwzxr8,r8,r5/* Get L1 entry */ + rlwinm. r8,r8,0,0,20/* extract pt base address */ +#endif Any reason you wrote the above using a different technique ? If you believe that rlwinm/lwzx is going to be more efficient than rlwimi/lwz, maybe we should fix the old one ... or am I missing something totally obvious ? :-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
On Fri, Aug 29, 2008 at 08:42:01AM +1000, Benjamin Herrenschmidt wrote: For the non-SMP case, I think it should be possible to optimize it. The only thing that can happen at interrupt time is hashing of kernel or vmalloc/ioremap pages, which shouldn't compete with set_pte on those pages, so there would be no access races there, but I may be missing something as it's the morning and I about just woke up :-) Is that still true with preemptible kernels? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
On Aug 27, 2008, at 6:43 PM, Scott Wood wrote: Becky Bruce wrote: #if _PAGE_HASHPTE != 0 +#ifndef CONFIG_PTE_64BIT pte_update(ptep, ~_PAGE_HASHPTE, pte_val(pte) ~_PAGE_HASHPTE); #else + /* +* We have to do the write of the 64b pte as 2 stores. This +* code assumes that the entry we're storing to is currently +* not valid and that all callers have the page table lock. +* Having the entry be not valid protects readers who might read +* between the first and second stores. +*/ + unsigned int tmp; + + __asm__ __volatile__(\ +1: lwarx %0,0,%4\n\ + rlwimi %L2,%0,0,30,30\n\ + stw %2,0(%3)\n\ + eieio\n\ + stwcx. %L2,0,%4\n\ + bne-1b + : =r (tmp), =m (*ptep) + : r (pte), r (ptep), r ((unsigned long)(ptep) + 4), m (*ptep) + : cc); +#endif /* CONFIG_PTE_64BIT */ Could the stw to the same reservation granule as the stwcx cancel the reservation on some implementations? P Not on the same processor. lus, if you're assuming that the entry is currently invalid and all callers have the page table lock, do we need the lwarx/stwcx at all? At the least, it should check PTE_ATOMIC_UPDATES. I'm pretty sure I went through this in great detail at one point and concluded that I did in fact need the lwarx/stwcx. IIRC, it has to do with other non-set_pte_at writers not necessarily holding the page table lock. FYI, the existing 32-bit PTE code is doing atomic updates as well. About PTE_ATOMIC_UPDATES, I didn't add that in because hashed page table implementations require atomic updates. Adding it in would create another clause in that code, because I would still need to order the operations with a 64-bit PTE and I can't call pte_update as it only changes the low word of the pte. I wasn't feeling too keen on adding untested pagetable code into the kernel :) I can add it if the peanut gallery wants it, but I'll be marking it with a big fat BUYER BEWARE. %2 should be +r, not r. diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/ platforms/Kconfig.cputype index 7f65127..ca5b58b 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -128,18 +128,22 @@ config FSL_EMB_PERFMON config PTE_64BIT bool - depends on 44x || E500 + depends on 44x || E500 || 6xx default y if 44x - default y if E500 PHYS_64BIT + default y if PHYS_64BIT How is this different from PHYS_64BIT? One is the width of the PTE and one is the width of a physical address. It's entirely plausible to have a 64-bit PTE because you have a bunch of status bits, and only have 32-bit physical addressing. That's why there are 2 options. config PHYS_64BIT - bool 'Large physical address support' if E500 - depends on 44x || E500 + bool 'Large physical address support' if E500 || 6xx Maybe if !44x, or have 44x select this, rather than listing all arches where it's optional. Not sure exactly what you're suggesting here + depends on 44x || E500 || 6xx select RESOURCES_64BIT default y if 44x ---help--- This option enables kernel support for larger than 32-bit physical - addresses. This features is not be available on all e500 cores. + addresses. This features is not be available on all cores. This features is not be? Heh, I didn't type that :) But I can fix it. Thanks, B ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
Becky Bruce wrote: I'm pretty sure I went through this in great detail at one point and concluded that I did in fact need the lwarx/stwcx. IIRC, it has to do with other non-set_pte_at writers not necessarily holding the page table lock. FYI, the existing 32-bit PTE code is doing atomic updates as well. But will those updates happen if there isn't already a valid PTE? About PTE_ATOMIC_UPDATES, I didn't add that in because hashed page table implementations require atomic updates. Right, I misread it and thought it was being used for non-hashed implementations as well. What happens if you enable 64-bit PTEs on a 603-ish CPU? The kconfig seems to allow it. Adding it in would create another clause in that code, because I would still need to order the operations with a 64-bit PTE and I can't call pte_update as it only changes the low word of the pte. I wasn't feeling too keen on adding untested pagetable code into the kernel :) Wimp. :-) I can add it if the peanut gallery wants it, but I'll be marking it with a big fat BUYER BEWARE. No, there's little point if nothing selects it (or is planned to in the near future). diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 7f65127..ca5b58b 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -128,18 +128,22 @@ config FSL_EMB_PERFMON config PTE_64BIT bool -depends on 44x || E500 +depends on 44x || E500 || 6xx default y if 44x -default y if E500 PHYS_64BIT +default y if PHYS_64BIT How is this different from PHYS_64BIT? One is the width of the PTE and one is the width of a physical address. It's entirely plausible to have a 64-bit PTE because you have a bunch of status bits, and only have 32-bit physical addressing. That's why there are 2 options. Right, I just didn't see anything that actually selects it independently of PHYS_64BIT. Is this something that's expected to actually happen in the future? The default y if 44x line is redundant with the default y if PHYS_64BIT. config PHYS_64BIT -bool 'Large physical address support' if E500 -depends on 44x || E500 +bool 'Large physical address support' if E500 || 6xx Maybe if !44x, or have 44x select this, rather than listing all arches where it's optional. Not sure exactly what you're suggesting here It just seems simpler to not conditionalize the bool, but instead have CONFIG_44x do select PHYS_64BIT. I'd rather avoid another list of platforms accumulating in a kconfig dependency. +depends on 44x || E500 || 6xx select RESOURCES_64BIT default y if 44x ---help--- This option enables kernel support for larger than 32-bit physical - addresses. This features is not be available on all e500 cores. + addresses. This features is not be available on all cores. This features is not be? Heh, I didn't type that :) But I can fix it. You didn't type it, but you touched it. Tag, you're it. :-) -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
Becky Bruce wrote: On Aug 28, 2008, at 11:07 AM, Scott Wood wrote: Becky Bruce wrote: I'm pretty sure I went through this in great detail at one point and concluded that I did in fact need the lwarx/stwcx. IIRC, it has to do with other non-set_pte_at writers not necessarily holding the page table lock. FYI, the existing 32-bit PTE code is doing atomic updates as well. But will those updates happen if there isn't already a valid PTE? I understand what you're saying, I've been here before :) However, I was never able to convince myself that it's safe without the lwarx/stwcx. There's hashing code that wanks around with the HASHPTE bit doing a RMW without holding any lock (other than lwarx/stwcx-ing the PTE itself). OK. I was concerned not just about efficiency, but of the safety of the stw write if there were other modifications going on (even if the set_pte_at stwcx fails, the other updater could have lwarxed an succesfully stwcxed after the stw and ended up with a mixed PTE), but it may not be an issue depending on the nature of the updates. About PTE_ATOMIC_UPDATES, I didn't add that in because hashed page table implementations require atomic updates. Right, I misread it and thought it was being used for non-hashed implementations as well. What happens if you enable 64-bit PTEs on a 603-ish CPU? The kconfig seems to allow it. Don't do that :) That's why the help is there in the Kconfig. People will do it anyway -- and there's multiplatform to consider. Otherwise, I have to list out every 74xx part that supports 36-bit physical addressing. In any event, nothing interesting will happen other than that you'll waste some space. The kernel boots fine with a non-36b physical u-boot and small amounts of RAM. My concern was the generic code trying to use 64-bit PTEs, and the 603 TLB miss handlers continuing to assume that the PTEs are 32-bit, and loading the wrong thing. Wasted space alone is an acceptable consequence of turning on things you don't need. :-) I'm still not sure where you're going with this - I can remove 44x from the conditional part, but we still have to deal with e500 and 6xx. You still need it in depends (in the absence of a PHYS_64BIT_CAPABLE or some such), but not bool '...' if. It's not a big deal, just a pet peeve. In which case you're now setting this in different places for difft plats, making it potentially harder to read. Unless you're suggesting allowing the selection of PHYS_64BIT on any platform No, unless the code for all platforms makes it safe to do so. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
Great, so *you* got my email, and I did not. I love our mailserver! On Aug 28, 2008, at 3:28 PM, Scott Wood wrote: Becky Bruce wrote: On Aug 28, 2008, at 11:07 AM, Scott Wood wrote: Becky Bruce wrote: I'm pretty sure I went through this in great detail at one point and concluded that I did in fact need the lwarx/stwcx. IIRC, it has to do with other non-set_pte_at writers not necessarily holding the page table lock. FYI, the existing 32-bit PTE code is doing atomic updates as well. But will those updates happen if there isn't already a valid PTE? I understand what you're saying, I've been here before :) However, I was never able to convince myself that it's safe without the lwarx/ stwcx. There's hashing code that wanks around with the HASHPTE bit doing a RMW without holding any lock (other than lwarx/stwcx-ing the PTE itself). OK. I was concerned not just about efficiency, but of the safety of the stw write if there were other modifications going on (even if the set_pte_at stwcx fails, the other updater could have lwarxed an succesfully stwcxed after the stw and ended up with a mixed PTE), but it may not be an issue depending on the nature of the updates. It shouldn't be an issue - set_pte_at() is the only writer of the high bits of the PTE, the pte is invalid upon entry to set_pte_at(), and none of the potential interfering writers should be turning on the valid bit. About PTE_ATOMIC_UPDATES, I didn't add that in because hashed page table implementations require atomic updates. Right, I misread it and thought it was being used for non-hashed implementations as well. What happens if you enable 64-bit PTEs on a 603-ish CPU? The kconfig seems to allow it. Don't do that :) That's why the help is there in the Kconfig. People will do it anyway -- and there's multiplatform to consider. Otherwise, I have to list out every 74xx part that supports 36-bit physical addressing. In any event, nothing interesting will happen other than that you'll waste some space. The kernel boots fine with a non-36b physical u-boot and small amounts of RAM. My concern was the generic code trying to use 64-bit PTEs, and the 603 TLB miss handlers continuing to assume that the PTEs are 32-bit, and loading the wrong thing. Wasted space alone is an acceptable consequence of turning on things you don't need. :-) Actually, I'm lying to you - I forgot about the old parts with DTLB/ ITLB handlers. That code will actually break, and I'd rather not hack it up to pointlessly accomodate large PTEs. I do need to fix this the Kconfig, even though it's going to be gross. Thanks for pointing this out. I'm still not sure where you're going with this - I can remove 44x from the conditional part, but we still have to deal with e500 and 6xx. You still need it in depends (in the absence of a PHYS_64BIT_CAPABLE or some such), but not bool '...' if. It's not a big deal, just a pet peeve. I'll look at making it less peevy :) In which case you're now setting this in different places for difft plats, making it potentially harder to read. Unless you're suggesting allowing the selection of PHYS_64BIT on any platform No, unless the code for all platforms makes it safe to do so. Thanks! -Becky ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
On Aug 28, 2008, at 11:07 AM, Scott Wood wrote: Becky Bruce wrote: I'm pretty sure I went through this in great detail at one point and concluded that I did in fact need the lwarx/stwcx. IIRC, it has to do with other non-set_pte_at writers not necessarily holding the page table lock. FYI, the existing 32-bit PTE code is doing atomic updates as well. But will those updates happen if there isn't already a valid PTE? I understand what you're saying, I've been here before :) However, I was never able to convince myself that it's safe without the lwarx/ stwcx. There's hashing code that wanks around with the HASHPTE bit doing a RMW without holding any lock (other than lwarx/stwcx-ing the PTE itself). And there's definitely code that's fairly far removed from the last time you checked that an entry was valid. I've CC'd Ben on this mail - perhaps he can shed some light. If we don't need the atomics, I'm happy to yank them. Now, it *does* seem like set_pte_at could be optimized for the non-SMP case I'll have to chew on that one a bit. About PTE_ATOMIC_UPDATES, I didn't add that in because hashed page table implementations require atomic updates. Right, I misread it and thought it was being used for non-hashed implementations as well. What happens if you enable 64-bit PTEs on a 603-ish CPU? The kconfig seems to allow it. Don't do that :) That's why the help is there in the Kconfig. Otherwise, I have to list out every 74xx part that supports 36-bit physical addressing. In any event, nothing interesting will happen other than that you'll waste some space. The kernel boots fine with a non-36b physical u-boot and small amounts of RAM. Adding it in would create another clause in that code, because I would still need to order the operations with a 64-bit PTE and I can't call pte_update as it only changes the low word of the pte. I wasn't feeling too keen on adding untested pagetable code into the kernel :) Wimp. :-) Pansy :) I can add it if the peanut gallery wants it, but I'll be marking it with a big fat BUYER BEWARE. No, there's little point if nothing selects it (or is planned to in the near future). Good :) diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/ powerpc/platforms/Kconfig.cputype index 7f65127..ca5b58b 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -128,18 +128,22 @@ config FSL_EMB_PERFMON config PTE_64BIT bool -depends on 44x || E500 +depends on 44x || E500 || 6xx default y if 44x -default y if E500 PHYS_64BIT +default y if PHYS_64BIT How is this different from PHYS_64BIT? One is the width of the PTE and one is the width of a physical address. It's entirely plausible to have a 64-bit PTE because you have a bunch of status bits, and only have 32-bit physical addressing. That's why there are 2 options. Right, I just didn't see anything that actually selects it independently of PHYS_64BIT. Is this something that's expected to actually happen in the future? There's been talk of making the PTEs always 64-bit even on 32-bit platforms. So it's entirely plausible. The default y if 44x line is redundant with the default y if PHYS_64BIT. You're right, I'll clean that up. config PHYS_64BIT -bool 'Large physical address support' if E500 -depends on 44x || E500 +bool 'Large physical address support' if E500 || 6xx Maybe if !44x, or have 44x select this, rather than listing all arches where it's optional. Not sure exactly what you're suggesting here It just seems simpler to not conditionalize the bool, but instead have CONFIG_44x do select PHYS_64BIT. I'd rather avoid another list of platforms accumulating in a kconfig dependency. I'm still not sure where you're going with this - I can remove 44x from the conditional part, but we still have to deal with e500 and 6xx. In which case you're now setting this in different places for difft plats, making it potentially harder to read. Unless you're suggesting allowing the selection of PHYS_64BIT on any platform (in which case your comment about what happens if I select this on 603 doesn't make much sense). +depends on 44x || E500 || 6xx select RESOURCES_64BIT default y if 44x ---help--- This option enables kernel support for larger than 32-bit physical - addresses. This features is not be available on all e500 cores. + addresses. This features is not be available on all cores. This features is not be? Heh, I didn't type that :) But I can fix it. You didn't type it, but you touched it. Tag, you're it. :-) It's already fixed in my tree :) -B ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
I understand what you're saying, I've been here before :) However, I was never able to convince myself that it's safe without the lwarx/ stwcx. There's hashing code that wanks around with the HASHPTE bit doing a RMW without holding any lock (other than lwarx/stwcx-ing the PTE itself). And there's definitely code that's fairly far removed from the last time you checked that an entry was valid. I've CC'd Ben on this mail - perhaps he can shed some light. If we don't need the atomics, I'm happy to yank them. Now, it *does* seem like set_pte_at could be optimized for the non-SMP case I'll have to chew on that one a bit. I haven't read the whole discussion not reviewed the patches yet, so I'm just answering to the above sentence before I head off for the week-end (and yes, Becky, I _DO_ have reviewing your stuff high on my todo list, but I've been really swamped those last 2 weeks). So all generic code always accesses PTEs with the PTE lock held (that lock can be in various places ... typically for us it's one per PTE page). 44x and FSL BookE no longer write to the PTEs without that lock anymore and thus don't require atomic access in set_pte and friends. Hash based platforms still do because of -one- thing : the hashing code proper which writes back using lwarx/stwcx. to update _PAGE_ACCESSED, _PAGE_HASHPTE and _PAGE_DIRTY. The hash code does take a global lock to avoid double-hashing of the same page but this isn't something we should use elsewhere. So on hash based platforms, updates of the PTEs are expected to be done atomically. Now if you extend the size of the PTE, hopefully those atomic updates are still necessary but only for the -low- part of the PTE that contains those bits. For the non-SMP case, I think it should be possible to optimize it. The only thing that can happen at interrupt time is hashing of kernel or vmalloc/ioremap pages, which shouldn't compete with set_pte on those pages, so there would be no access races there, but I may be missing something as it's the morning and I about just woke up :-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical
Becky Bruce wrote: #if _PAGE_HASHPTE != 0 +#ifndef CONFIG_PTE_64BIT pte_update(ptep, ~_PAGE_HASHPTE, pte_val(pte) ~_PAGE_HASHPTE); #else + /* +* We have to do the write of the 64b pte as 2 stores. This +* code assumes that the entry we're storing to is currently +* not valid and that all callers have the page table lock. +* Having the entry be not valid protects readers who might read +* between the first and second stores. +*/ + unsigned int tmp; + + __asm__ __volatile__(\ +1: lwarx %0,0,%4\n\ + rlwimi %L2,%0,0,30,30\n\ + stw %2,0(%3)\n\ + eieio\n\ + stwcx. %L2,0,%4\n\ + bne-1b + : =r (tmp), =m (*ptep) + : r (pte), r (ptep), r ((unsigned long)(ptep) + 4), m (*ptep) + : cc); +#endif /* CONFIG_PTE_64BIT */ Could the stw to the same reservation granule as the stwcx cancel the reservation on some implementations? Plus, if you're assuming that the entry is currently invalid and all callers have the page table lock, do we need the lwarx/stwcx at all? At the least, it should check PTE_ATOMIC_UPDATES. %2 should be +r, not r. diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 7f65127..ca5b58b 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -128,18 +128,22 @@ config FSL_EMB_PERFMON config PTE_64BIT bool - depends on 44x || E500 + depends on 44x || E500 || 6xx default y if 44x - default y if E500 PHYS_64BIT + default y if PHYS_64BIT How is this different from PHYS_64BIT? config PHYS_64BIT - bool 'Large physical address support' if E500 - depends on 44x || E500 + bool 'Large physical address support' if E500 || 6xx Maybe if !44x, or have 44x select this, rather than listing all arches where it's optional. + depends on 44x || E500 || 6xx select RESOURCES_64BIT default y if 44x ---help--- This option enables kernel support for larger than 32-bit physical - addresses. This features is not be available on all e500 cores. + addresses. This features is not be available on all cores. This features is not be? -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev