Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Tuesday 20 March 2007 07:00, Nick Piggin wrote: > On Mon, Mar 19, 2007 at 09:44:28PM +0100, Blaisorblade wrote: > > On Sunday 18 March 2007 03:50, Nick Piggin wrote: > > > > > Yes, I believe that is the case, however I wonder if that is going > > > > > to be a problem for you to distinguish between write faults for > > > > > clean writable ptes, and write faults for readonly ptes? > > > > I wouldn't be able to distinguish them, but am I going to get write > > > > faults for clean ptes when vma_wants_writenotify() is false (as seems > > > > to be for tmpfs)? I guess not. > > > > For tmpfs pages, clean writable PTEs are mapped as writable so they > > > > won't give any problem, since vma_wants_writenotify() is false for > > > > tmpfs. Correct? > > > Yes, that should be the case. So would this mean that nonlinear > > > protections don't work on regular files? > > They still work in most cases (including for UML), but if the initial > > mmap() specified PROT_WRITE, that is ignored, for pages which are not > > remapped via remap_file_pages(). UML uses PROT_NONE for the initial mmap, > > so that's no problem. > But how are you going to distinguish a write fault on a readonly pte for > dirty page accounting vs a read-only nonlinear protection? Hmm... I was only thinking to PTEs which hadn't been remapped via remap_file_pages, but just faulted in with initial mmap() protection. For the other PTEs, however, I overlooked that the current code ignores vma_wants_writenotify(), i.e. breaks dirty page accounting for them, and I refused to even consider this opportunity, even without knowing the purposes of dirty pages accounting (I found the commits explaining this however). > You can't store any more data in a present pte AFAIK, so you'd have to > have some out of band data. At which point, you may as well just forget > about vma_wants_writenotify vmas, considering that everybody is using > shmem/ramfs. I was going to do that anyway. I'd guess that I should just disallow in remap_file_pages() the VM_MANYPROTS (i.e. MAP_CHGPROT in flags) && vma_wants_writenotify() combination, right? Ok, trivial (shouldn't even have pointed this out). -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Tuesday 20 March 2007 07:00, Nick Piggin wrote: On Mon, Mar 19, 2007 at 09:44:28PM +0100, Blaisorblade wrote: On Sunday 18 March 2007 03:50, Nick Piggin wrote: Yes, I believe that is the case, however I wonder if that is going to be a problem for you to distinguish between write faults for clean writable ptes, and write faults for readonly ptes? I wouldn't be able to distinguish them, but am I going to get write faults for clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? I guess not. For tmpfs pages, clean writable PTEs are mapped as writable so they won't give any problem, since vma_wants_writenotify() is false for tmpfs. Correct? Yes, that should be the case. So would this mean that nonlinear protections don't work on regular files? They still work in most cases (including for UML), but if the initial mmap() specified PROT_WRITE, that is ignored, for pages which are not remapped via remap_file_pages(). UML uses PROT_NONE for the initial mmap, so that's no problem. But how are you going to distinguish a write fault on a readonly pte for dirty page accounting vs a read-only nonlinear protection? Hmm... I was only thinking to PTEs which hadn't been remapped via remap_file_pages, but just faulted in with initial mmap() protection. For the other PTEs, however, I overlooked that the current code ignores vma_wants_writenotify(), i.e. breaks dirty page accounting for them, and I refused to even consider this opportunity, even without knowing the purposes of dirty pages accounting (I found the commits explaining this however). You can't store any more data in a present pte AFAIK, so you'd have to have some out of band data. At which point, you may as well just forget about vma_wants_writenotify vmas, considering that everybody is using shmem/ramfs. I was going to do that anyway. I'd guess that I should just disallow in remap_file_pages() the VM_MANYPROTS (i.e. MAP_CHGPROT in flags) vma_wants_writenotify() combination, right? Ok, trivial (shouldn't even have pointed this out). -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Mon, Mar 19, 2007 at 09:44:28PM +0100, Blaisorblade wrote: > On Sunday 18 March 2007 03:50, Nick Piggin wrote: > > > > > > > > Yes, I believe that is the case, however I wonder if that is going to > > > > be a problem for you to distinguish between write faults for clean > > > > writable ptes, and write faults for readonly ptes? > > > > > > I wouldn't be able to distinguish them, but am I going to get write > > > faults for clean ptes when vma_wants_writenotify() is false (as seems to > > > be for tmpfs)? I guess not. > > > > > > For tmpfs pages, clean writable PTEs are mapped as writable so they won't > > > give any problem, since vma_wants_writenotify() is false for tmpfs. > > > Correct? > > > > Yes, that should be the case. So would this mean that nonlinear protections > > don't work on regular files? > > They still work in most cases (including for UML), but if the initial mmap() > specified PROT_WRITE, that is ignored, for pages which are not remapped via > remap_file_pages(). UML uses PROT_NONE for the initial mmap, so that's no > problem. But how are you going to distinguish a write fault on a readonly pte for dirty page accounting vs a read-only nonlinear protection? You can't store any more data in a present pte AFAIK, so you'd have to have some out of band data. At which point, you may as well just forget about vma_wants_writenotify vmas, considering that everybody is using shmem/ramfs. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Sunday 18 March 2007 03:50, Nick Piggin wrote: > On Sat, Mar 17, 2007 at 01:17:00PM +0100, Blaisorblade wrote: > > On Tuesday 13 March 2007 02:19, Nick Piggin wrote: > > > On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote: > > > > On Wednesday 07 March 2007 11:02, Nick Piggin wrote: > > > > > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can > > > > > > live with that as well, then I think it might be a good option. > > > > > > > > > > Oh, hmm if you can truncate these things then you still need to > > > > > force unmap so you still need i_mmap_nonlinear. > > > > > > > > Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, > > > > which is way similar I guess. > > > > > > > > About the restriction to tmpfs, I have just discovered > > > > '[PATCH] mm: tracking shared dirty pages' (commit > > > > d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially > > > > conflicts with remap_file_pages for file-based mmaps (and that's > > > > fully fine, for now). > > > > > > > > Even if UML does not need it, till now if there is a VMA protection > > > > and a page hasn't been remapped with remap_file_pages, the VMA > > > > protection is used (just because it makes sense). > > > > > > > > However, it is only used when the PTE is first created - we can never > > > > change protections on a VMA - so it vma_wants_writenotify() is true > > > > (on all file-based and on no shmfs based mapping, right?), and we > > > > write-protect the VMA, it will always be write-protected. > > > > > > Yes, I believe that is the case, however I wonder if that is going to > > > be a problem for you to distinguish between write faults for clean > > > writable ptes, and write faults for readonly ptes? > > > > I wouldn't be able to distinguish them, but am I going to get write > > faults for clean ptes when vma_wants_writenotify() is false (as seems to > > be for tmpfs)? I guess not. > > > > For tmpfs pages, clean writable PTEs are mapped as writable so they won't > > give any problem, since vma_wants_writenotify() is false for tmpfs. > > Correct? > > Yes, that should be the case. So would this mean that nonlinear protections > don't work on regular files? They still work in most cases (including for UML), but if the initial mmap() specified PROT_WRITE, that is ignored, for pages which are not remapped via remap_file_pages(). UML uses PROT_NONE for the initial mmap, so that's no problem. > I guess that's OK if Oracle and UML both use > tmpfs/shm? -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Sun, Mar 18, 2007 at 03:50:10AM +0100, Nick Piggin wrote: > Yes, that should be the case. So would this mean that nonlinear protections > don't work on regular files? I guess that's OK if Oracle and UML both use > tmpfs/shm? Sometimes ramfs is also used in the Oracle case. I presume that's even simpler than tmpfs. (Hugetlb, while also used in for the same general buffer pool, is never used in conjunction with remap_file_pages() etc.) -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Sun, Mar 18, 2007 at 03:50:10AM +0100, Nick Piggin wrote: Yes, that should be the case. So would this mean that nonlinear protections don't work on regular files? I guess that's OK if Oracle and UML both use tmpfs/shm? Sometimes ramfs is also used in the Oracle case. I presume that's even simpler than tmpfs. (Hugetlb, while also used in for the same general buffer pool, is never used in conjunction with remap_file_pages() etc.) -- wli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Sunday 18 March 2007 03:50, Nick Piggin wrote: On Sat, Mar 17, 2007 at 01:17:00PM +0100, Blaisorblade wrote: On Tuesday 13 March 2007 02:19, Nick Piggin wrote: On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote: On Wednesday 07 March 2007 11:02, Nick Piggin wrote: Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with that as well, then I think it might be a good option. Oh, hmm if you can truncate these things then you still need to force unmap so you still need i_mmap_nonlinear. Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which is way similar I guess. About the restriction to tmpfs, I have just discovered '[PATCH] mm: tracking shared dirty pages' (commit d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts with remap_file_pages for file-based mmaps (and that's fully fine, for now). Even if UML does not need it, till now if there is a VMA protection and a page hasn't been remapped with remap_file_pages, the VMA protection is used (just because it makes sense). However, it is only used when the PTE is first created - we can never change protections on a VMA - so it vma_wants_writenotify() is true (on all file-based and on no shmfs based mapping, right?), and we write-protect the VMA, it will always be write-protected. Yes, I believe that is the case, however I wonder if that is going to be a problem for you to distinguish between write faults for clean writable ptes, and write faults for readonly ptes? I wouldn't be able to distinguish them, but am I going to get write faults for clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? I guess not. For tmpfs pages, clean writable PTEs are mapped as writable so they won't give any problem, since vma_wants_writenotify() is false for tmpfs. Correct? Yes, that should be the case. So would this mean that nonlinear protections don't work on regular files? They still work in most cases (including for UML), but if the initial mmap() specified PROT_WRITE, that is ignored, for pages which are not remapped via remap_file_pages(). UML uses PROT_NONE for the initial mmap, so that's no problem. I guess that's OK if Oracle and UML both use tmpfs/shm? -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Mon, Mar 19, 2007 at 09:44:28PM +0100, Blaisorblade wrote: On Sunday 18 March 2007 03:50, Nick Piggin wrote: Yes, I believe that is the case, however I wonder if that is going to be a problem for you to distinguish between write faults for clean writable ptes, and write faults for readonly ptes? I wouldn't be able to distinguish them, but am I going to get write faults for clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? I guess not. For tmpfs pages, clean writable PTEs are mapped as writable so they won't give any problem, since vma_wants_writenotify() is false for tmpfs. Correct? Yes, that should be the case. So would this mean that nonlinear protections don't work on regular files? They still work in most cases (including for UML), but if the initial mmap() specified PROT_WRITE, that is ignored, for pages which are not remapped via remap_file_pages(). UML uses PROT_NONE for the initial mmap, so that's no problem. But how are you going to distinguish a write fault on a readonly pte for dirty page accounting vs a read-only nonlinear protection? You can't store any more data in a present pte AFAIK, so you'd have to have some out of band data. At which point, you may as well just forget about vma_wants_writenotify vmas, considering that everybody is using shmem/ramfs. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Sun, Mar 18, 2007 at 03:50:10AM +0100, Nick Piggin wrote: > Yes, that should be the case. So would this mean that nonlinear protections > don't work on regular files? I guess that's OK if Oracle and UML both use > tmpfs/shm? It's OK for UML. Jeff -- Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Sun, Mar 18, 2007 at 03:50:10AM +0100, Nick Piggin wrote: Yes, that should be the case. So would this mean that nonlinear protections don't work on regular files? I guess that's OK if Oracle and UML both use tmpfs/shm? It's OK for UML. Jeff -- Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Sat, Mar 17, 2007 at 01:17:00PM +0100, Blaisorblade wrote: > On Tuesday 13 March 2007 02:19, Nick Piggin wrote: > > On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote: > > > On Wednesday 07 March 2007 11:02, Nick Piggin wrote: > > > > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live > > > > > with that as well, then I think it might be a good option. > > > > > > > > Oh, hmm if you can truncate these things then you still need to > > > > force unmap so you still need i_mmap_nonlinear. > > > > > > Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, > > > which is way similar I guess. > > > > > > About the restriction to tmpfs, I have just discovered > > > '[PATCH] mm: tracking shared dirty pages' (commit > > > d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially > > > conflicts with remap_file_pages for file-based mmaps (and that's fully > > > fine, for now). > > > > > > Even if UML does not need it, till now if there is a VMA protection and a > > > page hasn't been remapped with remap_file_pages, the VMA protection is > > > used (just because it makes sense). > > > > > > However, it is only used when the PTE is first created - we can never > > > change protections on a VMA - so it vma_wants_writenotify() is true (on > > > all file-based and on no shmfs based mapping, right?), and we > > > write-protect the VMA, it will always be write-protected. > > > > Yes, I believe that is the case, however I wonder if that is going to be > > a problem for you to distinguish between write faults for clean writable > > ptes, and write faults for readonly ptes? > I wouldn't be able to distinguish them, but am I going to get write faults > for > clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? > I guess not. > > For tmpfs pages, clean writable PTEs are mapped as writable so they won't > give > any problem, since vma_wants_writenotify() is false for tmpfs. Correct? Yes, that should be the case. So would this mean that nonlinear protections don't work on regular files? I guess that's OK if Oracle and UML both use tmpfs/shm? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Tuesday 13 March 2007 02:19, Nick Piggin wrote: > On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote: > > On Wednesday 07 March 2007 11:02, Nick Piggin wrote: > > > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live > > > > with that as well, then I think it might be a good option. > > > > > > Oh, hmm if you can truncate these things then you still need to > > > force unmap so you still need i_mmap_nonlinear. > > > > Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, > > which is way similar I guess. > > > > About the restriction to tmpfs, I have just discovered > > '[PATCH] mm: tracking shared dirty pages' (commit > > d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially > > conflicts with remap_file_pages for file-based mmaps (and that's fully > > fine, for now). > > > > Even if UML does not need it, till now if there is a VMA protection and a > > page hasn't been remapped with remap_file_pages, the VMA protection is > > used (just because it makes sense). > > > > However, it is only used when the PTE is first created - we can never > > change protections on a VMA - so it vma_wants_writenotify() is true (on > > all file-based and on no shmfs based mapping, right?), and we > > write-protect the VMA, it will always be write-protected. > > Yes, I believe that is the case, however I wonder if that is going to be > a problem for you to distinguish between write faults for clean writable > ptes, and write faults for readonly ptes? I wouldn't be able to distinguish them, but am I going to get write faults for clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? I guess not. For tmpfs pages, clean writable PTEs are mapped as writable so they won't give any problem, since vma_wants_writenotify() is false for tmpfs. Correct? > > Also, I'm curious. Since my patches are already changing > > remap_file_pages() code, should they be absolutely merged after yours? > > Is there a big clash? I don't think I did a great deal to fremap.c (mainly > just removing stuff)... Hopefully, we just both modify sys_remap_file_pages(), I'll see soon. -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Tuesday 13 March 2007 02:19, Nick Piggin wrote: On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote: On Wednesday 07 March 2007 11:02, Nick Piggin wrote: Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with that as well, then I think it might be a good option. Oh, hmm if you can truncate these things then you still need to force unmap so you still need i_mmap_nonlinear. Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which is way similar I guess. About the restriction to tmpfs, I have just discovered '[PATCH] mm: tracking shared dirty pages' (commit d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts with remap_file_pages for file-based mmaps (and that's fully fine, for now). Even if UML does not need it, till now if there is a VMA protection and a page hasn't been remapped with remap_file_pages, the VMA protection is used (just because it makes sense). However, it is only used when the PTE is first created - we can never change protections on a VMA - so it vma_wants_writenotify() is true (on all file-based and on no shmfs based mapping, right?), and we write-protect the VMA, it will always be write-protected. Yes, I believe that is the case, however I wonder if that is going to be a problem for you to distinguish between write faults for clean writable ptes, and write faults for readonly ptes? I wouldn't be able to distinguish them, but am I going to get write faults for clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? I guess not. For tmpfs pages, clean writable PTEs are mapped as writable so they won't give any problem, since vma_wants_writenotify() is false for tmpfs. Correct? Also, I'm curious. Since my patches are already changing remap_file_pages() code, should they be absolutely merged after yours? Is there a big clash? I don't think I did a great deal to fremap.c (mainly just removing stuff)... Hopefully, we just both modify sys_remap_file_pages(), I'll see soon. -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Sat, Mar 17, 2007 at 01:17:00PM +0100, Blaisorblade wrote: On Tuesday 13 March 2007 02:19, Nick Piggin wrote: On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote: On Wednesday 07 March 2007 11:02, Nick Piggin wrote: Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with that as well, then I think it might be a good option. Oh, hmm if you can truncate these things then you still need to force unmap so you still need i_mmap_nonlinear. Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which is way similar I guess. About the restriction to tmpfs, I have just discovered '[PATCH] mm: tracking shared dirty pages' (commit d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts with remap_file_pages for file-based mmaps (and that's fully fine, for now). Even if UML does not need it, till now if there is a VMA protection and a page hasn't been remapped with remap_file_pages, the VMA protection is used (just because it makes sense). However, it is only used when the PTE is first created - we can never change protections on a VMA - so it vma_wants_writenotify() is true (on all file-based and on no shmfs based mapping, right?), and we write-protect the VMA, it will always be write-protected. Yes, I believe that is the case, however I wonder if that is going to be a problem for you to distinguish between write faults for clean writable ptes, and write faults for readonly ptes? I wouldn't be able to distinguish them, but am I going to get write faults for clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? I guess not. For tmpfs pages, clean writable PTEs are mapped as writable so they won't give any problem, since vma_wants_writenotify() is false for tmpfs. Correct? Yes, that should be the case. So would this mean that nonlinear protections don't work on regular files? I guess that's OK if Oracle and UML both use tmpfs/shm? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote: > On Wednesday 07 March 2007 11:02, Nick Piggin wrote: > > > > > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with > > > that as well, then I think it might be a good option. > > > > Oh, hmm if you can truncate these things then you still need to > > force unmap so you still need i_mmap_nonlinear. > > Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which > is > way similar I guess. > > About the restriction to tmpfs, I have just discovered > '[PATCH] mm: tracking shared dirty pages' (commit > d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts > with remap_file_pages for file-based mmaps (and that's fully fine, for now). > > Even if UML does not need it, till now if there is a VMA protection and a > page > hasn't been remapped with remap_file_pages, the VMA protection is used (just > because it makes sense). > > However, it is only used when the PTE is first created - we can never change > protections on a VMA - so it vma_wants_writenotify() is true (on all > file-based and on no shmfs based mapping, right?), and we write-protect the > VMA, it will always be write-protected. Yes, I believe that is the case, however I wonder if that is going to be a problem for you to distinguish between write faults for clean writable ptes, and write faults for readonly ptes? > That's no problem for UML, but for any other user (I guess I'll have to > prevent callers from trying such stuff - I started from a pretty generic > patch). > > > But come to think of it, I still don't think nonlinear mappings are > > too bad as they are ;) > > Btw, I really like removing ->populate and merging the common code together. > filemap_populate and shmem_populate are so obnoxiously different that I > already wanted to do that (after merging remap_file_pages() core). Yeah they are also frustratingly similar to filemap_nopage and shmem_nopage, and duplicate a lot of the same code ;) > Also, I'm curious. Since my patches are already changing remap_file_pages() > code, should they be absolutely merged after yours? Is there a big clash? I don't think I did a great deal to fremap.c (mainly just removing stuff)... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wednesday 07 March 2007 11:02, Nick Piggin wrote: > On Wed, Mar 07, 2007 at 10:49:47AM +0100, Nick Piggin wrote: > > On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote: > > > On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote: > > > > Depending on whether anyone wants it, and what features they want, we > > > > could emulate the old syscall, and make a new restricted one which is > > > > much less intrusive. > > > > For example, if we can operate only on MAP_ANONYMOUS memory and > > > > specify that nonlinear mappings effectively mlock the pages, then we > > > > can get rid of all the objrmap and unmap_mapping_range handling, > > > > forget about the writeout and msync problems... > > > > > > Anonymous-only would make it a doorstop for Oracle, since its entire > > > motive for using it is to window into objects larger than user virtual > > > > Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem > > inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't > > have a file descriptor to get a pgoff, then remap_file_pages is a > > doorstop for everyone ;) > > > > > address spaces (this likely also applies to UML, though they should > > > really chime in to confirm). Restrictions to tmpfs and/or ramfs would > > > likely be liveable, though I suspect some things might want to do it to > > > shm segments (I'll ask about that one). There's definitely no need for > > > a persistent backing store for the object to be remapped in Oracle's > > > case, in any event. It's largely the in-core destination and source of > > > IO, not something saved on-disk itself. > > > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with > > that as well, then I think it might be a good option. > > Oh, hmm if you can truncate these things then you still need to > force unmap so you still need i_mmap_nonlinear. Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which is way similar I guess. About the restriction to tmpfs, I have just discovered '[PATCH] mm: tracking shared dirty pages' (commit d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts with remap_file_pages for file-based mmaps (and that's fully fine, for now). Even if UML does not need it, till now if there is a VMA protection and a page hasn't been remapped with remap_file_pages, the VMA protection is used (just because it makes sense). However, it is only used when the PTE is first created - we can never change protections on a VMA - so it vma_wants_writenotify() is true (on all file-based and on no shmfs based mapping, right?), and we write-protect the VMA, it will always be write-protected. That's no problem for UML, but for any other user (I guess I'll have to prevent callers from trying such stuff - I started from a pretty generic patch). > But come to think of it, I still don't think nonlinear mappings are > too bad as they are ;) Btw, I really like removing ->populate and merging the common code together. filemap_populate and shmem_populate are so obnoxiously different that I already wanted to do that (after merging remap_file_pages() core). Also, I'm curious. Since my patches are already changing remap_file_pages() code, should they be absolutely merged after yours? -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wednesday 07 March 2007 11:02, Nick Piggin wrote: On Wed, Mar 07, 2007 at 10:49:47AM +0100, Nick Piggin wrote: On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote: On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote: Depending on whether anyone wants it, and what features they want, we could emulate the old syscall, and make a new restricted one which is much less intrusive. For example, if we can operate only on MAP_ANONYMOUS memory and specify that nonlinear mappings effectively mlock the pages, then we can get rid of all the objrmap and unmap_mapping_range handling, forget about the writeout and msync problems... Anonymous-only would make it a doorstop for Oracle, since its entire motive for using it is to window into objects larger than user virtual Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't have a file descriptor to get a pgoff, then remap_file_pages is a doorstop for everyone ;) address spaces (this likely also applies to UML, though they should really chime in to confirm). Restrictions to tmpfs and/or ramfs would likely be liveable, though I suspect some things might want to do it to shm segments (I'll ask about that one). There's definitely no need for a persistent backing store for the object to be remapped in Oracle's case, in any event. It's largely the in-core destination and source of IO, not something saved on-disk itself. Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with that as well, then I think it might be a good option. Oh, hmm if you can truncate these things then you still need to force unmap so you still need i_mmap_nonlinear. Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which is way similar I guess. About the restriction to tmpfs, I have just discovered '[PATCH] mm: tracking shared dirty pages' (commit d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts with remap_file_pages for file-based mmaps (and that's fully fine, for now). Even if UML does not need it, till now if there is a VMA protection and a page hasn't been remapped with remap_file_pages, the VMA protection is used (just because it makes sense). However, it is only used when the PTE is first created - we can never change protections on a VMA - so it vma_wants_writenotify() is true (on all file-based and on no shmfs based mapping, right?), and we write-protect the VMA, it will always be write-protected. That's no problem for UML, but for any other user (I guess I'll have to prevent callers from trying such stuff - I started from a pretty generic patch). But come to think of it, I still don't think nonlinear mappings are too bad as they are ;) Btw, I really like removing -populate and merging the common code together. filemap_populate and shmem_populate are so obnoxiously different that I already wanted to do that (after merging remap_file_pages() core). Also, I'm curious. Since my patches are already changing remap_file_pages() code, should they be absolutely merged after yours? -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote: On Wednesday 07 March 2007 11:02, Nick Piggin wrote: Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with that as well, then I think it might be a good option. Oh, hmm if you can truncate these things then you still need to force unmap so you still need i_mmap_nonlinear. Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which is way similar I guess. About the restriction to tmpfs, I have just discovered '[PATCH] mm: tracking shared dirty pages' (commit d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts with remap_file_pages for file-based mmaps (and that's fully fine, for now). Even if UML does not need it, till now if there is a VMA protection and a page hasn't been remapped with remap_file_pages, the VMA protection is used (just because it makes sense). However, it is only used when the PTE is first created - we can never change protections on a VMA - so it vma_wants_writenotify() is true (on all file-based and on no shmfs based mapping, right?), and we write-protect the VMA, it will always be write-protected. Yes, I believe that is the case, however I wonder if that is going to be a problem for you to distinguish between write faults for clean writable ptes, and write faults for readonly ptes? That's no problem for UML, but for any other user (I guess I'll have to prevent callers from trying such stuff - I started from a pretty generic patch). But come to think of it, I still don't think nonlinear mappings are too bad as they are ;) Btw, I really like removing -populate and merging the common code together. filemap_populate and shmem_populate are so obnoxiously different that I already wanted to do that (after merging remap_file_pages() core). Yeah they are also frustratingly similar to filemap_nopage and shmem_nopage, and duplicate a lot of the same code ;) Also, I'm curious. Since my patches are already changing remap_file_pages() code, should they be absolutely merged after yours? Is there a big clash? I don't think I did a great deal to fremap.c (mainly just removing stuff)... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wednesday 07 March 2007 10:44, Bill Irwin wrote: > On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote: > > Depending on whether anyone wants it, and what features they want, we > > could emulate the old syscall, and make a new restricted one which is > > much less intrusive. > > For example, if we can operate only on MAP_ANONYMOUS memory and specify > > that nonlinear mappings effectively mlock the pages, then we can get > > rid of all the objrmap and unmap_mapping_range handling, forget about > > the writeout and msync problems... > > Anonymous-only would make it a doorstop for Oracle, since its entire > motive for using it is to window into objects larger than user virtual > address spaces (this likely also applies to UML, though they should > really chime in to confirm). We need it for shared file mappings (for tmpfs only). Our scenario is: RAM is implemented through a shared mapped file, kept on tmpfs (except by dumb users); various processes share an fd for this file (it's opened and immediately deleted). We maintain page tables in x86 style, and TLB flush is implemented through mmap()/munmap()/mprotect(). Having a VMA per each 4K is not the intended VMA usage: for instance, the default /proc/sys/vm/max_map_count (64K) is saturated by a UML process with 64K * 4K = 256M of resident memory. > Restrictions to tmpfs and/or ramfs would > likely be liveable, though I suspect some things might want to do it to > shm segments (I'll ask about that one). > There's definitely no need for a > persistent backing store for the object to be remapped in Oracle's case, > in any event. It's largely the in-core destination and source of IO, not > something saved on-disk itself. > > > -- wli -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wednesday 07 March 2007 10:44, Bill Irwin wrote: On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote: Depending on whether anyone wants it, and what features they want, we could emulate the old syscall, and make a new restricted one which is much less intrusive. For example, if we can operate only on MAP_ANONYMOUS memory and specify that nonlinear mappings effectively mlock the pages, then we can get rid of all the objrmap and unmap_mapping_range handling, forget about the writeout and msync problems... Anonymous-only would make it a doorstop for Oracle, since its entire motive for using it is to window into objects larger than user virtual address spaces (this likely also applies to UML, though they should really chime in to confirm). We need it for shared file mappings (for tmpfs only). Our scenario is: RAM is implemented through a shared mapped file, kept on tmpfs (except by dumb users); various processes share an fd for this file (it's opened and immediately deleted). We maintain page tables in x86 style, and TLB flush is implemented through mmap()/munmap()/mprotect(). Having a VMA per each 4K is not the intended VMA usage: for instance, the default /proc/sys/vm/max_map_count (64K) is saturated by a UML process with 64K * 4K = 256M of resident memory. Restrictions to tmpfs and/or ramfs would likely be liveable, though I suspect some things might want to do it to shm segments (I'll ask about that one). There's definitely no need for a persistent backing store for the object to be remapped in Oracle's case, in any event. It's largely the in-core destination and source of IO, not something saved on-disk itself. -- wli -- Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 02:52:12PM +0100, Peter Zijlstra wrote: > > Well I don't think UML uses nonlinear yet anyway, does it? Can they > > make do with restricting nonlinear to mlocked vmas, I wonder? Probably > > not. > > I think it does, but lets ask, Jeff? Nope, UML needs to be able to change permissions as well as locations. Would be nice, though, there are apparently nice UML speedups with it. Jeff -- Work email - jdike at linux dot intel dot com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 03:34:27PM +0100, Peter Zijlstra wrote: > On Wed, 2007-03-07 at 14:52 +0100, Peter Zijlstra wrote: > > > True. We could even guesstimate the nonlinear dirty pages by subtracting > > the result of page_mkclean() from page_mapcount() and force an > > msync(MS_ASYNC) on said mapping (or all (nonlinear) mappings of the > > related file) when some threshold gets exceeded. > > Almost, but not quite, we'd need to extract another value from the > page_mkclean() run, the actual number of mappings encountered. The > return value only sums the number of dirty mappings encountered. > > s390 would already work I guess. > > Certainly doable. But if we restrict it to root only, and have a note in the man page about it, then it really isn't worth cluttering up the kernel. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 02:53:07PM +0100, Miklos Szeredi wrote: > > > msync() might never get called and then we're back with the old > > > behaviour where we can surprise the VM with a ton of dirty pages. > > > > But we're root. With your patch, root *can't* do nonlinear writeback > > well. Ever. With msync, at least you give them enough rope. > > Restricting to root doesn't buy you much, nobody wants to be root. > Restricting to mlock is similarly pointless. UML _will_ want to get > swapped out if there's no activity. They could always not use nonlinear, or we could add a ulimit to the size of nonlinear vaddr allowed. > Restricting to tmpfs makes sense, but it's probably not what UML > wants. I think it is OK. They might want some persistent storage to migrate or something, but that can always be done by copying from tmpfs to a block based filesystem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 2007-03-07 at 14:52 +0100, Peter Zijlstra wrote: > True. We could even guesstimate the nonlinear dirty pages by subtracting > the result of page_mkclean() from page_mapcount() and force an > msync(MS_ASYNC) on said mapping (or all (nonlinear) mappings of the > related file) when some threshold gets exceeded. Almost, but not quite, we'd need to extract another value from the page_mkclean() run, the actual number of mappings encountered. The return value only sums the number of dirty mappings encountered. s390 would already work I guess. Certainly doable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
> > Well I don't think UML uses nonlinear yet anyway, does it? Can they > > make do with restricting nonlinear to mlocked vmas, I wonder? Probably > > not. > > I think it does, but lets ask, Jeff? Looks like it doesn't: $ grep -r remap_file_pages arch/um/ $ Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
> On Wed, Mar 07, 2007 at 02:19:22PM +0100, Peter Zijlstra wrote: > > On Wed, 2007-03-07 at 14:08 +0100, Nick Piggin wrote: > > > > > > > The thing is, I don't think anybody who uses these things cares > > > > > about any of the 'problems' you want to fix, do they? We are > > > > > interested in dirty pages only for the correctness issue, rather > > > > > than performance. Same as reclaim. > > > > > > > > If so, we can just stick to the dead slow but correct 'scan the full > > > > vma' page_mkclean() and nobody would ever trigger it. > > > > > > Not if we restricted it to root and mlocked tmpfs. But then why > > > wouldn't you just do it with the much more efficient msync walk, > > > so that if root does want to do writeout via these things, it does > > > not blow up? > > > > This is all used on ram based filesystems right, they all have > > BDI_CAP_NO_WRITEBACK afaik, so page_mkclean will never get called > > anyway. Mlock doesn't avoid getting page_mkclean called. > > > > Those who use this on a 'real' filesystem will get hit in the face by a > > linear scanning page_mkclean(), but AFAIK nobody does this anyway. > > But somebody might do it. I just don't know why you'd want to make > this _worse_ when the msync option would work? > > > Restricting it to root for such filesystems is unwanted, that'd severely > > handicap both UML and Oracle as I understand it (are there other users > > of this feature around?) > > Why? I think they all use tmpfs backings, don't they? > > > msync() might never get called and then we're back with the old > > behaviour where we can surprise the VM with a ton of dirty pages. > > But we're root. With your patch, root *can't* do nonlinear writeback > well. Ever. With msync, at least you give them enough rope. Restricting to root doesn't buy you much, nobody wants to be root. Restricting to mlock is similarly pointless. UML _will_ want to get swapped out if there's no activity. Restricting to tmpfs makes sense, but it's probably not what UML wants. Conclusion: there's no good solution for UML in kernel-space. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 2007-03-07 at 14:36 +0100, Nick Piggin wrote: > On Wed, Mar 07, 2007 at 02:19:22PM +0100, Peter Zijlstra wrote: > > On Wed, 2007-03-07 at 14:08 +0100, Nick Piggin wrote: > > > > > > > The thing is, I don't think anybody who uses these things cares > > > > > about any of the 'problems' you want to fix, do they? We are > > > > > interested in dirty pages only for the correctness issue, rather > > > > > than performance. Same as reclaim. > > > > > > > > If so, we can just stick to the dead slow but correct 'scan the full > > > > vma' page_mkclean() and nobody would ever trigger it. > > > > > > Not if we restricted it to root and mlocked tmpfs. But then why > > > wouldn't you just do it with the much more efficient msync walk, > > > so that if root does want to do writeout via these things, it does > > > not blow up? > > > > This is all used on ram based filesystems right, they all have > > BDI_CAP_NO_WRITEBACK afaik, so page_mkclean will never get called > > anyway. Mlock doesn't avoid getting page_mkclean called. > > > > Those who use this on a 'real' filesystem will get hit in the face by a > > linear scanning page_mkclean(), but AFAIK nobody does this anyway. > > But somebody might do it. I just don't know why you'd want to make > this _worse_ when the msync option would work? > > > Restricting it to root for such filesystems is unwanted, that'd severely > > handicap both UML and Oracle as I understand it (are there other users > > of this feature around?) > > Why? I think they all use tmpfs backings, don't they? Ooh, you only want to restrict remap_file_pages on mappings from bdi's without BDI_CAP_NO_WRITEBACK. Sure, I can live with that, and I suspect others can as well. > > msync() might never get called and then we're back with the old > > behaviour where we can surprise the VM with a ton of dirty pages. > > But we're root. With your patch, root *can't* do nonlinear writeback > well. Ever. With msync, at least you give them enough rope. True. We could even guesstimate the nonlinear dirty pages by subtracting the result of page_mkclean() from page_mapcount() and force an msync(MS_ASYNC) on said mapping (or all (nonlinear) mappings of the related file) when some threshold gets exceeded. > > > > What is the DoS scenario wrt reclaim? We really ought to fix that if > > > > real, those UML farms run on nothing but nonlinear reclaim I'd think. > > > > > > I guess you can just increase the computational complexity of > > > reclaim quite easily. > > > > Right, on first glance it doesn't look to be too bad, but I should take > > a closer look. > > Well I don't think UML uses nonlinear yet anyway, does it? Can they > make do with restricting nonlinear to mlocked vmas, I wonder? Probably > not. I think it does, but lets ask, Jeff? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 02:19:22PM +0100, Peter Zijlstra wrote: > On Wed, 2007-03-07 at 14:08 +0100, Nick Piggin wrote: > > > > > The thing is, I don't think anybody who uses these things cares > > > > about any of the 'problems' you want to fix, do they? We are > > > > interested in dirty pages only for the correctness issue, rather > > > > than performance. Same as reclaim. > > > > > > If so, we can just stick to the dead slow but correct 'scan the full > > > vma' page_mkclean() and nobody would ever trigger it. > > > > Not if we restricted it to root and mlocked tmpfs. But then why > > wouldn't you just do it with the much more efficient msync walk, > > so that if root does want to do writeout via these things, it does > > not blow up? > > This is all used on ram based filesystems right, they all have > BDI_CAP_NO_WRITEBACK afaik, so page_mkclean will never get called > anyway. Mlock doesn't avoid getting page_mkclean called. > > Those who use this on a 'real' filesystem will get hit in the face by a > linear scanning page_mkclean(), but AFAIK nobody does this anyway. But somebody might do it. I just don't know why you'd want to make this _worse_ when the msync option would work? > Restricting it to root for such filesystems is unwanted, that'd severely > handicap both UML and Oracle as I understand it (are there other users > of this feature around?) Why? I think they all use tmpfs backings, don't they? > msync() might never get called and then we're back with the old > behaviour where we can surprise the VM with a ton of dirty pages. But we're root. With your patch, root *can't* do nonlinear writeback well. Ever. With msync, at least you give them enough rope. > > > What is the DoS scenario wrt reclaim? We really ought to fix that if > > > real, those UML farms run on nothing but nonlinear reclaim I'd think. > > > > I guess you can just increase the computational complexity of > > reclaim quite easily. > > Right, on first glance it doesn't look to be too bad, but I should take > a closer look. Well I don't think UML uses nonlinear yet anyway, does it? Can they make do with restricting nonlinear to mlocked vmas, I wonder? Probably not. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 2007-03-07 at 14:08 +0100, Nick Piggin wrote: > > > The thing is, I don't think anybody who uses these things cares > > > about any of the 'problems' you want to fix, do they? We are > > > interested in dirty pages only for the correctness issue, rather > > > than performance. Same as reclaim. > > > > If so, we can just stick to the dead slow but correct 'scan the full > > vma' page_mkclean() and nobody would ever trigger it. > > Not if we restricted it to root and mlocked tmpfs. But then why > wouldn't you just do it with the much more efficient msync walk, > so that if root does want to do writeout via these things, it does > not blow up? This is all used on ram based filesystems right, they all have BDI_CAP_NO_WRITEBACK afaik, so page_mkclean will never get called anyway. Mlock doesn't avoid getting page_mkclean called. Those who use this on a 'real' filesystem will get hit in the face by a linear scanning page_mkclean(), but AFAIK nobody does this anyway. Restricting it to root for such filesystems is unwanted, that'd severely handicap both UML and Oracle as I understand it (are there other users of this feature around?) msync() might never get called and then we're back with the old behaviour where we can surprise the VM with a ton of dirty pages. > > What is the DoS scenario wrt reclaim? We really ought to fix that if > > real, those UML farms run on nothing but nonlinear reclaim I'd think. > > I guess you can just increase the computational complexity of > reclaim quite easily. Right, on first glance it doesn't look to be too bad, but I should take a closer look. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 01:41:26PM +0100, Peter Zijlstra wrote: > On Wed, 2007-03-07 at 13:17 +0100, Nick Piggin wrote: > > > > Tracking these ranges on a per-vma basis would avoid taking the mm wide > > > mmap_sem and so would be cheaper than regular vmas. > > > > > > Would that still be too expensive? > > > > Well you can today remap N pages in a file, arbitrarily for > > sizeof(pte_t)*tiny bit for the upper page tables + small constant > > for the vma. > > > > At best, you need an extra pointer to pte / vaddr, so you'd basically > > double memory overhead. > > I was hoping some form of range compression would gain something, but if > its a fully random mapping, then yes a shadow page table would be needed > (still looking into what a pte_chain is) > > > > > > Well, now they don't, but it could be done or even exploited as a DoS. > > > > > > > > But so could nonlinear page reclaim. I think we need to restrict > > > > nonlinear > > > > mappings to root if we're worried about that. > > > > > > Can't we just 'fix' it? > > > > The thing is, I don't think anybody who uses these things cares > > about any of the 'problems' you want to fix, do they? We are > > interested in dirty pages only for the correctness issue, rather > > than performance. Same as reclaim. > > If so, we can just stick to the dead slow but correct 'scan the full > vma' page_mkclean() and nobody would ever trigger it. Not if we restricted it to root and mlocked tmpfs. But then why wouldn't you just do it with the much more efficient msync walk, so that if root does want to do writeout via these things, it does not blow up? > What is the DoS scenario wrt reclaim? We really ought to fix that if > real, those UML farms run on nothing but nonlinear reclaim I'd think. I guess you can just increase the computational complexity of reclaim quite easily. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 2007-03-07 at 13:17 +0100, Nick Piggin wrote: > > Tracking these ranges on a per-vma basis would avoid taking the mm wide > > mmap_sem and so would be cheaper than regular vmas. > > > > Would that still be too expensive? > > Well you can today remap N pages in a file, arbitrarily for > sizeof(pte_t)*tiny bit for the upper page tables + small constant > for the vma. > > At best, you need an extra pointer to pte / vaddr, so you'd basically > double memory overhead. I was hoping some form of range compression would gain something, but if its a fully random mapping, then yes a shadow page table would be needed (still looking into what a pte_chain is) > > > > Well, now they don't, but it could be done or even exploited as a DoS. > > > > > > But so could nonlinear page reclaim. I think we need to restrict nonlinear > > > mappings to root if we're worried about that. > > > > Can't we just 'fix' it? > > The thing is, I don't think anybody who uses these things cares > about any of the 'problems' you want to fix, do they? We are > interested in dirty pages only for the correctness issue, rather > than performance. Same as reclaim. If so, we can just stick to the dead slow but correct 'scan the full vma' page_mkclean() and nobody would ever trigger it. What is the DoS scenario wrt reclaim? We really ought to fix that if real, those UML farms run on nothing but nonlinear reclaim I'd think. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 04:22:24AM -0800, Bill Irwin wrote: > On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote: > >> Well, now they don't, but it could be done or even exploited as a DoS. > > On Wed, Mar 07, 2007 at 12:00:36PM +0100, Nick Piggin wrote: > > But so could nonlinear page reclaim. I think we need to restrict nonlinear > > mappings to root if we're worried about that. > > Please not root. The users really don't want to be privileged. UML > itself is at least partly for use as privilege isolation of the guest > workload. Oracle has some of the same concerns itself, which is part of > why it uses separate processes heavily, even: to isolate instances from > each other. Well non-root users could be allowed to work on mlocked regions on tmpfs/shm. That way they avoid the pathological nonlinear problems, and can work within the mlock ulimit. That is, if we are worried about such a DoS. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote: >> Well, now they don't, but it could be done or even exploited as a DoS. On Wed, Mar 07, 2007 at 12:00:36PM +0100, Nick Piggin wrote: > But so could nonlinear page reclaim. I think we need to restrict nonlinear > mappings to root if we're worried about that. Please not root. The users really don't want to be privileged. UML itself is at least partly for use as privilege isolation of the guest workload. Oracle has some of the same concerns itself, which is part of why it uses separate processes heavily, even: to isolate instances from each other. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 12:48:06PM +0100, Peter Zijlstra wrote: > On Wed, 2007-03-07 at 12:00 +0100, Nick Piggin wrote: > > On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote: > > > On Wed, 2007-03-07 at 11:38 +0100, Nick Piggin wrote: > > > > > > > > > There are real users who want these fast, though. > > > > > > > > > > Yeah, why don't we have a tree per nonlinear vma to find these pages? > > > > > > > > > > wli mentions shadow page tables.. > > > > > > > > We could do something more efficient, but I thought that half the point > > > > was that they didn't carry any of this extra memory, and they could be > > > > really fast to set up at the expense of efficiency elsewhere. > > > > > > I'm failing to understand this :-( > > > > > > That extra memory, and apparently they don't want the inefficiency > > s/T/W/ > > > > either. > > > > Sorry, I didn't understand your misunderstandings ;) > > Bah, my brain is thick and foggy today. Let us try again; > > Nonlinear vmas exist because many vmas are expensive somehow, right? > Nonlinear vmas keep the page mapping in the page tables and screw rmaps. > > This 'extra memory' you mentioned would be the overhead of tracking the > actual ranges? > > And apparently now we want it to not suck on the rmap case :-( Do we? I think just "work" is the way we've been handling them up until now. Making them suck less for rmap makes them suck more for what they're good at. > Anyway, if used on a non writeback capable backing store (ramfs) > page_mkclean will never be called. If also mlocked (I think oracle does > this) then page reclaim will pass over too. > > So we're only interested in the bdi_cap_accounting_dirty and VM_SHARED > case, right? > > Tracking these ranges on a per-vma basis would avoid taking the mm wide > mmap_sem and so would be cheaper than regular vmas. > > Would that still be too expensive? Well you can today remap N pages in a file, arbitrarily for sizeof(pte_t)*tiny bit for the upper page tables + small constant for the vma. At best, you need an extra pointer to pte / vaddr, so you'd basically double memory overhead. > > > Well, now they don't, but it could be done or even exploited as a DoS. > > > > But so could nonlinear page reclaim. I think we need to restrict nonlinear > > mappings to root if we're worried about that. > > Can't we just 'fix' it? The thing is, I don't think anybody who uses these things cares about any of the 'problems' you want to fix, do they? We are interested in dirty pages only for the correctness issue, rather than performance. Same as reclaim. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 2007-03-07 at 12:00 +0100, Nick Piggin wrote: > On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote: > > On Wed, 2007-03-07 at 11:38 +0100, Nick Piggin wrote: > > > > > > > There are real users who want these fast, though. > > > > > > > > Yeah, why don't we have a tree per nonlinear vma to find these pages? > > > > > > > > wli mentions shadow page tables.. > > > > > > We could do something more efficient, but I thought that half the point > > > was that they didn't carry any of this extra memory, and they could be > > > really fast to set up at the expense of efficiency elsewhere. > > > > I'm failing to understand this :-( > > > > That extra memory, and apparently they don't want the inefficiency s/T/W/ > > either. > > Sorry, I didn't understand your misunderstandings ;) Bah, my brain is thick and foggy today. Let us try again; Nonlinear vmas exist because many vmas are expensive somehow, right? Nonlinear vmas keep the page mapping in the page tables and screw rmaps. This 'extra memory' you mentioned would be the overhead of tracking the actual ranges? And apparently now we want it to not suck on the rmap case :-( Anyway, if used on a non writeback capable backing store (ramfs) page_mkclean will never be called. If also mlocked (I think oracle does this) then page reclaim will pass over too. So we're only interested in the bdi_cap_accounting_dirty and VM_SHARED case, right? Tracking these ranges on a per-vma basis would avoid taking the mm wide mmap_sem and so would be cheaper than regular vmas. Would that still be too expensive? > > Well, now they don't, but it could be done or even exploited as a DoS. > > But so could nonlinear page reclaim. I think we need to restrict nonlinear > mappings to root if we're worried about that. Can't we just 'fix' it? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote: > On Wed, 2007-03-07 at 11:38 +0100, Nick Piggin wrote: > > > > > There are real users who want these fast, though. > > > > > > Yeah, why don't we have a tree per nonlinear vma to find these pages? > > > > > > wli mentions shadow page tables.. > > > > We could do something more efficient, but I thought that half the point > > was that they didn't carry any of this extra memory, and they could be > > really fast to set up at the expense of efficiency elsewhere. > > I'm failing to understand this :-( > > That extra memory, and apparently they don't want the inefficiency > either. Sorry, I didn't understand your misunderstandings ;) > > > I don't see it being a big deal. I doubt anybody is writing out huge > > amounts of data via nonlinear mappings. > > Well, now they don't, but it could be done or even exploited as a DoS. But so could nonlinear page reclaim. I think we need to restrict nonlinear mappings to root if we're worried about that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 2007-03-07 at 11:38 +0100, Nick Piggin wrote: > > > There are real users who want these fast, though. > > > > Yeah, why don't we have a tree per nonlinear vma to find these pages? > > > > wli mentions shadow page tables.. > > We could do something more efficient, but I thought that half the point > was that they didn't carry any of this extra memory, and they could be > really fast to set up at the expense of efficiency elsewhere. I'm failing to understand this :-( That extra memory, and apparently they don't want the inefficiency either. > I don't see it being a big deal. I doubt anybody is writing out huge > amounts of data via nonlinear mappings. Well, now they don't, but it could be done or even exploited as a DoS. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 2007-03-07 at 11:17 +0100, Nick Piggin wrote: > On Wed, Mar 07, 2007 at 11:05:48AM +0100, Benjamin Herrenschmidt wrote: > > > > > > NOPAGE_REFAULT is removed. This should be implemented with ->fault, and > > > > no users have hit mainline yet. > > > > > > Did benh agree with that? > > > > I won't use NOPAGE_REFAULT, I use NOPFN_REFAULT and that has hit > > mainline. I will switch to ->fault when I have time to adapt the code, > > in the meantime, NOPFN_REFAULT should stay. > > I think I removed not only NOFPN_REFAULT, but also nopfn itself, *and* > adapted the code for you ;) it is in patch 5/6, sent a while ago. Ok, I need to look. I've been travelling, having meeting etc... for the last couple of weeks and I'm taking a week off next week :-) Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 11:24:45AM +0100, Peter Zijlstra wrote: > On Wed, 2007-03-07 at 11:21 +0100, Nick Piggin wrote: > > On Wed, Mar 07, 2007 at 11:13:20AM +0100, Miklos Szeredi wrote: > > > > *sigh* yes was looking at all that code, thats gonna be darn slow > > > > though, but I'll whip up a patch. > > > > > > Well, if it's going to be darn slow, maybe it's better to go with > > > mingo's plan on emulating nonlinear vmas with linear ones. That'll be > > > > There are real users who want these fast, though. > > Yeah, why don't we have a tree per nonlinear vma to find these pages? > > wli mentions shadow page tables.. We could do something more efficient, but I thought that half the point was that they didn't carry any of this extra memory, and they could be really fast to set up at the expense of efficiency elsewhere. > > > darn slow as well, but at least it will be much less complicated. > > > > IMO, the best thing to do is just restore msync behaviour, and comment > > the fact that we ignore nonlinears. We need to restore msync behaviour > > to fix races in regular mappings anyway, at least for now. > > Seems to be the best quick solution indeed. If we fix the race in the linear mappings, then we can just do the full msync for nonlinear vmas, and the fast noop version for everyone else. I don't see it being a big deal. I doubt anybody is writing out huge amounts of data via nonlinear mappings. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 2007-03-07 at 11:21 +0100, Nick Piggin wrote: > On Wed, Mar 07, 2007 at 11:13:20AM +0100, Miklos Szeredi wrote: > > > *sigh* yes was looking at all that code, thats gonna be darn slow > > > though, but I'll whip up a patch. > > > > Well, if it's going to be darn slow, maybe it's better to go with > > mingo's plan on emulating nonlinear vmas with linear ones. That'll be > > There are real users who want these fast, though. Yeah, why don't we have a tree per nonlinear vma to find these pages? wli mentions shadow page tables.. > > darn slow as well, but at least it will be much less complicated. > > IMO, the best thing to do is just restore msync behaviour, and comment > the fact that we ignore nonlinears. We need to restore msync behaviour > to fix races in regular mappings anyway, at least for now. Seems to be the best quick solution indeed. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 11:13:20AM +0100, Miklos Szeredi wrote: > > *sigh* yes was looking at all that code, thats gonna be darn slow > > though, but I'll whip up a patch. > > Well, if it's going to be darn slow, maybe it's better to go with > mingo's plan on emulating nonlinear vmas with linear ones. That'll be There are real users who want these fast, though. > darn slow as well, but at least it will be much less complicated. IMO, the best thing to do is just restore msync behaviour, and comment the fact that we ignore nonlinears. We need to restore msync behaviour to fix races in regular mappings anyway, at least for now. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 11:05:48AM +0100, Benjamin Herrenschmidt wrote: > > > > NOPAGE_REFAULT is removed. This should be implemented with ->fault, and > > > no users have hit mainline yet. > > > > Did benh agree with that? > > I won't use NOPAGE_REFAULT, I use NOPFN_REFAULT and that has hit > mainline. I will switch to ->fault when I have time to adapt the code, > in the meantime, NOPFN_REFAULT should stay. I think I removed not only NOFPN_REFAULT, but also nopfn itself, *and* adapted the code for you ;) it is in patch 5/6, sent a while ago. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
> *sigh* yes was looking at all that code, thats gonna be darn slow > though, but I'll whip up a patch. Well, if it's going to be darn slow, maybe it's better to go with mingo's plan on emulating nonlinear vmas with linear ones. That'll be darn slow as well, but at least it will be much less complicated. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
> > NOPAGE_REFAULT is removed. This should be implemented with ->fault, and > > no users have hit mainline yet. > > Did benh agree with that? I won't use NOPAGE_REFAULT, I use NOPFN_REFAULT and that has hit mainline. I will switch to ->fault when I have time to adapt the code, in the meantime, NOPFN_REFAULT should stay. Note that one thing we really want with the new ->fault (though I haven't looked at the patches lately to see if it's available) is to be able to differenciate faults coming from userspace from faults coming from the kernel. The major difference is that the former can be re-executed to handle signals, the later can't. Thus waiting in the fault handler can be made interruptible in the former case, not in the later case. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 7 Mar 2007 01:29:03 -0800 Bill Irwin <[EMAIL PROTECTED]> wrote: >> Guess what major real-life application not only uses nonlinear daily >> but would even be very happy to see it extended with non-vma-creating >> protections and more? On Wed, Mar 07, 2007 at 01:39:42AM -0800, Andrew Morton wrote: > uh-oh. SQL server? Close enough. ;) On Wed, 7 Mar 2007 01:29:03 -0800 Bill Irwin <[EMAIL PROTECTED]> wrote: >> It's not terribly typical for things to be >> truncated while remap_file_pages() is doing its work, though it's been >> proposed as a method of dynamism. It won't stress remap_file_pages() vs. >> truncate() in any meaningful way, though, as userspace will be rather >> diligent about clearing in-use data out of the file offset range to be >> truncated away anyway, and all that via O_DIRECT. On Wed, Mar 07, 2007 at 01:39:42AM -0800, Andrew Morton wrote: > The problem here isn't related to truncate or direct-IO. It's just > plain-old MAP_SHARED. nonlinear VMAs are now using the old-style > dirty-memory management. msync() is basically a no-op and the code is > wildly tricky and pretty much untested. The chances that we broke it are > considerable. This would be of concern for swapping out tmpfs-backed nonlinearly- mapped files under extreme stress in Oracle's case, though it's rather typical for it all to be mlock()'d in-core and cases where that's necessary to be considered grossly underprovisioned. As far as I know, msync() is not used to manage the nonlinearly-mapped objects, which are most typically expected to be memory-backed, rendering writeback to disk of questionable value. Also quite happily, I'm not aware of any data integrity issues it would explain. Bug though it may be, it requires a usage model very rarely used by Oracle to trigger, so we've not run into it. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 2007-03-07 at 11:04 +0100, Nick Piggin wrote: > On Wed, Mar 07, 2007 at 10:45:03AM +0100, Nick Piggin wrote: > > On Wed, Mar 07, 2007 at 10:32:22AM +0100, Peter Zijlstra wrote: > > > > > > Can recollect as much, I modelled it after page_referenced() and can't > > > find any VM_NONLINEAR specific code in there either. > > > > > > Will have a hard look, but if its broken, then page_referenced if > > > equally broken it seems, which would make page reclaim funny in the > > > light of nonlinear mappings. > > > > page_referenced is just an heuristic, and it ignores nonlinear mappings > > and the page which will get filtered down to try_to_unmap. > > > > Page reclaim is already "funny" for nonlinear mappings, page_referenced > > is the least of its worries ;) It works, though. > > Or, to be more helpful, unmap_mapping_range is what it should be > modelled on. *sigh* yes was looking at all that code, thats gonna be darn slow though, but I'll whip up a patch. /me feels terribly bad about having missed this.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 10:45:03AM +0100, Nick Piggin wrote: > On Wed, Mar 07, 2007 at 10:32:22AM +0100, Peter Zijlstra wrote: > > > > Can recollect as much, I modelled it after page_referenced() and can't > > find any VM_NONLINEAR specific code in there either. > > > > Will have a hard look, but if its broken, then page_referenced if > > equally broken it seems, which would make page reclaim funny in the > > light of nonlinear mappings. > > page_referenced is just an heuristic, and it ignores nonlinear mappings > and the page which will get filtered down to try_to_unmap. > > Page reclaim is already "funny" for nonlinear mappings, page_referenced > is the least of its worries ;) It works, though. Or, to be more helpful, unmap_mapping_range is what it should be modelled on. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 10:49:47AM +0100, Nick Piggin wrote: > On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote: > > On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote: > > > Depending on whether anyone wants it, and what features they want, we > > > could emulate the old syscall, and make a new restricted one which is > > > much less intrusive. > > > For example, if we can operate only on MAP_ANONYMOUS memory and specify > > > that nonlinear mappings effectively mlock the pages, then we can get > > > rid of all the objrmap and unmap_mapping_range handling, forget about > > > the writeout and msync problems... > > > > Anonymous-only would make it a doorstop for Oracle, since its entire > > motive for using it is to window into objects larger than user virtual > > Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem > inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't > have a file descriptor to get a pgoff, then remap_file_pages is a doorstop > for everyone ;) > > > address spaces (this likely also applies to UML, though they should > > really chime in to confirm). Restrictions to tmpfs and/or ramfs would > > likely be liveable, though I suspect some things might want to do it to > > shm segments (I'll ask about that one). There's definitely no need for a > > persistent backing store for the object to be remapped in Oracle's case, > > in any event. It's largely the in-core destination and source of IO, not > > something saved on-disk itself. > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with > that as well, then I think it might be a good option. Oh, hmm if you can truncate these things then you still need to force unmap so you still need i_mmap_nonlinear. But come to think of it, I still don't think nonlinear mappings are too bad as they are ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote: >>> ok. What do you think about the sys_remap_file_pages_prot() thing that >>> Paolo has done in a nicely split up form - does that complicate things >>> in any fundamental way? That is what is useful to UML. * Bill Irwin <[EMAIL PROTECTED]> wrote: >> Oracle would love it. You don't want to know how far back I've been >> asked to backport that. On Wed, Mar 07, 2007 at 10:35:18AM +0100, Ingo Molnar wrote: > ok, cool! Then the first step would be for you to talk to Paolo and to > pick up the patches, review them, nurse it in -mm, etc. Suffering in > silence is just a pointless act of masochism, not an efficient > upstream-merge tactic ;-) It was intended for use in a debugging mode for the database, so given the general mood where fighting backouts was an issue, I was relatively loath to bring it up. With UML behind it I don't feel that's as much of a concern. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote: > > * Nick Piggin <[EMAIL PROTECTED]> wrote: > > > After these patches, I don't think there is too much burden. The main > > thing left really is just the objrmap stuff, but that is just handled > > with a minimal 'dumb' algorithm that doesn't cost much. > > ok. What do you think about the sys_remap_file_pages_prot() thing that > Paolo has done in a nicely split up form - does that complicate things > in any fundamental way? That is what is useful to UML. Last time I looked (a while ago), the only issue I had was that he was doing a weird special case rather than using another !present pte bit for his "nonlinear protection" ptes. I think he fixed that now and so it should be quite good now. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote: > On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote: > > Depending on whether anyone wants it, and what features they want, we > > could emulate the old syscall, and make a new restricted one which is > > much less intrusive. > > For example, if we can operate only on MAP_ANONYMOUS memory and specify > > that nonlinear mappings effectively mlock the pages, then we can get > > rid of all the objrmap and unmap_mapping_range handling, forget about > > the writeout and msync problems... > > Anonymous-only would make it a doorstop for Oracle, since its entire > motive for using it is to window into objects larger than user virtual Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't have a file descriptor to get a pgoff, then remap_file_pages is a doorstop for everyone ;) > address spaces (this likely also applies to UML, though they should > really chime in to confirm). Restrictions to tmpfs and/or ramfs would > likely be liveable, though I suspect some things might want to do it to > shm segments (I'll ask about that one). There's definitely no need for a > persistent backing store for the object to be remapped in Oracle's case, > in any event. It's largely the in-core destination and source of IO, not > something saved on-disk itself. Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with that as well, then I think it might be a good option. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote: > Depending on whether anyone wants it, and what features they want, we > could emulate the old syscall, and make a new restricted one which is > much less intrusive. > For example, if we can operate only on MAP_ANONYMOUS memory and specify > that nonlinear mappings effectively mlock the pages, then we can get > rid of all the objrmap and unmap_mapping_range handling, forget about > the writeout and msync problems... Anonymous-only would make it a doorstop for Oracle, since its entire motive for using it is to window into objects larger than user virtual address spaces (this likely also applies to UML, though they should really chime in to confirm). Restrictions to tmpfs and/or ramfs would likely be liveable, though I suspect some things might want to do it to shm segments (I'll ask about that one). There's definitely no need for a persistent backing store for the object to be remapped in Oracle's case, in any event. It's largely the in-core destination and source of IO, not something saved on-disk itself. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 10:32:22AM +0100, Peter Zijlstra wrote: > On Wed, 2007-03-07 at 01:07 -0800, Andrew Morton wrote: > > On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > > > > > Dirty page accounting doesn't work either on > > > > > non-linear mappings > > > > > > > > It doesn't? Confused - these things don't have anything to do with each > > > > other do they? > > > > > > Look in page_mkclean(). Where does it handle non-linear mappings? > > > > > > > OK, I'd forgotten about that. It won't break dirty memory accounting, > > but it'll potentially break dirty memory balancing. > > > > If we have the wrong page (due to nonlinear), page_check_address() will > > fail and we'll leave the pte dirty. That puts us back to the pre-2.6.17 > > algorithms and I guess it'll break the msync guarantees. > > > > Peter, I thought we went through the nonlinear problem ages ago and decided > > it was OK? > > Can recollect as much, I modelled it after page_referenced() and can't > find any VM_NONLINEAR specific code in there either. > > Will have a hard look, but if its broken, then page_referenced if > equally broken it seems, which would make page reclaim funny in the > light of nonlinear mappings. page_referenced is just an heuristic, and it ignores nonlinear mappings and the page which will get filtered down to try_to_unmap. Page reclaim is already "funny" for nonlinear mappings, page_referenced is the least of its worries ;) It works, though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 01:26:38AM -0800, Andrew Morton wrote: > On Wed, 7 Mar 2007 10:18:23 +0100 Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > > msync breakage is bad, but otherwise I don't know that we care about > > dirty page writeout efficiency. > > Well. We made so many changes to support the synchronous > dirty-the-page-when-we-dirty-the-pte thing that I'm rather doubtful that > the old-style approach still works. It might seem to, most of the time. > But if it _is_ subtly broken, boy it's going to take a long time for us to > find out. I can't think of anything that should have caused breakage (except for the msync thing). We're still careful about not dropping pte dirty bits. > > But I think we discovered that those msync changes are bogus anyway > > becuase there is a small race window where pte could be dirtied without > > page being set dirty? > > Dunno, I don't recall that. We dirty the page before the pte... I don't think it isn't really that simple. There is a big comment in clear_page_dirty_for_io. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 7 Mar 2007 01:29:03 -0800 Bill Irwin <[EMAIL PROTECTED]> wrote: > On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > >> btw., if we decide that nonlinear isnt worth the continuing maintainance > >> pain, we could internally implement/emulate sys_remap_file_pages() via a > >> call to mremap() and essentially deprecate it, without breaking the ABI > >> - and remove all the nonlinear code. (This would split fremap areas into > >> separate vmas) > > On Wed, Mar 07, 2007 at 12:35:20AM -0800, Andrew Morton wrote: > > I'm rather regretting having merged it - I don't think it has been used for > > much. > > Paolo's UML speedup patches might use nonlinear though. > > Guess what major real-life application not only uses nonlinear daily > but would even be very happy to see it extended with non-vma-creating > protections and more? uh-oh. SQL server? > It's not terribly typical for things to be > truncated while remap_file_pages() is doing its work, though it's been > proposed as a method of dynamism. It won't stress remap_file_pages() vs. > truncate() in any meaningful way, though, as userspace will be rather > diligent about clearing in-use data out of the file offset range to be > truncated away anyway, and all that via O_DIRECT. The problem here isn't related to truncate or direct-IO. It's just plain-old MAP_SHARED. nonlinear VMAs are now using the old-style dirty-memory management. msync() is basically a no-op and the code is wildly tricky and pretty much untested. The chances that we broke it are considerable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
* Bill Irwin <[EMAIL PROTECTED]> wrote: > * Nick Piggin <[EMAIL PROTECTED]> wrote: > >> After these patches, I don't think there is too much burden. The main > >> thing left really is just the objrmap stuff, but that is just handled > >> with a minimal 'dumb' algorithm that doesn't cost much. > > On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote: > > ok. What do you think about the sys_remap_file_pages_prot() thing that > > Paolo has done in a nicely split up form - does that complicate things > > in any fundamental way? That is what is useful to UML. > > Oracle would love it. You don't want to know how far back I've been > asked to backport that. ok, cool! Then the first step would be for you to talk to Paolo and to pick up the patches, review them, nurse it in -mm, etc. Suffering in silence is just a pointless act of masochism, not an efficient upstream-merge tactic ;-) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 2007-03-07 at 01:07 -0800, Andrew Morton wrote: > On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > > > Dirty page accounting doesn't work either on > > > > non-linear mappings > > > > > > It doesn't? Confused - these things don't have anything to do with each > > > other do they? > > > > Look in page_mkclean(). Where does it handle non-linear mappings? > > > > OK, I'd forgotten about that. It won't break dirty memory accounting, > but it'll potentially break dirty memory balancing. > > If we have the wrong page (due to nonlinear), page_check_address() will > fail and we'll leave the pte dirty. That puts us back to the pre-2.6.17 > algorithms and I guess it'll break the msync guarantees. > > Peter, I thought we went through the nonlinear problem ages ago and decided > it was OK? Can recollect as much, I modelled it after page_referenced() and can't find any VM_NONLINEAR specific code in there either. Will have a hard look, but if its broken, then page_referenced if equally broken it seems, which would make page reclaim funny in the light of nonlinear mappings. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
* Nick Piggin <[EMAIL PROTECTED]> wrote: >> After these patches, I don't think there is too much burden. The main >> thing left really is just the objrmap stuff, but that is just handled >> with a minimal 'dumb' algorithm that doesn't cost much. On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote: > ok. What do you think about the sys_remap_file_pages_prot() thing that > Paolo has done in a nicely split up form - does that complicate things > in any fundamental way? That is what is useful to UML. Oracle would love it. You don't want to know how far back I've been asked to backport that. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
> > But I think we discovered that those msync changes are bogus anyway > > becuase there is a small race window where pte could be dirtied without > > page being set dirty? > > Dunno, I don't recall that. We dirty the page before the pte... That's the one I just submitted a fix for ;) http://lkml.org/lkml/2007/3/6/308 Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: >> btw., if we decide that nonlinear isnt worth the continuing maintainance >> pain, we could internally implement/emulate sys_remap_file_pages() via a >> call to mremap() and essentially deprecate it, without breaking the ABI >> - and remove all the nonlinear code. (This would split fremap areas into >> separate vmas) On Wed, Mar 07, 2007 at 12:35:20AM -0800, Andrew Morton wrote: > I'm rather regretting having merged it - I don't think it has been used for > much. > Paolo's UML speedup patches might use nonlinear though. Guess what major real-life application not only uses nonlinear daily but would even be very happy to see it extended with non-vma-creating protections and more? It's not terribly typical for things to be truncated while remap_file_pages() is doing its work, though it's been proposed as a method of dynamism. It won't stress remap_file_pages() vs. truncate() in any meaningful way, though, as userspace will be rather diligent about clearing in-use data out of the file offset range to be truncated away anyway, and all that via O_DIRECT. -- wli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 7 Mar 2007 10:18:23 +0100 Nick Piggin <[EMAIL PROTECTED]> wrote: > On Wed, Mar 07, 2007 at 01:07:56AM -0800, Andrew Morton wrote: > > On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > > > > > Dirty page accounting doesn't work either on > > > > > non-linear mappings > > > > > > > > It doesn't? Confused - these things don't have anything to do with each > > > > other do they? > > > > > > Look in page_mkclean(). Where does it handle non-linear mappings? > > > > > > > OK, I'd forgotten about that. It won't break dirty memory accounting, > > but it'll potentially break dirty memory balancing. > > > > If we have the wrong page (due to nonlinear), page_check_address() will > > fail and we'll leave the pte dirty. That puts us back to the pre-2.6.17 > > algorithms and I guess it'll break the msync guarantees. > > > > Peter, I thought we went through the nonlinear problem ages ago and decided > > it was OK? > > msync breakage is bad, but otherwise I don't know that we care about > dirty page writeout efficiency. Well. We made so many changes to support the synchronous dirty-the-page-when-we-dirty-the-pte thing that I'm rather doubtful that the old-style approach still works. It might seem to, most of the time. But if it _is_ subtly broken, boy it's going to take a long time for us to find out. > But I think we discovered that those msync changes are bogus anyway > becuase there is a small race window where pte could be dirtied without > page being set dirty? Dunno, I don't recall that. We dirty the page before the pte... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 09:53:23AM +0100, Ingo Molnar wrote: > > * Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > btw., if we decide that nonlinear isnt worth the continuing > > > maintainance pain, we could internally implement/emulate > > > sys_remap_file_pages() via a call to mremap() and essentially > > > deprecate it, without breaking the ABI - and remove all the > > > nonlinear code. (This would split fremap areas into separate vmas) > > > > > > > I'm rather regretting having merged it - I don't think it has been > > used for much. > > > > Paolo's UML speedup patches might use nonlinear though. > > yes, i wrote the first, prototype version of that for UML, it needs an > extended version of the syscall, sys_remap_file_pages_prot(): > > > http://redhat.com/~mingo/remap-file-pages-patches/remap-file-pages-prot-2.6.4-rc1-mm1-A1 > > i also wrote an x86 hypervisor kind of thing for UML, called > 'sys_vcpu()', which allows UML to execute guest user-mode in a box, > which also relies on sys_remap_file_pages_prot(): > > http://redhat.com/~mingo/remap-file-pages-patches/vcpu-2.6.4-rc2-mm1-A2 > > which reduced the UML guest syscall overhead from 30 usecs to 4 usecs > (with native syscalls taking 2 usecs, on the box i tested, years ago). > > So it certainly looked useful to me - but wasnt really picked up widely. > > We'll always have the option to get rid of it (and hence completely > reverse the decision to merge it) without breaking the ABI, by emulating > the API via mremap(). That eliminates the UML speedup though. So no need > to feel sorry about having merged it, we can easily revisit that > years-old 'do we want it' decision, without any ABI worries. Depending on whether anyone wants it, and what features they want, we could emulate the old syscall, and make a new restricted one which is much less intrusive. For example, if we can operate only on MAP_ANONYMOUS memory and specify that nonlinear mappings effectively mlock the pages, then we can get rid of all the objrmap and unmap_mapping_range handling, forget about the writeout and msync problems... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
> > > > Look in page_mkclean(). Where does it handle non-linear mappings? > > > > OK, I'd forgotten about that. It won't break dirty memory accounting, > but it'll potentially break dirty memory balancing. > > If we have the wrong page (due to nonlinear), page_check_address() will > fail and we'll leave the pte dirty. It won't even get that far, because it only looks at vmas on mapping->i_mmap, and not on i_mmap_nonlinear. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
* Nick Piggin <[EMAIL PROTECTED]> wrote: > After these patches, I don't think there is too much burden. The main > thing left really is just the objrmap stuff, but that is just handled > with a minimal 'dumb' algorithm that doesn't cost much. ok. What do you think about the sys_remap_file_pages_prot() thing that Paolo has done in a nicely split up form - does that complicate things in any fundamental way? That is what is useful to UML. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 01:07:56AM -0800, Andrew Morton wrote: > On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > > > Dirty page accounting doesn't work either on > > > > non-linear mappings > > > > > > It doesn't? Confused - these things don't have anything to do with each > > > other do they? > > > > Look in page_mkclean(). Where does it handle non-linear mappings? > > > > OK, I'd forgotten about that. It won't break dirty memory accounting, > but it'll potentially break dirty memory balancing. > > If we have the wrong page (due to nonlinear), page_check_address() will > fail and we'll leave the pte dirty. That puts us back to the pre-2.6.17 > algorithms and I guess it'll break the msync guarantees. > > Peter, I thought we went through the nonlinear problem ages ago and decided > it was OK? msync breakage is bad, but otherwise I don't know that we care about dirty page writeout efficiency. But I think we discovered that those msync changes are bogus anyway becuase there is a small race window where pte could be dirtied without page being set dirty? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 09:59:44AM +0100, Nick Piggin wrote: > Apart from a handful of trivial if (pte_file()) cases throughout mm/, > our maintainance burden basically now amounts to the following patch. > Even the rmap.c change looks bigger than it is because I split out > the nonlinear unmapping code from try_to_unmap_file. Not too bad, eh? :) Oh, there is a bit more nonlinear mmap list manipulation I'd forgotten about too... makes things a little bit worse, but not too much. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > Dirty page accounting doesn't work either on > > > non-linear mappings > > > > It doesn't? Confused - these things don't have anything to do with each > > other do they? > > Look in page_mkclean(). Where does it handle non-linear mappings? > OK, I'd forgotten about that. It won't break dirty memory accounting, but it'll potentially break dirty memory balancing. If we have the wrong page (due to nonlinear), page_check_address() will fail and we'll leave the pte dirty. That puts us back to the pre-2.6.17 algorithms and I guess it'll break the msync guarantees. Peter, I thought we went through the nonlinear problem ages ago and decided it was OK? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 09:27:55AM +0100, Ingo Molnar wrote: > > * Nick Piggin <[EMAIL PROTECTED]> wrote: > > > Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I > > thought you would have liked the patches... > > btw., if we decide that nonlinear isnt worth the continuing maintainance > pain, we could internally implement/emulate sys_remap_file_pages() via a > call to mremap() and essentially deprecate it, without breaking the ABI > - and remove all the nonlinear code. (This would split fremap areas into > separate vmas) Well I think it has a few possible uses outside the PAE database workloads. UML for one seem to be interested... as much as I don't use them, I think nonlinear mappings are kinda cool ;) After these patches, I don't think there is too much burden. The main thing left really is just the objrmap stuff, but that is just handled with a minimal 'dumb' algorithm that doesn't cost much. Then the core of it is just the file pte handling, which really doesn't seem to be much problem. Apart from a handful of trivial if (pte_file()) cases throughout mm/, our maintainance burden basically now amounts to the following patch. Even the rmap.c change looks bigger than it is because I split out the nonlinear unmapping code from try_to_unmap_file. Not too bad, eh? :) -- include/asm-powerpc/pgtable.h | 12 mm/Kconfig|6 ++ mm/Makefile |6 +- mm/rmap.c | 101 +- 4 files changed, 83 insertions(+), 42 deletions(-) Index: linux-2.6/include/asm-powerpc/pgtable.h === --- linux-2.6.orig/include/asm-powerpc/pgtable.h +++ linux-2.6/include/asm-powerpc/pgtable.h @@ -243,7 +243,12 @@ static inline int pte_write(pte_t pte) { static inline int pte_exec(pte_t pte) { return pte_val(pte) & _PAGE_EXEC;} static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY;} static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED;} + +#ifdef CONFIG_NONLINEAR static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE;} +#else +static inline int pte_file(pte_t pte) { return 0; } +#endif static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; } static inline void pte_cache(pte_t pte) { pte_val(pte) &= ~_PAGE_NO_CACHE; } @@ -483,9 +488,16 @@ extern void update_mmu_cache(struct vm_a #define __swp_entry(type, offset) ((swp_entry_t){((type)<< 1)|((offset)<<8)}) #define __pte_to_swp_entry(pte)((swp_entry_t){pte_val(pte) >> PTE_RPN_SHIFT}) #define __swp_entry_to_pte(x) ((pte_t) { (x).val << PTE_RPN_SHIFT }) + +#ifdef CONFIG_NONLINEAR #define pte_to_pgoff(pte) (pte_val(pte) >> PTE_RPN_SHIFT) #define pgoff_to_pte(off) ((pte_t) {((off) << PTE_RPN_SHIFT)|_PAGE_FILE}) #define PTE_FILE_MAX_BITS (BITS_PER_LONG - PTE_RPN_SHIFT) +#else +#define pte_to_pgoff(pte) ({BUG(); -1;}) +#define pgoff_to_pte(off) ({BUG(); (pte_t){-1};}) +#define PTE_FILE_MAX_BITS 0 +#endif /* * kern_addr_valid is intended to indicate whether an address is a valid Index: linux-2.6/mm/Kconfig === --- linux-2.6.orig/mm/Kconfig +++ linux-2.6/mm/Kconfig @@ -142,6 +142,12 @@ config SPLIT_PTLOCK_CPUS # # support for page migration # +config NONLINEAR + bool "Non linear mappings" + def_bool y + help + Provides support for the remap_file_pages syscall. + config MIGRATION bool "Page migration" def_bool y Index: linux-2.6/mm/Makefile === --- linux-2.6.orig/mm/Makefile +++ linux-2.6/mm/Makefile @@ -3,9 +3,8 @@ # mmu-y := nommu.o -mmu-$(CONFIG_MMU) := fremap.o highmem.o madvise.o memory.o mincore.o \ - mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \ - vmalloc.o +mmu-$(CONFIG_MMU) := highmem.o madvise.o memory.o mincore.o mlock.o \ + mmap.o mprotect.o mremap.o msync.o rmap.o vmalloc.o obj-y := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \ page_alloc.o page-writeback.o pdflush.o \ @@ -27,5 +26,6 @@ obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_SLAB) += slab.o obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o obj-$(CONFIG_FS_XIP) += filemap_xip.o +obj-$(CONFIG_NONLINEAR) += fremap.o obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_SMP) += allocpercpu.o Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c +++ linux-2.6/mm/rmap.c @@ -756,6 +756,7 @@ out: return ret; } +#ifdef CONFIG_NONLINEAR /* * objrmap doesn't work for nonlinear VMAs because the assumption that * offset-into-file correlates with offset-into-virtual-addresses does not
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
> > Dirty page accounting doesn't work either on > > non-linear mappings > > It doesn't? Confused - these things don't have anything to do with each > other do they? Look in page_mkclean(). Where does it handle non-linear mappings? Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > btw., if we decide that nonlinear isnt worth the continuing > > maintainance pain, we could internally implement/emulate > > sys_remap_file_pages() via a call to mremap() and essentially > > deprecate it, without breaking the ABI - and remove all the > > nonlinear code. (This would split fremap areas into separate vmas) > > > > I'm rather regretting having merged it - I don't think it has been > used for much. > > Paolo's UML speedup patches might use nonlinear though. yes, i wrote the first, prototype version of that for UML, it needs an extended version of the syscall, sys_remap_file_pages_prot(): http://redhat.com/~mingo/remap-file-pages-patches/remap-file-pages-prot-2.6.4-rc1-mm1-A1 i also wrote an x86 hypervisor kind of thing for UML, called 'sys_vcpu()', which allows UML to execute guest user-mode in a box, which also relies on sys_remap_file_pages_prot(): http://redhat.com/~mingo/remap-file-pages-patches/vcpu-2.6.4-rc2-mm1-A2 which reduced the UML guest syscall overhead from 30 usecs to 4 usecs (with native syscalls taking 2 usecs, on the box i tested, years ago). So it certainly looked useful to me - but wasnt really picked up widely. We'll always have the option to get rid of it (and hence completely reverse the decision to merge it) without breaking the ABI, by emulating the API via mremap(). That eliminates the UML speedup though. So no need to feel sorry about having merged it, we can easily revisit that years-old 'do we want it' decision, without any ABI worries. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 07 Mar 2007 09:38:34 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > Dirty page accounting doesn't work either on > non-linear mappings It doesn't? Confused - these things don't have anything to do with each other do they? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
> > If it doesn't look very impressive, it could be because it leaves all > > the old crud around for backwards compatibility (the worst offenders > > are removed in patch 6/6). > > > > If you look at the patchset as a whole, it removes about 250 lines, > > mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c > > fremap.c, that is nonlinear pages specific and doesn't get anywhere > > near the testing that the linear fault path does. > > > > A minimal fix for nonlinear pages would have required changing all > > ->populate handlers, which I simply thought was not very productive > > considering the testing and coverage issues, and that I was going to > > rewrite the nonlinear path anyway. > > > > If you like, you can consider patches 1,2,3 as the fix, and ignore > > nonlinear (hey, it doesn't even bother checking truncate_count > > today!). > > > > Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I > > thought you would have liked the patches... > > btw., if we decide that nonlinear isnt worth the continuing maintainance > pain, we could internally implement/emulate sys_remap_file_pages() via a > call to mremap() and essentially deprecate it, without breaking the ABI > - and remove all the nonlinear code. (This would split fremap areas into > separate vmas) That would make sense. Dirty page accounting doesn't work either on non-linear mappings, and I can't see how that could be fixed in any other way. Miklos - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * Nick Piggin <[EMAIL PROTECTED]> wrote: > > > If it doesn't look very impressive, it could be because it leaves all > > the old crud around for backwards compatibility (the worst offenders > > are removed in patch 6/6). > > > > If you look at the patchset as a whole, it removes about 250 lines, > > mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c > > fremap.c, that is nonlinear pages specific and doesn't get anywhere > > near the testing that the linear fault path does. > > > > A minimal fix for nonlinear pages would have required changing all > > ->populate handlers, which I simply thought was not very productive > > considering the testing and coverage issues, and that I was going to > > rewrite the nonlinear path anyway. > > > > If you like, you can consider patches 1,2,3 as the fix, and ignore > > nonlinear (hey, it doesn't even bother checking truncate_count > > today!). > > > > Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I > > thought you would have liked the patches... > > btw., if we decide that nonlinear isnt worth the continuing maintainance > pain, we could internally implement/emulate sys_remap_file_pages() via a > call to mremap() and essentially deprecate it, without breaking the ABI > - and remove all the nonlinear code. (This would split fremap areas into > separate vmas) > I'm rather regretting having merged it - I don't think it has been used for much. Paolo's UML speedup patches might use nonlinear though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
* Nick Piggin <[EMAIL PROTECTED]> wrote: > If it doesn't look very impressive, it could be because it leaves all > the old crud around for backwards compatibility (the worst offenders > are removed in patch 6/6). > > If you look at the patchset as a whole, it removes about 250 lines, > mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c > fremap.c, that is nonlinear pages specific and doesn't get anywhere > near the testing that the linear fault path does. > > A minimal fix for nonlinear pages would have required changing all > ->populate handlers, which I simply thought was not very productive > considering the testing and coverage issues, and that I was going to > rewrite the nonlinear path anyway. > > If you like, you can consider patches 1,2,3 as the fix, and ignore > nonlinear (hey, it doesn't even bother checking truncate_count > today!). > > Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I > thought you would have liked the patches... btw., if we decide that nonlinear isnt worth the continuing maintainance pain, we could internally implement/emulate sys_remap_file_pages() via a call to mremap() and essentially deprecate it, without breaking the ABI - and remove all the nonlinear code. (This would split fremap areas into separate vmas) Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 08:08:53AM +0100, Nick Piggin wrote: > On Tue, Mar 06, 2007 at 10:51:01PM -0800, Andrew Morton wrote: > > > This patch seems to churn things around an awful lot for minimal benefit. > > Well it fixes the whole design of the nonlinear fault path. If it doesn't look very impressive, it could be because it leaves all the old crud around for backwards compatibility (the worst offenders are removed in patch 6/6). If you look at the patchset as a whole, it removes about 250 lines, mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c fremap.c, that is nonlinear pages specific and doesn't get anywhere near the testing that the linear fault path does. A minimal fix for nonlinear pages would have required changing all ->populate handlers, which I simply thought was not very productive considering the testing and coverage issues, and that I was going to rewrite the nonlinear path anyway. If you like, you can consider patches 1,2,3 as the fix, and ignore nonlinear (hey, it doesn't even bother checking truncate_count today!). Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I thought you would have liked the patches... - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 08:08:53AM +0100, Nick Piggin wrote: On Tue, Mar 06, 2007 at 10:51:01PM -0800, Andrew Morton wrote: This patch seems to churn things around an awful lot for minimal benefit. Well it fixes the whole design of the nonlinear fault path. If it doesn't look very impressive, it could be because it leaves all the old crud around for backwards compatibility (the worst offenders are removed in patch 6/6). If you look at the patchset as a whole, it removes about 250 lines, mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c fremap.c, that is nonlinear pages specific and doesn't get anywhere near the testing that the linear fault path does. A minimal fix for nonlinear pages would have required changing all -populate handlers, which I simply thought was not very productive considering the testing and coverage issues, and that I was going to rewrite the nonlinear path anyway. If you like, you can consider patches 1,2,3 as the fix, and ignore nonlinear (hey, it doesn't even bother checking truncate_count today!). Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I thought you would have liked the patches... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
* Nick Piggin [EMAIL PROTECTED] wrote: If it doesn't look very impressive, it could be because it leaves all the old crud around for backwards compatibility (the worst offenders are removed in patch 6/6). If you look at the patchset as a whole, it removes about 250 lines, mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c fremap.c, that is nonlinear pages specific and doesn't get anywhere near the testing that the linear fault path does. A minimal fix for nonlinear pages would have required changing all -populate handlers, which I simply thought was not very productive considering the testing and coverage issues, and that I was going to rewrite the nonlinear path anyway. If you like, you can consider patches 1,2,3 as the fix, and ignore nonlinear (hey, it doesn't even bother checking truncate_count today!). Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I thought you would have liked the patches... btw., if we decide that nonlinear isnt worth the continuing maintainance pain, we could internally implement/emulate sys_remap_file_pages() via a call to mremap() and essentially deprecate it, without breaking the ABI - and remove all the nonlinear code. (This would split fremap areas into separate vmas) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar [EMAIL PROTECTED] wrote: * Nick Piggin [EMAIL PROTECTED] wrote: If it doesn't look very impressive, it could be because it leaves all the old crud around for backwards compatibility (the worst offenders are removed in patch 6/6). If you look at the patchset as a whole, it removes about 250 lines, mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c fremap.c, that is nonlinear pages specific and doesn't get anywhere near the testing that the linear fault path does. A minimal fix for nonlinear pages would have required changing all -populate handlers, which I simply thought was not very productive considering the testing and coverage issues, and that I was going to rewrite the nonlinear path anyway. If you like, you can consider patches 1,2,3 as the fix, and ignore nonlinear (hey, it doesn't even bother checking truncate_count today!). Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I thought you would have liked the patches... btw., if we decide that nonlinear isnt worth the continuing maintainance pain, we could internally implement/emulate sys_remap_file_pages() via a call to mremap() and essentially deprecate it, without breaking the ABI - and remove all the nonlinear code. (This would split fremap areas into separate vmas) I'm rather regretting having merged it - I don't think it has been used for much. Paolo's UML speedup patches might use nonlinear though. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
If it doesn't look very impressive, it could be because it leaves all the old crud around for backwards compatibility (the worst offenders are removed in patch 6/6). If you look at the patchset as a whole, it removes about 250 lines, mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c fremap.c, that is nonlinear pages specific and doesn't get anywhere near the testing that the linear fault path does. A minimal fix for nonlinear pages would have required changing all -populate handlers, which I simply thought was not very productive considering the testing and coverage issues, and that I was going to rewrite the nonlinear path anyway. If you like, you can consider patches 1,2,3 as the fix, and ignore nonlinear (hey, it doesn't even bother checking truncate_count today!). Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I thought you would have liked the patches... btw., if we decide that nonlinear isnt worth the continuing maintainance pain, we could internally implement/emulate sys_remap_file_pages() via a call to mremap() and essentially deprecate it, without breaking the ABI - and remove all the nonlinear code. (This would split fremap areas into separate vmas) That would make sense. Dirty page accounting doesn't work either on non-linear mappings, and I can't see how that could be fixed in any other way. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 07 Mar 2007 09:38:34 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote: Dirty page accounting doesn't work either on non-linear mappings It doesn't? Confused - these things don't have anything to do with each other do they? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
Dirty page accounting doesn't work either on non-linear mappings It doesn't? Confused - these things don't have anything to do with each other do they? Look in page_mkclean(). Where does it handle non-linear mappings? Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
* Andrew Morton [EMAIL PROTECTED] wrote: btw., if we decide that nonlinear isnt worth the continuing maintainance pain, we could internally implement/emulate sys_remap_file_pages() via a call to mremap() and essentially deprecate it, without breaking the ABI - and remove all the nonlinear code. (This would split fremap areas into separate vmas) I'm rather regretting having merged it - I don't think it has been used for much. Paolo's UML speedup patches might use nonlinear though. yes, i wrote the first, prototype version of that for UML, it needs an extended version of the syscall, sys_remap_file_pages_prot(): http://redhat.com/~mingo/remap-file-pages-patches/remap-file-pages-prot-2.6.4-rc1-mm1-A1 i also wrote an x86 hypervisor kind of thing for UML, called 'sys_vcpu()', which allows UML to execute guest user-mode in a box, which also relies on sys_remap_file_pages_prot(): http://redhat.com/~mingo/remap-file-pages-patches/vcpu-2.6.4-rc2-mm1-A2 which reduced the UML guest syscall overhead from 30 usecs to 4 usecs (with native syscalls taking 2 usecs, on the box i tested, years ago). So it certainly looked useful to me - but wasnt really picked up widely. We'll always have the option to get rid of it (and hence completely reverse the decision to merge it) without breaking the ABI, by emulating the API via mremap(). That eliminates the UML speedup though. So no need to feel sorry about having merged it, we can easily revisit that years-old 'do we want it' decision, without any ABI worries. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 09:27:55AM +0100, Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I thought you would have liked the patches... btw., if we decide that nonlinear isnt worth the continuing maintainance pain, we could internally implement/emulate sys_remap_file_pages() via a call to mremap() and essentially deprecate it, without breaking the ABI - and remove all the nonlinear code. (This would split fremap areas into separate vmas) Well I think it has a few possible uses outside the PAE database workloads. UML for one seem to be interested... as much as I don't use them, I think nonlinear mappings are kinda cool ;) After these patches, I don't think there is too much burden. The main thing left really is just the objrmap stuff, but that is just handled with a minimal 'dumb' algorithm that doesn't cost much. Then the core of it is just the file pte handling, which really doesn't seem to be much problem. Apart from a handful of trivial if (pte_file()) cases throughout mm/, our maintainance burden basically now amounts to the following patch. Even the rmap.c change looks bigger than it is because I split out the nonlinear unmapping code from try_to_unmap_file. Not too bad, eh? :) -- include/asm-powerpc/pgtable.h | 12 mm/Kconfig|6 ++ mm/Makefile |6 +- mm/rmap.c | 101 +- 4 files changed, 83 insertions(+), 42 deletions(-) Index: linux-2.6/include/asm-powerpc/pgtable.h === --- linux-2.6.orig/include/asm-powerpc/pgtable.h +++ linux-2.6/include/asm-powerpc/pgtable.h @@ -243,7 +243,12 @@ static inline int pte_write(pte_t pte) { static inline int pte_exec(pte_t pte) { return pte_val(pte) _PAGE_EXEC;} static inline int pte_dirty(pte_t pte) { return pte_val(pte) _PAGE_DIRTY;} static inline int pte_young(pte_t pte) { return pte_val(pte) _PAGE_ACCESSED;} + +#ifdef CONFIG_NONLINEAR static inline int pte_file(pte_t pte) { return pte_val(pte) _PAGE_FILE;} +#else +static inline int pte_file(pte_t pte) { return 0; } +#endif static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; } static inline void pte_cache(pte_t pte) { pte_val(pte) = ~_PAGE_NO_CACHE; } @@ -483,9 +488,16 @@ extern void update_mmu_cache(struct vm_a #define __swp_entry(type, offset) ((swp_entry_t){((type) 1)|((offset)8)}) #define __pte_to_swp_entry(pte)((swp_entry_t){pte_val(pte) PTE_RPN_SHIFT}) #define __swp_entry_to_pte(x) ((pte_t) { (x).val PTE_RPN_SHIFT }) + +#ifdef CONFIG_NONLINEAR #define pte_to_pgoff(pte) (pte_val(pte) PTE_RPN_SHIFT) #define pgoff_to_pte(off) ((pte_t) {((off) PTE_RPN_SHIFT)|_PAGE_FILE}) #define PTE_FILE_MAX_BITS (BITS_PER_LONG - PTE_RPN_SHIFT) +#else +#define pte_to_pgoff(pte) ({BUG(); -1;}) +#define pgoff_to_pte(off) ({BUG(); (pte_t){-1};}) +#define PTE_FILE_MAX_BITS 0 +#endif /* * kern_addr_valid is intended to indicate whether an address is a valid Index: linux-2.6/mm/Kconfig === --- linux-2.6.orig/mm/Kconfig +++ linux-2.6/mm/Kconfig @@ -142,6 +142,12 @@ config SPLIT_PTLOCK_CPUS # # support for page migration # +config NONLINEAR + bool Non linear mappings + def_bool y + help + Provides support for the remap_file_pages syscall. + config MIGRATION bool Page migration def_bool y Index: linux-2.6/mm/Makefile === --- linux-2.6.orig/mm/Makefile +++ linux-2.6/mm/Makefile @@ -3,9 +3,8 @@ # mmu-y := nommu.o -mmu-$(CONFIG_MMU) := fremap.o highmem.o madvise.o memory.o mincore.o \ - mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \ - vmalloc.o +mmu-$(CONFIG_MMU) := highmem.o madvise.o memory.o mincore.o mlock.o \ + mmap.o mprotect.o mremap.o msync.o rmap.o vmalloc.o obj-y := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \ page_alloc.o page-writeback.o pdflush.o \ @@ -27,5 +26,6 @@ obj-$(CONFIG_SLOB) += slob.o obj-$(CONFIG_SLAB) += slab.o obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o obj-$(CONFIG_FS_XIP) += filemap_xip.o +obj-$(CONFIG_NONLINEAR) += fremap.o obj-$(CONFIG_MIGRATION) += migrate.o obj-$(CONFIG_SMP) += allocpercpu.o Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c +++ linux-2.6/mm/rmap.c @@ -756,6 +756,7 @@ out: return ret; } +#ifdef CONFIG_NONLINEAR /* * objrmap doesn't work for nonlinear VMAs because the assumption that * offset-into-file correlates with offset-into-virtual-addresses does not hold. @@ -845,53 +846,18 @@ static void
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote: Dirty page accounting doesn't work either on non-linear mappings It doesn't? Confused - these things don't have anything to do with each other do they? Look in page_mkclean(). Where does it handle non-linear mappings? OK, I'd forgotten about that. It won't break dirty memory accounting, but it'll potentially break dirty memory balancing. If we have the wrong page (due to nonlinear), page_check_address() will fail and we'll leave the pte dirty. That puts us back to the pre-2.6.17 algorithms and I guess it'll break the msync guarantees. Peter, I thought we went through the nonlinear problem ages ago and decided it was OK? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 09:59:44AM +0100, Nick Piggin wrote: Apart from a handful of trivial if (pte_file()) cases throughout mm/, our maintainance burden basically now amounts to the following patch. Even the rmap.c change looks bigger than it is because I split out the nonlinear unmapping code from try_to_unmap_file. Not too bad, eh? :) Oh, there is a bit more nonlinear mmap list manipulation I'd forgotten about too... makes things a little bit worse, but not too much. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 01:07:56AM -0800, Andrew Morton wrote: On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote: Dirty page accounting doesn't work either on non-linear mappings It doesn't? Confused - these things don't have anything to do with each other do they? Look in page_mkclean(). Where does it handle non-linear mappings? OK, I'd forgotten about that. It won't break dirty memory accounting, but it'll potentially break dirty memory balancing. If we have the wrong page (due to nonlinear), page_check_address() will fail and we'll leave the pte dirty. That puts us back to the pre-2.6.17 algorithms and I guess it'll break the msync guarantees. Peter, I thought we went through the nonlinear problem ages ago and decided it was OK? msync breakage is bad, but otherwise I don't know that we care about dirty page writeout efficiency. But I think we discovered that those msync changes are bogus anyway becuase there is a small race window where pte could be dirtied without page being set dirty? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
* Nick Piggin [EMAIL PROTECTED] wrote: After these patches, I don't think there is too much burden. The main thing left really is just the objrmap stuff, but that is just handled with a minimal 'dumb' algorithm that doesn't cost much. ok. What do you think about the sys_remap_file_pages_prot() thing that Paolo has done in a nicely split up form - does that complicate things in any fundamental way? That is what is useful to UML. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
Look in page_mkclean(). Where does it handle non-linear mappings? OK, I'd forgotten about that. It won't break dirty memory accounting, but it'll potentially break dirty memory balancing. If we have the wrong page (due to nonlinear), page_check_address() will fail and we'll leave the pte dirty. It won't even get that far, because it only looks at vmas on mapping-i_mmap, and not on i_mmap_nonlinear. Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 7 Mar 2007 10:18:23 +0100 Nick Piggin [EMAIL PROTECTED] wrote: On Wed, Mar 07, 2007 at 01:07:56AM -0800, Andrew Morton wrote: On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote: Dirty page accounting doesn't work either on non-linear mappings It doesn't? Confused - these things don't have anything to do with each other do they? Look in page_mkclean(). Where does it handle non-linear mappings? OK, I'd forgotten about that. It won't break dirty memory accounting, but it'll potentially break dirty memory balancing. If we have the wrong page (due to nonlinear), page_check_address() will fail and we'll leave the pte dirty. That puts us back to the pre-2.6.17 algorithms and I guess it'll break the msync guarantees. Peter, I thought we went through the nonlinear problem ages ago and decided it was OK? msync breakage is bad, but otherwise I don't know that we care about dirty page writeout efficiency. Well. We made so many changes to support the synchronous dirty-the-page-when-we-dirty-the-pte thing that I'm rather doubtful that the old-style approach still works. It might seem to, most of the time. But if it _is_ subtly broken, boy it's going to take a long time for us to find out. But I think we discovered that those msync changes are bogus anyway becuase there is a small race window where pte could be dirtied without page being set dirty? Dunno, I don't recall that. We dirty the page before the pte... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 09:53:23AM +0100, Ingo Molnar wrote: * Andrew Morton [EMAIL PROTECTED] wrote: btw., if we decide that nonlinear isnt worth the continuing maintainance pain, we could internally implement/emulate sys_remap_file_pages() via a call to mremap() and essentially deprecate it, without breaking the ABI - and remove all the nonlinear code. (This would split fremap areas into separate vmas) I'm rather regretting having merged it - I don't think it has been used for much. Paolo's UML speedup patches might use nonlinear though. yes, i wrote the first, prototype version of that for UML, it needs an extended version of the syscall, sys_remap_file_pages_prot(): http://redhat.com/~mingo/remap-file-pages-patches/remap-file-pages-prot-2.6.4-rc1-mm1-A1 i also wrote an x86 hypervisor kind of thing for UML, called 'sys_vcpu()', which allows UML to execute guest user-mode in a box, which also relies on sys_remap_file_pages_prot(): http://redhat.com/~mingo/remap-file-pages-patches/vcpu-2.6.4-rc2-mm1-A2 which reduced the UML guest syscall overhead from 30 usecs to 4 usecs (with native syscalls taking 2 usecs, on the box i tested, years ago). So it certainly looked useful to me - but wasnt really picked up widely. We'll always have the option to get rid of it (and hence completely reverse the decision to merge it) without breaking the ABI, by emulating the API via mremap(). That eliminates the UML speedup though. So no need to feel sorry about having merged it, we can easily revisit that years-old 'do we want it' decision, without any ABI worries. Depending on whether anyone wants it, and what features they want, we could emulate the old syscall, and make a new restricted one which is much less intrusive. For example, if we can operate only on MAP_ANONYMOUS memory and specify that nonlinear mappings effectively mlock the pages, then we can get rid of all the objrmap and unmap_mapping_range handling, forget about the writeout and msync problems... - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
But I think we discovered that those msync changes are bogus anyway becuase there is a small race window where pte could be dirtied without page being set dirty? Dunno, I don't recall that. We dirty the page before the pte... That's the one I just submitted a fix for ;) http://lkml.org/lkml/2007/3/6/308 Miklos - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar [EMAIL PROTECTED] wrote: btw., if we decide that nonlinear isnt worth the continuing maintainance pain, we could internally implement/emulate sys_remap_file_pages() via a call to mremap() and essentially deprecate it, without breaking the ABI - and remove all the nonlinear code. (This would split fremap areas into separate vmas) On Wed, Mar 07, 2007 at 12:35:20AM -0800, Andrew Morton wrote: I'm rather regretting having merged it - I don't think it has been used for much. Paolo's UML speedup patches might use nonlinear though. Guess what major real-life application not only uses nonlinear daily but would even be very happy to see it extended with non-vma-creating protections and more? It's not terribly typical for things to be truncated while remap_file_pages() is doing its work, though it's been proposed as a method of dynamism. It won't stress remap_file_pages() vs. truncate() in any meaningful way, though, as userspace will be rather diligent about clearing in-use data out of the file offset range to be truncated away anyway, and all that via O_DIRECT. -- wli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
* Nick Piggin [EMAIL PROTECTED] wrote: After these patches, I don't think there is too much burden. The main thing left really is just the objrmap stuff, but that is just handled with a minimal 'dumb' algorithm that doesn't cost much. On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote: ok. What do you think about the sys_remap_file_pages_prot() thing that Paolo has done in a nicely split up form - does that complicate things in any fundamental way? That is what is useful to UML. Oracle would love it. You don't want to know how far back I've been asked to backport that. -- wli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 2007-03-07 at 01:07 -0800, Andrew Morton wrote: On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote: Dirty page accounting doesn't work either on non-linear mappings It doesn't? Confused - these things don't have anything to do with each other do they? Look in page_mkclean(). Where does it handle non-linear mappings? OK, I'd forgotten about that. It won't break dirty memory accounting, but it'll potentially break dirty memory balancing. If we have the wrong page (due to nonlinear), page_check_address() will fail and we'll leave the pte dirty. That puts us back to the pre-2.6.17 algorithms and I guess it'll break the msync guarantees. Peter, I thought we went through the nonlinear problem ages ago and decided it was OK? Can recollect as much, I modelled it after page_referenced() and can't find any VM_NONLINEAR specific code in there either. Will have a hard look, but if its broken, then page_referenced if equally broken it seems, which would make page reclaim funny in the light of nonlinear mappings. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
* Bill Irwin [EMAIL PROTECTED] wrote: * Nick Piggin [EMAIL PROTECTED] wrote: After these patches, I don't think there is too much burden. The main thing left really is just the objrmap stuff, but that is just handled with a minimal 'dumb' algorithm that doesn't cost much. On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote: ok. What do you think about the sys_remap_file_pages_prot() thing that Paolo has done in a nicely split up form - does that complicate things in any fundamental way? That is what is useful to UML. Oracle would love it. You don't want to know how far back I've been asked to backport that. ok, cool! Then the first step would be for you to talk to Paolo and to pick up the patches, review them, nurse it in -mm, etc. Suffering in silence is just a pointless act of masochism, not an efficient upstream-merge tactic ;-) Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 01:26:38AM -0800, Andrew Morton wrote: On Wed, 7 Mar 2007 10:18:23 +0100 Nick Piggin [EMAIL PROTECTED] wrote: msync breakage is bad, but otherwise I don't know that we care about dirty page writeout efficiency. Well. We made so many changes to support the synchronous dirty-the-page-when-we-dirty-the-pte thing that I'm rather doubtful that the old-style approach still works. It might seem to, most of the time. But if it _is_ subtly broken, boy it's going to take a long time for us to find out. I can't think of anything that should have caused breakage (except for the msync thing). We're still careful about not dropping pte dirty bits. But I think we discovered that those msync changes are bogus anyway becuase there is a small race window where pte could be dirtied without page being set dirty? Dunno, I don't recall that. We dirty the page before the pte... I don't think it isn't really that simple. There is a big comment in clear_page_dirty_for_io. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, 7 Mar 2007 01:29:03 -0800 Bill Irwin [EMAIL PROTECTED] wrote: On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar [EMAIL PROTECTED] wrote: btw., if we decide that nonlinear isnt worth the continuing maintainance pain, we could internally implement/emulate sys_remap_file_pages() via a call to mremap() and essentially deprecate it, without breaking the ABI - and remove all the nonlinear code. (This would split fremap areas into separate vmas) On Wed, Mar 07, 2007 at 12:35:20AM -0800, Andrew Morton wrote: I'm rather regretting having merged it - I don't think it has been used for much. Paolo's UML speedup patches might use nonlinear though. Guess what major real-life application not only uses nonlinear daily but would even be very happy to see it extended with non-vma-creating protections and more? uh-oh. SQL server? It's not terribly typical for things to be truncated while remap_file_pages() is doing its work, though it's been proposed as a method of dynamism. It won't stress remap_file_pages() vs. truncate() in any meaningful way, though, as userspace will be rather diligent about clearing in-use data out of the file offset range to be truncated away anyway, and all that via O_DIRECT. The problem here isn't related to truncate or direct-IO. It's just plain-old MAP_SHARED. nonlinear VMAs are now using the old-style dirty-memory management. msync() is basically a no-op and the code is wildly tricky and pretty much untested. The chances that we broke it are considerable. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote: Depending on whether anyone wants it, and what features they want, we could emulate the old syscall, and make a new restricted one which is much less intrusive. For example, if we can operate only on MAP_ANONYMOUS memory and specify that nonlinear mappings effectively mlock the pages, then we can get rid of all the objrmap and unmap_mapping_range handling, forget about the writeout and msync problems... Anonymous-only would make it a doorstop for Oracle, since its entire motive for using it is to window into objects larger than user virtual address spaces (this likely also applies to UML, though they should really chime in to confirm). Restrictions to tmpfs and/or ramfs would likely be liveable, though I suspect some things might want to do it to shm segments (I'll ask about that one). There's definitely no need for a persistent backing store for the object to be remapped in Oracle's case, in any event. It's largely the in-core destination and source of IO, not something saved on-disk itself. -- wli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 10:32:22AM +0100, Peter Zijlstra wrote: On Wed, 2007-03-07 at 01:07 -0800, Andrew Morton wrote: On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote: Dirty page accounting doesn't work either on non-linear mappings It doesn't? Confused - these things don't have anything to do with each other do they? Look in page_mkclean(). Where does it handle non-linear mappings? OK, I'd forgotten about that. It won't break dirty memory accounting, but it'll potentially break dirty memory balancing. If we have the wrong page (due to nonlinear), page_check_address() will fail and we'll leave the pte dirty. That puts us back to the pre-2.6.17 algorithms and I guess it'll break the msync guarantees. Peter, I thought we went through the nonlinear problem ages ago and decided it was OK? Can recollect as much, I modelled it after page_referenced() and can't find any VM_NONLINEAR specific code in there either. Will have a hard look, but if its broken, then page_referenced if equally broken it seems, which would make page reclaim funny in the light of nonlinear mappings. page_referenced is just an heuristic, and it ignores nonlinear mappings and the page which will get filtered down to try_to_unmap. Page reclaim is already funny for nonlinear mappings, page_referenced is the least of its worries ;) It works, though. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)
On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote: On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote: Depending on whether anyone wants it, and what features they want, we could emulate the old syscall, and make a new restricted one which is much less intrusive. For example, if we can operate only on MAP_ANONYMOUS memory and specify that nonlinear mappings effectively mlock the pages, then we can get rid of all the objrmap and unmap_mapping_range handling, forget about the writeout and msync problems... Anonymous-only would make it a doorstop for Oracle, since its entire motive for using it is to window into objects larger than user virtual Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't have a file descriptor to get a pgoff, then remap_file_pages is a doorstop for everyone ;) address spaces (this likely also applies to UML, though they should really chime in to confirm). Restrictions to tmpfs and/or ramfs would likely be liveable, though I suspect some things might want to do it to shm segments (I'll ask about that one). There's definitely no need for a persistent backing store for the object to be remapped in Oracle's case, in any event. It's largely the in-core destination and source of IO, not something saved on-disk itself. Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with that as well, then I think it might be a good option. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/