Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-21 Thread Blaisorblade
On Tuesday 20 March 2007 07:00, Nick Piggin wrote:
> On Mon, Mar 19, 2007 at 09:44:28PM +0100, Blaisorblade wrote:
> > On Sunday 18 March 2007 03:50, Nick Piggin wrote:
> > > > > Yes, I believe that is the case, however I wonder if that is going
> > > > > to be a problem for you to distinguish between write faults for
> > > > > clean writable ptes, and write faults for readonly ptes?

> > > > I wouldn't be able to distinguish them, but am I going to get write
> > > > faults for clean ptes when vma_wants_writenotify() is false (as seems
> > > > to be for tmpfs)? I guess not.

> > > > For tmpfs pages, clean writable PTEs are mapped as writable so they
> > > > won't give any problem, since vma_wants_writenotify() is false for
> > > > tmpfs. Correct?

> > > Yes, that should be the case. So would this mean that nonlinear
> > > protections don't work on regular files?

> > They still work in most cases (including for UML), but if the initial
> > mmap() specified PROT_WRITE, that is ignored, for pages which are not
> > remapped via remap_file_pages(). UML uses PROT_NONE for the initial mmap,
> > so that's no problem.

> But how are you going to distinguish a write fault on a readonly pte for
> dirty page accounting vs a read-only nonlinear protection?

Hmm... I was only thinking to PTEs which hadn't been remapped via 
remap_file_pages, but just faulted in with initial mmap() protection.

For the other PTEs, however, I overlooked that the current code ignores 
vma_wants_writenotify(), i.e. breaks dirty page accounting for them, and I 
refused to even consider this opportunity, even without knowing the purposes 
of dirty pages accounting (I found the commits explaining this however).

> You can't store any more data in a present pte AFAIK, so you'd have to
> have some out of band data. At which point, you may as well just forget
> about vma_wants_writenotify vmas, considering that everybody is using
> shmem/ramfs.

I was going to do that anyway. I'd guess that I should just disallow in 
remap_file_pages() the VM_MANYPROTS (i.e. MAP_CHGPROT in flags) && 
vma_wants_writenotify() combination, right? Ok, trivial (shouldn't even have 
pointed this out).
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-21 Thread Blaisorblade
On Tuesday 20 March 2007 07:00, Nick Piggin wrote:
 On Mon, Mar 19, 2007 at 09:44:28PM +0100, Blaisorblade wrote:
  On Sunday 18 March 2007 03:50, Nick Piggin wrote:
 Yes, I believe that is the case, however I wonder if that is going
 to be a problem for you to distinguish between write faults for
 clean writable ptes, and write faults for readonly ptes?

I wouldn't be able to distinguish them, but am I going to get write
faults for clean ptes when vma_wants_writenotify() is false (as seems
to be for tmpfs)? I guess not.

For tmpfs pages, clean writable PTEs are mapped as writable so they
won't give any problem, since vma_wants_writenotify() is false for
tmpfs. Correct?

   Yes, that should be the case. So would this mean that nonlinear
   protections don't work on regular files?

  They still work in most cases (including for UML), but if the initial
  mmap() specified PROT_WRITE, that is ignored, for pages which are not
  remapped via remap_file_pages(). UML uses PROT_NONE for the initial mmap,
  so that's no problem.

 But how are you going to distinguish a write fault on a readonly pte for
 dirty page accounting vs a read-only nonlinear protection?

Hmm... I was only thinking to PTEs which hadn't been remapped via 
remap_file_pages, but just faulted in with initial mmap() protection.

For the other PTEs, however, I overlooked that the current code ignores 
vma_wants_writenotify(), i.e. breaks dirty page accounting for them, and I 
refused to even consider this opportunity, even without knowing the purposes 
of dirty pages accounting (I found the commits explaining this however).

 You can't store any more data in a present pte AFAIK, so you'd have to
 have some out of band data. At which point, you may as well just forget
 about vma_wants_writenotify vmas, considering that everybody is using
 shmem/ramfs.

I was going to do that anyway. I'd guess that I should just disallow in 
remap_file_pages() the VM_MANYPROTS (i.e. MAP_CHGPROT in flags)  
vma_wants_writenotify() combination, right? Ok, trivial (shouldn't even have 
pointed this out).
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-19 Thread Nick Piggin
On Mon, Mar 19, 2007 at 09:44:28PM +0100, Blaisorblade wrote:
> On Sunday 18 March 2007 03:50, Nick Piggin wrote:
> > > >
> > > > Yes, I believe that is the case, however I wonder if that is going to
> > > > be a problem for you to distinguish between write faults for clean
> > > > writable ptes, and write faults for readonly ptes?
> > >
> > > I wouldn't be able to distinguish them, but am I going to get write
> > > faults for clean ptes when vma_wants_writenotify() is false (as seems to
> > > be for tmpfs)? I guess not.
> > >
> > > For tmpfs pages, clean writable PTEs are mapped as writable so they won't
> > > give any problem, since vma_wants_writenotify() is false for tmpfs.
> > > Correct?
> >
> > Yes, that should be the case. So would this mean that nonlinear protections
> > don't work on regular files?
> 
> They still work in most cases (including for UML), but if the initial mmap() 
> specified PROT_WRITE, that is ignored, for pages which are not remapped via 
> remap_file_pages(). UML uses PROT_NONE for the initial mmap, so that's no 
> problem.

But how are you going to distinguish a write fault on a readonly pte for
dirty page accounting vs a read-only nonlinear protection?

You can't store any more data in a present pte AFAIK, so you'd have to
have some out of band data. At which point, you may as well just forget
about vma_wants_writenotify vmas, considering that everybody is using
shmem/ramfs.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-19 Thread Blaisorblade
On Sunday 18 March 2007 03:50, Nick Piggin wrote:
> On Sat, Mar 17, 2007 at 01:17:00PM +0100, Blaisorblade wrote:
> > On Tuesday 13 March 2007 02:19, Nick Piggin wrote:
> > > On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote:
> > > > On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
> > > > > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can
> > > > > > live with that as well, then I think it might be a good option.
> > > > >
> > > > > Oh, hmm if you can truncate these things then you still need to
> > > > > force unmap so you still need i_mmap_nonlinear.
> > > >
> > > > Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug,
> > > > which is way similar I guess.
> > > >
> > > > About the restriction to tmpfs, I have just discovered
> > > > '[PATCH] mm: tracking shared dirty pages' (commit
> > > > d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially
> > > > conflicts with remap_file_pages for file-based mmaps (and that's
> > > > fully fine, for now).
> > > >
> > > > Even if UML does not need it, till now if there is a VMA protection
> > > > and a page hasn't been remapped with remap_file_pages, the VMA
> > > > protection is used (just because it makes sense).
> > > >
> > > > However, it is only used when the PTE is first created - we can never
> > > > change protections on a VMA  - so it vma_wants_writenotify() is true
> > > > (on all file-based and on no shmfs based mapping, right?), and we
> > > > write-protect the VMA, it will always be write-protected.
> > >
> > > Yes, I believe that is the case, however I wonder if that is going to
> > > be a problem for you to distinguish between write faults for clean
> > > writable ptes, and write faults for readonly ptes?
> >
> > I wouldn't be able to distinguish them, but am I going to get write
> > faults for clean ptes when vma_wants_writenotify() is false (as seems to
> > be for tmpfs)? I guess not.
> >
> > For tmpfs pages, clean writable PTEs are mapped as writable so they won't
> > give any problem, since vma_wants_writenotify() is false for tmpfs.
> > Correct?
>
> Yes, that should be the case. So would this mean that nonlinear protections
> don't work on regular files?

They still work in most cases (including for UML), but if the initial mmap() 
specified PROT_WRITE, that is ignored, for pages which are not remapped via 
remap_file_pages(). UML uses PROT_NONE for the initial mmap, so that's no 
problem.

> I guess that's OK if Oracle and UML both use 
> tmpfs/shm?

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-19 Thread Bill Irwin
On Sun, Mar 18, 2007 at 03:50:10AM +0100, Nick Piggin wrote:
> Yes, that should be the case. So would this mean that nonlinear protections
> don't work on regular files? I guess that's OK if Oracle and UML both use
> tmpfs/shm?

Sometimes ramfs is also used in the Oracle case. I presume that's even
simpler than tmpfs. (Hugetlb, while also used in for the same general
buffer pool, is never used in conjunction with remap_file_pages() etc.)


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-19 Thread Bill Irwin
On Sun, Mar 18, 2007 at 03:50:10AM +0100, Nick Piggin wrote:
 Yes, that should be the case. So would this mean that nonlinear protections
 don't work on regular files? I guess that's OK if Oracle and UML both use
 tmpfs/shm?

Sometimes ramfs is also used in the Oracle case. I presume that's even
simpler than tmpfs. (Hugetlb, while also used in for the same general
buffer pool, is never used in conjunction with remap_file_pages() etc.)


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-19 Thread Blaisorblade
On Sunday 18 March 2007 03:50, Nick Piggin wrote:
 On Sat, Mar 17, 2007 at 01:17:00PM +0100, Blaisorblade wrote:
  On Tuesday 13 March 2007 02:19, Nick Piggin wrote:
   On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote:
On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
  Yeah, tmpfs/shm segs are what I was thinking about. If UML can
  live with that as well, then I think it might be a good option.

 Oh, hmm if you can truncate these things then you still need to
 force unmap so you still need i_mmap_nonlinear.
   
Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug,
which is way similar I guess.
   
About the restriction to tmpfs, I have just discovered
'[PATCH] mm: tracking shared dirty pages' (commit
d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially
conflicts with remap_file_pages for file-based mmaps (and that's
fully fine, for now).
   
Even if UML does not need it, till now if there is a VMA protection
and a page hasn't been remapped with remap_file_pages, the VMA
protection is used (just because it makes sense).
   
However, it is only used when the PTE is first created - we can never
change protections on a VMA  - so it vma_wants_writenotify() is true
(on all file-based and on no shmfs based mapping, right?), and we
write-protect the VMA, it will always be write-protected.
  
   Yes, I believe that is the case, however I wonder if that is going to
   be a problem for you to distinguish between write faults for clean
   writable ptes, and write faults for readonly ptes?
 
  I wouldn't be able to distinguish them, but am I going to get write
  faults for clean ptes when vma_wants_writenotify() is false (as seems to
  be for tmpfs)? I guess not.
 
  For tmpfs pages, clean writable PTEs are mapped as writable so they won't
  give any problem, since vma_wants_writenotify() is false for tmpfs.
  Correct?

 Yes, that should be the case. So would this mean that nonlinear protections
 don't work on regular files?

They still work in most cases (including for UML), but if the initial mmap() 
specified PROT_WRITE, that is ignored, for pages which are not remapped via 
remap_file_pages(). UML uses PROT_NONE for the initial mmap, so that's no 
problem.

 I guess that's OK if Oracle and UML both use 
 tmpfs/shm?

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-19 Thread Nick Piggin
On Mon, Mar 19, 2007 at 09:44:28PM +0100, Blaisorblade wrote:
 On Sunday 18 March 2007 03:50, Nick Piggin wrote:
   
Yes, I believe that is the case, however I wonder if that is going to
be a problem for you to distinguish between write faults for clean
writable ptes, and write faults for readonly ptes?
  
   I wouldn't be able to distinguish them, but am I going to get write
   faults for clean ptes when vma_wants_writenotify() is false (as seems to
   be for tmpfs)? I guess not.
  
   For tmpfs pages, clean writable PTEs are mapped as writable so they won't
   give any problem, since vma_wants_writenotify() is false for tmpfs.
   Correct?
 
  Yes, that should be the case. So would this mean that nonlinear protections
  don't work on regular files?
 
 They still work in most cases (including for UML), but if the initial mmap() 
 specified PROT_WRITE, that is ignored, for pages which are not remapped via 
 remap_file_pages(). UML uses PROT_NONE for the initial mmap, so that's no 
 problem.

But how are you going to distinguish a write fault on a readonly pte for
dirty page accounting vs a read-only nonlinear protection?

You can't store any more data in a present pte AFAIK, so you'd have to
have some out of band data. At which point, you may as well just forget
about vma_wants_writenotify vmas, considering that everybody is using
shmem/ramfs.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-18 Thread Jeff Dike
On Sun, Mar 18, 2007 at 03:50:10AM +0100, Nick Piggin wrote:
> Yes, that should be the case. So would this mean that nonlinear protections
> don't work on regular files? I guess that's OK if Oracle and UML both use
> tmpfs/shm?

It's OK for UML.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-18 Thread Jeff Dike
On Sun, Mar 18, 2007 at 03:50:10AM +0100, Nick Piggin wrote:
 Yes, that should be the case. So would this mean that nonlinear protections
 don't work on regular files? I guess that's OK if Oracle and UML both use
 tmpfs/shm?

It's OK for UML.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-17 Thread Nick Piggin
On Sat, Mar 17, 2007 at 01:17:00PM +0100, Blaisorblade wrote:
> On Tuesday 13 March 2007 02:19, Nick Piggin wrote:
> > On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote:
> > > On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
> > > > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live
> > > > > with that as well, then I think it might be a good option.
> > > >
> > > > Oh, hmm if you can truncate these things then you still need to
> > > > force unmap so you still need i_mmap_nonlinear.
> > >
> > > Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug,
> > > which is way similar I guess.
> > >
> > > About the restriction to tmpfs, I have just discovered
> > > '[PATCH] mm: tracking shared dirty pages' (commit
> > > d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially
> > > conflicts with remap_file_pages for file-based mmaps (and that's fully
> > > fine, for now).
> > >
> > > Even if UML does not need it, till now if there is a VMA protection and a
> > > page hasn't been remapped with remap_file_pages, the VMA protection is
> > > used (just because it makes sense).
> > >
> > > However, it is only used when the PTE is first created - we can never
> > > change protections on a VMA  - so it vma_wants_writenotify() is true (on
> > > all file-based and on no shmfs based mapping, right?), and we
> > > write-protect the VMA, it will always be write-protected.
> >
> > Yes, I believe that is the case, however I wonder if that is going to be
> > a problem for you to distinguish between write faults for clean writable
> > ptes, and write faults for readonly ptes?
> I wouldn't be able to distinguish them, but am I going to get write faults 
> for 
> clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? 
> I guess not.
> 
> For tmpfs pages, clean writable PTEs are mapped as writable so they won't 
> give 
> any problem, since vma_wants_writenotify() is false for tmpfs. Correct?

Yes, that should be the case. So would this mean that nonlinear protections
don't work on regular files? I guess that's OK if Oracle and UML both use
tmpfs/shm?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-17 Thread Blaisorblade
On Tuesday 13 March 2007 02:19, Nick Piggin wrote:
> On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote:
> > On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
> > > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live
> > > > with that as well, then I think it might be a good option.
> > >
> > > Oh, hmm if you can truncate these things then you still need to
> > > force unmap so you still need i_mmap_nonlinear.
> >
> > Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug,
> > which is way similar I guess.
> >
> > About the restriction to tmpfs, I have just discovered
> > '[PATCH] mm: tracking shared dirty pages' (commit
> > d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially
> > conflicts with remap_file_pages for file-based mmaps (and that's fully
> > fine, for now).
> >
> > Even if UML does not need it, till now if there is a VMA protection and a
> > page hasn't been remapped with remap_file_pages, the VMA protection is
> > used (just because it makes sense).
> >
> > However, it is only used when the PTE is first created - we can never
> > change protections on a VMA  - so it vma_wants_writenotify() is true (on
> > all file-based and on no shmfs based mapping, right?), and we
> > write-protect the VMA, it will always be write-protected.
>
> Yes, I believe that is the case, however I wonder if that is going to be
> a problem for you to distinguish between write faults for clean writable
> ptes, and write faults for readonly ptes?
I wouldn't be able to distinguish them, but am I going to get write faults for 
clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? 
I guess not.

For tmpfs pages, clean writable PTEs are mapped as writable so they won't give 
any problem, since vma_wants_writenotify() is false for tmpfs. Correct?

> > Also, I'm curious. Since my patches are already changing
> > remap_file_pages() code, should they be absolutely merged after yours?
>
> Is there a big clash? I don't think I did a great deal to fremap.c (mainly
> just removing stuff)...
Hopefully, we just both modify sys_remap_file_pages(), I'll see soon.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-17 Thread Blaisorblade
On Tuesday 13 March 2007 02:19, Nick Piggin wrote:
 On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote:
  On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
Yeah, tmpfs/shm segs are what I was thinking about. If UML can live
with that as well, then I think it might be a good option.
  
   Oh, hmm if you can truncate these things then you still need to
   force unmap so you still need i_mmap_nonlinear.
 
  Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug,
  which is way similar I guess.
 
  About the restriction to tmpfs, I have just discovered
  '[PATCH] mm: tracking shared dirty pages' (commit
  d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially
  conflicts with remap_file_pages for file-based mmaps (and that's fully
  fine, for now).
 
  Even if UML does not need it, till now if there is a VMA protection and a
  page hasn't been remapped with remap_file_pages, the VMA protection is
  used (just because it makes sense).
 
  However, it is only used when the PTE is first created - we can never
  change protections on a VMA  - so it vma_wants_writenotify() is true (on
  all file-based and on no shmfs based mapping, right?), and we
  write-protect the VMA, it will always be write-protected.

 Yes, I believe that is the case, however I wonder if that is going to be
 a problem for you to distinguish between write faults for clean writable
 ptes, and write faults for readonly ptes?
I wouldn't be able to distinguish them, but am I going to get write faults for 
clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? 
I guess not.

For tmpfs pages, clean writable PTEs are mapped as writable so they won't give 
any problem, since vma_wants_writenotify() is false for tmpfs. Correct?

  Also, I'm curious. Since my patches are already changing
  remap_file_pages() code, should they be absolutely merged after yours?

 Is there a big clash? I don't think I did a great deal to fremap.c (mainly
 just removing stuff)...
Hopefully, we just both modify sys_remap_file_pages(), I'll see soon.
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-17 Thread Nick Piggin
On Sat, Mar 17, 2007 at 01:17:00PM +0100, Blaisorblade wrote:
 On Tuesday 13 March 2007 02:19, Nick Piggin wrote:
  On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote:
   On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
 Yeah, tmpfs/shm segs are what I was thinking about. If UML can live
 with that as well, then I think it might be a good option.
   
Oh, hmm if you can truncate these things then you still need to
force unmap so you still need i_mmap_nonlinear.
  
   Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug,
   which is way similar I guess.
  
   About the restriction to tmpfs, I have just discovered
   '[PATCH] mm: tracking shared dirty pages' (commit
   d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially
   conflicts with remap_file_pages for file-based mmaps (and that's fully
   fine, for now).
  
   Even if UML does not need it, till now if there is a VMA protection and a
   page hasn't been remapped with remap_file_pages, the VMA protection is
   used (just because it makes sense).
  
   However, it is only used when the PTE is first created - we can never
   change protections on a VMA  - so it vma_wants_writenotify() is true (on
   all file-based and on no shmfs based mapping, right?), and we
   write-protect the VMA, it will always be write-protected.
 
  Yes, I believe that is the case, however I wonder if that is going to be
  a problem for you to distinguish between write faults for clean writable
  ptes, and write faults for readonly ptes?
 I wouldn't be able to distinguish them, but am I going to get write faults 
 for 
 clean ptes when vma_wants_writenotify() is false (as seems to be for tmpfs)? 
 I guess not.
 
 For tmpfs pages, clean writable PTEs are mapped as writable so they won't 
 give 
 any problem, since vma_wants_writenotify() is false for tmpfs. Correct?

Yes, that should be the case. So would this mean that nonlinear protections
don't work on regular files? I guess that's OK if Oracle and UML both use
tmpfs/shm?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-12 Thread Nick Piggin
On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote:
> On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
> > >
> > > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with
> > > that as well, then I think it might be a good option.
> >
> > Oh, hmm if you can truncate these things then you still need to
> > force unmap so you still need i_mmap_nonlinear.
> 
> Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which 
> is 
> way similar I guess.
> 
> About the restriction to tmpfs, I have just discovered 
> '[PATCH] mm: tracking shared dirty pages' (commit 
> d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts 
> with remap_file_pages for file-based mmaps (and that's fully fine, for now).
> 
> Even if UML does not need it, till now if there is a VMA protection and a 
> page 
> hasn't been remapped with remap_file_pages, the VMA protection is used (just 
> because it makes sense).
> 
> However, it is only used when the PTE is first created - we can never change 
> protections on a VMA  - so it vma_wants_writenotify() is true (on all 
> file-based and on no shmfs based mapping, right?), and we write-protect the 
> VMA, it will always be write-protected.

Yes, I believe that is the case, however I wonder if that is going to be
a problem for you to distinguish between write faults for clean writable
ptes, and write faults for readonly ptes?

> That's no problem for UML, but for any other user (I guess I'll have to 
> prevent callers from trying such stuff - I started from a pretty generic 
> patch).
> 
> > But come to think of it, I still don't think nonlinear mappings are
> > too bad as they are ;)
> 
> Btw, I really like removing ->populate and merging the common code together. 
> filemap_populate and shmem_populate are so obnoxiously different that I 
> already wanted to do that (after merging remap_file_pages() core).

Yeah they are also frustratingly similar to filemap_nopage and shmem_nopage,
and duplicate a lot of the same code ;)

> Also, I'm curious. Since my patches are already changing remap_file_pages() 
> code, should they be absolutely merged after yours?

Is there a big clash? I don't think I did a great deal to fremap.c (mainly
just removing stuff)...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-12 Thread Blaisorblade
On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
> On Wed, Mar 07, 2007 at 10:49:47AM +0100, Nick Piggin wrote:
> > On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote:
> > > On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
> > > > Depending on whether anyone wants it, and what features they want, we
> > > > could emulate the old syscall, and make a new restricted one which is
> > > > much less intrusive.
> > > > For example, if we can operate only on MAP_ANONYMOUS memory and
> > > > specify that nonlinear mappings effectively mlock the pages, then we
> > > > can get rid of all the objrmap and unmap_mapping_range handling,
> > > > forget about the writeout and msync problems...
> > >
> > > Anonymous-only would make it a doorstop for Oracle, since its entire
> > > motive for using it is to window into objects larger than user virtual
> >
> > Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem
> > inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't
> > have a file descriptor to get a pgoff, then remap_file_pages is a
> > doorstop for everyone ;)
> >
> > > address spaces (this likely also applies to UML, though they should
> > > really chime in to confirm). Restrictions to tmpfs and/or ramfs would
> > > likely be liveable, though I suspect some things might want to do it to
> > > shm segments (I'll ask about that one). There's definitely no need for
> > > a persistent backing store for the object to be remapped in Oracle's
> > > case, in any event. It's largely the in-core destination and source of
> > > IO, not something saved on-disk itself.
> >
> > Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with
> > that as well, then I think it might be a good option.
>
> Oh, hmm if you can truncate these things then you still need to
> force unmap so you still need i_mmap_nonlinear.

Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which is 
way similar I guess.

About the restriction to tmpfs, I have just discovered 
'[PATCH] mm: tracking shared dirty pages' (commit 
d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts 
with remap_file_pages for file-based mmaps (and that's fully fine, for now).

Even if UML does not need it, till now if there is a VMA protection and a page 
hasn't been remapped with remap_file_pages, the VMA protection is used (just 
because it makes sense).

However, it is only used when the PTE is first created - we can never change 
protections on a VMA  - so it vma_wants_writenotify() is true (on all 
file-based and on no shmfs based mapping, right?), and we write-protect the 
VMA, it will always be write-protected.

That's no problem for UML, but for any other user (I guess I'll have to 
prevent callers from trying such stuff - I started from a pretty generic 
patch).

> But come to think of it, I still don't think nonlinear mappings are
> too bad as they are ;)

Btw, I really like removing ->populate and merging the common code together. 
filemap_populate and shmem_populate are so obnoxiously different that I 
already wanted to do that (after merging remap_file_pages() core).

Also, I'm curious. Since my patches are already changing remap_file_pages() 
code, should they be absolutely merged after yours?
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-12 Thread Blaisorblade
On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
 On Wed, Mar 07, 2007 at 10:49:47AM +0100, Nick Piggin wrote:
  On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote:
   On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
Depending on whether anyone wants it, and what features they want, we
could emulate the old syscall, and make a new restricted one which is
much less intrusive.
For example, if we can operate only on MAP_ANONYMOUS memory and
specify that nonlinear mappings effectively mlock the pages, then we
can get rid of all the objrmap and unmap_mapping_range handling,
forget about the writeout and msync problems...
  
   Anonymous-only would make it a doorstop for Oracle, since its entire
   motive for using it is to window into objects larger than user virtual
 
  Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem
  inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't
  have a file descriptor to get a pgoff, then remap_file_pages is a
  doorstop for everyone ;)
 
   address spaces (this likely also applies to UML, though they should
   really chime in to confirm). Restrictions to tmpfs and/or ramfs would
   likely be liveable, though I suspect some things might want to do it to
   shm segments (I'll ask about that one). There's definitely no need for
   a persistent backing store for the object to be remapped in Oracle's
   case, in any event. It's largely the in-core destination and source of
   IO, not something saved on-disk itself.
 
  Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with
  that as well, then I think it might be a good option.

 Oh, hmm if you can truncate these things then you still need to
 force unmap so you still need i_mmap_nonlinear.

Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which is 
way similar I guess.

About the restriction to tmpfs, I have just discovered 
'[PATCH] mm: tracking shared dirty pages' (commit 
d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts 
with remap_file_pages for file-based mmaps (and that's fully fine, for now).

Even if UML does not need it, till now if there is a VMA protection and a page 
hasn't been remapped with remap_file_pages, the VMA protection is used (just 
because it makes sense).

However, it is only used when the PTE is first created - we can never change 
protections on a VMA  - so it vma_wants_writenotify() is true (on all 
file-based and on no shmfs based mapping, right?), and we write-protect the 
VMA, it will always be write-protected.

That's no problem for UML, but for any other user (I guess I'll have to 
prevent callers from trying such stuff - I started from a pretty generic 
patch).

 But come to think of it, I still don't think nonlinear mappings are
 too bad as they are ;)

Btw, I really like removing -populate and merging the common code together. 
filemap_populate and shmem_populate are so obnoxiously different that I 
already wanted to do that (after merging remap_file_pages() core).

Also, I'm curious. Since my patches are already changing remap_file_pages() 
code, should they be absolutely merged after yours?
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-12 Thread Nick Piggin
On Tue, Mar 13, 2007 at 12:01:13AM +0100, Blaisorblade wrote:
 On Wednesday 07 March 2007 11:02, Nick Piggin wrote:
  
   Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with
   that as well, then I think it might be a good option.
 
  Oh, hmm if you can truncate these things then you still need to
  force unmap so you still need i_mmap_nonlinear.
 
 Well, we don't need truncate(), but MADV_REMOVE for memory hotunplug, which 
 is 
 way similar I guess.
 
 About the restriction to tmpfs, I have just discovered 
 '[PATCH] mm: tracking shared dirty pages' (commit 
 d08b3851da41d0ee60851f2c75b118e1f7a5fc89), which already partially conflicts 
 with remap_file_pages for file-based mmaps (and that's fully fine, for now).
 
 Even if UML does not need it, till now if there is a VMA protection and a 
 page 
 hasn't been remapped with remap_file_pages, the VMA protection is used (just 
 because it makes sense).
 
 However, it is only used when the PTE is first created - we can never change 
 protections on a VMA  - so it vma_wants_writenotify() is true (on all 
 file-based and on no shmfs based mapping, right?), and we write-protect the 
 VMA, it will always be write-protected.

Yes, I believe that is the case, however I wonder if that is going to be
a problem for you to distinguish between write faults for clean writable
ptes, and write faults for readonly ptes?

 That's no problem for UML, but for any other user (I guess I'll have to 
 prevent callers from trying such stuff - I started from a pretty generic 
 patch).
 
  But come to think of it, I still don't think nonlinear mappings are
  too bad as they are ;)
 
 Btw, I really like removing -populate and merging the common code together. 
 filemap_populate and shmem_populate are so obnoxiously different that I 
 already wanted to do that (after merging remap_file_pages() core).

Yeah they are also frustratingly similar to filemap_nopage and shmem_nopage,
and duplicate a lot of the same code ;)

 Also, I'm curious. Since my patches are already changing remap_file_pages() 
 code, should they be absolutely merged after yours?

Is there a big clash? I don't think I did a great deal to fremap.c (mainly
just removing stuff)...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-08 Thread Blaisorblade
On Wednesday 07 March 2007 10:44, Bill Irwin wrote:
> On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
> > Depending on whether anyone wants it, and what features they want, we
> > could emulate the old syscall, and make a new restricted one which is
> > much less intrusive.
> > For example, if we can operate only on MAP_ANONYMOUS memory and specify
> > that nonlinear mappings effectively mlock the pages, then we can get
> > rid of all the objrmap and unmap_mapping_range handling, forget about
> > the writeout and msync problems...
>
> Anonymous-only would make it a doorstop for Oracle, since its entire
> motive for using it is to window into objects larger than user virtual
> address spaces (this likely also applies to UML, though they should
> really chime in to confirm).

We need it for shared file mappings (for tmpfs only).

Our scenario is:
RAM is implemented through a shared mapped file, kept on tmpfs (except by dumb 
users); various processes share an fd for this file (it's opened and 
immediately deleted).

We maintain page tables in x86 style, and TLB flush is implemented through 
mmap()/munmap()/mprotect().

Having a VMA per each 4K is not the intended VMA usage: for instance, the 
default /proc/sys/vm/max_map_count (64K) is saturated by a UML process with 
64K * 4K = 256M of resident memory.

> Restrictions to tmpfs and/or ramfs would 
> likely be liveable, though I suspect some things might want to do it to
> shm segments (I'll ask about that one).

> There's definitely no need for a 
> persistent backing store for the object to be remapped in Oracle's case,
> in any event. It's largely the in-core destination and source of IO, not
> something saved on-disk itself.
>
>
> -- wli

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-08 Thread Blaisorblade
On Wednesday 07 March 2007 10:44, Bill Irwin wrote:
 On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
  Depending on whether anyone wants it, and what features they want, we
  could emulate the old syscall, and make a new restricted one which is
  much less intrusive.
  For example, if we can operate only on MAP_ANONYMOUS memory and specify
  that nonlinear mappings effectively mlock the pages, then we can get
  rid of all the objrmap and unmap_mapping_range handling, forget about
  the writeout and msync problems...

 Anonymous-only would make it a doorstop for Oracle, since its entire
 motive for using it is to window into objects larger than user virtual
 address spaces (this likely also applies to UML, though they should
 really chime in to confirm).

We need it for shared file mappings (for tmpfs only).

Our scenario is:
RAM is implemented through a shared mapped file, kept on tmpfs (except by dumb 
users); various processes share an fd for this file (it's opened and 
immediately deleted).

We maintain page tables in x86 style, and TLB flush is implemented through 
mmap()/munmap()/mprotect().

Having a VMA per each 4K is not the intended VMA usage: for instance, the 
default /proc/sys/vm/max_map_count (64K) is saturated by a UML process with 
64K * 4K = 256M of resident memory.

 Restrictions to tmpfs and/or ramfs would 
 likely be liveable, though I suspect some things might want to do it to
 shm segments (I'll ask about that one).

 There's definitely no need for a 
 persistent backing store for the object to be remapped in Oracle's case,
 in any event. It's largely the in-core destination and source of IO, not
 something saved on-disk itself.


 -- wli

-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Jeff Dike
On Wed, Mar 07, 2007 at 02:52:12PM +0100, Peter Zijlstra wrote:
> > Well I don't think UML uses nonlinear yet anyway, does it? Can they
> > make do with restricting nonlinear to mlocked vmas, I wonder? Probably
> > not.
> 
> I think it does, but lets ask, Jeff?

Nope, UML needs to be able to change permissions as well as locations.

Would be nice, though, there are apparently nice UML speedups with it.

Jeff

-- 
Work email - jdike at linux dot intel dot com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 03:34:27PM +0100, Peter Zijlstra wrote:
> On Wed, 2007-03-07 at 14:52 +0100, Peter Zijlstra wrote:
> 
> > True. We could even guesstimate the nonlinear dirty pages by subtracting
> > the result of page_mkclean() from page_mapcount() and force an
> > msync(MS_ASYNC) on said mapping (or all (nonlinear) mappings of the
> > related file) when some threshold gets exceeded.
> 
> Almost, but not quite, we'd need to extract another value from the
> page_mkclean() run, the actual number of mappings encountered. The
> return value only sums the number of dirty mappings encountered.
> 
> s390 would already work I guess.
> 
> Certainly doable.

But if we restrict it to root only, and have a note in the man page
about it, then it really isn't worth cluttering up the kernel.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 02:53:07PM +0100, Miklos Szeredi wrote:
> > > msync() might never get called and then we're back with the old
> > > behaviour where we can surprise the VM with a ton of dirty pages.
> > 
> > But we're root. With your patch, root *can't* do nonlinear writeback
> > well. Ever. With msync, at least you give them enough rope.
> 
> Restricting to root doesn't buy you much, nobody wants to be root.
> Restricting to mlock is similarly pointless.  UML _will_ want to get
> swapped out if there's no activity.

They could always not use nonlinear, or we could add a ulimit to the
size of nonlinear vaddr allowed. 

> Restricting to tmpfs makes sense, but it's probably not what UML
> wants.

I think it is OK. They might want some persistent storage to migrate
or something, but that can always be done by copying from tmpfs to
a block based filesystem.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 14:52 +0100, Peter Zijlstra wrote:

> True. We could even guesstimate the nonlinear dirty pages by subtracting
> the result of page_mkclean() from page_mapcount() and force an
> msync(MS_ASYNC) on said mapping (or all (nonlinear) mappings of the
> related file) when some threshold gets exceeded.

Almost, but not quite, we'd need to extract another value from the
page_mkclean() run, the actual number of mappings encountered. The
return value only sums the number of dirty mappings encountered.

s390 would already work I guess.

Certainly doable.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
> > Well I don't think UML uses nonlinear yet anyway, does it? Can they
> > make do with restricting nonlinear to mlocked vmas, I wonder? Probably
> > not.
> 
> I think it does, but lets ask, Jeff?

Looks like it doesn't:

$ grep -r remap_file_pages arch/um/
$

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
> On Wed, Mar 07, 2007 at 02:19:22PM +0100, Peter Zijlstra wrote:
> > On Wed, 2007-03-07 at 14:08 +0100, Nick Piggin wrote:
> > 
> > > > > The thing is, I don't think anybody who uses these things cares
> > > > > about any of the 'problems' you want to fix, do they? We are
> > > > > interested in dirty pages only for the correctness issue, rather
> > > > > than performance. Same as reclaim.
> > > > 
> > > > If so, we can just stick to the dead slow but correct 'scan the full
> > > > vma' page_mkclean() and nobody would ever trigger it.
> > > 
> > > Not if we restricted it to root and mlocked tmpfs. But then why
> > > wouldn't you just do it with the much more efficient msync walk,
> > > so that if root does want to do writeout via these things, it does
> > > not blow up?
> > 
> > This is all used on ram based filesystems right, they all have
> > BDI_CAP_NO_WRITEBACK afaik, so page_mkclean will never get called
> > anyway. Mlock doesn't avoid getting page_mkclean called.
> > 
> > Those who use this on a 'real' filesystem will get hit in the face by a
> > linear scanning page_mkclean(), but AFAIK nobody does this anyway.
> 
> But somebody might do it. I just don't know why you'd want to make
> this _worse_ when the msync option would work?
> 
> > Restricting it to root for such filesystems is unwanted, that'd severely
> > handicap both UML and Oracle as I understand it (are there other users
> > of this feature around?)
> 
> Why? I think they all use tmpfs backings, don't they?
> 
> > msync() might never get called and then we're back with the old
> > behaviour where we can surprise the VM with a ton of dirty pages.
> 
> But we're root. With your patch, root *can't* do nonlinear writeback
> well. Ever. With msync, at least you give them enough rope.

Restricting to root doesn't buy you much, nobody wants to be root.
Restricting to mlock is similarly pointless.  UML _will_ want to get
swapped out if there's no activity.

Restricting to tmpfs makes sense, but it's probably not what UML
wants.

Conclusion: there's no good solution for UML in kernel-space.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 14:36 +0100, Nick Piggin wrote:
> On Wed, Mar 07, 2007 at 02:19:22PM +0100, Peter Zijlstra wrote:
> > On Wed, 2007-03-07 at 14:08 +0100, Nick Piggin wrote:
> > 
> > > > > The thing is, I don't think anybody who uses these things cares
> > > > > about any of the 'problems' you want to fix, do they? We are
> > > > > interested in dirty pages only for the correctness issue, rather
> > > > > than performance. Same as reclaim.
> > > > 
> > > > If so, we can just stick to the dead slow but correct 'scan the full
> > > > vma' page_mkclean() and nobody would ever trigger it.
> > > 
> > > Not if we restricted it to root and mlocked tmpfs. But then why
> > > wouldn't you just do it with the much more efficient msync walk,
> > > so that if root does want to do writeout via these things, it does
> > > not blow up?
> > 
> > This is all used on ram based filesystems right, they all have
> > BDI_CAP_NO_WRITEBACK afaik, so page_mkclean will never get called
> > anyway. Mlock doesn't avoid getting page_mkclean called.
> > 
> > Those who use this on a 'real' filesystem will get hit in the face by a
> > linear scanning page_mkclean(), but AFAIK nobody does this anyway.
> 
> But somebody might do it. I just don't know why you'd want to make
> this _worse_ when the msync option would work?
> 
> > Restricting it to root for such filesystems is unwanted, that'd severely
> > handicap both UML and Oracle as I understand it (are there other users
> > of this feature around?)
> 
> Why? I think they all use tmpfs backings, don't they?

Ooh, you only want to restrict remap_file_pages on mappings from bdi's
without BDI_CAP_NO_WRITEBACK. Sure, I can live with that, and I suspect
others can as well.

> > msync() might never get called and then we're back with the old
> > behaviour where we can surprise the VM with a ton of dirty pages.
> 
> But we're root. With your patch, root *can't* do nonlinear writeback
> well. Ever. With msync, at least you give them enough rope.

True. We could even guesstimate the nonlinear dirty pages by subtracting
the result of page_mkclean() from page_mapcount() and force an
msync(MS_ASYNC) on said mapping (or all (nonlinear) mappings of the
related file) when some threshold gets exceeded.

> > > > What is the DoS scenario wrt reclaim? We really ought to fix that if
> > > > real, those UML farms run on nothing but nonlinear reclaim I'd think.
> > > 
> > > I guess you can just increase the computational complexity of
> > > reclaim quite easily.
> > 
> > Right, on first glance it doesn't look to be too bad, but I should take
> > a closer look.
> 
> Well I don't think UML uses nonlinear yet anyway, does it? Can they
> make do with restricting nonlinear to mlocked vmas, I wonder? Probably
> not.

I think it does, but lets ask, Jeff?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 02:19:22PM +0100, Peter Zijlstra wrote:
> On Wed, 2007-03-07 at 14:08 +0100, Nick Piggin wrote:
> 
> > > > The thing is, I don't think anybody who uses these things cares
> > > > about any of the 'problems' you want to fix, do they? We are
> > > > interested in dirty pages only for the correctness issue, rather
> > > > than performance. Same as reclaim.
> > > 
> > > If so, we can just stick to the dead slow but correct 'scan the full
> > > vma' page_mkclean() and nobody would ever trigger it.
> > 
> > Not if we restricted it to root and mlocked tmpfs. But then why
> > wouldn't you just do it with the much more efficient msync walk,
> > so that if root does want to do writeout via these things, it does
> > not blow up?
> 
> This is all used on ram based filesystems right, they all have
> BDI_CAP_NO_WRITEBACK afaik, so page_mkclean will never get called
> anyway. Mlock doesn't avoid getting page_mkclean called.
> 
> Those who use this on a 'real' filesystem will get hit in the face by a
> linear scanning page_mkclean(), but AFAIK nobody does this anyway.

But somebody might do it. I just don't know why you'd want to make
this _worse_ when the msync option would work?

> Restricting it to root for such filesystems is unwanted, that'd severely
> handicap both UML and Oracle as I understand it (are there other users
> of this feature around?)

Why? I think they all use tmpfs backings, don't they?

> msync() might never get called and then we're back with the old
> behaviour where we can surprise the VM with a ton of dirty pages.

But we're root. With your patch, root *can't* do nonlinear writeback
well. Ever. With msync, at least you give them enough rope.

> > > What is the DoS scenario wrt reclaim? We really ought to fix that if
> > > real, those UML farms run on nothing but nonlinear reclaim I'd think.
> > 
> > I guess you can just increase the computational complexity of
> > reclaim quite easily.
> 
> Right, on first glance it doesn't look to be too bad, but I should take
> a closer look.

Well I don't think UML uses nonlinear yet anyway, does it? Can they
make do with restricting nonlinear to mlocked vmas, I wonder? Probably
not.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 14:08 +0100, Nick Piggin wrote:

> > > The thing is, I don't think anybody who uses these things cares
> > > about any of the 'problems' you want to fix, do they? We are
> > > interested in dirty pages only for the correctness issue, rather
> > > than performance. Same as reclaim.
> > 
> > If so, we can just stick to the dead slow but correct 'scan the full
> > vma' page_mkclean() and nobody would ever trigger it.
> 
> Not if we restricted it to root and mlocked tmpfs. But then why
> wouldn't you just do it with the much more efficient msync walk,
> so that if root does want to do writeout via these things, it does
> not blow up?

This is all used on ram based filesystems right, they all have
BDI_CAP_NO_WRITEBACK afaik, so page_mkclean will never get called
anyway. Mlock doesn't avoid getting page_mkclean called.

Those who use this on a 'real' filesystem will get hit in the face by a
linear scanning page_mkclean(), but AFAIK nobody does this anyway.

Restricting it to root for such filesystems is unwanted, that'd severely
handicap both UML and Oracle as I understand it (are there other users
of this feature around?)

msync() might never get called and then we're back with the old
behaviour where we can surprise the VM with a ton of dirty pages.

> > What is the DoS scenario wrt reclaim? We really ought to fix that if
> > real, those UML farms run on nothing but nonlinear reclaim I'd think.
> 
> I guess you can just increase the computational complexity of
> reclaim quite easily.

Right, on first glance it doesn't look to be too bad, but I should take
a closer look.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 01:41:26PM +0100, Peter Zijlstra wrote:
> On Wed, 2007-03-07 at 13:17 +0100, Nick Piggin wrote:
> 
> > > Tracking these ranges on a per-vma basis would avoid taking the mm wide
> > > mmap_sem and so would be cheaper than regular vmas.
> > > 
> > > Would that still be too expensive?
> > 
> > Well you can today remap N pages in a file, arbitrarily for
> > sizeof(pte_t)*tiny bit for the upper page tables + small constant
> > for the vma.
> > 
> > At best, you need an extra pointer to pte / vaddr, so you'd basically
> > double memory overhead.
> 
> I was hoping some form of range compression would gain something, but if
> its a fully random mapping, then yes a shadow page table would be needed
> (still looking into what a pte_chain is)
> 
> > > > > Well, now they don't, but it could be done or even exploited as a DoS.
> > > > 
> > > > But so could nonlinear page reclaim. I think we need to restrict 
> > > > nonlinear
> > > > mappings to root if we're worried about that.
> > > 
> > > Can't we just 'fix' it?
> > 
> > The thing is, I don't think anybody who uses these things cares
> > about any of the 'problems' you want to fix, do they? We are
> > interested in dirty pages only for the correctness issue, rather
> > than performance. Same as reclaim.
> 
> If so, we can just stick to the dead slow but correct 'scan the full
> vma' page_mkclean() and nobody would ever trigger it.

Not if we restricted it to root and mlocked tmpfs. But then why
wouldn't you just do it with the much more efficient msync walk,
so that if root does want to do writeout via these things, it does
not blow up?

> What is the DoS scenario wrt reclaim? We really ought to fix that if
> real, those UML farms run on nothing but nonlinear reclaim I'd think.

I guess you can just increase the computational complexity of
reclaim quite easily.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 13:17 +0100, Nick Piggin wrote:

> > Tracking these ranges on a per-vma basis would avoid taking the mm wide
> > mmap_sem and so would be cheaper than regular vmas.
> > 
> > Would that still be too expensive?
> 
> Well you can today remap N pages in a file, arbitrarily for
> sizeof(pte_t)*tiny bit for the upper page tables + small constant
> for the vma.
> 
> At best, you need an extra pointer to pte / vaddr, so you'd basically
> double memory overhead.

I was hoping some form of range compression would gain something, but if
its a fully random mapping, then yes a shadow page table would be needed
(still looking into what a pte_chain is)

> > > > Well, now they don't, but it could be done or even exploited as a DoS.
> > > 
> > > But so could nonlinear page reclaim. I think we need to restrict nonlinear
> > > mappings to root if we're worried about that.
> > 
> > Can't we just 'fix' it?
> 
> The thing is, I don't think anybody who uses these things cares
> about any of the 'problems' you want to fix, do they? We are
> interested in dirty pages only for the correctness issue, rather
> than performance. Same as reclaim.

If so, we can just stick to the dead slow but correct 'scan the full
vma' page_mkclean() and nobody would ever trigger it.

What is the DoS scenario wrt reclaim? We really ought to fix that if
real, those UML farms run on nothing but nonlinear reclaim I'd think.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 04:22:24AM -0800, Bill Irwin wrote:
> On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote:
> >> Well, now they don't, but it could be done or even exploited as a DoS.
> 
> On Wed, Mar 07, 2007 at 12:00:36PM +0100, Nick Piggin wrote:
> > But so could nonlinear page reclaim. I think we need to restrict nonlinear
> > mappings to root if we're worried about that.
> 
> Please not root. The users really don't want to be privileged. UML
> itself is at least partly for use as privilege isolation of the guest
> workload. Oracle has some of the same concerns itself, which is part of
> why it uses separate processes heavily, even: to isolate instances from
> each other.

Well non-root users could be allowed to work on mlocked regions on
tmpfs/shm. That way they avoid the pathological nonlinear problems,
and can work within the mlock ulimit.

That is, if we are worried about such a DoS.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote:
>> Well, now they don't, but it could be done or even exploited as a DoS.

On Wed, Mar 07, 2007 at 12:00:36PM +0100, Nick Piggin wrote:
> But so could nonlinear page reclaim. I think we need to restrict nonlinear
> mappings to root if we're worried about that.

Please not root. The users really don't want to be privileged. UML
itself is at least partly for use as privilege isolation of the guest
workload. Oracle has some of the same concerns itself, which is part of
why it uses separate processes heavily, even: to isolate instances from
each other.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 12:48:06PM +0100, Peter Zijlstra wrote:
> On Wed, 2007-03-07 at 12:00 +0100, Nick Piggin wrote:
> > On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote:
> > > On Wed, 2007-03-07 at 11:38 +0100, Nick Piggin wrote:
> > > 
> > > > > > There are real users who want these fast, though.
> > > > > 
> > > > > Yeah, why don't we have a tree per nonlinear vma to find these pages?
> > > > > 
> > > > > wli mentions shadow page tables..
> > > > 
> > > > We could do something more efficient, but I thought that half the point
> > > > was that they didn't carry any of this extra memory, and they could be
> > > > really fast to set up at the expense of efficiency elsewhere.
> > > 
> > > I'm failing to understand this :-(
> > > 
> > > That extra memory, and apparently they don't want the inefficiency
> 
> s/T/W/
> 
> > > either.
> > 
> > Sorry, I didn't understand your misunderstandings ;)
> 
> Bah, my brain is thick and foggy today. Let us try again;
> 
> Nonlinear vmas exist because many vmas are expensive somehow, right?
> Nonlinear vmas keep the page mapping in the page tables and screw rmaps.
> 
> This 'extra memory' you mentioned would be the overhead of tracking the
> actual ranges?
> 
> And apparently now we want it to not suck on the rmap case :-(

Do we? I think just "work" is the way we've been handling them up until
now. Making them suck less for rmap makes them suck more for what they're
good at.

> Anyway, if used on a non writeback capable backing store (ramfs)
> page_mkclean will never be called. If also mlocked (I think oracle does
> this) then page reclaim will pass over too.
> 
> So we're only interested in the bdi_cap_accounting_dirty and VM_SHARED
> case, right?
> 
> Tracking these ranges on a per-vma basis would avoid taking the mm wide
> mmap_sem and so would be cheaper than regular vmas.
> 
> Would that still be too expensive?

Well you can today remap N pages in a file, arbitrarily for
sizeof(pte_t)*tiny bit for the upper page tables + small constant
for the vma.

At best, you need an extra pointer to pte / vaddr, so you'd basically
double memory overhead.

> > > Well, now they don't, but it could be done or even exploited as a DoS.
> > 
> > But so could nonlinear page reclaim. I think we need to restrict nonlinear
> > mappings to root if we're worried about that.
> 
> Can't we just 'fix' it?

The thing is, I don't think anybody who uses these things cares
about any of the 'problems' you want to fix, do they? We are
interested in dirty pages only for the correctness issue, rather
than performance. Same as reclaim.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 12:00 +0100, Nick Piggin wrote:
> On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote:
> > On Wed, 2007-03-07 at 11:38 +0100, Nick Piggin wrote:
> > 
> > > > > There are real users who want these fast, though.
> > > > 
> > > > Yeah, why don't we have a tree per nonlinear vma to find these pages?
> > > > 
> > > > wli mentions shadow page tables..
> > > 
> > > We could do something more efficient, but I thought that half the point
> > > was that they didn't carry any of this extra memory, and they could be
> > > really fast to set up at the expense of efficiency elsewhere.
> > 
> > I'm failing to understand this :-(
> > 
> > That extra memory, and apparently they don't want the inefficiency

s/T/W/

> > either.
> 
> Sorry, I didn't understand your misunderstandings ;)

Bah, my brain is thick and foggy today. Let us try again;

Nonlinear vmas exist because many vmas are expensive somehow, right?
Nonlinear vmas keep the page mapping in the page tables and screw rmaps.

This 'extra memory' you mentioned would be the overhead of tracking the
actual ranges?

And apparently now we want it to not suck on the rmap case :-(

Anyway, if used on a non writeback capable backing store (ramfs)
page_mkclean will never be called. If also mlocked (I think oracle does
this) then page reclaim will pass over too.

So we're only interested in the bdi_cap_accounting_dirty and VM_SHARED
case, right?

Tracking these ranges on a per-vma basis would avoid taking the mm wide
mmap_sem and so would be cheaper than regular vmas.

Would that still be too expensive?

> > Well, now they don't, but it could be done or even exploited as a DoS.
> 
> But so could nonlinear page reclaim. I think we need to restrict nonlinear
> mappings to root if we're worried about that.

Can't we just 'fix' it?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 11:47:42AM +0100, Peter Zijlstra wrote:
> On Wed, 2007-03-07 at 11:38 +0100, Nick Piggin wrote:
> 
> > > > There are real users who want these fast, though.
> > > 
> > > Yeah, why don't we have a tree per nonlinear vma to find these pages?
> > > 
> > > wli mentions shadow page tables..
> > 
> > We could do something more efficient, but I thought that half the point
> > was that they didn't carry any of this extra memory, and they could be
> > really fast to set up at the expense of efficiency elsewhere.
> 
> I'm failing to understand this :-(
> 
> That extra memory, and apparently they don't want the inefficiency
> either.

Sorry, I didn't understand your misunderstandings ;)

> 
> > I don't see it being a big deal. I doubt anybody is writing out huge
> > amounts of data via nonlinear mappings.
> 
> Well, now they don't, but it could be done or even exploited as a DoS.

But so could nonlinear page reclaim. I think we need to restrict nonlinear
mappings to root if we're worried about that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 11:38 +0100, Nick Piggin wrote:

> > > There are real users who want these fast, though.
> > 
> > Yeah, why don't we have a tree per nonlinear vma to find these pages?
> > 
> > wli mentions shadow page tables..
> 
> We could do something more efficient, but I thought that half the point
> was that they didn't carry any of this extra memory, and they could be
> really fast to set up at the expense of efficiency elsewhere.

I'm failing to understand this :-(

That extra memory, and apparently they don't want the inefficiency
either.

> I don't see it being a big deal. I doubt anybody is writing out huge
> amounts of data via nonlinear mappings.

Well, now they don't, but it could be done or even exploited as a DoS.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Benjamin Herrenschmidt
On Wed, 2007-03-07 at 11:17 +0100, Nick Piggin wrote:
> On Wed, Mar 07, 2007 at 11:05:48AM +0100, Benjamin Herrenschmidt wrote:
> > 
> > > > NOPAGE_REFAULT is removed. This should be implemented with ->fault, and
> > > > no users have hit mainline yet.
> > > 
> > > Did benh agree with that?
> > 
> > I won't use NOPAGE_REFAULT, I use NOPFN_REFAULT and that has hit
> > mainline. I will switch to ->fault when I have time to adapt the code,
> > in the meantime, NOPFN_REFAULT should stay.
> 
> I think I removed not only NOFPN_REFAULT, but also nopfn itself, *and*
> adapted the code for you ;) it is in patch 5/6, sent a while ago. 

Ok, I need to look. I've been travelling, having meeting etc... for the
last couple of weeks and I'm taking a week off next week :-)

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 11:24:45AM +0100, Peter Zijlstra wrote:
> On Wed, 2007-03-07 at 11:21 +0100, Nick Piggin wrote:
> > On Wed, Mar 07, 2007 at 11:13:20AM +0100, Miklos Szeredi wrote:
> > > > *sigh* yes was looking at all that code, thats gonna be darn slow
> > > > though, but I'll whip up a patch.
> > > 
> > > Well, if it's going to be darn slow, maybe it's better to go with
> > > mingo's plan on emulating nonlinear vmas with linear ones.  That'll be
> > 
> > There are real users who want these fast, though.
> 
> Yeah, why don't we have a tree per nonlinear vma to find these pages?
> 
> wli mentions shadow page tables..

We could do something more efficient, but I thought that half the point
was that they didn't carry any of this extra memory, and they could be
really fast to set up at the expense of efficiency elsewhere.

> > > darn slow as well, but at least it will be much less complicated.
> > 
> > IMO, the best thing to do is just restore msync behaviour, and comment
> > the fact that we ignore nonlinears. We need to restore msync behaviour
> > to fix races in regular mappings anyway, at least for now.
> 
> Seems to be the best quick solution indeed.

If we fix the race in the linear mappings, then we can just do the full
msync for nonlinear vmas, and the fast noop version for everyone else.

I don't see it being a big deal. I doubt anybody is writing out huge
amounts of data via nonlinear mappings.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 11:21 +0100, Nick Piggin wrote:
> On Wed, Mar 07, 2007 at 11:13:20AM +0100, Miklos Szeredi wrote:
> > > *sigh* yes was looking at all that code, thats gonna be darn slow
> > > though, but I'll whip up a patch.
> > 
> > Well, if it's going to be darn slow, maybe it's better to go with
> > mingo's plan on emulating nonlinear vmas with linear ones.  That'll be
> 
> There are real users who want these fast, though.

Yeah, why don't we have a tree per nonlinear vma to find these pages?

wli mentions shadow page tables..

> > darn slow as well, but at least it will be much less complicated.
> 
> IMO, the best thing to do is just restore msync behaviour, and comment
> the fact that we ignore nonlinears. We need to restore msync behaviour
> to fix races in regular mappings anyway, at least for now.

Seems to be the best quick solution indeed.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 11:13:20AM +0100, Miklos Szeredi wrote:
> > *sigh* yes was looking at all that code, thats gonna be darn slow
> > though, but I'll whip up a patch.
> 
> Well, if it's going to be darn slow, maybe it's better to go with
> mingo's plan on emulating nonlinear vmas with linear ones.  That'll be

There are real users who want these fast, though.

> darn slow as well, but at least it will be much less complicated.

IMO, the best thing to do is just restore msync behaviour, and comment
the fact that we ignore nonlinears. We need to restore msync behaviour
to fix races in regular mappings anyway, at least for now.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 11:05:48AM +0100, Benjamin Herrenschmidt wrote:
> 
> > > NOPAGE_REFAULT is removed. This should be implemented with ->fault, and
> > > no users have hit mainline yet.
> > 
> > Did benh agree with that?
> 
> I won't use NOPAGE_REFAULT, I use NOPFN_REFAULT and that has hit
> mainline. I will switch to ->fault when I have time to adapt the code,
> in the meantime, NOPFN_REFAULT should stay.

I think I removed not only NOFPN_REFAULT, but also nopfn itself, *and*
adapted the code for you ;) it is in patch 5/6, sent a while ago. 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
> *sigh* yes was looking at all that code, thats gonna be darn slow
> though, but I'll whip up a patch.

Well, if it's going to be darn slow, maybe it's better to go with
mingo's plan on emulating nonlinear vmas with linear ones.  That'll be
darn slow as well, but at least it will be much less complicated.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Benjamin Herrenschmidt

> > NOPAGE_REFAULT is removed. This should be implemented with ->fault, and
> > no users have hit mainline yet.
> 
> Did benh agree with that?

I won't use NOPAGE_REFAULT, I use NOPFN_REFAULT and that has hit
mainline. I will switch to ->fault when I have time to adapt the code,
in the meantime, NOPFN_REFAULT should stay.

Note that one thing we really want with the new ->fault (though I
haven't looked at the patches lately to see if it's available) is to be
able to differenciate faults coming from userspace from faults coming
from the kernel. The major difference is that the former can be
re-executed to handle signals, the later can't. Thus waiting in the
fault handler can be made interruptible in the former case, not in the
later case.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
On Wed, 7 Mar 2007 01:29:03 -0800 Bill Irwin <[EMAIL PROTECTED]> wrote:
>> Guess what major real-life application not only uses nonlinear daily
>> but would even be very happy to see it extended with non-vma-creating
>> protections and more?

On Wed, Mar 07, 2007 at 01:39:42AM -0800, Andrew Morton wrote:
> uh-oh.  SQL server?

Close enough. ;)


On Wed, 7 Mar 2007 01:29:03 -0800 Bill Irwin <[EMAIL PROTECTED]> wrote:
>> It's not terribly typical for things to be
>> truncated while remap_file_pages() is doing its work, though it's been
>> proposed as a method of dynamism. It won't stress remap_file_pages() vs.
>> truncate() in any meaningful way, though, as userspace will be rather
>> diligent about clearing in-use data out of the file offset range to be
>> truncated away anyway, and all that via O_DIRECT.

On Wed, Mar 07, 2007 at 01:39:42AM -0800, Andrew Morton wrote:
> The problem here isn't related to truncate or direct-IO.  It's just
> plain-old MAP_SHARED.  nonlinear VMAs are now using the old-style
> dirty-memory management.  msync() is basically a no-op and the code is
> wildly tricky and pretty much untested.  The chances that we broke it are
> considerable.

This would be of concern for swapping out tmpfs-backed nonlinearly-
mapped files under extreme stress in Oracle's case, though it's rather
typical for it all to be mlock()'d in-core and cases where that's
necessary to be considered grossly underprovisioned. As far as I know,
msync() is not used to manage the nonlinearly-mapped objects, which are
most typically expected to be memory-backed, rendering writeback to
disk of questionable value. Also quite happily, I'm not aware of any
data integrity issues it would explain. Bug though it may be, it
requires a usage model very rarely used by Oracle to trigger, so we've
not run into it.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 11:04 +0100, Nick Piggin wrote:
> On Wed, Mar 07, 2007 at 10:45:03AM +0100, Nick Piggin wrote:
> > On Wed, Mar 07, 2007 at 10:32:22AM +0100, Peter Zijlstra wrote:
> > > 
> > > Can recollect as much, I modelled it after page_referenced() and can't
> > > find any VM_NONLINEAR specific code in there either.
> > > 
> > > Will have a hard look, but if its broken, then page_referenced if
> > > equally broken it seems, which would make page reclaim funny in the
> > > light of nonlinear mappings.
> > 
> > page_referenced is just an heuristic, and it ignores nonlinear mappings
> > and the page which will get filtered down to try_to_unmap.
> > 
> > Page reclaim is already "funny" for nonlinear mappings, page_referenced
> > is the least of its worries ;) It works, though.
> 
> Or, to be more helpful, unmap_mapping_range is what it should be
> modelled on.

*sigh* yes was looking at all that code, thats gonna be darn slow
though, but I'll whip up a patch.

/me feels terribly bad about having missed this..

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 10:45:03AM +0100, Nick Piggin wrote:
> On Wed, Mar 07, 2007 at 10:32:22AM +0100, Peter Zijlstra wrote:
> > 
> > Can recollect as much, I modelled it after page_referenced() and can't
> > find any VM_NONLINEAR specific code in there either.
> > 
> > Will have a hard look, but if its broken, then page_referenced if
> > equally broken it seems, which would make page reclaim funny in the
> > light of nonlinear mappings.
> 
> page_referenced is just an heuristic, and it ignores nonlinear mappings
> and the page which will get filtered down to try_to_unmap.
> 
> Page reclaim is already "funny" for nonlinear mappings, page_referenced
> is the least of its worries ;) It works, though.

Or, to be more helpful, unmap_mapping_range is what it should be
modelled on.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 10:49:47AM +0100, Nick Piggin wrote:
> On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote:
> > On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
> > > Depending on whether anyone wants it, and what features they want, we
> > > could emulate the old syscall, and make a new restricted one which is
> > > much less intrusive.
> > > For example, if we can operate only on MAP_ANONYMOUS memory and specify
> > > that nonlinear mappings effectively mlock the pages, then we can get
> > > rid of all the objrmap and unmap_mapping_range handling, forget about
> > > the writeout and msync problems...
> > 
> > Anonymous-only would make it a doorstop for Oracle, since its entire
> > motive for using it is to window into objects larger than user virtual
> 
> Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem
> inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't
> have a file descriptor to get a pgoff, then remap_file_pages is a doorstop
> for everyone ;)
> 
> > address spaces (this likely also applies to UML, though they should
> > really chime in to confirm). Restrictions to tmpfs and/or ramfs would
> > likely be liveable, though I suspect some things might want to do it to
> > shm segments (I'll ask about that one). There's definitely no need for a
> > persistent backing store for the object to be remapped in Oracle's case,
> > in any event. It's largely the in-core destination and source of IO, not
> > something saved on-disk itself.
> 
> Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with
> that as well, then I think it might be a good option.

Oh, hmm if you can truncate these things then you still need to
force unmap so you still need i_mmap_nonlinear.

But come to think of it, I still don't think nonlinear mappings are
too bad as they are ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote:
>>> ok. What do you think about the sys_remap_file_pages_prot() thing that 
>>> Paolo has done in a nicely split up form - does that complicate things 
>>> in any fundamental way? That is what is useful to UML.

* Bill Irwin <[EMAIL PROTECTED]> wrote:
>> Oracle would love it. You don't want to know how far back I've been 
>> asked to backport that.

On Wed, Mar 07, 2007 at 10:35:18AM +0100, Ingo Molnar wrote:
> ok, cool! Then the first step would be for you to talk to Paolo and to 
> pick up the patches, review them, nurse it in -mm, etc. Suffering in 
> silence is just a pointless act of masochism, not an efficient 
> upstream-merge tactic ;-)

It was intended for use in a debugging mode for the database, so given
the general mood where fighting backouts was an issue, I was relatively
loath to bring it up. With UML behind it I don't feel that's as much of
a concern.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > After these patches, I don't think there is too much burden. The main 
> > thing left really is just the objrmap stuff, but that is just handled 
> > with a minimal 'dumb' algorithm that doesn't cost much.
> 
> ok. What do you think about the sys_remap_file_pages_prot() thing that 
> Paolo has done in a nicely split up form - does that complicate things 
> in any fundamental way? That is what is useful to UML.

Last time I looked (a while ago), the only issue I had was that he was
doing a weird special case rather than using another !present pte bit
for his "nonlinear protection" ptes.

I think he fixed that now and so it should be quite good now.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote:
> On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
> > Depending on whether anyone wants it, and what features they want, we
> > could emulate the old syscall, and make a new restricted one which is
> > much less intrusive.
> > For example, if we can operate only on MAP_ANONYMOUS memory and specify
> > that nonlinear mappings effectively mlock the pages, then we can get
> > rid of all the objrmap and unmap_mapping_range handling, forget about
> > the writeout and msync problems...
> 
> Anonymous-only would make it a doorstop for Oracle, since its entire
> motive for using it is to window into objects larger than user virtual

Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem
inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't
have a file descriptor to get a pgoff, then remap_file_pages is a doorstop
for everyone ;)

> address spaces (this likely also applies to UML, though they should
> really chime in to confirm). Restrictions to tmpfs and/or ramfs would
> likely be liveable, though I suspect some things might want to do it to
> shm segments (I'll ask about that one). There's definitely no need for a
> persistent backing store for the object to be remapped in Oracle's case,
> in any event. It's largely the in-core destination and source of IO, not
> something saved on-disk itself.

Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with
that as well, then I think it might be a good option.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
> Depending on whether anyone wants it, and what features they want, we
> could emulate the old syscall, and make a new restricted one which is
> much less intrusive.
> For example, if we can operate only on MAP_ANONYMOUS memory and specify
> that nonlinear mappings effectively mlock the pages, then we can get
> rid of all the objrmap and unmap_mapping_range handling, forget about
> the writeout and msync problems...

Anonymous-only would make it a doorstop for Oracle, since its entire
motive for using it is to window into objects larger than user virtual
address spaces (this likely also applies to UML, though they should
really chime in to confirm). Restrictions to tmpfs and/or ramfs would
likely be liveable, though I suspect some things might want to do it to
shm segments (I'll ask about that one). There's definitely no need for a
persistent backing store for the object to be remapped in Oracle's case,
in any event. It's largely the in-core destination and source of IO, not
something saved on-disk itself.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 10:32:22AM +0100, Peter Zijlstra wrote:
> On Wed, 2007-03-07 at 01:07 -0800, Andrew Morton wrote:
> > On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:
> > 
> > > > > Dirty page accounting doesn't work either on
> > > > > non-linear mappings
> > > > 
> > > > It doesn't?  Confused - these things don't have anything to do with each
> > > > other do they?
> > > 
> > > Look in page_mkclean().  Where does it handle non-linear mappings?
> > > 
> > 
> > OK, I'd forgotten about that.  It won't break dirty memory accounting,
> > but it'll potentially break dirty memory balancing.
> > 
> > If we have the wrong page (due to nonlinear), page_check_address() will
> > fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
> > algorithms and I guess it'll break the msync guarantees.
> > 
> > Peter, I thought we went through the nonlinear problem ages ago and decided
> > it was OK?
> 
> Can recollect as much, I modelled it after page_referenced() and can't
> find any VM_NONLINEAR specific code in there either.
> 
> Will have a hard look, but if its broken, then page_referenced if
> equally broken it seems, which would make page reclaim funny in the
> light of nonlinear mappings.

page_referenced is just an heuristic, and it ignores nonlinear mappings
and the page which will get filtered down to try_to_unmap.

Page reclaim is already "funny" for nonlinear mappings, page_referenced
is the least of its worries ;) It works, though.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 01:26:38AM -0800, Andrew Morton wrote:
> On Wed, 7 Mar 2007 10:18:23 +0100 Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > 
> > msync breakage is bad, but otherwise I don't know that we care about
> > dirty page writeout efficiency.
> 
> Well.  We made so many changes to support the synchronous
> dirty-the-page-when-we-dirty-the-pte thing that I'm rather doubtful that
> the old-style approach still works.  It might seem to, most of the time. 
> But if it _is_ subtly broken, boy it's going to take a long time for us to
> find out.

I can't think of anything that should have caused breakage (except for
the msync thing). We're still careful about not dropping pte dirty bits.

> > But I think we discovered that those msync changes are bogus anyway
> > becuase there is a small race window where pte could be dirtied without
> > page being set dirty?
> 
> Dunno, I don't recall that.  We dirty the page before the pte...

I don't think it isn't really that simple. There is a big comment in
clear_page_dirty_for_io.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 01:29:03 -0800 Bill Irwin <[EMAIL PROTECTED]> wrote:

> On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >> btw., if we decide that nonlinear isnt worth the continuing maintainance 
> >> pain, we could internally implement/emulate sys_remap_file_pages() via a 
> >> call to mremap() and essentially deprecate it, without breaking the ABI 
> >> - and remove all the nonlinear code. (This would split fremap areas into 
> >> separate vmas)
> 
> On Wed, Mar 07, 2007 at 12:35:20AM -0800, Andrew Morton wrote:
> > I'm rather regretting having merged it - I don't think it has been used for
> > much.
> > Paolo's UML speedup patches might use nonlinear though.
> 
> Guess what major real-life application not only uses nonlinear daily
> but would even be very happy to see it extended with non-vma-creating
> protections and more?

uh-oh.  SQL server?

> It's not terribly typical for things to be
> truncated while remap_file_pages() is doing its work, though it's been
> proposed as a method of dynamism. It won't stress remap_file_pages() vs.
> truncate() in any meaningful way, though, as userspace will be rather
> diligent about clearing in-use data out of the file offset range to be
> truncated away anyway, and all that via O_DIRECT.

The problem here isn't related to truncate or direct-IO.  It's just
plain-old MAP_SHARED.  nonlinear VMAs are now using the old-style
dirty-memory management.  msync() is basically a no-op and the code is
wildly tricky and pretty much untested.  The chances that we broke it are
considerable.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Bill Irwin <[EMAIL PROTECTED]> wrote:

> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> >> After these patches, I don't think there is too much burden. The main 
> >> thing left really is just the objrmap stuff, but that is just handled 
> >> with a minimal 'dumb' algorithm that doesn't cost much.
> 
> On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote:
> > ok. What do you think about the sys_remap_file_pages_prot() thing that 
> > Paolo has done in a nicely split up form - does that complicate things 
> > in any fundamental way? That is what is useful to UML.
> 
> Oracle would love it. You don't want to know how far back I've been 
> asked to backport that.

ok, cool! Then the first step would be for you to talk to Paolo and to 
pick up the patches, review them, nurse it in -mm, etc. Suffering in 
silence is just a pointless act of masochism, not an efficient 
upstream-merge tactic ;-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 01:07 -0800, Andrew Morton wrote:
> On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:
> 
> > > > Dirty page accounting doesn't work either on
> > > > non-linear mappings
> > > 
> > > It doesn't?  Confused - these things don't have anything to do with each
> > > other do they?
> > 
> > Look in page_mkclean().  Where does it handle non-linear mappings?
> > 
> 
> OK, I'd forgotten about that.  It won't break dirty memory accounting,
> but it'll potentially break dirty memory balancing.
> 
> If we have the wrong page (due to nonlinear), page_check_address() will
> fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
> algorithms and I guess it'll break the msync guarantees.
> 
> Peter, I thought we went through the nonlinear problem ages ago and decided
> it was OK?

Can recollect as much, I modelled it after page_referenced() and can't
find any VM_NONLINEAR specific code in there either.

Will have a hard look, but if its broken, then page_referenced if
equally broken it seems, which would make page reclaim funny in the
light of nonlinear mappings.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
* Nick Piggin <[EMAIL PROTECTED]> wrote:
>> After these patches, I don't think there is too much burden. The main 
>> thing left really is just the objrmap stuff, but that is just handled 
>> with a minimal 'dumb' algorithm that doesn't cost much.

On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote:
> ok. What do you think about the sys_remap_file_pages_prot() thing that 
> Paolo has done in a nicely split up form - does that complicate things 
> in any fundamental way? That is what is useful to UML.

Oracle would love it. You don't want to know how far back I've been
asked to backport that.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
> > But I think we discovered that those msync changes are bogus anyway
> > becuase there is a small race window where pte could be dirtied without
> > page being set dirty?
> 
> Dunno, I don't recall that.  We dirty the page before the pte...

That's the one I just submitted a fix for ;)

  http://lkml.org/lkml/2007/3/6/308

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote:
>> btw., if we decide that nonlinear isnt worth the continuing maintainance 
>> pain, we could internally implement/emulate sys_remap_file_pages() via a 
>> call to mremap() and essentially deprecate it, without breaking the ABI 
>> - and remove all the nonlinear code. (This would split fremap areas into 
>> separate vmas)

On Wed, Mar 07, 2007 at 12:35:20AM -0800, Andrew Morton wrote:
> I'm rather regretting having merged it - I don't think it has been used for
> much.
> Paolo's UML speedup patches might use nonlinear though.

Guess what major real-life application not only uses nonlinear daily
but would even be very happy to see it extended with non-vma-creating
protections and more? It's not terribly typical for things to be
truncated while remap_file_pages() is doing its work, though it's been
proposed as a method of dynamism. It won't stress remap_file_pages() vs.
truncate() in any meaningful way, though, as userspace will be rather
diligent about clearing in-use data out of the file offset range to be
truncated away anyway, and all that via O_DIRECT.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 10:18:23 +0100 Nick Piggin <[EMAIL PROTECTED]> wrote:

> On Wed, Mar 07, 2007 at 01:07:56AM -0800, Andrew Morton wrote:
> > On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:
> > 
> > > > > Dirty page accounting doesn't work either on
> > > > > non-linear mappings
> > > > 
> > > > It doesn't?  Confused - these things don't have anything to do with each
> > > > other do they?
> > > 
> > > Look in page_mkclean().  Where does it handle non-linear mappings?
> > > 
> > 
> > OK, I'd forgotten about that.  It won't break dirty memory accounting,
> > but it'll potentially break dirty memory balancing.
> > 
> > If we have the wrong page (due to nonlinear), page_check_address() will
> > fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
> > algorithms and I guess it'll break the msync guarantees.
> > 
> > Peter, I thought we went through the nonlinear problem ages ago and decided
> > it was OK?
> 
> msync breakage is bad, but otherwise I don't know that we care about
> dirty page writeout efficiency.

Well.  We made so many changes to support the synchronous
dirty-the-page-when-we-dirty-the-pte thing that I'm rather doubtful that
the old-style approach still works.  It might seem to, most of the time. 
But if it _is_ subtly broken, boy it's going to take a long time for us to
find out.

> But I think we discovered that those msync changes are bogus anyway
> becuase there is a small race window where pte could be dirtied without
> page being set dirty?

Dunno, I don't recall that.  We dirty the page before the pte...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 09:53:23AM +0100, Ingo Molnar wrote:
> 
> * Andrew Morton <[EMAIL PROTECTED]> wrote:
> 
> > > btw., if we decide that nonlinear isnt worth the continuing 
> > > maintainance pain, we could internally implement/emulate 
> > > sys_remap_file_pages() via a call to mremap() and essentially 
> > > deprecate it, without breaking the ABI - and remove all the 
> > > nonlinear code. (This would split fremap areas into separate vmas)
> > > 
> > 
> > I'm rather regretting having merged it - I don't think it has been 
> > used for much.
> > 
> > Paolo's UML speedup patches might use nonlinear though.
> 
> yes, i wrote the first, prototype version of that for UML, it needs an 
> extended version of the syscall, sys_remap_file_pages_prot():
> 
>  
> http://redhat.com/~mingo/remap-file-pages-patches/remap-file-pages-prot-2.6.4-rc1-mm1-A1
> 
> i also wrote an x86 hypervisor kind of thing for UML, called 
> 'sys_vcpu()', which allows UML to execute guest user-mode in a box, 
> which also relies on sys_remap_file_pages_prot():
> 
>  http://redhat.com/~mingo/remap-file-pages-patches/vcpu-2.6.4-rc2-mm1-A2
> 
> which reduced the UML guest syscall overhead from 30 usecs to 4 usecs 
> (with native syscalls taking 2 usecs, on the box i tested, years ago).
> 
> So it certainly looked useful to me - but wasnt really picked up widely. 
> 
> We'll always have the option to get rid of it (and hence completely 
> reverse the decision to merge it) without breaking the ABI, by emulating 
> the API via mremap(). That eliminates the UML speedup though. So no need 
> to feel sorry about having merged it, we can easily revisit that 
> years-old 'do we want it' decision, without any ABI worries.

Depending on whether anyone wants it, and what features they want, we
could emulate the old syscall, and make a new restricted one which is
much less intrusive.

For example, if we can operate only on MAP_ANONYMOUS memory and specify
that nonlinear mappings effectively mlock the pages, then we can get
rid of all the objrmap and unmap_mapping_range handling, forget about
the writeout and msync problems...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
> > 
> > Look in page_mkclean().  Where does it handle non-linear mappings?
> > 
> 
> OK, I'd forgotten about that.  It won't break dirty memory accounting,
> but it'll potentially break dirty memory balancing.
> 
> If we have the wrong page (due to nonlinear), page_check_address() will
> fail and we'll leave the pte dirty.

It won't even get that far, because it only looks at vmas on
mapping->i_mmap, and not on i_mmap_nonlinear.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> After these patches, I don't think there is too much burden. The main 
> thing left really is just the objrmap stuff, but that is just handled 
> with a minimal 'dumb' algorithm that doesn't cost much.

ok. What do you think about the sys_remap_file_pages_prot() thing that 
Paolo has done in a nicely split up form - does that complicate things 
in any fundamental way? That is what is useful to UML.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 01:07:56AM -0800, Andrew Morton wrote:
> On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:
> 
> > > > Dirty page accounting doesn't work either on
> > > > non-linear mappings
> > > 
> > > It doesn't?  Confused - these things don't have anything to do with each
> > > other do they?
> > 
> > Look in page_mkclean().  Where does it handle non-linear mappings?
> > 
> 
> OK, I'd forgotten about that.  It won't break dirty memory accounting,
> but it'll potentially break dirty memory balancing.
> 
> If we have the wrong page (due to nonlinear), page_check_address() will
> fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
> algorithms and I guess it'll break the msync guarantees.
> 
> Peter, I thought we went through the nonlinear problem ages ago and decided
> it was OK?

msync breakage is bad, but otherwise I don't know that we care about
dirty page writeout efficiency.

But I think we discovered that those msync changes are bogus anyway
becuase there is a small race window where pte could be dirtied without
page being set dirty?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 09:59:44AM +0100, Nick Piggin wrote:
> Apart from a handful of trivial if (pte_file()) cases throughout mm/,
> our maintainance burden basically now amounts to the following patch.
> Even the rmap.c change looks bigger than it is because I split out
> the nonlinear unmapping code from try_to_unmap_file. Not too bad, eh? :)

Oh, there is a bit more nonlinear mmap list manipulation I'd forgotten
about too... makes things a little bit worse, but not too much.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> > > Dirty page accounting doesn't work either on
> > > non-linear mappings
> > 
> > It doesn't?  Confused - these things don't have anything to do with each
> > other do they?
> 
> Look in page_mkclean().  Where does it handle non-linear mappings?
> 

OK, I'd forgotten about that.  It won't break dirty memory accounting,
but it'll potentially break dirty memory balancing.

If we have the wrong page (due to nonlinear), page_check_address() will
fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
algorithms and I guess it'll break the msync guarantees.

Peter, I thought we went through the nonlinear problem ages ago and decided
it was OK?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 09:27:55AM +0100, Ingo Molnar wrote:
> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
> > thought you would have liked the patches...
> 
> btw., if we decide that nonlinear isnt worth the continuing maintainance 
> pain, we could internally implement/emulate sys_remap_file_pages() via a 
> call to mremap() and essentially deprecate it, without breaking the ABI 
> - and remove all the nonlinear code. (This would split fremap areas into 
> separate vmas)

Well I think it has a few possible uses outside the PAE database
workloads. UML for one seem to be interested... as much as I don't
use them, I think nonlinear mappings are kinda cool ;)

After these patches, I don't think there is too much burden. The main
thing left really is just the objrmap stuff, but that is just handled
with a minimal 'dumb' algorithm that doesn't cost much.

Then the core of it is just the file pte handling, which really doesn't
seem to be much problem.

Apart from a handful of trivial if (pte_file()) cases throughout mm/,
our maintainance burden basically now amounts to the following patch.
Even the rmap.c change looks bigger than it is because I split out
the nonlinear unmapping code from try_to_unmap_file. Not too bad, eh? :)

--

 include/asm-powerpc/pgtable.h |   12 
 mm/Kconfig|6 ++
 mm/Makefile   |6 +-
 mm/rmap.c |  101 +-
 4 files changed, 83 insertions(+), 42 deletions(-)

Index: linux-2.6/include/asm-powerpc/pgtable.h
===
--- linux-2.6.orig/include/asm-powerpc/pgtable.h
+++ linux-2.6/include/asm-powerpc/pgtable.h
@@ -243,7 +243,12 @@ static inline int pte_write(pte_t pte) {
 static inline int pte_exec(pte_t pte)  { return pte_val(pte) & _PAGE_EXEC;}
 static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY;}
 static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED;}
+
+#ifdef CONFIG_NONLINEAR
 static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE;}
+#else
+static inline int pte_file(pte_t pte) { return 0; }
+#endif
 
 static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; }
 static inline void pte_cache(pte_t pte)   { pte_val(pte) &= ~_PAGE_NO_CACHE; }
@@ -483,9 +488,16 @@ extern void update_mmu_cache(struct vm_a
 #define __swp_entry(type, offset) ((swp_entry_t){((type)<< 1)|((offset)<<8)})
 #define __pte_to_swp_entry(pte)((swp_entry_t){pte_val(pte) >> 
PTE_RPN_SHIFT})
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val << PTE_RPN_SHIFT })
+
+#ifdef CONFIG_NONLINEAR
 #define pte_to_pgoff(pte)  (pte_val(pte) >> PTE_RPN_SHIFT)
 #define pgoff_to_pte(off)  ((pte_t) {((off) << PTE_RPN_SHIFT)|_PAGE_FILE})
 #define PTE_FILE_MAX_BITS  (BITS_PER_LONG - PTE_RPN_SHIFT)
+#else
+#define pte_to_pgoff(pte)  ({BUG(); -1;})
+#define pgoff_to_pte(off)  ({BUG(); (pte_t){-1};})
+#define PTE_FILE_MAX_BITS  0
+#endif
 
 /*
  * kern_addr_valid is intended to indicate whether an address is a valid
Index: linux-2.6/mm/Kconfig
===
--- linux-2.6.orig/mm/Kconfig
+++ linux-2.6/mm/Kconfig
@@ -142,6 +142,12 @@ config SPLIT_PTLOCK_CPUS
 #
 # support for page migration
 #
+config NONLINEAR
+   bool "Non linear mappings"
+   def_bool y
+   help
+ Provides support for the remap_file_pages syscall.
+
 config MIGRATION
bool "Page migration"
def_bool y
Index: linux-2.6/mm/Makefile
===
--- linux-2.6.orig/mm/Makefile
+++ linux-2.6/mm/Makefile
@@ -3,9 +3,8 @@
 #
 
 mmu-y  := nommu.o
-mmu-$(CONFIG_MMU)  := fremap.o highmem.o madvise.o memory.o mincore.o \
-  mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
-  vmalloc.o
+mmu-$(CONFIG_MMU)  := highmem.o madvise.o memory.o mincore.o mlock.o \
+  mmap.o mprotect.o mremap.o msync.o rmap.o vmalloc.o
 
 obj-y  := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
   page_alloc.o page-writeback.o pdflush.o \
@@ -27,5 +26,6 @@ obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_SLAB) += slab.o
 obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
+obj-$(CONFIG_NONLINEAR) += fremap.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -756,6 +756,7 @@ out:
return ret;
 }
 
+#ifdef CONFIG_NONLINEAR
 /*
  * objrmap doesn't work for nonlinear VMAs because the assumption that
  * offset-into-file correlates with offset-into-virtual-addresses does not 

Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
> > Dirty page accounting doesn't work either on
> > non-linear mappings
> 
> It doesn't?  Confused - these things don't have anything to do with each
> other do they?

Look in page_mkclean().  Where does it handle non-linear mappings?

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Andrew Morton <[EMAIL PROTECTED]> wrote:

> > btw., if we decide that nonlinear isnt worth the continuing 
> > maintainance pain, we could internally implement/emulate 
> > sys_remap_file_pages() via a call to mremap() and essentially 
> > deprecate it, without breaking the ABI - and remove all the 
> > nonlinear code. (This would split fremap areas into separate vmas)
> > 
> 
> I'm rather regretting having merged it - I don't think it has been 
> used for much.
> 
> Paolo's UML speedup patches might use nonlinear though.

yes, i wrote the first, prototype version of that for UML, it needs an 
extended version of the syscall, sys_remap_file_pages_prot():

 
http://redhat.com/~mingo/remap-file-pages-patches/remap-file-pages-prot-2.6.4-rc1-mm1-A1

i also wrote an x86 hypervisor kind of thing for UML, called 
'sys_vcpu()', which allows UML to execute guest user-mode in a box, 
which also relies on sys_remap_file_pages_prot():

 http://redhat.com/~mingo/remap-file-pages-patches/vcpu-2.6.4-rc2-mm1-A2

which reduced the UML guest syscall overhead from 30 usecs to 4 usecs 
(with native syscalls taking 2 usecs, on the box i tested, years ago).

So it certainly looked useful to me - but wasnt really picked up widely. 

We'll always have the option to get rid of it (and hence completely 
reverse the decision to merge it) without breaking the ABI, by emulating 
the API via mremap(). That eliminates the UML speedup though. So no need 
to feel sorry about having merged it, we can easily revisit that 
years-old 'do we want it' decision, without any ABI worries.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 07 Mar 2007 09:38:34 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:

> Dirty page accounting doesn't work either on
> non-linear mappings

It doesn't?  Confused - these things don't have anything to do with each
other do they?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
> > If it doesn't look very impressive, it could be because it leaves all 
> > the old crud around for backwards compatibility (the worst offenders 
> > are removed in patch 6/6).
> > 
> > If you look at the patchset as a whole, it removes about 250 lines, 
> > mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c 
> > fremap.c, that is nonlinear pages specific and doesn't get anywhere 
> > near the testing that the linear fault path does.
> > 
> > A minimal fix for nonlinear pages would have required changing all 
> > ->populate handlers, which I simply thought was not very productive 
> > considering the testing and coverage issues, and that I was going to 
> > rewrite the nonlinear path anyway.
> > 
> > If you like, you can consider patches 1,2,3 as the fix, and ignore 
> > nonlinear (hey, it doesn't even bother checking truncate_count 
> > today!).
> > 
> > Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
> > thought you would have liked the patches...
> 
> btw., if we decide that nonlinear isnt worth the continuing maintainance 
> pain, we could internally implement/emulate sys_remap_file_pages() via a 
> call to mremap() and essentially deprecate it, without breaking the ABI 
> - and remove all the nonlinear code. (This would split fremap areas into 
> separate vmas)

That would make sense.  Dirty page accounting doesn't work either on
non-linear mappings, and I can't see how that could be fixed in any
other way.

Miklos
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar <[EMAIL PROTECTED]> wrote:

> 
> * Nick Piggin <[EMAIL PROTECTED]> wrote:
> 
> > If it doesn't look very impressive, it could be because it leaves all 
> > the old crud around for backwards compatibility (the worst offenders 
> > are removed in patch 6/6).
> > 
> > If you look at the patchset as a whole, it removes about 250 lines, 
> > mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c 
> > fremap.c, that is nonlinear pages specific and doesn't get anywhere 
> > near the testing that the linear fault path does.
> > 
> > A minimal fix for nonlinear pages would have required changing all 
> > ->populate handlers, which I simply thought was not very productive 
> > considering the testing and coverage issues, and that I was going to 
> > rewrite the nonlinear path anyway.
> > 
> > If you like, you can consider patches 1,2,3 as the fix, and ignore 
> > nonlinear (hey, it doesn't even bother checking truncate_count 
> > today!).
> > 
> > Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
> > thought you would have liked the patches...
> 
> btw., if we decide that nonlinear isnt worth the continuing maintainance 
> pain, we could internally implement/emulate sys_remap_file_pages() via a 
> call to mremap() and essentially deprecate it, without breaking the ABI 
> - and remove all the nonlinear code. (This would split fremap areas into 
> separate vmas)
> 

I'm rather regretting having merged it - I don't think it has been used for
much.

Paolo's UML speedup patches might use nonlinear though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Nick Piggin <[EMAIL PROTECTED]> wrote:

> If it doesn't look very impressive, it could be because it leaves all 
> the old crud around for backwards compatibility (the worst offenders 
> are removed in patch 6/6).
> 
> If you look at the patchset as a whole, it removes about 250 lines, 
> mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c 
> fremap.c, that is nonlinear pages specific and doesn't get anywhere 
> near the testing that the linear fault path does.
> 
> A minimal fix for nonlinear pages would have required changing all 
> ->populate handlers, which I simply thought was not very productive 
> considering the testing and coverage issues, and that I was going to 
> rewrite the nonlinear path anyway.
> 
> If you like, you can consider patches 1,2,3 as the fix, and ignore 
> nonlinear (hey, it doesn't even bother checking truncate_count 
> today!).
> 
> Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
> thought you would have liked the patches...

btw., if we decide that nonlinear isnt worth the continuing maintainance 
pain, we could internally implement/emulate sys_remap_file_pages() via a 
call to mremap() and essentially deprecate it, without breaking the ABI 
- and remove all the nonlinear code. (This would split fremap areas into 
separate vmas)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 08:08:53AM +0100, Nick Piggin wrote:
> On Tue, Mar 06, 2007 at 10:51:01PM -0800, Andrew Morton wrote:
> 
> > This patch seems to churn things around an awful lot for minimal benefit.
> 
> Well it fixes the whole design of the nonlinear fault path.

If it doesn't look very impressive, it could be because it leaves all
the old crud around for backwards compatibility (the worst offenders
are removed in patch 6/6).

If you look at the patchset as a whole, it removes about 250 lines,
mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c
fremap.c, that is nonlinear pages specific and doesn't get anywhere
near the testing that the linear fault path does.

A minimal fix for nonlinear pages would have required changing all
->populate handlers, which I simply thought was not very productive
considering the testing and coverage issues, and that I was going to
rewrite the nonlinear path anyway.

If you like, you can consider patches 1,2,3 as the fix, and ignore
nonlinear (hey, it doesn't even bother checking truncate_count today!).

Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I thought
you would have liked the patches...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 08:08:53AM +0100, Nick Piggin wrote:
 On Tue, Mar 06, 2007 at 10:51:01PM -0800, Andrew Morton wrote:
 
  This patch seems to churn things around an awful lot for minimal benefit.
 
 Well it fixes the whole design of the nonlinear fault path.

If it doesn't look very impressive, it could be because it leaves all
the old crud around for backwards compatibility (the worst offenders
are removed in patch 6/6).

If you look at the patchset as a whole, it removes about 250 lines,
mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c
fremap.c, that is nonlinear pages specific and doesn't get anywhere
near the testing that the linear fault path does.

A minimal fix for nonlinear pages would have required changing all
-populate handlers, which I simply thought was not very productive
considering the testing and coverage issues, and that I was going to
rewrite the nonlinear path anyway.

If you like, you can consider patches 1,2,3 as the fix, and ignore
nonlinear (hey, it doesn't even bother checking truncate_count today!).

Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I thought
you would have liked the patches...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 If it doesn't look very impressive, it could be because it leaves all 
 the old crud around for backwards compatibility (the worst offenders 
 are removed in patch 6/6).
 
 If you look at the patchset as a whole, it removes about 250 lines, 
 mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c 
 fremap.c, that is nonlinear pages specific and doesn't get anywhere 
 near the testing that the linear fault path does.
 
 A minimal fix for nonlinear pages would have required changing all 
 -populate handlers, which I simply thought was not very productive 
 considering the testing and coverage issues, and that I was going to 
 rewrite the nonlinear path anyway.
 
 If you like, you can consider patches 1,2,3 as the fix, and ignore 
 nonlinear (hey, it doesn't even bother checking truncate_count 
 today!).
 
 Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
 thought you would have liked the patches...

btw., if we decide that nonlinear isnt worth the continuing maintainance 
pain, we could internally implement/emulate sys_remap_file_pages() via a 
call to mremap() and essentially deprecate it, without breaking the ABI 
- and remove all the nonlinear code. (This would split fremap areas into 
separate vmas)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar [EMAIL PROTECTED] wrote:

 
 * Nick Piggin [EMAIL PROTECTED] wrote:
 
  If it doesn't look very impressive, it could be because it leaves all 
  the old crud around for backwards compatibility (the worst offenders 
  are removed in patch 6/6).
  
  If you look at the patchset as a whole, it removes about 250 lines, 
  mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c 
  fremap.c, that is nonlinear pages specific and doesn't get anywhere 
  near the testing that the linear fault path does.
  
  A minimal fix for nonlinear pages would have required changing all 
  -populate handlers, which I simply thought was not very productive 
  considering the testing and coverage issues, and that I was going to 
  rewrite the nonlinear path anyway.
  
  If you like, you can consider patches 1,2,3 as the fix, and ignore 
  nonlinear (hey, it doesn't even bother checking truncate_count 
  today!).
  
  Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
  thought you would have liked the patches...
 
 btw., if we decide that nonlinear isnt worth the continuing maintainance 
 pain, we could internally implement/emulate sys_remap_file_pages() via a 
 call to mremap() and essentially deprecate it, without breaking the ABI 
 - and remove all the nonlinear code. (This would split fremap areas into 
 separate vmas)
 

I'm rather regretting having merged it - I don't think it has been used for
much.

Paolo's UML speedup patches might use nonlinear though.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
  If it doesn't look very impressive, it could be because it leaves all 
  the old crud around for backwards compatibility (the worst offenders 
  are removed in patch 6/6).
  
  If you look at the patchset as a whole, it removes about 250 lines, 
  mostly of (non trivial) duplicated code in filemap.c memory.c shmem.c 
  fremap.c, that is nonlinear pages specific and doesn't get anywhere 
  near the testing that the linear fault path does.
  
  A minimal fix for nonlinear pages would have required changing all 
  -populate handlers, which I simply thought was not very productive 
  considering the testing and coverage issues, and that I was going to 
  rewrite the nonlinear path anyway.
  
  If you like, you can consider patches 1,2,3 as the fix, and ignore 
  nonlinear (hey, it doesn't even bother checking truncate_count 
  today!).
  
  Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
  thought you would have liked the patches...
 
 btw., if we decide that nonlinear isnt worth the continuing maintainance 
 pain, we could internally implement/emulate sys_remap_file_pages() via a 
 call to mremap() and essentially deprecate it, without breaking the ABI 
 - and remove all the nonlinear code. (This would split fremap areas into 
 separate vmas)

That would make sense.  Dirty page accounting doesn't work either on
non-linear mappings, and I can't see how that could be fixed in any
other way.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 07 Mar 2007 09:38:34 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:

 Dirty page accounting doesn't work either on
 non-linear mappings

It doesn't?  Confused - these things don't have anything to do with each
other do they?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
  Dirty page accounting doesn't work either on
  non-linear mappings
 
 It doesn't?  Confused - these things don't have anything to do with each
 other do they?

Look in page_mkclean().  Where does it handle non-linear mappings?

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Andrew Morton [EMAIL PROTECTED] wrote:

  btw., if we decide that nonlinear isnt worth the continuing 
  maintainance pain, we could internally implement/emulate 
  sys_remap_file_pages() via a call to mremap() and essentially 
  deprecate it, without breaking the ABI - and remove all the 
  nonlinear code. (This would split fremap areas into separate vmas)
  
 
 I'm rather regretting having merged it - I don't think it has been 
 used for much.
 
 Paolo's UML speedup patches might use nonlinear though.

yes, i wrote the first, prototype version of that for UML, it needs an 
extended version of the syscall, sys_remap_file_pages_prot():

 
http://redhat.com/~mingo/remap-file-pages-patches/remap-file-pages-prot-2.6.4-rc1-mm1-A1

i also wrote an x86 hypervisor kind of thing for UML, called 
'sys_vcpu()', which allows UML to execute guest user-mode in a box, 
which also relies on sys_remap_file_pages_prot():

 http://redhat.com/~mingo/remap-file-pages-patches/vcpu-2.6.4-rc2-mm1-A2

which reduced the UML guest syscall overhead from 30 usecs to 4 usecs 
(with native syscalls taking 2 usecs, on the box i tested, years ago).

So it certainly looked useful to me - but wasnt really picked up widely. 

We'll always have the option to get rid of it (and hence completely 
reverse the decision to merge it) without breaking the ABI, by emulating 
the API via mremap(). That eliminates the UML speedup though. So no need 
to feel sorry about having merged it, we can easily revisit that 
years-old 'do we want it' decision, without any ABI worries.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 09:27:55AM +0100, Ingo Molnar wrote:
 
 * Nick Piggin [EMAIL PROTECTED] wrote:
 
  Then 4,5,6 is the fault/nonlinear rewrite, take it or leave it. I 
  thought you would have liked the patches...
 
 btw., if we decide that nonlinear isnt worth the continuing maintainance 
 pain, we could internally implement/emulate sys_remap_file_pages() via a 
 call to mremap() and essentially deprecate it, without breaking the ABI 
 - and remove all the nonlinear code. (This would split fremap areas into 
 separate vmas)

Well I think it has a few possible uses outside the PAE database
workloads. UML for one seem to be interested... as much as I don't
use them, I think nonlinear mappings are kinda cool ;)

After these patches, I don't think there is too much burden. The main
thing left really is just the objrmap stuff, but that is just handled
with a minimal 'dumb' algorithm that doesn't cost much.

Then the core of it is just the file pte handling, which really doesn't
seem to be much problem.

Apart from a handful of trivial if (pte_file()) cases throughout mm/,
our maintainance burden basically now amounts to the following patch.
Even the rmap.c change looks bigger than it is because I split out
the nonlinear unmapping code from try_to_unmap_file. Not too bad, eh? :)

--

 include/asm-powerpc/pgtable.h |   12 
 mm/Kconfig|6 ++
 mm/Makefile   |6 +-
 mm/rmap.c |  101 +-
 4 files changed, 83 insertions(+), 42 deletions(-)

Index: linux-2.6/include/asm-powerpc/pgtable.h
===
--- linux-2.6.orig/include/asm-powerpc/pgtable.h
+++ linux-2.6/include/asm-powerpc/pgtable.h
@@ -243,7 +243,12 @@ static inline int pte_write(pte_t pte) {
 static inline int pte_exec(pte_t pte)  { return pte_val(pte)  _PAGE_EXEC;}
 static inline int pte_dirty(pte_t pte) { return pte_val(pte)  _PAGE_DIRTY;}
 static inline int pte_young(pte_t pte) { return pte_val(pte)  _PAGE_ACCESSED;}
+
+#ifdef CONFIG_NONLINEAR
 static inline int pte_file(pte_t pte) { return pte_val(pte)  _PAGE_FILE;}
+#else
+static inline int pte_file(pte_t pte) { return 0; }
+#endif
 
 static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; }
 static inline void pte_cache(pte_t pte)   { pte_val(pte) = ~_PAGE_NO_CACHE; }
@@ -483,9 +488,16 @@ extern void update_mmu_cache(struct vm_a
 #define __swp_entry(type, offset) ((swp_entry_t){((type) 1)|((offset)8)})
 #define __pte_to_swp_entry(pte)((swp_entry_t){pte_val(pte)  
PTE_RPN_SHIFT})
 #define __swp_entry_to_pte(x)  ((pte_t) { (x).val  PTE_RPN_SHIFT })
+
+#ifdef CONFIG_NONLINEAR
 #define pte_to_pgoff(pte)  (pte_val(pte)  PTE_RPN_SHIFT)
 #define pgoff_to_pte(off)  ((pte_t) {((off)  PTE_RPN_SHIFT)|_PAGE_FILE})
 #define PTE_FILE_MAX_BITS  (BITS_PER_LONG - PTE_RPN_SHIFT)
+#else
+#define pte_to_pgoff(pte)  ({BUG(); -1;})
+#define pgoff_to_pte(off)  ({BUG(); (pte_t){-1};})
+#define PTE_FILE_MAX_BITS  0
+#endif
 
 /*
  * kern_addr_valid is intended to indicate whether an address is a valid
Index: linux-2.6/mm/Kconfig
===
--- linux-2.6.orig/mm/Kconfig
+++ linux-2.6/mm/Kconfig
@@ -142,6 +142,12 @@ config SPLIT_PTLOCK_CPUS
 #
 # support for page migration
 #
+config NONLINEAR
+   bool Non linear mappings
+   def_bool y
+   help
+ Provides support for the remap_file_pages syscall.
+
 config MIGRATION
bool Page migration
def_bool y
Index: linux-2.6/mm/Makefile
===
--- linux-2.6.orig/mm/Makefile
+++ linux-2.6/mm/Makefile
@@ -3,9 +3,8 @@
 #
 
 mmu-y  := nommu.o
-mmu-$(CONFIG_MMU)  := fremap.o highmem.o madvise.o memory.o mincore.o \
-  mlock.o mmap.o mprotect.o mremap.o msync.o rmap.o \
-  vmalloc.o
+mmu-$(CONFIG_MMU)  := highmem.o madvise.o memory.o mincore.o mlock.o \
+  mmap.o mprotect.o mremap.o msync.o rmap.o vmalloc.o
 
 obj-y  := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
   page_alloc.o page-writeback.o pdflush.o \
@@ -27,5 +26,6 @@ obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_SLAB) += slab.o
 obj-$(CONFIG_MEMORY_HOTPLUG) += memory_hotplug.o
 obj-$(CONFIG_FS_XIP) += filemap_xip.o
+obj-$(CONFIG_NONLINEAR) += fremap.o
 obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
Index: linux-2.6/mm/rmap.c
===
--- linux-2.6.orig/mm/rmap.c
+++ linux-2.6/mm/rmap.c
@@ -756,6 +756,7 @@ out:
return ret;
 }
 
+#ifdef CONFIG_NONLINEAR
 /*
  * objrmap doesn't work for nonlinear VMAs because the assumption that
  * offset-into-file correlates with offset-into-virtual-addresses does not 
hold.
@@ -845,53 +846,18 @@ static void 

Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:

   Dirty page accounting doesn't work either on
   non-linear mappings
  
  It doesn't?  Confused - these things don't have anything to do with each
  other do they?
 
 Look in page_mkclean().  Where does it handle non-linear mappings?
 

OK, I'd forgotten about that.  It won't break dirty memory accounting,
but it'll potentially break dirty memory balancing.

If we have the wrong page (due to nonlinear), page_check_address() will
fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
algorithms and I guess it'll break the msync guarantees.

Peter, I thought we went through the nonlinear problem ages ago and decided
it was OK?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 09:59:44AM +0100, Nick Piggin wrote:
 Apart from a handful of trivial if (pte_file()) cases throughout mm/,
 our maintainance burden basically now amounts to the following patch.
 Even the rmap.c change looks bigger than it is because I split out
 the nonlinear unmapping code from try_to_unmap_file. Not too bad, eh? :)

Oh, there is a bit more nonlinear mmap list manipulation I'd forgotten
about too... makes things a little bit worse, but not too much.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 01:07:56AM -0800, Andrew Morton wrote:
 On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:
 
Dirty page accounting doesn't work either on
non-linear mappings
   
   It doesn't?  Confused - these things don't have anything to do with each
   other do they?
  
  Look in page_mkclean().  Where does it handle non-linear mappings?
  
 
 OK, I'd forgotten about that.  It won't break dirty memory accounting,
 but it'll potentially break dirty memory balancing.
 
 If we have the wrong page (due to nonlinear), page_check_address() will
 fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
 algorithms and I guess it'll break the msync guarantees.
 
 Peter, I thought we went through the nonlinear problem ages ago and decided
 it was OK?

msync breakage is bad, but otherwise I don't know that we care about
dirty page writeout efficiency.

But I think we discovered that those msync changes are bogus anyway
becuase there is a small race window where pte could be dirtied without
page being set dirty?


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Nick Piggin [EMAIL PROTECTED] wrote:

 After these patches, I don't think there is too much burden. The main 
 thing left really is just the objrmap stuff, but that is just handled 
 with a minimal 'dumb' algorithm that doesn't cost much.

ok. What do you think about the sys_remap_file_pages_prot() thing that 
Paolo has done in a nicely split up form - does that complicate things 
in any fundamental way? That is what is useful to UML.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
  
  Look in page_mkclean().  Where does it handle non-linear mappings?
  
 
 OK, I'd forgotten about that.  It won't break dirty memory accounting,
 but it'll potentially break dirty memory balancing.
 
 If we have the wrong page (due to nonlinear), page_check_address() will
 fail and we'll leave the pte dirty.

It won't even get that far, because it only looks at vmas on
mapping-i_mmap, and not on i_mmap_nonlinear.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 10:18:23 +0100 Nick Piggin [EMAIL PROTECTED] wrote:

 On Wed, Mar 07, 2007 at 01:07:56AM -0800, Andrew Morton wrote:
  On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:
  
 Dirty page accounting doesn't work either on
 non-linear mappings

It doesn't?  Confused - these things don't have anything to do with each
other do they?
   
   Look in page_mkclean().  Where does it handle non-linear mappings?
   
  
  OK, I'd forgotten about that.  It won't break dirty memory accounting,
  but it'll potentially break dirty memory balancing.
  
  If we have the wrong page (due to nonlinear), page_check_address() will
  fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
  algorithms and I guess it'll break the msync guarantees.
  
  Peter, I thought we went through the nonlinear problem ages ago and decided
  it was OK?
 
 msync breakage is bad, but otherwise I don't know that we care about
 dirty page writeout efficiency.

Well.  We made so many changes to support the synchronous
dirty-the-page-when-we-dirty-the-pte thing that I'm rather doubtful that
the old-style approach still works.  It might seem to, most of the time. 
But if it _is_ subtly broken, boy it's going to take a long time for us to
find out.

 But I think we discovered that those msync changes are bogus anyway
 becuase there is a small race window where pte could be dirtied without
 page being set dirty?

Dunno, I don't recall that.  We dirty the page before the pte...
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 09:53:23AM +0100, Ingo Molnar wrote:
 
 * Andrew Morton [EMAIL PROTECTED] wrote:
 
   btw., if we decide that nonlinear isnt worth the continuing 
   maintainance pain, we could internally implement/emulate 
   sys_remap_file_pages() via a call to mremap() and essentially 
   deprecate it, without breaking the ABI - and remove all the 
   nonlinear code. (This would split fremap areas into separate vmas)
   
  
  I'm rather regretting having merged it - I don't think it has been 
  used for much.
  
  Paolo's UML speedup patches might use nonlinear though.
 
 yes, i wrote the first, prototype version of that for UML, it needs an 
 extended version of the syscall, sys_remap_file_pages_prot():
 
  
 http://redhat.com/~mingo/remap-file-pages-patches/remap-file-pages-prot-2.6.4-rc1-mm1-A1
 
 i also wrote an x86 hypervisor kind of thing for UML, called 
 'sys_vcpu()', which allows UML to execute guest user-mode in a box, 
 which also relies on sys_remap_file_pages_prot():
 
  http://redhat.com/~mingo/remap-file-pages-patches/vcpu-2.6.4-rc2-mm1-A2
 
 which reduced the UML guest syscall overhead from 30 usecs to 4 usecs 
 (with native syscalls taking 2 usecs, on the box i tested, years ago).
 
 So it certainly looked useful to me - but wasnt really picked up widely. 
 
 We'll always have the option to get rid of it (and hence completely 
 reverse the decision to merge it) without breaking the ABI, by emulating 
 the API via mremap(). That eliminates the UML speedup though. So no need 
 to feel sorry about having merged it, we can easily revisit that 
 years-old 'do we want it' decision, without any ABI worries.

Depending on whether anyone wants it, and what features they want, we
could emulate the old syscall, and make a new restricted one which is
much less intrusive.

For example, if we can operate only on MAP_ANONYMOUS memory and specify
that nonlinear mappings effectively mlock the pages, then we can get
rid of all the objrmap and unmap_mapping_range handling, forget about
the writeout and msync problems...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Miklos Szeredi
  But I think we discovered that those msync changes are bogus anyway
  becuase there is a small race window where pte could be dirtied without
  page being set dirty?
 
 Dunno, I don't recall that.  We dirty the page before the pte...

That's the one I just submitted a fix for ;)

  http://lkml.org/lkml/2007/3/6/308

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar [EMAIL PROTECTED] wrote:
 btw., if we decide that nonlinear isnt worth the continuing maintainance 
 pain, we could internally implement/emulate sys_remap_file_pages() via a 
 call to mremap() and essentially deprecate it, without breaking the ABI 
 - and remove all the nonlinear code. (This would split fremap areas into 
 separate vmas)

On Wed, Mar 07, 2007 at 12:35:20AM -0800, Andrew Morton wrote:
 I'm rather regretting having merged it - I don't think it has been used for
 much.
 Paolo's UML speedup patches might use nonlinear though.

Guess what major real-life application not only uses nonlinear daily
but would even be very happy to see it extended with non-vma-creating
protections and more? It's not terribly typical for things to be
truncated while remap_file_pages() is doing its work, though it's been
proposed as a method of dynamism. It won't stress remap_file_pages() vs.
truncate() in any meaningful way, though, as userspace will be rather
diligent about clearing in-use data out of the file offset range to be
truncated away anyway, and all that via O_DIRECT.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
* Nick Piggin [EMAIL PROTECTED] wrote:
 After these patches, I don't think there is too much burden. The main 
 thing left really is just the objrmap stuff, but that is just handled 
 with a minimal 'dumb' algorithm that doesn't cost much.

On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote:
 ok. What do you think about the sys_remap_file_pages_prot() thing that 
 Paolo has done in a nicely split up form - does that complicate things 
 in any fundamental way? That is what is useful to UML.

Oracle would love it. You don't want to know how far back I've been
asked to backport that.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Peter Zijlstra
On Wed, 2007-03-07 at 01:07 -0800, Andrew Morton wrote:
 On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:
 
Dirty page accounting doesn't work either on
non-linear mappings
   
   It doesn't?  Confused - these things don't have anything to do with each
   other do they?
  
  Look in page_mkclean().  Where does it handle non-linear mappings?
  
 
 OK, I'd forgotten about that.  It won't break dirty memory accounting,
 but it'll potentially break dirty memory balancing.
 
 If we have the wrong page (due to nonlinear), page_check_address() will
 fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
 algorithms and I guess it'll break the msync guarantees.
 
 Peter, I thought we went through the nonlinear problem ages ago and decided
 it was OK?

Can recollect as much, I modelled it after page_referenced() and can't
find any VM_NONLINEAR specific code in there either.

Will have a hard look, but if its broken, then page_referenced if
equally broken it seems, which would make page reclaim funny in the
light of nonlinear mappings.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Ingo Molnar

* Bill Irwin [EMAIL PROTECTED] wrote:

 * Nick Piggin [EMAIL PROTECTED] wrote:
  After these patches, I don't think there is too much burden. The main 
  thing left really is just the objrmap stuff, but that is just handled 
  with a minimal 'dumb' algorithm that doesn't cost much.
 
 On Wed, Mar 07, 2007 at 10:22:52AM +0100, Ingo Molnar wrote:
  ok. What do you think about the sys_remap_file_pages_prot() thing that 
  Paolo has done in a nicely split up form - does that complicate things 
  in any fundamental way? That is what is useful to UML.
 
 Oracle would love it. You don't want to know how far back I've been 
 asked to backport that.

ok, cool! Then the first step would be for you to talk to Paolo and to 
pick up the patches, review them, nurse it in -mm, etc. Suffering in 
silence is just a pointless act of masochism, not an efficient 
upstream-merge tactic ;-)

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 01:26:38AM -0800, Andrew Morton wrote:
 On Wed, 7 Mar 2007 10:18:23 +0100 Nick Piggin [EMAIL PROTECTED] wrote:
 
  
  msync breakage is bad, but otherwise I don't know that we care about
  dirty page writeout efficiency.
 
 Well.  We made so many changes to support the synchronous
 dirty-the-page-when-we-dirty-the-pte thing that I'm rather doubtful that
 the old-style approach still works.  It might seem to, most of the time. 
 But if it _is_ subtly broken, boy it's going to take a long time for us to
 find out.

I can't think of anything that should have caused breakage (except for
the msync thing). We're still careful about not dropping pte dirty bits.

  But I think we discovered that those msync changes are bogus anyway
  becuase there is a small race window where pte could be dirtied without
  page being set dirty?
 
 Dunno, I don't recall that.  We dirty the page before the pte...

I don't think it isn't really that simple. There is a big comment in
clear_page_dirty_for_io.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Andrew Morton
On Wed, 7 Mar 2007 01:29:03 -0800 Bill Irwin [EMAIL PROTECTED] wrote:

 On Wed, 7 Mar 2007 09:27:55 +0100 Ingo Molnar [EMAIL PROTECTED] wrote:
  btw., if we decide that nonlinear isnt worth the continuing maintainance 
  pain, we could internally implement/emulate sys_remap_file_pages() via a 
  call to mremap() and essentially deprecate it, without breaking the ABI 
  - and remove all the nonlinear code. (This would split fremap areas into 
  separate vmas)
 
 On Wed, Mar 07, 2007 at 12:35:20AM -0800, Andrew Morton wrote:
  I'm rather regretting having merged it - I don't think it has been used for
  much.
  Paolo's UML speedup patches might use nonlinear though.
 
 Guess what major real-life application not only uses nonlinear daily
 but would even be very happy to see it extended with non-vma-creating
 protections and more?

uh-oh.  SQL server?

 It's not terribly typical for things to be
 truncated while remap_file_pages() is doing its work, though it's been
 proposed as a method of dynamism. It won't stress remap_file_pages() vs.
 truncate() in any meaningful way, though, as userspace will be rather
 diligent about clearing in-use data out of the file offset range to be
 truncated away anyway, and all that via O_DIRECT.

The problem here isn't related to truncate or direct-IO.  It's just
plain-old MAP_SHARED.  nonlinear VMAs are now using the old-style
dirty-memory management.  msync() is basically a no-op and the code is
wildly tricky and pretty much untested.  The chances that we broke it are
considerable.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Bill Irwin
On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
 Depending on whether anyone wants it, and what features they want, we
 could emulate the old syscall, and make a new restricted one which is
 much less intrusive.
 For example, if we can operate only on MAP_ANONYMOUS memory and specify
 that nonlinear mappings effectively mlock the pages, then we can get
 rid of all the objrmap and unmap_mapping_range handling, forget about
 the writeout and msync problems...

Anonymous-only would make it a doorstop for Oracle, since its entire
motive for using it is to window into objects larger than user virtual
address spaces (this likely also applies to UML, though they should
really chime in to confirm). Restrictions to tmpfs and/or ramfs would
likely be liveable, though I suspect some things might want to do it to
shm segments (I'll ask about that one). There's definitely no need for a
persistent backing store for the object to be remapped in Oracle's case,
in any event. It's largely the in-core destination and source of IO, not
something saved on-disk itself.


-- wli
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 10:32:22AM +0100, Peter Zijlstra wrote:
 On Wed, 2007-03-07 at 01:07 -0800, Andrew Morton wrote:
  On Wed, 07 Mar 2007 09:51:57 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote:
  
 Dirty page accounting doesn't work either on
 non-linear mappings

It doesn't?  Confused - these things don't have anything to do with each
other do they?
   
   Look in page_mkclean().  Where does it handle non-linear mappings?
   
  
  OK, I'd forgotten about that.  It won't break dirty memory accounting,
  but it'll potentially break dirty memory balancing.
  
  If we have the wrong page (due to nonlinear), page_check_address() will
  fail and we'll leave the pte dirty.  That puts us back to the pre-2.6.17
  algorithms and I guess it'll break the msync guarantees.
  
  Peter, I thought we went through the nonlinear problem ages ago and decided
  it was OK?
 
 Can recollect as much, I modelled it after page_referenced() and can't
 find any VM_NONLINEAR specific code in there either.
 
 Will have a hard look, but if its broken, then page_referenced if
 equally broken it seems, which would make page reclaim funny in the
 light of nonlinear mappings.

page_referenced is just an heuristic, and it ignores nonlinear mappings
and the page which will get filtered down to try_to_unmap.

Page reclaim is already funny for nonlinear mappings, page_referenced
is the least of its worries ;) It works, though.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 4/6] mm: merge populate and nopage into fault (fixes nonlinear)

2007-03-07 Thread Nick Piggin
On Wed, Mar 07, 2007 at 01:44:20AM -0800, Bill Irwin wrote:
 On Wed, Mar 07, 2007 at 10:28:21AM +0100, Nick Piggin wrote:
  Depending on whether anyone wants it, and what features they want, we
  could emulate the old syscall, and make a new restricted one which is
  much less intrusive.
  For example, if we can operate only on MAP_ANONYMOUS memory and specify
  that nonlinear mappings effectively mlock the pages, then we can get
  rid of all the objrmap and unmap_mapping_range handling, forget about
  the writeout and msync problems...
 
 Anonymous-only would make it a doorstop for Oracle, since its entire
 motive for using it is to window into objects larger than user virtual

Uh, duh yes I don't mean MAP_ANONYMOUS, I was just thinking of the shmem
inode that sits behind MAP_ANONYMOUS|MAP_SHARED. Of course if you don't
have a file descriptor to get a pgoff, then remap_file_pages is a doorstop
for everyone ;)

 address spaces (this likely also applies to UML, though they should
 really chime in to confirm). Restrictions to tmpfs and/or ramfs would
 likely be liveable, though I suspect some things might want to do it to
 shm segments (I'll ask about that one). There's definitely no need for a
 persistent backing store for the object to be remapped in Oracle's case,
 in any event. It's largely the in-core destination and source of IO, not
 something saved on-disk itself.

Yeah, tmpfs/shm segs are what I was thinking about. If UML can live with
that as well, then I think it might be a good option.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >