Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-06 Thread Ingo Molnar
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > i've attached an updated version of trace-it.c, which will turn this > > off itself, using a sysctl. I also made WAKEUP_TIMING default-off. > > ok. http://userweb.kernel.org/~akpm/to-ingo.txt is the trace of > > taskset -c 0

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-06 Thread Andrew Morton
On Fri, 6 Apr 2007 11:08:22 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote: > * Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > getting a good trace of it is easy: pick up the latest -rt kernel > > > from: > > > > > > http://redhat.com/~mingo/realtime-preempt/ > > > > > > enable EVENT_TRACING

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-06 Thread Ingo Molnar
* Andrew Morton <[EMAIL PROTECTED]> wrote: > > getting a good trace of it is easy: pick up the latest -rt kernel > > from: > > > > http://redhat.com/~mingo/realtime-preempt/ > > > > enable EVENT_TRACING in that kernel, run the workload and do: > > > > scripts/trace-it > to-ingo.txt >

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-06 Thread Ingo Molnar
* Andrew Morton [EMAIL PROTECTED] wrote: getting a good trace of it is easy: pick up the latest -rt kernel from: http://redhat.com/~mingo/realtime-preempt/ enable EVENT_TRACING in that kernel, run the workload and do: scripts/trace-it to-ingo.txt and send me the

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-06 Thread Andrew Morton
On Fri, 6 Apr 2007 11:08:22 +0200 Ingo Molnar [EMAIL PROTECTED] wrote: * Andrew Morton [EMAIL PROTECTED] wrote: getting a good trace of it is easy: pick up the latest -rt kernel from: http://redhat.com/~mingo/realtime-preempt/ enable EVENT_TRACING in that kernel, run the

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-06 Thread Ingo Molnar
* Andrew Morton [EMAIL PROTECTED] wrote: i've attached an updated version of trace-it.c, which will turn this off itself, using a sysctl. I also made WAKEUP_TIMING default-off. ok. http://userweb.kernel.org/~akpm/to-ingo.txt is the trace of taskset -c 0 ./jakubs-test-app

Re: missing madvise functionality

2007-04-05 Thread Nick Piggin
Ulrich Drepper wrote: Nick Piggin wrote: Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's kernels using down_write(mmap_sem) for MADV_DONTNEED is better than mmap/mprotect, which have more fundamental locking requirements, more overhead and no benefits (except debugging,

Re: missing madvise functionality

2007-04-05 Thread Ulrich Drepper
Nick Piggin wrote: > Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's > kernels using down_write(mmap_sem) for MADV_DONTNEED is better than > mmap/mprotect, which have more fundamental locking requirements, more > overhead and no benefits (except debugging, I suppose). It's

Re: missing madvise functionality

2007-04-05 Thread Nick Piggin
Ulrich Drepper wrote: In case somebody wants to play around with Rik patch or another madvise-based patch, I have x86-64 glibc binaries which can use it: http://people.redhat.com/drepper/rpms These are based on the latest Fedora rawhide version. They should work on older systems, too, but

Re: missing madvise functionality

2007-04-05 Thread Nick Piggin
Rik van Riel wrote: Nick Piggin wrote: Oh, also: something like this patch would help out MADV_DONTNEED, as it means it can run concurrently with page faults. I think the locking will work (but needs forward porting). Ironically, your patch decreases throughput on my quad core test system,

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Andrew Morton wrote: #if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS I wonder which way you're using, and whether using the other way changes things. I'm using the default Fedora config file, which has NR_CPUS defined to 64 and CONFIG_SPLIT_PTLOCK_CPUS to 4, so I am using the split locks.

Re: missing madvise functionality

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 14:38:30 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Nick Piggin wrote: > > > Oh, also: something like this patch would help out MADV_DONTNEED, as it > > means it can run concurrently with page faults. I think the locking will > > work (but needs forward porting). > >

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread Andrew Morton
On Thu, 5 Apr 2007 21:11:29 +0200 Ingo Molnar <[EMAIL PROTECTED]> wrote: > > * David Howells <[EMAIL PROTECTED]> wrote: > > > But short of recording the lock sequence, I don't think there's anyway > > to find out for sure. printk probably won't cut it as a recording > > mechanism because its

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 13:48:58 +0100 David Howells <[EMAIL PROTECTED]> wrote: > Andrew Morton <[EMAIL PROTECTED]> wrote: > > > > > What we effectively have is 32 threads on a single CPU all doing > > > > for (ever) { > > down_write() > > up_write() > >

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread Ingo Molnar
* David Howells <[EMAIL PROTECTED]> wrote: > But short of recording the lock sequence, I don't think there's anyway > to find out for sure. printk probably won't cut it as a recording > mechanism because its overheads are too great. getting a good trace of it is easy: pick up the latest -rt

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Nick Piggin wrote: Oh, also: something like this patch would help out MADV_DONTNEED, as it means it can run concurrently with page faults. I think the locking will work (but needs forward porting). Ironically, your patch decreases throughput on my quad core test system, with Jakub's test

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Jakub Jelinek wrote: + /* FIXME: POSIX says that MADV_DONTNEED cannot throw away data. */ case MADV_DONTNEED: + case MADV_FREE: error = madvise_dontneed(vma, prev, start, end); break; I think you should only use the new behavior for madvise

Re: missing madvise functionality

2007-04-05 Thread Ulrich Drepper
In case somebody wants to play around with Rik patch or another madvise-based patch, I have x86-64 glibc binaries which can use it: http://people.redhat.com/drepper/rpms These are based on the latest Fedora rawhide version. They should work on older systems, too, but you screw up your

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Andrew Morton wrote: On Thu, 05 Apr 2007 03:39:29 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: Rik van Riel wrote: MADV_DONTNEED, unpatched, 1000 loops real0m13.672s user0m1.217s sys 0m45.712s MADV_DONTNEED, with patch, 1000 loops real0m4.169s user0m2.033s sys

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread David Howells
Andrew Morton <[EMAIL PROTECTED]> wrote: > > What we effectively have is 32 threads on a single CPU all doing > > for (ever) { > down_write() > up_write() > down_read() > up_read(); > } That's not quite so. In that test

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Eric Dumazet wrote: Could you please add this patch and see if it helps on your machine ? [PATCH] VM : mm_struct's mmap_cache should be close to mmap_sem Avoids cache line dirtying I could, but I already know it's not going to help much. How do I know this? I already have 66% idle time

Re: missing madvise functionality

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 03:39:29 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Rik van Riel wrote: > > > MADV_DONTNEED, unpatched, 1000 loops > > > > real0m13.672s > > user0m1.217s > > sys 0m45.712s > > > > > > MADV_DONTNEED, with patch, 1000 loops > > > > real0m4.169s > > user

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
On Thu, 05 Apr 2007 03:31:24 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Jakub Jelinek wrote: > > > My guess is that all the page zeroing is pretty expensive as well and > > takes significant time, but I haven't profiled it. > > With the attached patch (Andrew, I'll change the details

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Rik van Riel wrote: MADV_DONTNEED, unpatched, 1000 loops real0m13.672s user0m1.217s sys 0m45.712s MADV_DONTNEED, with patch, 1000 loops real0m4.169s user0m2.033s sys 0m3.224s I just noticed something fun with these numbers. Without the patch, the system (a quad

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Jakub Jelinek wrote: My guess is that all the page zeroing is pretty expensive as well and takes significant time, but I haven't profiled it. With the attached patch (Andrew, I'll change the details around if you want - I just wanted something to test now), your test case run time went down

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
Ulrich Drepper a écrit : Eric Dumazet wrote: Database workload, where the user multi threaded app is constantly accessing GBytes of data, so L2 cache hit is very small. If you want to oprofile it, with say a CPU_CLK_UNHALTED:5000 event, then find_vma() is in the top 5. We did have a workload

Re: missing madvise functionality

2007-04-05 Thread Jakub Jelinek
On Thu, Apr 05, 2007 at 03:31:24AM -0400, Rik van Riel wrote: > >My guess is that all the page zeroing is pretty expensive as well and > >takes significant time, but I haven't profiled it. > > With the attached patch (Andrew, I'll change the details around > if you want - I just wanted something

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
On Thu, 05 Apr 2007 04:31:55 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Eric Dumazet wrote: > > > Could you please add this patch and see if it helps on your machine ? > > > > [PATCH] VM : mm_struct's mmap_cache should be close to mmap_sem > > > > Avoids cache line dirtying > > I could,

Re: missing madvise functionality

2007-04-05 Thread Ulrich Drepper
Eric Dumazet wrote: > Database workload, where the user multi threaded app is constantly > accessing GBytes of data, so L2 cache hit is very small. If you want to > oprofile it, with say a CPU_CLK_UNHALTED:5000 event, then find_vma() is > in the top 5. We did have a workload with lots of Java and

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
Nick Piggin a écrit : Eric Dumazet wrote: >> This was not a working patch, just to throw the idea, since the answers I got showed I was not understood. In this case, find_extend_vma() should of course have one struct vm_area_cache * argument, like find_vma() One single cache on one mm is

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Nick Piggin wrote: Oh, also: something like this patch would help out MADV_DONTNEED, as it means it can run concurrently with page faults. I think the locking will work (but needs forward porting). Ironically, your patch decreases throughput on my quad core test system, with Jakub's test

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread Ingo Molnar
* David Howells [EMAIL PROTECTED] wrote: But short of recording the lock sequence, I don't think there's anyway to find out for sure. printk probably won't cut it as a recording mechanism because its overheads are too great. getting a good trace of it is easy: pick up the latest -rt

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 13:48:58 +0100 David Howells [EMAIL PROTECTED] wrote: Andrew Morton [EMAIL PROTECTED] wrote: What we effectively have is 32 threads on a single CPU all doing for (ever) { down_write() up_write() down_read()

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread Andrew Morton
On Thu, 5 Apr 2007 21:11:29 +0200 Ingo Molnar [EMAIL PROTECTED] wrote: * David Howells [EMAIL PROTECTED] wrote: But short of recording the lock sequence, I don't think there's anyway to find out for sure. printk probably won't cut it as a recording mechanism because its overheads

Re: missing madvise functionality

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 14:38:30 -0400 Rik van Riel [EMAIL PROTECTED] wrote: Nick Piggin wrote: Oh, also: something like this patch would help out MADV_DONTNEED, as it means it can run concurrently with page faults. I think the locking will work (but needs forward porting). Ironically,

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Andrew Morton wrote: #if NR_CPUS = CONFIG_SPLIT_PTLOCK_CPUS I wonder which way you're using, and whether using the other way changes things. I'm using the default Fedora config file, which has NR_CPUS defined to 64 and CONFIG_SPLIT_PTLOCK_CPUS to 4, so I am using the split locks. However,

Re: missing madvise functionality

2007-04-05 Thread Nick Piggin
Rik van Riel wrote: Nick Piggin wrote: Oh, also: something like this patch would help out MADV_DONTNEED, as it means it can run concurrently with page faults. I think the locking will work (but needs forward porting). Ironically, your patch decreases throughput on my quad core test system,

Re: missing madvise functionality

2007-04-05 Thread Nick Piggin
Ulrich Drepper wrote: In case somebody wants to play around with Rik patch or another madvise-based patch, I have x86-64 glibc binaries which can use it: http://people.redhat.com/drepper/rpms These are based on the latest Fedora rawhide version. They should work on older systems, too, but

Re: missing madvise functionality

2007-04-05 Thread Ulrich Drepper
Nick Piggin wrote: Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's kernels using down_write(mmap_sem) for MADV_DONTNEED is better than mmap/mprotect, which have more fundamental locking requirements, more overhead and no benefits (except debugging, I suppose). It's a

Re: missing madvise functionality

2007-04-05 Thread Nick Piggin
Ulrich Drepper wrote: Nick Piggin wrote: Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's kernels using down_write(mmap_sem) for MADV_DONTNEED is better than mmap/mprotect, which have more fundamental locking requirements, more overhead and no benefits (except debugging,

Re: missing madvise functionality

2007-04-05 Thread Ulrich Drepper
Eric Dumazet wrote: Database workload, where the user multi threaded app is constantly accessing GBytes of data, so L2 cache hit is very small. If you want to oprofile it, with say a CPU_CLK_UNHALTED:5000 event, then find_vma() is in the top 5. We did have a workload with lots of Java and

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
Nick Piggin a écrit : Eric Dumazet wrote: This was not a working patch, just to throw the idea, since the answers I got showed I was not understood. In this case, find_extend_vma() should of course have one struct vm_area_cache * argument, like find_vma() One single cache on one mm is not

Re: missing madvise functionality

2007-04-05 Thread Jakub Jelinek
On Thu, Apr 05, 2007 at 03:31:24AM -0400, Rik van Riel wrote: My guess is that all the page zeroing is pretty expensive as well and takes significant time, but I haven't profiled it. With the attached patch (Andrew, I'll change the details around if you want - I just wanted something to test

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
On Thu, 05 Apr 2007 04:31:55 -0400 Rik van Riel [EMAIL PROTECTED] wrote: Eric Dumazet wrote: Could you please add this patch and see if it helps on your machine ? [PATCH] VM : mm_struct's mmap_cache should be close to mmap_sem Avoids cache line dirtying I could, but I already

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Jakub Jelinek wrote: My guess is that all the page zeroing is pretty expensive as well and takes significant time, but I haven't profiled it. With the attached patch (Andrew, I'll change the details around if you want - I just wanted something to test now), your test case run time went down

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Eric Dumazet wrote: Could you please add this patch and see if it helps on your machine ? [PATCH] VM : mm_struct's mmap_cache should be close to mmap_sem Avoids cache line dirtying I could, but I already know it's not going to help much. How do I know this? I already have 66% idle time

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
Ulrich Drepper a écrit : Eric Dumazet wrote: Database workload, where the user multi threaded app is constantly accessing GBytes of data, so L2 cache hit is very small. If you want to oprofile it, with say a CPU_CLK_UNHALTED:5000 event, then find_vma() is in the top 5. We did have a workload

Re: missing madvise functionality

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 03:39:29 -0400 Rik van Riel [EMAIL PROTECTED] wrote: Rik van Riel wrote: MADV_DONTNEED, unpatched, 1000 loops real0m13.672s user0m1.217s sys 0m45.712s MADV_DONTNEED, with patch, 1000 loops real0m4.169s user0m2.033s sys

Re: missing madvise functionality

2007-04-05 Thread Eric Dumazet
On Thu, 05 Apr 2007 03:31:24 -0400 Rik van Riel [EMAIL PROTECTED] wrote: Jakub Jelinek wrote: My guess is that all the page zeroing is pretty expensive as well and takes significant time, but I haven't profiled it. With the attached patch (Andrew, I'll change the details around if you

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Rik van Riel wrote: MADV_DONTNEED, unpatched, 1000 loops real0m13.672s user0m1.217s sys 0m45.712s MADV_DONTNEED, with patch, 1000 loops real0m4.169s user0m2.033s sys 0m3.224s I just noticed something fun with these numbers. Without the patch, the system (a quad

Re: preemption and rwsems (was: Re: missing madvise functionality)

2007-04-05 Thread David Howells
Andrew Morton [EMAIL PROTECTED] wrote: What we effectively have is 32 threads on a single CPU all doing for (ever) { down_write() up_write() down_read() up_read(); } That's not quite so. In that test program, most

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Andrew Morton wrote: On Thu, 05 Apr 2007 03:39:29 -0400 Rik van Riel [EMAIL PROTECTED] wrote: Rik van Riel wrote: MADV_DONTNEED, unpatched, 1000 loops real0m13.672s user0m1.217s sys 0m45.712s MADV_DONTNEED, with patch, 1000 loops real0m4.169s user0m2.033s sys

Re: missing madvise functionality

2007-04-05 Thread Ulrich Drepper
In case somebody wants to play around with Rik patch or another madvise-based patch, I have x86-64 glibc binaries which can use it: http://people.redhat.com/drepper/rpms These are based on the latest Fedora rawhide version. They should work on older systems, too, but you screw up your

Re: missing madvise functionality

2007-04-05 Thread Rik van Riel
Jakub Jelinek wrote: + /* FIXME: POSIX says that MADV_DONTNEED cannot throw away data. */ case MADV_DONTNEED: + case MADV_FREE: error = madvise_dontneed(vma, prev, start, end); break; I think you should only use the new behavior for madvise

Re: missing madvise functionality

2007-04-04 Thread William Lee Irwin III
On Wed, 4 Apr 2007 06:09:18 -0700 William Lee Irwin III <[EMAIL PROTECTED]> wrote: >> Oh dear. On Wed, Apr 04, 2007 at 11:51:05AM -0700, Andrew Morton wrote: > what's all this about? I rewrote Jakub's testcase and included it as a MIME attachment. Current working version inline below. Also at

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Nick Piggin wrote: Jakub Jelinek wrote: On Wed, Apr 04, 2007 at 05:46:12PM +1000, Nick Piggin wrote: Does mmap(PROT_NONE) actually free the memory? Yes. /* Clear old maps */ error = -ENOMEM; munmap_back: vma = find_vma_prepare(mm, addr, , _link, _parent);

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Eric Dumazet wrote: On Wed, 04 Apr 2007 20:05:54 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: @@ -1638,7 +1652,7 @@ find_extend_vma(struct mm_struct * mm, u unsigned long start; addr &= PAGE_MASK; - vma = find_vma(mm,addr); + vma = find_vma(mm,addr,>vmacache);

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Hugh Dickins wrote: On Wed, 4 Apr 2007, Rik van Riel wrote: Hugh Dickins wrote: (I didn't understand how Rik would achieve his point 5, _no_ lock contention while repeatedly re-marking these pages, but never mind.) The CPU marks them accessed when they are reused. The VM only moves the

preemption and rwsems (was: Re: missing madvise functionality)

2007-04-04 Thread Andrew Morton
On Tue, 3 Apr 2007 16:29:37 -0400 Jakub Jelinek <[EMAIL PROTECTED]> wrote: > #include > #include > #include > #include > > void * > tf (void *arg) > { > (void) arg; > size_t ps = sysconf (_SC_PAGE_SIZE); > void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE, >

Re: missing madvise functionality

2007-04-04 Thread Andrew Morton
On Wed, 04 Apr 2007 14:08:47 -0400 Rik van Riel <[EMAIL PROTECTED]> wrote: > Andrew Morton wrote: > > > There are other ways of doing it - I guess we could use a new page flag to > > indicate that this is one-of-those-pages, and add new code to handle it in > > all the right places. > > That's

Re: missing madvise functionality

2007-04-04 Thread Andrew Morton
On Wed, 4 Apr 2007 06:09:18 -0700 William Lee Irwin III <[EMAIL PROTECTED]> wrote: > > On Tue, Apr 03, 2007 at 04:29:37PM -0400, Jakub Jelinek wrote: > > void * > > tf (void *arg) > > { > > (void) arg; > > size_t ps = sysconf (_SC_PAGE_SIZE); > > void *p = mmap (NULL, 128 * ps, PROT_READ

Re: missing madvise functionality

2007-04-04 Thread Anton Blanchard
Hi, > Oh. I was assuming that we'd want to unmap these pages from pagetables and > mark then super-easily-reclaimable. So a later touch would incur a minor > fault. > > But you think that we should leave them mapped into pagetables so no such > fault occurs. That would be very nice. The

Re: missing madvise functionality

2007-04-04 Thread Hugh Dickins
On Wed, 4 Apr 2007, Andrew Morton wrote: > > The treatment is identical to clean swapcache pages, with the sole > exception that they don't actually consume any swap space - hence the fake > swapcache entry thing. I see, sneaking through try_to_unmap's anon PageSwapCache assumptions as simply as

Re: missing madvise functionality

2007-04-04 Thread Rik van Riel
Andrew Morton wrote: There are other ways of doing it - I guess we could use a new page flag to indicate that this is one-of-those-pages, and add new code to handle it in all the right places. That's what I did. I'm currently working on the zap_page_range() side of things. One thing which

Re: missing madvise functionality

2007-04-04 Thread Andrew Morton
On Wed, 4 Apr 2007 10:15:41 +0100 (BST) Hugh Dickins <[EMAIL PROTECTED]> wrote: > On Tue, 3 Apr 2007, Andrew Morton wrote: > > > > All of which indicates that if we can remove the down_write(mmap_sem) from > > this glibc operation, things should get a lot better - there will be no > > additional

Re: missing madvise functionality

2007-04-04 Thread Hugh Dickins
On Wed, 4 Apr 2007, Rik van Riel wrote: > Hugh Dickins wrote: > > > (I didn't understand how Rik would achieve his point 5, _no_ lock > > contention while repeatedly re-marking these pages, but never mind.) > > The CPU marks them accessed when they are reused. > > The VM only moves the reused

Re: missing madvise functionality

2007-04-04 Thread Rik van Riel
Hugh Dickins wrote: (I didn't understand how Rik would achieve his point 5, _no_ lock contention while repeatedly re-marking these pages, but never mind.) The CPU marks them accessed when they are reused. The VM only moves the reused pages back to the active list on memory pressure. This

Re: missing madvise functionality

2007-04-04 Thread Hugh Dickins
On Wed, 4 Apr 2007, Marko Macek wrote: > Ulrich Drepper wrote: > > A solution for this problem is a madvise() operation with the following > > property: > > > > - the content of the address range can be discarded > > > > - if an access to a page in the range happens in the future it must > >

Re: missing madvise functionality

2007-04-04 Thread William Lee Irwin III
On Wed, Apr 04, 2007 at 06:09:18AM -0700, William Lee Irwin III wrote: > for (--i; i >= 0; --i) { > if (pthread_join(th[i], NULL)) { > perror("main: pthread_join failed"); > ret = EXIT_FAILURE; > } > } Obligatory

Re: missing madvise functionality

2007-04-04 Thread William Lee Irwin III
On Tue, Apr 03, 2007 at 04:29:37PM -0400, Jakub Jelinek wrote: > void * > tf (void *arg) > { > (void) arg; > size_t ps = sysconf (_SC_PAGE_SIZE); > void *p = mmap (NULL, 128 * ps, PROT_READ | PROT_WRITE, > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > if (p == MAP_FAILED) >

Re: missing madvise functionality

2007-04-04 Thread Eric Dumazet
On Wed, 04 Apr 2007 20:05:54 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: > > > @@ -1638,7 +1652,7 @@ find_extend_vma(struct mm_struct * mm, u > > unsigned long start; > > > > addr &= PAGE_MASK; > > - vma = find_vma(mm,addr); > > + vma = find_vma(mm,addr,>vmacache); > > if

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Eric Dumazet wrote: Well, I believe this one is too expensive. I was thinking of a light one : This one seems worse. Passing your vm_area_cache around everywhere, which is just intrusive and dangerous because ot becomes decoupled from the mm struct you are passing around. Watch this: @@

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Eric Dumazet wrote: On Wed, 04 Apr 2007 18:55:18 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: Peter Zijlstra wrote: On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote: Eric Dumazet wrote: I do think such workloads might benefit from a vma_cache not shared by all threads but private

Re: missing madvise functionality

2007-04-04 Thread Eric Dumazet
On Wed, 04 Apr 2007 18:55:18 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: > Peter Zijlstra wrote: > > On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote: > > > >>Eric Dumazet wrote: > > > > > >>>I do think such workloads might benefit from a vma_cache not shared by > >>>all threads but

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
William Lee Irwin III wrote: On Wed, Apr 04, 2007 at 06:55:18PM +1000, Nick Piggin wrote: + rcu_read_lock(); + do { + t->vma_cache_sequence = -1; + t = next_thread(t); + } while (t != curr); +

Re: missing madvise functionality

2007-04-04 Thread Hugh Dickins
On Tue, 3 Apr 2007, Andrew Morton wrote: > > All of which indicates that if we can remove the down_write(mmap_sem) from > this glibc operation, things should get a lot better - there will be no > additional context switches at all. > > And we can surely do that if all we're doing is looking up

Re: missing madvise functionality

2007-04-04 Thread William Lee Irwin III
On Wed, Apr 04, 2007 at 06:55:18PM +1000, Nick Piggin wrote: > + rcu_read_lock(); > + do { > + t->vma_cache_sequence = -1; > + t = next_thread(t); > + } while (t != curr); > + rcu_read_unlock();

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Peter Zijlstra wrote: On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote: Eric Dumazet wrote: I do think such workloads might benefit from a vma_cache not shared by all threads but private to each thread. A sequence could invalidate the cache(s). ie instead of a mm->mmap_cache, having

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Jakub Jelinek wrote: On Wed, Apr 04, 2007 at 05:46:12PM +1000, Nick Piggin wrote: Does mmap(PROT_NONE) actually free the memory? Yes. /* Clear old maps */ error = -ENOMEM; munmap_back: vma = find_vma_prepare(mm, addr, , _link, _parent); if (vma &&

Re: missing madvise functionality

2007-04-04 Thread Peter Zijlstra
On Wed, 2007-04-04 at 12:22 +1000, Nick Piggin wrote: > Eric Dumazet wrote: > > I do think such workloads might benefit from a vma_cache not shared by > > all threads but private to each thread. A sequence could invalidate the > > cache(s). > > > > ie instead of a mm->mmap_cache, having a

Re: missing madvise functionality

2007-04-04 Thread Jakub Jelinek
On Wed, Apr 04, 2007 at 05:46:12PM +1000, Nick Piggin wrote: > Does mmap(PROT_NONE) actually free the memory? Yes. /* Clear old maps */ error = -ENOMEM; munmap_back: vma = find_vma_prepare(mm, addr, , _link, _parent); if (vma && vma->vm_start < addr + len) {

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Nick Piggin wrote: Ulrich Drepper wrote: People might remember the thread about mysql not scaling and pointing the finger quite happily at glibc. Well, the situation is not like that. The problem is glibc has to work around kernel limitations. If the malloc implementation detects that a

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Ulrich Drepper wrote: People might remember the thread about mysql not scaling and pointing the finger quite happily at glibc. Well, the situation is not like that. The problem is glibc has to work around kernel limitations. If the malloc implementation detects that a large chunk of

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Eric Dumazet
On Tue, 03 Apr 2007 23:54:42 -0700 Ulrich Drepper <[EMAIL PROTECTED]> wrote: > Eric Dumazet wrote: > > You were CC on this one, you can find an archive here : > > You cc:ed my gmail account. I don't pick out mails sent to me there. > If you want me to look at something you have to send it to my

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Ulrich Drepper
Eric Dumazet wrote: > You were CC on this one, you can find an archive here : You cc:ed my gmail account. I don't pick out mails sent to me there. If you want me to look at something you have to send it to my @redhat.com address. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Eric Dumazet
Ulrich Drepper a écrit : Nick Piggin wrote: Sad. Although Ulrich did seem interested at one point I think? Ulrich, do you agree at least with the interface that Eric is proposing? I have no idea what you're talking about. You were CC on this one, you can find an archive here :

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Nick Piggin
Ulrich Drepper wrote: Nick Piggin wrote: Sad. Although Ulrich did seem interested at one point I think? Ulrich, do you agree at least with the interface that Eric is proposing? I have no idea what you're talking about. Private futexes. -- SUSE Labs, Novell Inc. - To unsubscribe from

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Ulrich Drepper
Nick Piggin wrote: > Sad. Although Ulrich did seem interested at one point I think? Ulrich, > do you agree at least with the interface that Eric is proposing? I have no idea what you're talking about. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Nick Piggin
(sorry to change the subjet, I was initially going to send the threaded vma cache patches on list, but then decided they didn't have enough changelog!) Andrew Morton wrote: On Wed, 04 Apr 2007 16:09:40 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: Andrew, do you have any objections to putting

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Andrew Morton
On Wed, 04 Apr 2007 16:09:40 +1000 Nick Piggin <[EMAIL PROTECTED]> wrote: > Andrew, do you have any objections to putting Eric's fairly > important patch at least into -mm? you know what to do ;) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message

[patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Nick Piggin
Eric Dumazet wrote: Nick Piggin a écrit : Eric Dumazet wrote: I do think such workloads might benefit from a vma_cache not shared by all threads but private to each thread. A sequence could invalidate the cache(s). ie instead of a mm->mmap_cache, having a mm->sequence, and each thread

[patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Nick Piggin
Eric Dumazet wrote: Nick Piggin a écrit : Eric Dumazet wrote: I do think such workloads might benefit from a vma_cache not shared by all threads but private to each thread. A sequence could invalidate the cache(s). ie instead of a mm-mmap_cache, having a mm-sequence, and each thread

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Andrew Morton
On Wed, 04 Apr 2007 16:09:40 +1000 Nick Piggin [EMAIL PROTECTED] wrote: Andrew, do you have any objections to putting Eric's fairly important patch at least into -mm? you know what to do ;) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Nick Piggin
(sorry to change the subjet, I was initially going to send the threaded vma cache patches on list, but then decided they didn't have enough changelog!) Andrew Morton wrote: On Wed, 04 Apr 2007 16:09:40 +1000 Nick Piggin [EMAIL PROTECTED] wrote: Andrew, do you have any objections to putting

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Ulrich Drepper
Nick Piggin wrote: Sad. Although Ulrich did seem interested at one point I think? Ulrich, do you agree at least with the interface that Eric is proposing? I have no idea what you're talking about. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ signature.asc

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Nick Piggin
Ulrich Drepper wrote: Nick Piggin wrote: Sad. Although Ulrich did seem interested at one point I think? Ulrich, do you agree at least with the interface that Eric is proposing? I have no idea what you're talking about. Private futexes. -- SUSE Labs, Novell Inc. - To unsubscribe from

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Eric Dumazet
Ulrich Drepper a écrit : Nick Piggin wrote: Sad. Although Ulrich did seem interested at one point I think? Ulrich, do you agree at least with the interface that Eric is proposing? I have no idea what you're talking about. You were CC on this one, you can find an archive here :

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Ulrich Drepper
Eric Dumazet wrote: You were CC on this one, you can find an archive here : You cc:ed my gmail account. I don't pick out mails sent to me there. If you want me to look at something you have to send it to my @redhat.com address. -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain

Re: [patches] threaded vma patches (was Re: missing madvise functionality)

2007-04-04 Thread Eric Dumazet
On Tue, 03 Apr 2007 23:54:42 -0700 Ulrich Drepper [EMAIL PROTECTED] wrote: Eric Dumazet wrote: You were CC on this one, you can find an archive here : You cc:ed my gmail account. I don't pick out mails sent to me there. If you want me to look at something you have to send it to my

Re: missing madvise functionality

2007-04-04 Thread Nick Piggin
Ulrich Drepper wrote: People might remember the thread about mysql not scaling and pointing the finger quite happily at glibc. Well, the situation is not like that. The problem is glibc has to work around kernel limitations. If the malloc implementation detects that a large chunk of

  1   2   >