[patch 5/6] mm: remap ZERO_PAGE mappings

2005-07-26 Thread Nick Piggin
zero page COW faults. This change is required in order to be able to detect whether a pte points to a ZERO_PAGE using only its (pte, vaddr) pair. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/mremap.c === ---

[patch 3/6] mm: cleanup rmap

2005-07-26 Thread Nick Piggin
3/6 Thanks to Bill Irwin for pointing this out. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c +++ linux-2.6/mm/rmap.c @@ -448,16 +448,12 @@ void page_add_ano

Re: [patch 0/6] remove PageReserved

2005-07-26 Thread Nick Piggin
Nick Piggin wrote: Hi Andrew, If you're feeling like -mm is getting too stable, then you might consider giving these patches a spin? (unless anyone else raises an objection). Patches are against 2.6.13-rc3-git7 -- SUSE Labs, Novell Inc. Send instant messages to your online friends

[patch 6/6] mm: core remove PageReserved

2005-07-26 Thread Nick Piggin
ed to determine whether a struct page points to valid memory or not. This still needs to be addressed. Many thanks to Hugh Dickins for input. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/linux/mm.h ==

[patch 6/6] mm: core remove PageReserved (take 2)

2005-07-26 Thread Nick Piggin
Nick Piggin wrote: 6/6 Actually I think Hugh gave me some feedback about the introduced `print_invalid_pfn` function, which I ignored. So here is patch 6 again, with print_invalid_pfn renamed invalid_pfn, and using a macro to alleviate the requirement of passing in the function name by hand

Re: [patch 0/6] remove PageReserved

2005-07-26 Thread Nick Piggin
Kumar Gala wrote: Most of the arch code is just reserved memory reporting, which isn't very interesting and could easily be removed. Some arch users are a bit more subtle, however they *should not* break, because all the places that set and clear PageReserved are basically intact. What is th

Re: [patch 2/6] mm: micro-optimise rmap

2005-07-27 Thread Nick Piggin
Alexander Nyberg wrote: void page_add_anon_rmap(struct page *page, linear_page_index() here too? Hi Alexander, Yes, that's what patch 3/6 did :) -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send

Re: [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO configurable

2005-07-27 Thread Nick Piggin
Steven Rostedt wrote: OK, still looks like the generic ffb can be changed. Unless I'm missing something, this shows that I probably should be sending in a patch now to replace the find_first_bit. Ingo's sched_find_first_bit is still the winner, but that is customed to the scheduler, until we n

Re: Add prefetch switch stack hook in scheduler function

2005-07-28 Thread Nick Piggin
Keith Owens wrote: On Thu, 28 Jul 2005 09:41:18 +0200, Ingo Molnar <[EMAIL PROTECTED]> wrote: i'm wondering, is the switch_stack at the same/similar place as next->thread_info? If yes then we could simply do a prefetch(next->thread_info). No, they can be up to 30K apart. See include/asm-ia

Re: Add prefetch switch stack hook in scheduler function

2005-07-28 Thread Nick Piggin
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: No, they can be up to 30K apart. See include/asm-ia64/ptrace.h. thread_info is at ~0xda0, depending on the config. The switch_stack can be as high as 0x7bd0 in the kernel stack, depending on why the task is sleeping.

Re: Add prefetch switch stack hook in scheduler function

2005-07-28 Thread Nick Piggin
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: such as? Not sure. thread_info? Maybe next->timestamp or some other fields in next, something in next->mm? next->thread_info we could and should prefetch - but from the generic scheduler code (see the pa

Re: Add prefetch switch stack hook in scheduler function

2005-07-28 Thread Nick Piggin
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: [...] prefetch_area(void *first_addr, void *last_addr) (or as addr,len) Yep. We have prefetch_range. Yeah, then a specific field _within_ next->mm or thread_info may want to be fetched. In short, I don&

VIA PCI routing problem

2005-07-28 Thread Nick Piggin
Hi, Sorry in taking so long to track this down. I just got motivated today. I have a VIA SMP system and somewhere between 2.6.12-rc3 and 2.6.12 the USB mouse started moving around really slowly. Anyway, it turns out that the attached patch (against 2.6.13-rc3-git8) fixes the problem. Let me kno

Re: VIA PCI routing problem

2005-07-28 Thread Nick Piggin
Bjorn Helgaas wrote: Can you try this: [...] If that doesn't help, remove it and see if this does: [...] Can you also include "lspci" output? Neither worked. I'll open a bugzilla and include lspci and dmesg there. -- SUSE Labs, Novell Inc. Send instant messages to your online frie

Re: VIA PCI routing problem

2005-07-28 Thread Nick Piggin
Brown, Len wrote: Fix two systems, break another... Nick, can you open a bugzilla on this and put your lspci -vv and dmesg into it. Apparently the quirk is good for some machines and not as good for others and we need to get smarter about when to apply it. OK, done. I put it under ACPI tho

Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin
Chen, Kenneth W wrote: What sort of workload needs SD_WAKE_AFFINE and SD_WAKE_BALANCE? SD_WAKE_AFFINE are not useful in conjunction with interrupt binding. In fact, it creates more harm than usefulness, causing detrimental process migration and destroy process cache affinity etc. Also SD_WAKE_BA

Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 4:35 PM Wake balancing provides an opportunity to provide some input bias into the load balancer. For example, if you started 100 pairs of tasks which communicate through a pipe. On a 2 CPU system without wake balancing

Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 6:25 PM Well pipes are just an example. It could be any type of communication. What's more, even the synchronous wakeup uses the wake balancing path (although that could be modified to only do wake balancing for synch wa

Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-28 Thread Nick Piggin
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 6:46 PM I'd like to try making them less aggressive first if possible. Well, that's exactly what I'm trying to do: make them not aggressive at all by not performing any load balance :-) The workload gets

Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Nick Piggin
Chen, Kenneth W wrote: Nick Piggin wrote on Thursday, July 28, 2005 7:01 PM This clearly outlines an issue with the implementation. Optimize for one type of workload has detrimental effect on another workload and vice versa. Yep. That comes up fairly regularly when tuning the scheduler

Re: Delete scheduler SD_WAKE_AFFINE and SD_WAKE_BALANCE flags

2005-07-29 Thread Nick Piggin
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: processes1 2 3 4 2.6.13-rc4: 187, 183, 179 260, 259, 256 340, 320, 349 504, 496, 500 no wake-bal: 180, 180, 177 254, 254, 253 268, 270, 348 345, 290, 500 Numbers ar

Re: [sched, patch] better wake-balancing, #3

2005-07-29 Thread Nick Piggin
Ingo Molnar wrote: * Ingo Molnar <[EMAIL PROTECTED]> wrote: there's an even simpler way: only do wakeup-balancing if this_cpu is idle. (tbench results are still OK, and other workloads improved.) here's an updated patch. It handles one more detail: on SCHED_SMT we should check the idleness

Re: [sched, patch] better wake-balancing, #3

2005-07-30 Thread Nick Piggin
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: I don't really like having a hard cutoff like that -wake balancing can be important for IO workloads, though I haven't measured for a long time. [...] well, i have measured it, and it was a win for just about

[patch 2.6.13-rc4] fix get_user_pages bug

2005-08-01 Thread Nick Piggin
incorrect. Fix this by reporting a raced (uncompleted) fault and retrying the lookup and fault in get_user_pages before making the assumption that we have a writeable page. Great work by Robin Holt <[EMAIL PROTECTED]> to debug the problem. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

Re: [patch 2.6.13-rc4] fix get_user_pages bug

2005-08-01 Thread Nick Piggin
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: Feedback please, anyone. it looks good to me, but wouldnt it be simpler (in terms of patch and architecture impact) to always retry the follow_page() in get_user_pages(), in case of a minor fault? The sequence of minor

Re: [patch 2.6.13-rc4] fix get_user_pages bug

2005-08-01 Thread Nick Piggin
Ingo Molnar wrote: Hugh's posting said: "it's trying to avoid an endless loop of finding the pte not writable when ptrace is modifying a page which the user is currently protected against writing to (setting a breakpoint in readonly text, perhaps?)" i'm wondering, why should that case

Re: [patch 2.6.13-rc4] fix get_user_pages bug

2005-08-01 Thread Nick Piggin
Linus Torvalds wrote: On Mon, 1 Aug 2005, Nick Piggin wrote: Not sure if this should be fixed for 2.6.13. It can result in pagecache corruption: so I guess that answers my own question. Hell no. This patch is clearly untested and must _not_ be applied: Yes, I meant that the problem

Re: [patch 2.6.13-rc4] fix get_user_pages bug

2005-08-01 Thread Nick Piggin
Linus Torvalds wrote: Instead, I'd suggest changing the logic for "lookup_write". Make it require that the page table entry is _dirty_ (not writable), and then remove the line that says: lookup_write = write && !force; and you're now done. A successful mm fault for write _should_ a

Re: [patch 2.6.13-rc4] fix get_user_pages bug

2005-08-01 Thread Nick Piggin
On Mon, 2005-08-01 at 20:45 -0700, Linus Torvalds wrote: > > On Tue, 2 Aug 2005, Nick Piggin wrote: > > > > Surely this introduces integrity problems when `force` is not set? > > "force" changes how we test the vma->vm_flags, that was always the > mean

Re: [Patch] don't kick ALB in the presence of pinned task

2005-08-01 Thread Nick Piggin
Siddha, Suresh B wrote: Jack Steiner brought this issue at my OLS talk. Take a scenario where two tasks are pinned to two HT threads in a physical package. Idle packages in the system will keep kicking migration_thread on the busy package with out any success. We will run into similar scenarios

Re: [Patch] don't kick ALB in the presence of pinned task

2005-08-02 Thread Nick Piggin
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: Hmm, I would have hoped the new "all_pinned" logic should have handled this case properly. [...] no, active_balance is a different case, not covered by the all_pinned logic. This is a HT-special scenari

[patch 0/2] sched: reduce locking

2005-08-02 Thread Nick Piggin
Hi, I've had these patches around for a while, and I'd like to get rid of them. They could possibly even go in 2.6.13. I haven't really done performance testing because it is difficult to get real workloads going that really stress these things. There are small improvements on things like tbench

[patch 1/2] sched: reduce locking in newidle balancing

2005-08-02 Thread Nick Piggin
ads. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-08-02 21:35:36.0 +1000 +++ linux-2.6/kernel/sched.c2005-08-02 21:56:40.0 +1000 @

Re: [patch 2.6.13-rc4] fix get_user_pages bug

2005-08-02 Thread Nick Piggin
for every clean, writeable pte it encounters (when being called for write). Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/memory.c === --- linux-2.6.orig/mm/memory.c +++ linux-2.6/mm/memory.c @@ -811,15 +

[patch 2/2] sched: reduce locking in periodic balancing

2005-08-02 Thread Nick Piggin
e runqueues won't be stable anyway, so load balancing is always an inexact operation. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-08-02

Re: [patch 2.6.13-rc4] fix get_user_pages bug

2005-08-03 Thread Nick Piggin
Hugh Dickins wrote: On Tue, 2 Aug 2005, Linus Torvalds wrote: Go for it, I think whatever we do won't be wonderfully pretty. Here we are: get_user_pages quite untested, let alone the racy case, but I think it should work. Please all hack it around as you see fit, I'll check mail when I get

Re: [patch 2/2] sched: reduce locking in periodic balancing

2005-08-03 Thread Nick Piggin
Ingo Molnar wrote: [...] Thanks for the corrections. btw., holding the runqueue lock during the initial scanning portion of load-balancing is one of the top PREEMPT_RT critical paths on SMP. (It's not bad, but it's one of the factors that makes SMP latencies higher.) Good, I'm glad they

Re: [patch 2.6.13-rc4] fix get_user_pages bug

2005-08-03 Thread Nick Piggin
Hugh Dickins wrote: Stupidity was the reason I thought handle_mm_fault couldn't be inline: I was picturing it static inline within mm/memory.c, failed to make the great intellectual leap you've achieved by moving it to include/linux/mm.h. Well it was one of my finer moments, so don't be too h

Re: [patch 2.6.13-rc4] fix get_user_pages bug

2005-08-03 Thread Nick Piggin
Linus Torvalds wrote: On Wed, 3 Aug 2005, Nick Piggin wrote: Oh, it gets rid of the -1 for VM_FAULT_OOM. Doesn't seem like there is a good reason for it, but might that break out of tree drivers? Ok, I applied this because it was reasonably pretty and I liked the approach. It seems

Re: [patch 2.6.13-rc4] fix get_user_pages bug

2005-08-04 Thread Nick Piggin
Alexander Nyberg wrote: On Wed, Aug 03, 2005 at 09:12:37AM -0700 Linus Torvalds wrote: Ok, I applied this because it was reasonably pretty and I liked the approach. It seems buggy, though, since it was using "switch ()" to test the bits (wrongly, afaik), and I'm going to apply the appended

Re: [PATCH] dyn-tick3 tweaks

2005-08-04 Thread Nick Piggin
Con Kolivas wrote: Something like this on top is cleaner and quieter. I'll add this to pending changes for another version. Index: linux-2.6.13-rc5-ck2/arch/i386/kernel/timers/timer_tsc.c ===

Re: [PATCH] dyn-tick3 tweaks respin

2005-08-04 Thread Nick Piggin
On Fri, 2005-08-05 at 13:20 +1000, Con Kolivas wrote: > Like this I assume you meant? > Yeah that looks good. Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel

Re: [PATCH] struct file cleanup : the very large file_ra_state is now allocated only on demand.

2005-08-17 Thread Nick Piggin
Andi Kleen wrote: I would just set the ra pointer to a single global structure if the allocation fails. Then you can avoid all the other checks. It will slow down things and trash some state, but not fail and nobody should expect good performance after out of memory anyways. The only check sti

Re: sched_yield() makes OpenLDAP slow

2005-08-17 Thread Nick Piggin
Joseph Fannin wrote: On Thu, Aug 18, 2005 at 02:50:16AM +0200, Bernardo Innocenti wrote: The relative timestamp reveals that slapd is spending 50ms after yielding. Meanwhile, GCC is probably being scheduled for a whole quantum. Reading the man-page of sched_yield() it seems this isn't the c

Re: [PATCH] struct file cleanup : the very large file_ra_state is now allocated only on demand.

2005-08-17 Thread Nick Piggin
Andi Kleen wrote: You don't want to always have bad performance though, so you could attempt to allocate if either the pointer is null _or_ it points to the global structure? Remember it's after a GFP_KERNEL OOM. If that fails most likely you have deadlocked somewhere else already because Lin

Re: sched_yield() makes OpenLDAP slow

2005-08-18 Thread Nick Piggin
Hi Howard, Thanks for joining the discussion. One request, if I may, can you retain the CC list on posts please? Howard Chu wrote: > AFAIKS, sched_yield should only really be used by realtime applications that know exactly what they're doing. pthread_yield() was deleted from the POSIX thre

Re: sched_yield() makes OpenLDAP slow

2005-08-19 Thread Nick Piggin
Robert Hancock wrote: I fail to see how sched_yield is going to be very helpful in this situation. Since that call can sleep from a range of time ranging from zero to a long time, it's going to give unpredictable results. Well, not sleep technically, but yield the CPU for some undefined a

Re: sched_yield() makes OpenLDAP slow

2005-08-20 Thread Nick Piggin
Howard Chu wrote: Lee Revell wrote: On Sat, 2005-08-20 at 11:38 -0700, Howard Chu wrote: > But I also found that I needed to add a new yield(), to work around > yet another unexpected issue on this system - we have a number of > threads waiting on a condition variable, and the thread holding t

Re: CONFIG_PRINTK_TIME woes

2005-08-21 Thread Nick Piggin
Andrew Morton wrote: Andrew Morton <[EMAIL PROTECTED]> wrote: How about we give each arch a printk_clock()? Which might be as simple as this.. sched_clock() shouldn't really be taken outside kernel/sched.c, especially for things like this. It actually has some fundamental problems even

Re: CONFIG_PRINTK_TIME woes

2005-08-21 Thread Nick Piggin
Andrew Morton wrote: yup. Why not use something like do_gettimeofday? (or I'm sure one of our time keepers can suggest the right thing to use). do_gettimeofday() takes locks, so a) we can't do printk from inside it and Dang, yeah maybe this is the showstopper. b) if you do a printk-fr

Re: CONFIG_PRINTK_TIME woes

2005-08-23 Thread Nick Piggin
David S. Miller wrote: This is a useful feature, please do not labotomize it just because it's difficult to implement on ia64. Just make a "printk_get_timestamp_because_ia64_sucks()" interface or something like that :-) I was a bit unclear when I raised this issue. It is not just an ia64 prob

Re: process creation time increases linearly with shmem

2005-08-24 Thread Nick Piggin
Ray Fucillo wrote: I am seeing process creation time increase linearly with the size of the shared memory segment that the parent touches. The attached forktest.c is a very simple user program that illustrates this behavior, which I have tested on various kernel versions from 2.4 through 2.6.

Re: [PATCH 2.6.13-rc6] cpu_exclusive sched domains build fix

2005-08-24 Thread Nick Piggin
Paul Jackson wrote: Dinakar wrote: Can we hold on to this patch for a while, as I reported yesterday, Sure - though I guess it's Linus or Andrew who will have to do the holding. I sent it off contingent on the approval of yourself, Hawkes and Nick. I get the feeling that the problem woul

Re: [PATCH 2.6.13-rc6] cpu_exclusive sched domains build fix

2005-08-24 Thread Nick Piggin
Paul Jackson wrote: So long as the cpuset code stops making any calls to partition_sched_domains() whatsoever, then we should be back where we were in 2.6.12, so far as the scheduler is concerned - right? That's right - sorry I just meant disabling the dynamic sched domains behaviour of the cp

Re: [PATCH] removes filp_count_lock and changes nr_files type to atomic_t

2005-08-25 Thread Nick Piggin
On Thu, 2005-08-25 at 12:41 +0200, Eric Dumazet wrote: > OK, here is a new clean patch that address this problem (nothing assumed > about > atomics) > Would you just be able to add the atomic sysctl handler that Christoph suggested? This introduces lost update problems. 2 CPUs may store to nr

Re: process creation time increases linearly with shmem

2005-08-25 Thread Nick Piggin
Ray Fucillo wrote: Nick Piggin wrote: fork() can be changed so as not to set up page tables for MAP_SHARED mappings. I think that has other tradeoffs like initially causing several unavoidable faults reading libraries and program text. What kind of application are you using? The

Re: process creation time increases linearly with shmem

2005-08-25 Thread Nick Piggin
Andi Kleen wrote: Would it be worth trying to do something like this? Maybe. Shouldn't be very hard though - you just need to check if the VMA is backed by an object and if yes don't call copy_page_range for it. I think it just needs (untested) I think you need to check for MAP_SHARED a

Re: [PATCH] removes filp_count_lock and changes nr_files type to atomic_t

2005-08-25 Thread Nick Piggin
Eric Dumazet wrote: Nick Piggin a écrit : Would you just be able to add the atomic sysctl handler that Christoph suggested? Quite a lot of work indeed, and it would force to convert 3 int (nr_files, nr_free_files, max_files) to 3 atomic_t. I feel bad introducing a lot of sysctl rework

Re: [PATCH] removes filp_count_lock and changes nr_files type to atomic_t

2005-08-25 Thread Nick Piggin
Eric Dumazet wrote: Furthermore, a lazy sync would mean to change sysctl proc_handler for "file-nr" to perform a synchronize before calling proc_dointvec, this would be really obscure. I was only using your terminology (ie. the 'lazy' synch after the atomic is updated). Actually, a better

Re: process creation time increases linearly with shmem

2005-08-25 Thread Nick Piggin
Rik van Riel wrote: On Thu, 25 Aug 2005, Nick Piggin wrote: fork() can be changed so as not to set up page tables for MAP_SHARED mappings. I think that has other tradeoffs like initially causing several unavoidable faults reading libraries and program text. Actually, libraries and program

Re: [PATCH 2.6.13-rc7 2/2] completely disable cpu_exclusive sched domain

2005-08-25 Thread Nick Piggin
Paul Jackson wrote: At the suggestion of Nick Piggin and Dinakar, totally disable the facility to allow cpu_exclusive cpusets to define dynamic sched domains in Linux 2.6.13, in order to avoid problems first reported by John Hawkes (corrupt sched data structures and kernel oops). This has been

Re: process creation time increases linearly with shmem

2005-08-26 Thread Nick Piggin
Hugh Dickins wrote: On Thu, 25 Aug 2005, Linus Torvalds wrote: That said, I think it's a valid optimization. Especially as the child _probably_ doesn't need it (ie there's at least some likelihood of an execve() or similar). I agree, seems a great idea to me (sulking because I was too dumb

Re: process creation time increases linearly with shmem

2005-08-27 Thread Nick Piggin
Linus Torvalds wrote: On Fri, 26 Aug 2005, Rik van Riel wrote: On Fri, 26 Aug 2005, Hugh Dickins wrote: Well, I still don't think we need to test vm_file. We can add an anon_vma test if you like, if we really want to minimize the fork overhead, in favour of later faults. Do we? When you

Re: process creation time increases linearly with shmem

2005-08-27 Thread Nick Piggin
Hugh Dickins wrote: On Sun, 28 Aug 2005, Nick Piggin wrote: This is the condition I ended up with. Any good? if (!(vma->vm_flags & (VM_HUGETLB|VM_NONLINEAR|VM_RESERVED))) { if (vma->vm_flags & VM_MAYSHARE) return 0; if (vma->vm_file && !vma->anon_vma) retu

Re: [PATCH] make radix tree gang lookup faster by using a bitmap search

2005-08-28 Thread Nick Piggin
James Bottomley wrote: On Sun, 2005-08-28 at 18:35 -0700, Andrew Morton wrote: It does make the tree higher and hence will incur some more cache missing when descending the tree. Actually, I don't think it does: the common user is the page tree. Obviously, I've changed nothing on 64 bits,

Re: process creation time increases linearly with shmem

2005-08-29 Thread Nick Piggin
Ray Fucillo wrote: Nick Piggin wrote: How does the following look? (I changed the comment a bit). Andrew, please apply if nobody objects. Nick, I applied this latest patch to a 2.6.12 kernel and found that it does resolve the problem. Prior to the patch on this machine, I was seeing

Re: [PATCH] make radix tree gang lookup faster by using a bitmap search

2005-08-29 Thread Nick Piggin
Sonny Rao wrote: On Mon, Aug 29, 2005 at 01:37:48PM +1000, Nick Piggin wrote: s/common/only ? But the page tree is indexed by file offset rather than virtual address, and we try to span the file's pagecache with the smallest possible tree. So it will tend to make the trees taller.

Re: [PATCH] make radix tree gang lookup faster by using a bitmap search

2005-08-29 Thread Nick Piggin
James Bottomley wrote: On Tue, 2005-08-30 at 10:56 +1000, Nick Piggin wrote: Gang lookup is mainly used on IO paths but also on truncate, which is a reasonably fast path on some workloads (James, this is my suggestion for what you should test - truncate). Actually, I don't think

Re: [PATCH] make radix tree gang lookup faster by using a bitmap search

2005-08-30 Thread Nick Piggin
Sonny Rao wrote: On Tue, Aug 30, 2005 at 12:53:18PM +1000, Nick Piggin wrote: For testing regular lookups, yeah that's more difficult. For a microbenchmark you can use sparse files, which can be a good trick for testing pagecache performance without the IO. I have a feeling that te

Re: strange CPU speedups with SMP on Athlon 64 X2

2005-08-31 Thread Nick Piggin
Nathan Becker wrote: I would be happy to post my exact C source that I use to do the benchmark, but I wanted to get some feedback first in case I'm just doing something stupid. Also, since I'm not subscribed to this list, please cc me directly regarding this topic. Hi Nathan, Cache issu

Re: Where is the performance bottleneck?

2005-08-31 Thread Nick Piggin
Holger Kiehl wrote: 3236497 total 1.4547 2507913 default_idle 52248.1875 158752 shrink_zone 43.3275 121584 copy_user_generic_c 3199.5789 34271 __wake_up_bit

Re: Where is the performance bottleneck?

2005-08-31 Thread Nick Piggin
Holger Kiehl wrote: meminfo.dump: MemTotal: 8124172 kB MemFree: 23564 kB Buffers: 7825944 kB Cached: 19216 kB SwapCached: 0 kB Active: 25708 kB Inactive: 7835548 kB HighTotal: 0 kB HighFree:0 kB

[PATCH 2.6.13] lockless pagecache 2/7

2005-09-01 Thread Nick Piggin
2/7 Implement atomic_cmpxchg for i386 and ppc64. Is there any architecture that won't be able to implement such an operation? -- SUSE Labs, Novell Inc. Introduce an atomic_cmpxchg operation. Implement this for i386 and ppc64. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index

[PATCH 2.6.13] lockless pagecache 1/7

2005-09-01 Thread Nick Piggin
1/7 Remove PageReserved rollup. -- SUSE Labs, Novell Inc. Index: linux-2.6/mm/memory.c === --- linux-2.6.orig/mm/memory.c +++ linux-2.6/mm/memory.c @@ -333,6 +333,21 @@ out: } /* + * This function is called to print an error whe

[PATCH 2.6.13] lockless pagecache 5/7

2005-09-01 Thread Nick Piggin
5/7 -- SUSE Labs, Novell Inc. Make radix tree lookups safe to be performed without locks. Readers are protected against nodes being deleted by using RCU based freeing. Readers are protected against new node insertion by using memory barriers to ensure the node itself will be properly written befo

[PATCH 2.6.13] lockless pagecache 4/7

2005-09-01 Thread Nick Piggin
4/7 -- SUSE Labs, Novell Inc. From: Hans Reiser <[EMAIL PROTECTED]> Reiser4 uses radix trees to solve a trouble reiser4_readdir has serving nfs requests. Unfortunately, radix tree api lacks an operation suitable for modifying existing entry. This patch adds radix_tree_lookup_slot which return

[PATCH 2.6.13] lockless pagecache 3/7

2005-09-01 Thread Nick Piggin
3/7 -- SUSE Labs, Novell Inc. If we can be sure that elevating the page_count on a pagecache page will pin it, we can speculatively run this operation, and subsequently check to see if we hit the right page rather than relying on holding a lock or otherwise pinning a reference to the page. This

[PATCH 2.6.13] lockless pagecache 6/7

2005-09-01 Thread Nick Piggin
6/7 -- SUSE Labs, Novell Inc. Use the speculative get_page and the lockless radix tree lookups to introduce lockless page cache lookups (ie. no mapping->tree_lock). The only atomicity changes this should introduce is the use of a non atomic pagevec lookup for truncate, however what atomicity gu

[PATCH 2.6.13] lockless pagecache 7/7

2005-09-01 Thread Nick Piggin
7/7 -- SUSE Labs, Novell Inc. With practially all the read locks gone from mapping->tree_lock, convert the lock from an rwlock back to a spinlock. The remaining locks including the read locks mainly deal with IO submission and not the lookup fastpaths. Index: linux-2.6/fs/buffer.c

Re: New lockless pagecache

2005-09-01 Thread Nick Piggin
Nick Piggin wrote: I think this is getting pretty stable. No guarantees of course, but it would be great if anyone gave it a test. Or review, I might add. While I understand such a review is still quite difficult, this code really is far less complex than the previous lockless pagecache

Re: [PATCH 2.6.13] lockless pagecache 2/7

2005-09-02 Thread Nick Piggin
Andi Kleen wrote: Alan Cox <[EMAIL PROTECTED]> writes: On Gwe, 2005-09-02 at 16:29 +1000, Nick Piggin wrote: 2/7 Implement atomic_cmpxchg for i386 and ppc64. Is there any architecture that won't be able to implement such an operation? i386, sun4c, Actually we have cmpx

Re: [PATCH 2.6.13] lockless pagecache 2/7

2005-09-02 Thread Nick Piggin
Christoph Lameter wrote: On Fri, 2 Sep 2005, Nick Piggin wrote: Implement atomic_cmpxchg for i386 and ppc64. Is there any architecture that won't be able to implement such an operation? Something like that used to be part of the page fault scalability patchset. You contributed to it

Re: [PATCH 2.6.13] lockless pagecache 2/7

2005-09-02 Thread Nick Piggin
Bear with me Dave, I'll repeat myself a bit, for the benefit of lkml. Andi Kleen wrote: Yeah quite a few. I suspect most MIPS also would have a problem in this area. cmpxchg can be done with LL/SC can't it? Any MIPS should have that. Right. On PARISC, I don't see where they are emulating

Re: [PATCH 2.6.13] lockless pagecache 2/7

2005-09-02 Thread Nick Piggin
David S. Miller wrote: From: Nick Piggin <[EMAIL PROTECTED]> Date: Sat, 03 Sep 2005 07:22:18 +1000 This atomic_cmpxchg, unlike a "regular" cmpxchg, has the advantage that the memory altered should always be going through the atomic_ accessors, and thus should be implementabl

Re: [PATCH 2.6.13] lockless pagecache 2/7

2005-09-02 Thread Nick Piggin
Alan Cox wrote: but I suspect that SMP isn't supported on those CPUs without ll/sc, and thus an atomic_cmpxchg could be emulated by disabling interrupts. It's obviously emulatable on any platform - the question is at what cost. For x86 it probably isn't a big problem as there are very very fe

Re: [PATCH 2.6.13] lockless pagecache 2/7

2005-09-03 Thread Nick Piggin
Alan Cox wrote: On Sad, 2005-09-03 at 11:40 +1000, Nick Piggin wrote: We'll see how things go. I'm fairly sure that for my usage it will be a win even if it is costly. It is replacing an atomic_inc_return, and a read_lock/read_unlock pair. Make sure you bench both AMD and Intel -

Re: [PATCH 2.6.13] lockless pagecache 2/7

2005-09-05 Thread Nick Piggin
Alan Cox wrote: On Sul, 2005-09-04 at 11:01 +1000, Nick Piggin wrote: I would be surprised if it was a big loss... but I'm assuming a locked cmpxchg isn't outlandishly expensive. Basically: read_lock_irqsave(cacheline1); atomic_inc_return(cacheline2); read_unlock_irqrestore(

Re: RFC: i386: kill !4KSTACKS

2005-09-06 Thread Nick Piggin
On Tue, 2005-09-06 at 09:13 +0200, Andi Kleen wrote: > At some point we undoubtedly will need to increase it further, > the logical point would be when Linux switches to larger softpage > sizes. Is this really a "when"? Hugh and wli were both working on this and IIRC neither could show enough

Re: kbuild & C++

2005-09-07 Thread Nick Piggin
Budde, Marco wrote: make life more difficult. If you do not like any kind of abstraction, why are you using C instead of pure assembler? This has nothing to do with the linux kernel anymore, so can the thread be killed from lkml please? (Not to be rude; understand the s/n ratio is bad at the

Re: [PATCH 1/2] (repost) New System call, unshare (fwd)

2005-09-07 Thread Nick Piggin
Janak Desai wrote: - tsk->min_flt = tsk->maj_flt = 0; - tsk->nvcsw = tsk->nivcsw = 0; + /* +* If the process memory is being duplicated as part of the +* unshare system call, we are working with the current process +* and not a newly allocated task stru

Re: [PATCH 2.6.13] lockless pagecache 5/7

2005-09-08 Thread Nick Piggin
Christoph Lameter wrote: I wonder if it may not be better to use a seqlock for the tree_lock? A seqlock requires no writes at all if the tree has not been changed. RCU still requires the incrementing of a (local) counter. Ah, but the seqlock's write side will cause cacheline bouncing in the

Re: [PATCH 2.6.13] lockless pagecache 7/7

2005-09-09 Thread Nick Piggin
Christoph Lameter wrote: For Itanium (and I guess also for ppc64 and sparch64) the performance of write_lock/unlock is the same as spin_lock/unlock. There is at least one case where concurrent reads would be allowed without this patch. Yep, I picked up another one that was easy to make lockl

Re: general config preemption Q: preempt-model and Big-Lock Preemption

2008-01-06 Thread Nick Piggin
On Saturday 05 January 2008 14:25, Linda Walsh wrote: > A question that comes to mind every time I go through the settings > for "Preemption Model" and "Preempt The Big Kernel Lock". > > Do each of the combinations "make sense", or are some "no-ops"? > For model, we have 1) no forced (server), 2) V

Re: [patch 02/20] make the inode i_mmap_lock a reader/writer lock

2008-01-07 Thread Nick Piggin
On Thursday 03 January 2008 19:55, Ingo Molnar wrote: > * Nick Piggin <[EMAIL PROTECTED]> wrote: > > > Have you done anything more with allowing > 256 CPUS in this > > > spinlock patch? We've been testing with 1k cpus and to verify with > > > -mm

Re: [PATCH 10/28] FS-Cache: Recruit a couple of page flags for cache management [try #2]

2008-01-07 Thread Nick Piggin
On Thursday 03 January 2008 03:27, David Howells wrote: > Nick Piggin <[EMAIL PROTECTED]> wrote: > > Then make a PG_private2 bit and use that. > > To what end? Are you suggesting I should have: > > PG_private2 = PG_private | PG_fscache No. I mean call the bi

Re: [PATCH 10/28] FS-Cache: Recruit a couple of page flags for cache management [try #2]

2008-01-07 Thread Nick Piggin
On Tuesday 08 January 2008 00:09, David Howells wrote: > Nick Piggin <[EMAIL PROTECTED]> wrote: > > No. I mean call the bit PG_private2. That way non-pagecache and > > filesystems that don't use fscache can use it. > > The bit is called PG_owner_priv_2, and then &#

Re: free_pages_check

2008-01-07 Thread Nick Piggin
On Tuesday 08 January 2008 13:43, Yinghai Lu wrote: > wonder why free_pages_check mm/page_alloc.c is using bit OR than logical OR > > @@ -450,9 +450,9 @@ static inline void __free_one_page(struc > > static inline int free_pages_check(struct page *page) > { > - if (unlikely(page_mapcount(pag

Re: free_pages_check

2008-01-07 Thread Nick Piggin
On Tuesday 08 January 2008 16:44, H. Peter Anvin wrote: > Nick Piggin wrote: > > On Tuesday 08 January 2008 13:43, Yinghai Lu wrote: > >> wonder why free_pages_check mm/page_alloc.c is using bit OR than logical > >> OR > >> > >> @@ -450,9 +450,

[patch 1/3] drm: nopage

2008-01-08 Thread Nick Piggin
lt. Remove redundant vma range checks. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Cc: linux-kernel@vger.kernel.org --- drivers/char/drm/drm_vm.c | 131 +- 1 file changed, 61 insertions(+), 70 deletions(-) Index: linux-2.6/

[patch 3/3] mm: remove nopage

2008-01-08 Thread Nick Piggin
it in the core mm code and documentation (and a few stray references to it in comments). Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Cc: linux-kernel@vger.kernel.org --- Documentation/feature-removal-schedule.txt |9 Documentation/filesystems/Locking

<    1   2   3   4   5   6   7   8   9   10   >