Re: VST and Sched Load Balance

2005-04-07 Thread Nick Piggin
Srivatsa Vaddagiri wrote: I think a potential area which VST may need to address is scheduler load balance. If idle CPUs stop taking local timer ticks for some time, then during that period it could cause the various runqueues to go out of balance, since the idle CPUs will no longer pull tasks f

Re: VST and Sched Load Balance

2005-04-07 Thread Nick Piggin
Srivatsa Vaddagiri wrote: Hmm ..I guess we could restrict the max time a idle CPU will sleep taking into account its balance interval. But whatever heuristics we follow to maximize balance_interval of about-to-sleep idle CPU, don't we still run the risk of idle cpu being woken up and going immedi

Re: [PATCH 2/6] freepgt2: sys_mincore ignore FIRST_USER_PGD_NR

2005-04-07 Thread Nick Piggin
David Howells wrote: Hugh Dickins <[EMAIL PROTECTED]> wrote: Remove use of FIRST_USER_PGD_NR from sys_mincore: it's inconsistent (no other syscall refers to it), unnecessary (sys_mincore loops over vmas further down) and incorrect (misses user addresses in ARM's first pgd). You should make it use

Re: 2.6.12-rc2-mm1

2005-04-07 Thread Nick Piggin
On Thu, 2005-04-07 at 18:08 -0700, Siddha, Suresh B wrote: > On Thu, Apr 07, 2005 at 03:11:12AM +1000, Nick Piggin wrote: > > Using the attached patch, a puny dual PIII-650 with ~400MB RAM swapped > > itself to death after 2 infinite loop tasks had been pinned to one > >

Re: VST and Sched Load Balance

2005-04-07 Thread Nick Piggin
Srivatsa Vaddagiri wrote: On Thu, Apr 07, 2005 at 05:10:24PM +0200, Ingo Molnar wrote: Interaction with VST is not a big issue right now because this only matters on SMP boxes which is a rare (but not unprecedented) target for embedded platforms. Well, I don't think VST is targetting just powe

Re: [patch] use cheaper elv_queue_empty when unplug a device

2005-04-08 Thread Nick Piggin
Jens Axboe wrote: On Wed, Mar 30 2005, Nick Piggin wrote: So Kenneth if you could look into this one as well, to see if it is worthwhile, that would be great. For that to work, you have to change the get_io_context() allocation to be GFP_ATOMIC. Yes of course, thanks for picking that up. I guess

Re: [patch] use cheaper elv_queue_empty when unplug a device

2005-04-08 Thread Nick Piggin
Jens Axboe wrote: On Fri, Apr 08 2005, Nick Piggin wrote: I guess this isn't a problem, as io contexts should be allocated comparatively rarely. It would be possible to move it out of the lock though if we really want to. Lets just keep it inside the lock, for the fast case it should just

Re: [patch] sched: unlocked context-switches

2005-04-08 Thread Nick Piggin
Ingo Molnar wrote: * Luck, Tony <[EMAIL PROTECTED]> wrote: tested on x86, and all other arches should work as well, but if an architecture has irqs-off assumptions in its switch_to() logic it might break. (I havent found any but there may such assumptions.) The ia64_switch_to() code includes a s

Re: [patch] sched: unlocked context-switches

2005-04-09 Thread Nick Piggin
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: I did propose doing unconditionally unlocked switches a while back when my patch first popped up - you were against it then, but I guess you've had second thoughts? the reordering of switch_to() and the switch_mm()-relate

Re: Processes stuck on D state on Dual Opteron

2005-04-09 Thread Nick Piggin
Claudio Martins wrote: On Tuesday 05 April 2005 03:12, Andrew Morton wrote: Claudio Martins <[EMAIL PROTECTED]> wrote: While stress testing 2.6.12-rc2 on an HP DL145 I get processes stuck in D state after some time. This machine is a dual Opteron 248 with 2GB (ECC) on one node (the other node h

Re: Processes stuck on D state on Dual Opteron

2005-04-10 Thread Nick Piggin
or the tip. I booted with nmi_watchdog=0 and was able to get a full sysrq-t as well as a sysrq-m. Since it might be a little too big for the list, I've put it on a text file at: http://193.136.132.235/dl145/dump1-2.6.12-rc2.txt I also made a run with the mempool-can-fail patch from Nick Pi

Re: Processes stuck on D state on Dual Opteron

2005-04-11 Thread Nick Piggin
the lower zone protection for DMA ends up as), however you are well above all the "emergency watermarks" in ZONE_NORMAL. Also: I also made a run with the mempool-can-fail patch from Nick Piggin. With this I got some nice memory allocation errors from the md threads when the trouble started. T

Re: Processes stuck on D state on Dual Opteron

2005-04-11 Thread Nick Piggin
Nick Piggin wrote: The common theme seems to be: try_to_free_pages, swap_writepage, mempool_alloc, down/down_failed in .text.lock.md. Next I would suspect md/raid1 - maybe some deadlock in an uncommon memory allocation failure path? I'll see if I can reproduce it here. No luck yet (on SMP

Re: Processes stuck on D state on Dual Opteron

2005-04-11 Thread Nick Piggin
Claudio Martins wrote: Right. I'm using two Seagate ATA133 disks (ide controler is AMD-8111) each with 4 partitions, so I get 4 md Raid1 devices. The first one, md0, is for swap. The rest are ~$ df -h FilesystemSize Used Avail Use% Mounted on /dev/md1 4.6G 1.9G 2.6

Re: [patch 4/5] sched: RCU sched domains

2005-04-11 Thread Nick Piggin
Paul E. McKenney wrote: On Thu, Apr 07, 2005 at 05:58:40PM +1000, Nick Piggin wrote: OK thanks for the good explanation. So I'll keep it as is for now, and whatever needs cleaning up later can be worked out as it comes up. Looking forward to the split of synchronize_kernel() into synchroniz

Re: Processes stuck on D state on Dual Opteron

2005-04-11 Thread Nick Piggin
On Tue, 2005-04-12 at 01:22 +0100, Claudio Martins wrote: > On Monday 11 April 2005 23:59, Nick Piggin wrote: > > > > > OK, I'll try them in a few minutes and report back. > > > > I'm not overly hopeful. If they fix the problem, then it's likely >

Re: [patch] sched: unlocked context-switches

2005-04-11 Thread Nick Piggin
On Mon, 2005-04-11 at 18:06 -0700, David Mosberger wrote: > I had to refresh my memory with a quick Google search that netted [1] > (look for "Disable interrupts during context switch"). Actually, it > wasn't really a deadlock, but rather a livelock, since a CPU got stuck > on an infinite page-no

Re: 2.6.12-rc2-mm3

2005-04-11 Thread Nick Piggin
Andrew Morton wrote: So it turns out that patch was broken. I've fixed it locally and the results are good, but odd. The machine is a 4GB x86_64 with aic79xx controllers and MAXTOR ATLAS10K4_73WLS disks. ext2 filesystem. The workload is continuous pagecache writeback versus read-lots-of-little-fi

Re: 2.6.12-rc2-mm3

2005-04-11 Thread Nick Piggin
On Mon, 2005-04-11 at 23:19 -0700, Andrew Morton wrote: > Nick Piggin <[EMAIL PROTECTED]> wrote: > > > > >- The effects of tcq on AS are much less disastrous than I thought they > > > were. Do I have the wrong workload? Memory fails me. Or did we fix >

Re: Processes stuck on D state on Dual Opteron

2005-04-12 Thread Nick Piggin
Nick Piggin wrote: Chen, Kenneth W wrote: I like the patch a lot and already did bench it on our db setup. However, I'm seeing a negative regression compare to a very very crappy patch (see attached, you can laugh at me for doing things like that :-). OK - if we go that way, perhap

Re: Processes stuck on D state on Dual Opteron

2005-04-12 Thread Nick Piggin
Nick Piggin wrote: Nick Piggin wrote: Chen, Kenneth W wrote: I like the patch a lot and already did bench it on our db setup. However, I'm seeing a negative regression compare to a very very crappy patch (see attached, you can laugh at me for doing things like that :-). OK - if we go tha

[patch 0/9] various (mainly mempool fixes and block layer improvements)

2005-04-12 Thread Nick Piggin
Andrew, please consider. Ken, you'll probably have something similar to this if you were following various random threads closely and picking out my various random patches ;) -- SUSE Labs, Novell Inc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message

[patch 1/9] GFP_ZERO fix

2005-04-12 Thread Nick Piggin
1/9 -- SUSE Labs, Novell Inc. __GFP_ZERO really shouldn't tempt fate. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/linux/gfp.h === --- linux-2.6.orig/include/linux/gfp.h 2005-04-12 22:05:4

[patch 6/9] blk: unplug later

2005-04-12 Thread Nick Piggin
6/9 -- SUSE Labs, Novell Inc. get_request_wait needn't unplug the device immediately. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/drivers/block/ll_rw_blk.c === --- linux-2.6.orig/drivers/block/ll_rw_bl

[patch 7/9] blk: efficiency improvements

2005-04-12 Thread Nick Piggin
ue. This is reported to help efficiency. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/drivers/block/ll_rw_blk.c === --- linux-2.6.orig/drivers/block/ll_rw_blk.c2005-04-12 22:26:14.0 +1000 +

[patch doh/9] mempool simplify alloc

2005-04-12 Thread Nick Piggin
_harder' if it hits the page allocator. So if allocation still fails, then we can probably afford to hit the pool->lock - and what's the alternative? Try page reclaim and hit zone->lru_lock? Signed-off-by: Nick Piggin <[EMAIL

[patch 2/9] mempool gfp flag

2005-04-12 Thread Nick Piggin
2/9 -- SUSE Labs, Novell Inc. Mempools have 2 problems. The first is that mempool_alloc can possibly get stuck in __alloc_pages when they should opt to fail, and take an element from their reserved pool. The second is that it will happily eat emergency PF_MEMALLOC reserves instead of going to the

[patch 4/9] blk: no memory barrier

2005-04-12 Thread Nick Piggin
sts in flight. They will wake up waiters when they are retired. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/drivers/block/ll_rw_blk.c === --- linux-2.6.orig/drivers/block/ll_rw_blk.c2005-04-12 22:05:

[patch 5/9] blk: branch hints

2005-04-12 Thread Nick Piggin
5/9 -- SUSE Labs, Novell Inc. Sprinkle around a few branch hints in the block layer. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/drivers/block/ll_rw_blk.c === --- linux-2.6.orig/drivers/block/ll_rw_blk.c

[patch 0/9] blk: reduce locking

2005-04-12 Thread Nick Piggin
8/9 -- SUSE Labs, Novell Inc. Change around locking a bit for a result of 1-2 less spin lock unlock pairs in request submission paths. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/drivers/block/ll_rw

[patch 3/9] no PF_MEMALLOC tinkering

2005-04-12 Thread Nick Piggin
. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/swap_state.c === --- linux-2.6.orig/mm/swap_state.c 2005-04-12 22:05:44.0 +1000 +++ linux-2.6/mm/swap_state.c 2005-04-12 22:26:12.0

Re: Processes stuck on D state on Dual Opteron

2005-04-12 Thread Nick Piggin
Chen, Kenneth W wrote: On Tue, Apr 12 2005, Nick Piggin wrote: Actually the patches I have sent you do fix real bugs, but they also make the block layer less likely to recurse into page reclaim, so it may be eg. hiding the problem that Neil's patch fixes. Jens Axboe wrote on Tuesday, Apr

Re: [patch 1/9] GFP_ZERO fix

2005-04-12 Thread Nick Piggin
Andrew Morton wrote: Nick Piggin <[EMAIL PROTECTED]> wrote: #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \ - __GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \ - __GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GF

Re: [patch 2/9] mempool gfp flag

2005-04-12 Thread Nick Piggin
Andrew Morton wrote: Index: linux-2.6/include/linux/gfp.h === --- linux-2.6.orig/include/linux/gfp.h 2005-04-12 22:26:10.0 +1000 +++ linux-2.6/include/linux/gfp.h 2005-04-12 22:26:11.0 +1000 @@ -38,14 +38,16 @@ s

Re: [patch 3/9] no PF_MEMALLOC tinkering

2005-04-12 Thread Nick Piggin
Andrew Morton wrote: Nick Piggin <[EMAIL PROTECTED]> wrote: PF_MEMALLOC is really not a tool for tinkering. It is pretty specifically used to prevent recursion into page reclaim, and to prevent low memory deadlocks. The mm/swap_state.c code was the only legitimate tinkerer. Its conce

Re: [patch 6/9] blk: unplug later

2005-04-12 Thread Nick Piggin
Andrew Morton wrote: Nick Piggin <[EMAIL PROTECTED]> wrote: get_request_wait needn't unplug the device immediately. Probably. But what if the get_request(q, rw, GFP_NOIO); did some sleeping? It can't sleep unless it returns the request, because it is using mempool allocs. So any

Re: Processes stuck on D state on Dual Opteron

2005-04-12 Thread Nick Piggin
Chen, Kenneth W wrote: Nick Piggin wrote on Tuesday, April 12, 2005 4:09 AM Chen, Kenneth W wrote: I like the patch a lot and already did bench it on our db setup. However, I'm seeing a negative regression compare to a very very crappy patch (see attached, you can laugh at me for doing t

Re: Processes stuck on D state on Dual Opteron

2005-04-12 Thread Nick Piggin
Claudio Martins wrote: On Tuesday 12 April 2005 01:46, Andrew Morton wrote: Claudio Martins <[EMAIL PROTECTED]> wrote: I think I'm going to give a try to Neil's patch, but I'll have to apply some patches from -mm. Just this one if you're using 2.6.12-rc2: --- 25/drivers/md/md.c~avoid-deadlock-in-s

Re: [patch] sched: unlocked context-switches

2005-04-12 Thread Nick Piggin
David Mosberger wrote: On Tue, 12 Apr 2005 12:12:45 +1000, Nick Piggin <[EMAIL PROTECTED]> said: >> Now, Ingo says that the order is reversed with his patch, i.e., >> switch_mm() happens after switch_to(). That means flush_tlb_mm() >> may now see a current->ac

Re: [patch] sched: fix active load balance

2005-04-13 Thread Nick Piggin
Siddha, Suresh B wrote: On Wed, Apr 13, 2005 at 10:08:28PM +0200, Ingo Molnar wrote: * Siddha, Suresh B <[EMAIL PROTECTED]> wrote: - for_each_domain(target_cpu, sd) { + for_each_domain(target_cpu, sd) if ((sd->flags & SD_LOAD_BALANCE) && - cpu_isse

Re: [PATCH] sched: fix never executed code due to expression always false

2005-04-14 Thread Nick Piggin
Jesper Juhl wrote: There are two expressions in kernel/sched.c that are always false since they test for <0 but the result of the expression is unsigned so they will never be less than zero. This patch implement the logic that I believe is intended without the signedness issue and without the na

Re: [PATCH] sched: fix never executed code due to expression always false

2005-04-14 Thread Nick Piggin
Jesper Juhl wrote: On Fri, 15 Apr 2005, Nick Piggin wrote: Jesper Juhl wrote: There are two expressions in kernel/sched.c that are always false since they test for <0 but the result of the expression is unsigned so they will never be less than zero. This patch implement the logic that I beli

Re: [PATCH] sched: fix never executed code due to expression always false

2005-04-14 Thread Nick Piggin
Jesper Juhl wrote: As per this patch perhaps? : Thanks. I'll make sure it gets to the right place if nobody picks it up. Signed-off-by: Jesper Juhl <[EMAIL PROTECTED]> --- linux-2.6.12-rc2-mm3-orig/kernel/sched.c 2005-04-11 21:20:56.0 +0200 +++ linux-2.6.12-rc2-mm3/kernel/sched.c 2005-04-

Re: [PATCH] sched: fix never executed code due to expression always false

2005-04-14 Thread Nick Piggin
On Fri, 2005-04-15 at 12:59 +1000, Herbert Xu wrote: > Jesper Juhl <[EMAIL PROTECTED]> wrote: > > > > - if (unlikely((long long)now - prev->timestamp < 0)) > > + if (unlikely(((long long)now - (long long)prev->timestamp) > > < 0)) > > You can write this as > > (long l

Re: 2.6.12-rc2: >100% memory usage

2005-04-14 Thread Nick Piggin
On Fri, 2005-04-15 at 12:48 +0800, Michael Deegan wrote: > Hi folks, > > I noticed something unusual on my home desktop machine (K6II, 448M RAM, runs > KDE, samba, nfsd. 2.6.12-rc2 on Debian sarge). The machine seems to feel > slightly sluggish; it seems to swap a fair bit more than it did under >

Re: 2.6.12-rc2: >100% memory usage

2005-04-14 Thread Nick Piggin
On Thu, 2005-04-14 at 22:20 -0700, Randy.Dunlap wrote: > On Fri, 15 Apr 2005 14:59:05 +1000 Nick Piggin wrote: > > | On Fri, 2005-04-15 at 12:48 +0800, Michael Deegan wrote: > | > Hi folks, > | > > | > I noticed something unusual on my home desktop machine (K6II,

Re: [patch] sched: fix sched domain degenerate

2005-04-15 Thread Nick Piggin
esh Siddha <[EMAIL PROTECTED]> Catch more (hopefully all) cases. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-04-15 22:52:25.0 +1000

Re: [patch] sched: fix sched domain degenerate

2005-04-15 Thread Nick Piggin
Siddha, Suresh B wrote: On Fri, Apr 15, 2005 at 11:03:20PM +1000, Nick Piggin wrote: Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-04-15 22:52:25.0 +1000 +++ linux-2.6/kernel/sched.c2005

[patch 1/2] mm: remap ZERO_PAGE mappings

2005-08-06 Thread Nick Piggin
the mapping it needs for good cache performance, and as well do_wp_page is now able to always correctly detect and optimise zero page COW faults. This change is required in order to be able to detect whether a pte points to a ZERO_PAGE using only its (pte, vaddr) pair. Signed-off-by: Nick Piggin

[RFC][patch 0/2] mm: remove PageReserved

2005-08-06 Thread Nick Piggin
Hi, I'll be looking to send these off to Andrew after 2.6.14 opens, with the aim of having them merged by 2.6.15 hopefully. It doesn't look like they'll be able to easily free up a page flag for 2 reasons. First, PageReserved will probably be kept around for at least one release. Second, swsusp

[patch 2/2] mm: core remove PageReserved

2005-08-06 Thread Nick Piggin
wsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed. Many thanks to Hugh Dickins for input. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include

Re: [RFC][patch 0/2] mm: remove PageReserved

2005-08-08 Thread Nick Piggin
Daniel Phillips wrote: On Sunday 07 August 2005 13:28, Nick Piggin wrote: If anyone has an issue with the patches or my merge plan, let's get some discussion going. You forgot to mention what replaces PageReserved: the VM_RESERVED vma flag, which is now added to the whole zap_pte

Re: [RFC][patch 0/2] mm: remove PageReserved

2005-08-08 Thread Nick Piggin
Nigel Cunningham wrote: Hi. On Tue, 2005-08-09 at 07:09, Daniel Phillips wrote: It doesn't look like they'll be able to easily free up a page flag for 2 reasons. First, PageReserved will probably be kept around for at least one release. Second, swsusp and some arch code (ioremap) wants to know

Re: [RFC][patch 0/2] mm: remove PageReserved

2005-08-08 Thread Nick Piggin
Nigel Cunningham wrote: Hi Nick et al. On Tue, 2005-08-09 at 14:59, Nick Piggin wrote: Changing the e820 code so it sets PageNosave instead of PageReserved, along with a couple of modifications in swsusp itself should get rid of the swsusp dependency. That would work for swsusp, but there

Re: [RFC][patch 0/2] mm: remove PageReserved

2005-08-09 Thread Nick Piggin
Russell King wrote: On Tue, Aug 09, 2005 at 02:59:53PM +1000, Nick Piggin wrote: That would work for swsusp, but there are other users that want to know if a struct page is valid ram (eg. ioremap), so in that case swsusp would not be able to mess with the flag. The usage of "valid ram&

Re: [RFC][patch 0/2] mm: remove PageReserved

2005-08-09 Thread Nick Piggin
Arjan van de Ven wrote: On Tue, 2005-08-09 at 08:08 +0100, Russell King wrote: Can we straighten out the terminology so it's less confusing please? and. can we make a general page_is_ram() function that does what it says? on x86 it can go via the e820 table, other architectures can do

Re: [RFC][patch 0/2] mm: remove PageReserved

2005-08-09 Thread Nick Piggin
Benjamin Herrenschmidt wrote: I have no problem keeping PG_reserved for that, and _ONLY_ for that. (though i'd rather see it renamed then). I'm just afraid by doing so, some drivers will jump in the gap and abuse it again... Sure it would be renamed (better yet may be a slower page_is_valid()

Re: [RFC][patch 0/2] mm: remove PageReserved

2005-08-09 Thread Nick Piggin
Arjan van de Ven wrote: On Tue, 2005-08-09 at 19:31 +1000, Nick Piggin wrote: Arjan van de Ven wrote: and. can we make a general page_is_ram() function that does what it says? on x86 it can go via the e820 table, other architectures can do whatever they need That would be very

Re: [RFC][patch 0/2] mm: remove PageReserved

2005-08-09 Thread Nick Piggin
Hugh Dickins wrote: You're right (though I imagine might sometimes be holes rather than RAM). Yep. These holes are what I have in mind, and random other things like the !(bad_ppro && page_kills_ppro(pfn)) check. [...] I think Nick is treating the "use" of PageReserved in ioremap much too

Re: [RFC][patch 0/2] mm: remove PageReserved

2005-08-09 Thread Nick Piggin
Hugh Dickins wrote: On Tue, 9 Aug 2005, Nick Piggin wrote: But in either case: I agree that it is probably not a great loss to remove the check, although considering it will be needed for swsusp anyway... swsusp (and I think crashdump has a similar need) is a very different case: it&#

Re: allow the load to grow upto its cpu_power (was Re: [Patch] don't kick ALB in the presence of pinned task)

2005-08-09 Thread Nick Piggin
Siddha, Suresh B wrote: For example, lets take two nodes each having two physical packages. And assume that there are two tasks and both of them are on (may or may n't be pinned) two packages in node-0 Todays load balance will detect that there is an imbalance between the two nodes and will try

Re: sched_domains SD_BALANCE_FORK and sched_balance_self

2005-08-09 Thread Nick Piggin
Siddha, Suresh B wrote: On Tue, Aug 09, 2005 at 03:19:58PM -0700, Martin J. Bligh wrote: --On Tuesday, August 09, 2005 15:03:32 -0700 "Siddha, Suresh B" <[EMAIL PROTECTED]> wrote: Balance on clone make some sort of sense, since you know they're not going to exec afterwards. We've thrashed t

Re: allow the load to grow upto its cpu_power (was Re: [Patch] don't kick ALB in the presence of pinned task)

2005-08-10 Thread Nick Piggin
On Tue, 2005-08-09 at 19:03 -0700, Siddha, Suresh B wrote: > On Wed, Aug 10, 2005 at 10:27:44AM +1000, Nick Piggin wrote: > > Yeah this makes sense. Thanks. > > > > I think we'll only need your first line change to fix this, though. > > > > Your second chang

Re: [RFC][patch 0/2] mm: remove PageReserved

2005-08-11 Thread Nick Piggin
Benjamin Herrenschmidt wrote: On Tue, 2005-08-09 at 20:41 +0100, Russell King wrote: On Tue, Aug 09, 2005 at 07:38:52AM -0700, Martin J. Bligh wrote: pfn_valid() doesn't tell you it's RAM or not - it tells you whether you have a backing struct page for that address. Could be an IO mapped devi

[patch 0/7] lockless pagecache 2

2005-08-11 Thread Nick Piggin
This is my second attempt at a lockless pagecache. Patches are against 2.6.13-rc6, and have had reasonable stressing (albeit on small SMPs). Main changes since last seen: * Code clarity and commenting improvement. * Fix race where multiple concurrent failed speculative reference takers could

[patch 1/7] mm: remove PageReserved rollup

2005-08-11 Thread Nick Piggin
1/7 This rollup is a patchset all on its own. There is a recent thread on linux-kernel if it interests you. Required by lockless pagecache for consistent page refcounting -- SUSE Labs, Novell Inc. Index: linux-2.6/mm/rmap.c === --

[patch 3/7] mm: speculative get_page

2005-08-11 Thread Nick Piggin
3/7 -- SUSE Labs, Novell Inc. If we can be sure that elevating the page_count on a pagecache page will pin it, we can speculatively run this operation, and subsequently check to see if we hit the right page rather than relying on holding a lock or otherwise pinning a reference to the page. This

[patch 2/7] mm: PG_free flag

2005-08-11 Thread Nick Piggin
2/7 -- SUSE Labs, Novell Inc. In a future patch we can no longer rely on page_count being stable at any time, so we can no longer overload PagePrivate && page_count == 0 to mean the page is free and on the buddy lists. Index: linux-2.6/include/linux/page-flags.h

[patch 4/7] radix-tree: lookup_slot

2005-08-11 Thread Nick Piggin
4/7 Required by lockless pagecache in order to get a pointer to a pagecache struct page. -- SUSE Labs, Novell Inc. From: Hans Reiser <[EMAIL PROTECTED]> Reiser4 uses radix trees to solve a trouble reiser4_readdir has serving nfs requests. Unfortunately, radix tree api lacks an operation suita

[patch 5/7] radix-tree: lockless readside

2005-08-11 Thread Nick Piggin
5/7 -- SUSE Labs, Novell Inc. Make radix tree lookups safe to be performed without locks. Readers are protected against nodes being deleted by using RCU based freeing. Readers are protected against new node insertion by using memory barriers to ensure the node itself will be properly written bef

[patch 6/7] mm: lockless pagecache

2005-08-11 Thread Nick Piggin
6/7 -- SUSE Labs, Novell Inc. Use the speculative get_page and the lockless radix tree lookups to introduce lockless page cache lookups (ie. no mapping->tree_lock). The only atomicity changes this should introduce is the use of a non atomic pagevec lookup for truncate, however what atomicity gu

[patch 7/7] mm: spinlock tree_lock

2005-08-11 Thread Nick Piggin
7/7 -- SUSE Labs, Novell Inc. With practially all the read locks gone from mapping->tree_lock, convert the lock from an rwlock back to a spinlock. The remaining locks including the read locks mainly deal with IO submission and not the lookup fastpaths. Index: linux-2.6/fs/buffer.c

Re: [patch 6/7] mm: lockless pagecache

2005-08-11 Thread Nick Piggin
Hi Pekka, Pekka Enberg wrote: Hi Nick, On 8/11/05, Nick Piggin <[EMAIL PROTECTED]> wrote: +unsigned find_get_pages_nonatomic(struct address_space *mapping, pgoff_t start, + unsigned int nr_pages, struct page **pages) +{ + unsigned int i; + unsign

Re: allow the load to grow upto its cpu_power (was Re: [Patch] don't kick ALB in the presence of pinned task)

2005-08-11 Thread Nick Piggin
Siddha, Suresh B wrote: On Thu, Aug 11, 2005 at 01:09:10PM +1000, Nick Piggin wrote: I have a variation on the 2nd part of your patch which I think I would prefer. IMO it kind of generalises the current imbalance calculation to handle this case rather than introducing a new special case

Re: allow the load to grow upto its cpu_power (was Re: [Patch] don't kick ALB in the presence of pinned task)

2005-08-11 Thread Nick Piggin
Siddha, Suresh B wrote: On Fri, Aug 12, 2005 at 09:49:36AM +1000, Nick Piggin wrote: Well, it is a departure from our current idea of balancing. That idea is already changing from the first line of the patch. And the change is "allowing the load to grow upto the sched group's

Re: [patch 5/7] radix-tree: lockless readside

2005-08-11 Thread Nick Piggin
On Thu, 2005-08-11 at 18:37 -0700, Paul E. McKenney wrote: > On Thu, Aug 11, 2005 at 10:25:47PM +1000, Nick Piggin wrote: > > 5/7 > > > > -- > > SUSE Labs, Novell Inc. > > > > > Make radix tree lookups safe to be performed without locks. > > Read

Re: [patch 6/7] mm: lockless pagecache

2005-08-11 Thread Nick Piggin
On Thu, 2005-08-11 at 18:49 -0700, Paul E. McKenney wrote: > On Thu, Aug 11, 2005 at 10:28:04PM +1000, Nick Piggin wrote: > > 6/7 > > > > -- > > SUSE Labs, Novell Inc. > > > > > Use the speculative get_page and the lockless radix tree lookups > &

Re: [patch 5/7] radix-tree: lockless readside

2005-08-11 Thread Nick Piggin
On Thu, 2005-08-11 at 18:37 -0700, Paul E. McKenney wrote: > On Thu, Aug 11, 2005 at 10:25:47PM +1000, Nick Piggin wrote: > > 5/7 > > > > -- > > SUSE Labs, Novell Inc. > > > > > Make radix tree lookups safe to be performed without locks. > > Read

Re: [patch 5/7] radix-tree: lockless readside

2005-08-12 Thread Nick Piggin
Nick Piggin wrote: With the above, we can meet the same requirements of the current find_get_page. Which basically are: x) If the page was ever[1] in pagecache, it may be returned y) If the pagecache was ever[2] empty, NULL may be returned Oh, I missed a couple of "obvious" o

Re: [PATCH] eliminte NMI entry/ exit code

2005-08-12 Thread Nick Piggin
George Anzinger wrote: The NMI entry and exit code fiddles with bits in the preempt count. If an NMI happens while some other code is doing the same, bits will be lost. This patch removes this modify code from the NMI path till we can come up with something better. Humour me for a minute

Re: [patch 18/39] remap_file_pages protection support: add VM_FAULT_SIGSEGV

2005-08-15 Thread Nick Piggin
Russell King wrote: On Fri, Aug 12, 2005 at 08:21:45PM +0200, [EMAIL PROTECTED] wrote: @@ -632,10 +632,11 @@ static inline int page_mapped(struct pag * Used to decide whether a process gets delivered SIGBUS or * just gets major/minor fault counters bumped up. */ -#define VM_FAULT_OOM (-1)

Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets

2005-04-18 Thread Nick Piggin
Dinakar Guniguntala wrote: Here's an attempt at dynamic sched domains aka isolated cpusets Very good, I was wondering when someone would try to implement this ;) It needs some work. A few initial comments on the kernel/sched.c change - sorry, don't have too much time right now... --- linux-2.6.12-r

Re: PATCH [PPC64]: dead processes never reaped

2005-04-18 Thread Nick Piggin
On Tue, 2005-04-19 at 11:07 +1000, Benjamin Herrenschmidt wrote: > On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote: > > > > Hi, > > > > The patch below appears to fix a problem where a number of dead processes > > linger on the system. On a highly loaded system, dozens of processes > > w

Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets

2005-04-18 Thread Nick Piggin
On Mon, 2005-04-18 at 22:54 -0700, Paul Jackson wrote: > Now, onto the real stuff. > > This same issue, in a strange way, comes up on the memory side, > as well as on the cpu side. > > First, let me verify one thing. I understand that the _key_ > purpose of your patch is not so much to isolate

Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets

2005-04-19 Thread Nick Piggin
On Mon, 2005-04-18 at 23:59 -0700, Paul Jackson wrote: > Nick wrote: > > Basically you just have to know that it has the > > capability to partition the system in an arbitrary disjoint set > > of sets of cpus. > > > > If you can make use of that, then we're in business ;) > > You read fast ;) >

Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets

2005-04-19 Thread Nick Piggin
On Tue, 2005-04-19 at 00:19 -0700, Paul Jackson wrote: > Nick wrote: > > It doesn't work if you have *most* jobs bound to either > > {0, 1, 2, 3} or {4, 5, 6, 7} but one which should be allowed > > to use any CPU from 0-7. > > How bad does it not work? > > My understanding is that Dinakar's patch

Re: E1000 - page allocation failure - saga continues :(

2005-04-19 Thread Nick Piggin
On Tue, 2005-04-19 at 09:23 +0200, Yann Dupont wrote: > Lukas Hejtmanek a Ãcrit : > >Btw, are you using some TCP tweaks? E.g. I have default TCP window size 1MB. > > > > > > > Do you have turned NAPI on ??? I tried without it off on e1000 and ... > surprise ! > Don't have any messages since 12H

Re: E1000 - page allocation failure - saga continues :(

2005-04-19 Thread Nick Piggin
On Tue, 2005-04-19 at 10:15 +0200, Yann Dupont wrote: > Nick Piggin a Ãcrit : > > > > >>Do you have turned NAPI on ??? I tried without it off on e1000 and ... > >>surprise ! > >>Don't have any messages since 12H now (usually I got those in less than 1

Re: [PATCH scsi-misc-2.6 01/05] scsi: make blk layer set REQ_SOFTBARRIER when a request is dispatched

2005-04-20 Thread Nick Piggin
On Wed, 2005-04-20 at 16:40 +0900, Tejun Heo wrote: > Hello, Jens. > > On Wed, Apr 20, 2005 at 08:30:10AM +0200, Jens Axboe wrote: > > Do it on requeue, please - not on the initial spotting of the request. > > This is the reworked version of the patch. It sets REQ_SOFTBARRIER > in two places -

Re: [PATCH scsi-misc-2.6 01/05] scsi: make blk layer set REQ_SOFTBARRIER when a request is dispatched

2005-04-20 Thread Nick Piggin
Jens Axboe wrote: On Wed, Apr 20 2005, Tejun Heo wrote: Well, yeah, all schedulers have dispatch queue (noop has only the dispatch queue) and use them to defer/requeue, so no reordering will happen, but I'm not sure they are required to be like this or just happen to be implemented so. Precisely,

Re: [PATCH scsi-misc-2.6 01/05] scsi: make blk layer set REQ_SOFTBARRIER when a request is dispatched

2005-04-20 Thread Nick Piggin
Jens Axboe wrote: On Wed, Apr 20 2005, Nick Piggin wrote: I guess this could be one use of 'reordering' after a requeue. Yeah, or perhaps the io scheduler might determine that a request has higher prio than a requeued one. I'm not sure what semantics to place I guess this is

Re: Serious performance degradation on a RAID with kernel 2.6.10-bk7 and later

2005-04-20 Thread Nick Piggin
On Wed, 2005-04-20 at 10:55 -0600, jmerkey wrote: > > For 3Ware, you need to chage the queue depths, and you will see > dramatically improved performance. 3Ware can take requests > a lot faster than Linux pushes them out. Try changing this instead, you > won't be going to sleep all the time wait

[patch] fix race in __block_prepare_write

2005-04-20 Thread Nick Piggin
condition. BUG sighted on a 2-way Itanium2 system with 16K PAGE_SIZE running fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2 where $DIR is a new ext2 filesystem with 4K blocks that is quite small (causing get_block to fail often with -ENOSPC). Signed-off-by: Nick Piggin <[EMAIL PROTECTED]>

[patch] fix race in __block_prepare_write (again)

2005-04-20 Thread Nick Piggin
light. __mpage_writepage BUGs on this condition. BUG sighted on a 2-way Itanium2 system with 16K PAGE_SIZE running fsstress -v -d $DIR/tmp -n 1000 -p 1000 -l 2 where $DIR is a new ext2 filesystem with 4K blocks that is quite small (causing get_block to fail often with -ENOSPC). Signed-off-by: Nick P

Re: [patch] fix race in __block_prepare_write (again)

2005-04-21 Thread Nick Piggin
On Thu, 2005-04-21 at 08:01 +0100, Anton Altaparmakov wrote: > Any reason why you left the goto out? It would be IMO much cleaner to > remove the label "out" altogether and replace the single "goto out" with a > "break" (which is fine since the goto happens inside the for loop > immediately af

Re: [patch] fix race in __block_prepare_write (again)

2005-04-21 Thread Nick Piggin
On Thu, 2005-04-21 at 08:10 +0100, Anton Altaparmakov wrote: > And one more thing... > > On Thu, 2005-04-21 at 08:01 +0100, Anton Altaparmakov wrote: > > On Thu, 21 Apr 2005, Nick Piggin wrote: > > > ... I somehow didn't send it to Andrew last time.

[patch 0/6] remove PageReserved

2005-07-26 Thread Nick Piggin
Hi Andrew, If you're feeling like -mm is getting too stable, then you might consider giving these patches a spin? (unless anyone else raises an objection). Ben thought I should get moving with them soon. Not much change from last time. A bit of ppc64 input from Ben, and some rmap.c input from H

[patch 1/6] mm: comment rmap

2005-07-26 Thread Nick Piggin
1/6 Just be clear that VM_RESERVED pages here are a bug, and the test is not there because they are expected. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/rmap.c === --- linux-2.6.orig/mm/rmap.c +++ lin

[patch 2/6] mm: micro-optimise rmap

2005-07-26 Thread Nick Piggin
2/6 Microoptimise page_add_anon_rmap. Although these expressions are used only in the taken branch of the if() statement, the compiler can't reorder them inside because atomic_inc_and_test is a barrier. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.

[patch 4/6] mm: remove atomic

2005-07-26 Thread Nick Piggin
eing freed). Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/mm/page_alloc.c === --- linux-2.6.orig/mm/page_alloc.c +++ linux-2.6/mm/page_alloc.c @@ -329,7 +329,7 @@ static inline void free_pages_check(cons

<    1   2   3   4   5   6   7   8   9   10   >