Re: -mm merge plans for 2.6.23

2007-07-26 Thread Nick Piggin
Andrew Morton wrote: On Thu, 26 Jul 2007 15:53:37 +1000 Nick Piggin [EMAIL PROTECTED] wrote: Not that I want to say anything about swap prefetch getting merged: my inbox is already full of enough helpful suggestions about that, give them the kernel interfaces, they can do it themselves

Re: [patch] sched: introduce SD_BALANCE_FORK for ht/mc/smp domains

2007-07-26 Thread Nick Piggin
On Thu, Jul 26, 2007 at 03:34:56PM -0700, Suresh B wrote: On Fri, Jul 27, 2007 at 12:18:30AM +0200, Ingo Molnar wrote: * Siddha, Suresh B [EMAIL PROTECTED] wrote: Introduce SD_BALANCE_FORK for HT/MC/SMP domains. For HT/MC, as caches are shared, SD_BALANCE_FORK is the right thing

[patch][rfc] 2.6.23-rc1 mm: NUMA replicated pagecache

2007-07-27 Thread Nick Piggin
-2.6/mm/replication.c === --- /dev/null +++ linux-2.6/mm/replication.c @@ -0,0 +1,609 @@ +/* + * linux/mm/replication.c + * + * NUMA pagecache replication + * + * Copyright (C) 2007 Nick Piggin, SuSE Labs + */ +#include linux/init.h

Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

2007-08-22 Thread Nick Piggin
On Tue, Aug 21, 2007 at 05:29:27PM +0200, Peter Zijlstra wrote: [ now with CCs ] On Tue, 2007-08-21 at 02:28 +0200, Nick Piggin wrote: I do of course. There is one thing to have a real lock deadlock in some core path, and another to have this memory deadlock in a known-to-be-dodgy

Re: [RFC 2/9] Use NOMEMALLOC reclaim to allow reclaim if PF_MEMALLOC is set

2007-08-22 Thread Nick Piggin
On Tue, Aug 21, 2007 at 04:07:15PM +0200, Peter Zijlstra wrote: On Tue, 2007-08-21 at 02:39 +0200, Nick Piggin wrote: Although interestingly, we are not guaranteed to have enough memory to completely initialise writeout of a single page. Yes, that is due to the unbounded nature

Re: [PATCH,RESEND] documentation: atomic_add_unless() doesn't imply mb() on failure

2007-08-23 Thread Nick Piggin
as an example. We should either change the implementation, or fix the docs. Did this end up getting merged? If not, it should, thanks. Acked-by: Nick Piggin [EMAIL PROTECTED] Signed-off-by: Oleg Nesterov [EMAIL PROTECTED] Documentation/atomic_ops.txt |3 ++- Documentation/memory-barriers.txt

Re: [RFC 2/9] Use NOMEMALLOC reclaim to allow reclaim if PF_MEMALLOC is set

2007-08-23 Thread Nick Piggin
On Thu, Aug 23, 2007 at 11:26:48AM +0200, Peter Zijlstra wrote: On Thu, 2007-08-23 at 05:38 +0200, Nick Piggin wrote: On Tue, Aug 21, 2007 at 04:07:15PM +0200, Peter Zijlstra wrote: On Tue, 2007-08-21 at 02:39 +0200, Nick Piggin wrote: Although interestingly, we are not guaranteed

[patch] 2.6.23-rc3: fsblock

2007-08-24 Thread Nick Piggin
Hi, I'm still plugging away at fsblock slowly. Haven't really got around to to finishing up any big new features, but there has been a lot of bug fixing and little API changes since last release. I still think fsblock has merit, and even if a more extent-based approach ends up working better for

Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd get_page_from_freelist() : No more no page failures.

2007-08-26 Thread Nick Piggin
[EMAIL PROTECTED] wrote: [EMAIL PROTECTED] Sent: Friday, August 24, 2007 3:11 PM Subject: Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd get_page_from_freelist() : No more no page failures. Mailer added a HTML subpart and chopped the earlier email :^( Hi Mitchell, Is it

Re: [patch 01/28] Fall back on interrupt disable in cmpxchg8b on 80386 and 80486

2007-08-27 Thread Nick Piggin
Mathieu Desnoyers wrote: Q: What's the reason to have cmpxchg64_local on 32 bit architectures? Without that need all this would just be a few simple defines. A: cmpxchg64_local on 32 bits architectures takes unsigned long long parameters, but cmpxchg_local only takes longs. Since we have

Re: oom-killer with 27G free swap and overcommit_memory=2

2007-08-27 Thread Nick Piggin
Patrick J. LoPresti wrote: I am using Linux 2.6.16.46-0.12-smp (SUSE 10 SP1 stock kernel). I do intend to bother SUSE, but I am hoping some kind kernel savant can help me interpret these log messages and/or give me some suggestions for how to proceed. My system is a SunFire x4100 (x86_64) with

Re: [PATCH -rt 1/8] introduce PICK_FUNCTION

2007-08-28 Thread Nick Piggin
Daniel Walker wrote: PICK_FUNCTION() is similar to the other PICK_OP style macros, and was created to replace them all. I used variable argument macros to handle PICK_FUNC_2ARG/PICK_FUNC_1ARG. Otherwise the marcos are similar to the original macros used for semaphores. The entire system is used

Re: [07/36] Use page_cache_xxx in mm/filemap_xip.c

2007-08-28 Thread Nick Piggin
Christoph Hellwig wrote: On Tue, Aug 28, 2007 at 09:49:38PM +0200, J??rn Engel wrote: On Tue, 28 August 2007 12:05:58 -0700, [EMAIL PROTECTED] wrote: - index = *ppos PAGE_CACHE_SHIFT; - offset = *ppos ~PAGE_CACHE_MASK; + index = page_cache_index(mapping, *ppos); +

Re: [patch 01/28] Fall back on interrupt disable in cmpxchg8b on 80386 and 80486

2007-08-28 Thread Nick Piggin
Mathieu Desnoyers wrote: * Nick Piggin ([EMAIL PROTECTED]) wrote: Mathieu Desnoyers wrote: Q: What's the reason to have cmpxchg64_local on 32 bit architectures? Without that need all this would just be a few simple defines. A: cmpxchg64_local on 32 bits architectures takes unsigned long

Re: [RFC] : mm : / Patch / code : Suggestion :snip kswapd get_page_from_freelist() : No more no page failures. (WHY????)

2007-08-28 Thread Nick Piggin
Mitchell Erblich wrote: Nick Piggin wrote: Nick Piggin, et al, First diffs would generate alot of noise, since I rip and insert alot of code based on whether I think the code is REALLY needed for MY TEST environment. These suggestions are basicly minimal merge

Re: [linux-pm] [RFC][PATCH 0/2 -mm] kexec based hibernation

2007-08-28 Thread Nick Piggin
Huang, Ying wrote: On Mon, 2007-08-27 at 09:28 +0800, Hu, Fenghua wrote: One quick question is, can it improve hiberation/wakeup time? In general, for kexec based hibernation, what increases hibernation/wakeup time: - One extra Linux boot is needed to hibernate and wakeup. What decreases

Re: [Tech-board-discuss] Re: [Ksummit-2007-discuss] Re: Linux Foundation Technical Advisory Board Elections

2007-08-28 Thread Nick Piggin
Daniel Phillips wrote: On Friday 24 August 2007 03:45, Theodore Tso wrote: As I said; what's wrong with just using SPI membership? It's not like it is remotely hard for kernel hackers to gain membership in SPI. And somebody else takes care of the bureaucracy for you. Given the huge overlap

Re: [patch] fix the max path calculation in radix-tree.c

2007-08-29 Thread Nick Piggin
On Wed, Aug 29, 2007 at 05:39:18PM -0400, Jeff Moyer wrote: Nick Piggin [EMAIL PROTECTED] writes: On Tue, Aug 21, 2007 at 03:48:42PM -0400, Jeff Moyer wrote: Hi, A while back, Nick Piggin introduced a patch to reduce the node memory usage for small files (commit

Re: uncached page allocator

2007-08-30 Thread Nick Piggin
Peter Zijlstra wrote: On Tue, 2007-08-21 at 16:05 +1000, Dave Airlie wrote: So you can see why some sort of uncached+writecombined page cache would be useful, I could just allocate a bunch of pages at startup as uncached+writecombined, and allocate pixmaps from them and when I bind/free the

Re: [patch 1/2] spinlock: lockbreak cleanup

2007-08-13 Thread Nick Piggin
On Sat, Aug 11, 2007 at 02:07:43AM +0200, Andi Kleen wrote: Nick, These two patches make my P4 (single socket HT) test box not boot. I dropped them for now. Some oopses Sorry, the trylock had a race where it would not work correctly :( Have fixed it now and will do more testing and

Re: [patch][rfc] 2.6.23-rc1 mm: NUMA replicated pagecache

2007-08-13 Thread Nick Piggin
On Fri, Aug 10, 2007 at 05:08:18PM -0400, Lee Schermerhorn wrote: On Wed, 2007-08-08 at 16:25 -0400, Lee Schermerhorn wrote: On Fri, 2007-07-27 at 10:42 +0200, Nick Piggin wrote: Hi, Just got a bit of time to take another look at the replicated pagecache patch. The nopage vs

Re: [patch][rfc] 2.6.23-rc1 mm: NUMA replicated pagecache

2007-08-13 Thread Nick Piggin
On Mon, Aug 13, 2007 at 10:05:01AM -0400, Lee Schermerhorn wrote: On Mon, 2007-08-13 at 09:43 +0200, Nick Piggin wrote: Replication may be putting more stress on some locks. It will cause more tlb flushing that can not be batched well, which could cause the call_lock to get hotter

Re: lmbench ctxsw regression with CFS

2007-08-13 Thread Nick Piggin
On Mon, Aug 13, 2007 at 08:00:38PM -0700, Andrew Morton wrote: On Mon, 13 Aug 2007 14:30:31 +0200 Jens Axboe [EMAIL PROTECTED] wrote: On Mon, Aug 06 2007, Nick Piggin wrote: What CPU did you get these numbers on? Do the indirect calls hurt much on those without an indirect

Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-13 Thread Nick Piggin
Paul E. McKenney wrote: On Mon, Aug 13, 2007 at 01:15:52PM +0800, Herbert Xu wrote: Paul E. McKenney [EMAIL PROTECTED] wrote: On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote: Chris Snook [EMAIL PROTECTED] wrote: cpu_relax() contains a barrier, so it should do the right thing.

Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-13 Thread Nick Piggin
Chris Snook wrote: David Howells wrote: Chris Snook [EMAIL PROTECTED] wrote: cpu_relax() contains a barrier, so it should do the right thing. For non-smp architectures, I'm concerned about interacting with interrupt handlers. Some drivers do use atomic_* operations. I'm not sure that

Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

2007-08-15 Thread Nick Piggin
On Tue, Aug 14, 2007 at 07:21:03AM -0700, Christoph Lameter wrote: The following patchset implements recursive reclaim. Recursive reclaim is necessary if we run out of memory in the writeout patch from reclaim. This is f.e. important for stacked filesystems or anything that does complicated

Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Nick Piggin
Paul E. McKenney wrote: On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote: Maybe it is the safe way to go, but it does obscure cases where there is a real need for barriers. I prefer burying barriers into other primitives. When they should naturally be there, eg. locking

Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Nick Piggin
Segher Boessenkool wrote: Please check the definition of cache coherence. Which of the twelve thousand such definitions? :-) Every definition I have seen says that writes to a single memory location have a serial order as seen by all CPUs, and that a read will return the most recent write

Re: [PATCH 6/24] make atomic_read() behave consistently on frv

2007-08-15 Thread Nick Piggin
Paul E. McKenney wrote: On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote: Especially since several big architectures don't have volatile in their atomic_get and _set, I think it would be a step backwards to add them in as a just in case thin now (unless there is a better reason

Re: [patch 1/2] i386: use asm() like the other atomic operations already do.

2007-08-15 Thread Nick Piggin
Herbert Xu wrote: On Wed, Aug 15, 2007 at 01:02:23PM -0400, Chris Snook wrote: Herbert Xu wrote: I'm still unconvinced why we need this because nobody has brought up any examples of kernel code that legitimately need this. There's plenty of kernel code that *wants* this though. If we can

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Nick Piggin
Segher Boessenkool wrote: Part of the motivation here is to fix heisenbugs. If I knew where they By the same token we should probably disable optimisations altogether since that too can create heisenbugs. Almost everything is a tradeoff; and so is this. I don't believe most people would

Re: [RFC 0/9] Reclaim during GFP_ATOMIC allocs

2007-08-15 Thread Nick Piggin
On Tue, Aug 14, 2007 at 08:30:21AM -0700, Christoph Lameter wrote: This is the extended version of the reclaim patchset. It enables reclaim from clean file backed pages during GFP_ATOMIC allocs. A bit invasive since may locks must now be taken with saving flags. But it works. Tested by

Re: [PATCH] Fix VM_FAULT flags conversion for hugetlb

2007-08-15 Thread Nick Piggin
On Tue, Aug 14, 2007 at 02:41:21PM -0500, Adam Litke wrote: It seems a simple mistake was made when converting follow_hugetlb_page() over to the VM_FAULT flags bitmask stuff: (commit 83c54070ee1a2d05c89793884bea1a03f2851ed4). By using the wrong bitmask, hugetlb_fault() failures are

Re: [PATCH] calculation of pgoff in do_linear_fault() uses mixed units

2007-08-15 Thread Nick Piggin
as PAGE_SHIFT, but should that ever change this calculation would break. Signed-off-by: Dean Nelson [EMAIL PROTECTED] Acked-by: Nick Piggin [EMAIL PROTECTED] Index: linux-2.6/mm/memory.c === --- linux-2.6.orig/mm/memory.c2007-08

Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

2007-08-15 Thread Nick Piggin
On Wed, Aug 15, 2007 at 03:12:06PM +0200, Peter Zijlstra wrote: On Wed, 2007-08-15 at 14:22 +0200, Nick Piggin wrote: On Tue, Aug 14, 2007 at 07:21:03AM -0700, Christoph Lameter wrote: The following patchset implements recursive reclaim. Recursive reclaim is necessary if we run out

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Nick Piggin
Segher Boessenkool wrote: Part of the motivation here is to fix heisenbugs. If I knew where they By the same token we should probably disable optimisations altogether since that too can create heisenbugs. Almost everything is a tradeoff; and so is this. I don't believe most people would

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Nick Piggin
Chris Snook wrote: Herbert Xu wrote: On Thu, Aug 16, 2007 at 03:48:54PM -0400, Chris Snook wrote: Can you find an actual atomic_read code snippet there that is broken without the volatile modifier? A whole bunch of atomic_read uses will be broken without the volatile modifier once we

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Nick Piggin
Paul E. McKenney wrote: On Thu, Aug 16, 2007 at 06:42:50PM +0800, Herbert Xu wrote: In fact, volatile doesn't guarantee that the memory gets read anyway. You might be reading some stale value out of the cache. Granted this doesn't happen on x86 but when you're coding for the kernel you

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Nick Piggin
Paul Mackerras wrote: Nick Piggin writes: So i386 and x86-64 don't have volatiles there, and it saves them a few K of kernel text. What you need to justify is why it is a good I'm really surprised it's as much as a few K. I tried it on powerpc and it only saved 40 bytes (10 instructions

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-16 Thread Nick Piggin
Paul Mackerras wrote: Nick Piggin writes: Why are people making these undocumented and just plain false assumptions about atomic_t? Well, it has only been false since December 2006. Prior to that atomics *were* volatile on all platforms. Hmm, although I don't think it has ever been

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: #define atomic_read_volatile(v) \ ({ \ forget((v)-counter);\ ((v)-counter); \ }) where: *vomit* :)

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Stefan Richter wrote: Nick Piggin wrote: I don't know why people would assume volatile of atomics. AFAIK, most of the documentation is pretty clear that all the atomic stuff can be reordered etc. except for those that modify and return a value. Which documentation is there? Documentation

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Herbert Xu wrote: On Fri, Aug 17, 2007 at 01:43:27PM +1000, Paul Mackerras wrote: BTW, the sort of missing barriers that triggered this thread aren't that subtle. It'll result in a simple lock-up if the loop condition holds upon entry. At which

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Sure, now that I learned of these properties I can start to audit code and insert barriers where I believe they are needed, but this simply means that almost all occurrences of atomic_read will get barriers (unless there already

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Also, why would you want to make these insane accessors for atomic_t types? Just make sure everybody knows the basics of barriers, and they can apply that knowledge to atomic_t and all other lockless memory accesses as well

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: [...] Granted, the above IS buggy code. But, the stated objective is to avoid heisenbugs. ^^ Anyway, why are you making up code snippets that are buggy in other ways in order to support

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: It is very obvious. msleep calls schedule() (ie. sleeps), which is always a barrier. Probably you didn't mean that, but no, schedule() is not barrier because it sleeps. It's a barrier because it's

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: I think they would both be equally ugly, You think both these are equivalent in terms of looks: | while (!atomic_read(v)) { | while (!atomic_read_xxx(v

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Because they should be thinking about them in terms of barriers, over which the compiler / CPU is not to reorder accesses or cache memory operations, rather than special volatile accesses. This is obviously just a taste thing

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-17 Thread Nick Piggin
Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: On Fri, 17 Aug 2007, Nick Piggin wrote: Satyam Sharma wrote: It is very obvious. msleep calls schedule() (ie. sleeps), which is always a barrier. Probably you didn't mean that, but no, schedule

Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

2007-08-20 Thread Nick Piggin
On Mon, Aug 20, 2007 at 05:51:34AM +0200, Peter Zijlstra wrote: On Thu, 2007-08-16 at 05:29 +0200, Nick Piggin wrote: Well perhaps it doesn't work for networked swap, because dirty accounting doesn't work the same way with anonymous memory... but for _filesystems_, right? I mean

Re: [RFC 0/3] Recursive reclaim (on __PF_MEMALLOC)

2007-08-20 Thread Nick Piggin
On Mon, Aug 20, 2007 at 12:15:01PM -0700, Christoph Lameter wrote: On Mon, 20 Aug 2007, Peter Zijlstra wrote: What Christoph is proposing is doing recursive reclaim and not initiating writeout. This will only work _IFF_ there are clean pages about. Which in the general case need

Re: [RFC 2/9] Use NOMEMALLOC reclaim to allow reclaim if PF_MEMALLOC is set

2007-08-20 Thread Nick Piggin
On Mon, Aug 20, 2007 at 11:14:08PM +0200, Peter Zijlstra wrote: On Mon, 2007-08-20 at 13:27 -0700, Christoph Lameter wrote: On Mon, 20 Aug 2007, Peter Zijlstra wrote: Plus the same issue can happen today. Writes are usually not completed during reclaim. If the writes are

Re: [patch] fix the max path calculation in radix-tree.c

2007-08-21 Thread Nick Piggin
On Tue, Aug 21, 2007 at 03:48:42PM -0400, Jeff Moyer wrote: Hi, A while back, Nick Piggin introduced a patch to reduce the node memory usage for small files (commit cfd9b7df4abd3257c9e381b0e445817b26a51c0c): -#define RADIX_TREE_MAP_SHIFT 6 +#define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL

Re: [rfc] balance-on-fork NUMA placement

2007-08-05 Thread Nick Piggin
On Fri, Aug 03, 2007 at 01:10:13PM -0700, Suresh B wrote: On Fri, Aug 03, 2007 at 02:20:10AM +0200, Nick Piggin wrote: On Thu, Aug 02, 2007 at 11:33:39AM -0700, Martin Bligh wrote: Nick Piggin wrote: On Wed, Aug 01, 2007 at 03:52:11PM -0700, Martin Bligh wrote: And so forth. Initial

Re: [ck] Re: -mm merge plans for 2.6.23

2007-08-05 Thread Nick Piggin
Matthew Hawkins wrote: On 7/25/07, Nick Piggin [EMAIL PROTECTED] wrote: I guess /proc/meminfo, /proc/zoneinfo, /proc/vmstat, /proc/slabinfo before and after the updatedb run with the latest kernel would be a first step. top and vmstat output during the run wouldn't hurt either. Hi Nick

Re: RFT: updatedb morning after problem [was: Re: -mm merge plans for 2.6.23]

2007-08-05 Thread Nick Piggin
. It has not. Concerns that were raised (by specifically Nick Piggin) weren't being addressed. I may have missed them, but what I saw from him weren't specific issues, but instead a nebulous 'something better may come along later' Something better, ie. the problems with page reclaim being

Re: lmbench ctxsw regression with CFS

2007-08-05 Thread Nick Piggin
On Sat, Aug 04, 2007 at 08:50:37AM +0200, Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: Oh good. Thanks for getting to the bottom of it. We have normally disliked too much runtime tunables in the scheduler, so I assume these are mostly going away or under a CONFIG option

Re: RFT: updatedb morning after problem [was: Re: -mm merge plans for 2.6.23]

2007-08-06 Thread Nick Piggin
--- [EMAIL PROTECTED] wrote: On Mon, 6 Aug 2007, Nick Piggin wrote: [EMAIL PROTECTED] wrote: On Sun, 29 Jul 2007, Rene Herman wrote: On 07/29/2007 01:41 PM, [EMAIL PROTECTED] wrote: I agree that tinkering with the core VM code should not be done lightly

Re: [patch] radix-tree: use indirect bit

2007-08-06 Thread Nick Piggin
On Mon, Aug 06, 2007 at 11:40:55AM -0700, Andrew Morton wrote: On Thu, 2 Aug 2007 07:24:46 +0200 Nick Piggin [EMAIL PROTECTED] wrote: Rather than sign direct radix-tree pointers with a special bit, sign the indirect one that hangs off the root. This means that, given a lookup_slot

[patch 1/2] spinlock: lockbreak cleanup

2007-08-07 Thread Nick Piggin
do they even get bloated up with that break_lock then?). Signed-off-by: Nick Piggin [EMAIL PROTECTED] --- Index: linux-2.6/include/linux/sched.h === --- linux-2.6.orig/include/linux/sched.h +++ linux-2.6/include/linux/sched.h

[patch 2/2] x86_64: ticket lock spinlock

2007-08-07 Thread Nick Piggin
sections short, and ensure locks are reasonably fair (which this patch does). Signed-off-by: Nick Piggin [EMAIL PROTECTED] --- Index: linux-2.6/include/asm-x86_64/spinlock.h === --- linux-2.6.orig/include/asm-x86_64/spinlock.h +++ linux-2.6

Re: [patch 2/2] x86_64: ticket lock spinlock

2007-08-08 Thread Nick Piggin
On Wed, Aug 08, 2007 at 01:31:58PM -0400, [EMAIL PROTECTED] wrote: On Wed, 08 Aug 2007 06:24:44 +0200, Nick Piggin said: After this, we can no longer spin on any locks with preempt enabled, and cannot reenable interrupts when spinning on an irq safe lock, because at that point we have

Re: [patch 2/2] x86_64: ticket lock spinlock

2007-08-08 Thread Nick Piggin
On Wed, Aug 08, 2007 at 12:26:55PM +0200, Andi Kleen wrote: * * (the type definitions are in asm/spinlock_types.h) */ +#if (NR_CPUS 256) +#error spinlock supports a maximum of 256 CPUs +#endif + static inline int __raw_spin_is_locked(raw_spinlock_t *lock) { -

Re: 2.6.23-rc2-mm1: kernel BUG at mm/swap_state.c:78!

2007-08-09 Thread Nick Piggin
On Thu, Aug 09, 2007 at 04:37:35PM +0100, Hugh Dickins wrote: On Thu, 9 Aug 2007, Mariusz Kozlowski wrote: Hello, Nothing unusual happening, allmodconfig compiling etc. Not sure why it says kernel was tainted though ... hmmm. [ cut here ] kernel BUG at

Re: [13/18] x86_64: Allow fallback for the stack

2007-10-09 Thread Nick Piggin
On Wednesday 10 October 2007 04:39, Christoph Lameter wrote: On Mon, 8 Oct 2007, Nick Piggin wrote: The tight memory restrictions on stack usage do not come about because of the difficulty in increasing the stack size :) It is because we want to keep stack sizes small! Increasing

Re: remove zero_page (was Re: -mm merge plans for 2.6.24)

2007-10-09 Thread Nick Piggin
On Wednesday 10 October 2007 00:52, Linus Torvalds wrote: On Tue, 9 Oct 2007, Nick Piggin wrote: I have done some tests which indicate a couple of very basic common tools don't do much zero-page activity (ie. kbuild). And also combined with some logical arguments to say that a sane app

Re: [13/18] x86_64: Allow fallback for the stack

2007-10-09 Thread Nick Piggin
On Wednesday 10 October 2007 11:26, Christoph Lameter wrote: On Tue, 9 Oct 2007, Nick Piggin wrote: We already use 32k stacks on IA64. So the memory argument fail there. I'm talking about generic code. The stack size is set in arch code not in generic code. Generic code must assume a 4K

Re: remove zero_page (was Re: -mm merge plans for 2.6.24)

2007-10-09 Thread Nick Piggin
On Wednesday 10 October 2007 12:22, Linus Torvalds wrote: On Tue, 9 Oct 2007, Nick Piggin wrote: Where do you suggest I go from here? Is there any way I can convince you to try it? Make it a config option? (just kidding) No, I'll take the damn patch, but quite frankly, I think your

Re: howto boost write(2) performance?

2007-10-09 Thread Nick Piggin
On Tuesday 09 October 2007 23:50, Michael Stiller wrote: Hi list, i'm developing an application (in C) which needs to write about 1Gbit/s (125Mb/s) to a disk array attached via U320 SCSI. It runs on Dual Core 2 Xeons @2Ghz utilizing kernel 2.6.22.7. I buffer the data in (currently 4) 400Mb

Re: remove zero_page (was Re: -mm merge plans for 2.6.24)

2007-10-10 Thread Nick Piggin
On Wednesday 10 October 2007 15:20, Linus Torvalds wrote: On Wed, 10 Oct 2007, Hugh Dickins wrote: On Tue, 9 Oct 2007, Nick Piggin wrote: by it ;) To prove my point: the *first* approach I posted to fix this problem was exactly a patch to special-case the zero_page refcounting which

Re: 2.6.23 spinlock hang in kswapd under heavy disk write loads

2007-10-10 Thread Nick Piggin
On Thursday 11 October 2007 01:33, Berkley Shands wrote: 2.6.23 with CONFIG_DEBUG_SPINLOCK on does not hang under very high write loads to either an LSIELP (write rate 1.1GB/Sec) or to a highpoint RR2340 (write rate 1.0GB/Sec). With CONFIG_DEBUG_SPINLOCK off however, the system hangs with

Re: 2.6.23 spinlock hang in kswapd under heavy disk write loads

2007-10-10 Thread Nick Piggin
On Friday 12 October 2007 10:56, Berkley Shands wrote: 100% reproducible on the two motherboards in question. Does not happen on any other motherboard I have in my possession (not tyan, not uniwide, not socket 940...) No errors, no dmesg, nothing with debug_spinlock set. sysrq shows lots

Re: Linux 2.6.23

2007-10-11 Thread Nick Piggin
On Wednesday 10 October 2007 20:14, Ingo Molnar wrote: * Nicholas Miell [EMAIL PROTECTED] wrote: Does CFS still generate the following sysbench graphs with 2.6.23, or did that get fixed? http://people.freebsd.org/~kris/scaling/linux-pgsql.png

Re: 2.6.23 spinlock hang in kswapd under heavy disk write loads

2007-10-11 Thread Nick Piggin
On Friday 12 October 2007 02:23, Mr. Berkley Shands wrote: With DEBUG_SLAB on, I can run only a very short time under 2.6.23 before a kernel panic. [ 626.028180] eth0: too many iterations (6) in nv_nic_irq. [ 626.167583] eth0: too many iterations (6) in nv_nic_irq. [ 626.206729] eth0: too

Re: Linux 2.6.23

2007-10-12 Thread Nick Piggin
On Friday 12 October 2007 15:46, Ingo Molnar wrote: * Nick Piggin [EMAIL PROTECTED] wrote: ;) I think you snipped the important bit: the peak is terrible but it has virtually no dropoff and performs better under load than the default 2.6.21 scheduler. (verbatim) hm, i understood

Re: [PATCH] mm: avoid dirtying shared mappings on mlock

2007-10-12 Thread Nick Piggin
On Friday 12 October 2007 02:57, Nick Piggin wrote: On Friday 12 October 2007 19:03, Peter Zijlstra wrote: Subject: mm: avoid dirtying shared mappings on mlock Suleiman noticed that shared mappings get dirtied when mlocked. Avoid this by teaching make_pages_present about this case

Re: [PATCH 00/31] Remove iget() and read_inode() [try #4]

2007-10-12 Thread Nick Piggin
On Friday 12 October 2007 19:07, David Howells wrote: Hi Linus, Here's a set of patches that remove all calls to iget() and all read_inode() functions. They should be removed for two reasons: firstly they don't lend themselves to good error handling, and secondly their presence is a

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Nick Piggin
On Fri, Oct 12, 2007 at 10:25:34AM +0200, Jarek Poplawski wrote: On 04-10-2007 07:23, Nick Piggin wrote: According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture

Re: [PATCH] mm: avoid dirtying shared mappings on mlock

2007-10-12 Thread Nick Piggin
On Friday 12 October 2007 19:03, Peter Zijlstra wrote: Subject: mm: avoid dirtying shared mappings on mlock Suleiman noticed that shared mappings get dirtied when mlocked. Avoid this by teaching make_pages_present about this case. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] Acked-by:

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Nick Piggin
On Fri, Oct 12, 2007 at 11:12:13AM +0200, Jarek Poplawski wrote: On Fri, Oct 12, 2007 at 10:42:34AM +0200, Helge Hafting wrote: Jarek Poplawski wrote: On 04-10-2007 07:23, Nick Piggin wrote: According to latest memory ordering specification documents from Intel and AMD, both

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-12 Thread Nick Piggin
On Fri, Oct 12, 2007 at 11:55:05AM +0200, Jarek Poplawski wrote: On Fri, Oct 12, 2007 at 10:57:33AM +0200, Nick Piggin wrote: I don't know quite what you're saying... the CPUs could probably get performance by having weakly ordered loads, OTOH I think the Intel ones might already do

Re: [PATCH] mm: avoid dirtying shared mappings on mlock

2007-10-12 Thread Nick Piggin
On Friday 12 October 2007 20:37, Peter Zijlstra wrote: On Fri, 2007-10-12 at 02:57 +1000, Nick Piggin wrote: On Friday 12 October 2007 19:03, Peter Zijlstra wrote: Subject: mm: avoid dirtying shared mappings on mlock Suleiman noticed that shared mappings get dirtied when mlocked

Re: [PATCH] mm: avoid dirtying shared mappings on mlock

2007-10-12 Thread Nick Piggin
On Friday 12 October 2007 20:50, Peter Zijlstra wrote: On Fri, 2007-10-12 at 04:14 +1000, Nick Piggin wrote: On Friday 12 October 2007 20:37, Peter Zijlstra wrote: The pages will still be read-only due to dirty tracking, so the first write will still do page_mkwrite(). Which can

[patch 1/2] hdaps: fix locking

2007-10-14 Thread Nick Piggin
produces warnings, but I don't actually know if it does the right thing (because I don't really know what the driver does or how to test it anyway!). --- hdaps was using incorrect mutex_trylock return code. Signed-off-by: Nick Piggin [EMAIL PROTECTED] --- Index: linux-2.6/drivers/hwmon/hdaps.c

Re: [patch 1/2] hdaps: fix locking

2007-10-14 Thread Nick Piggin
On Sun, Oct 14, 2007 at 09:25:23AM +0200, Nick Piggin wrote: Here are a couple of fixes for the hdaps driver. I have kind of been blocking out the bug traces caused by these (the 2nd patch, actually) thinking that it's one of those transient / churn things... but it's getting annoying now

Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Nick Piggin
On Monday 15 October 2007 09:12, Jeremy Fitzhardinge wrote: David Chinner wrote: You mean xfs_buf.c. Yes, sorry. And yes, we delay unmapping pages until we have a batch of them to unmap. vmap and vunmap do not scale, so this is batching helps alleviate some of the worst of the

Re: Interaction between Xen and XFS: stray RW mappings

2007-10-14 Thread Nick Piggin
On Monday 15 October 2007 10:57, Jeremy Fitzhardinge wrote: Nick Piggin wrote: Yes, as Dave said, vmap (more specifically: vunmap) is very expensive because it generally has to invalidate TLBs on all CPUs. I see. I'm looking at some more general solutions to this (already have some

Re: [RFC] vivi, videobuf_to_vmalloc() and related breakage

2007-10-14 Thread Nick Piggin
On Monday 15 October 2007 12:01, Al Viro wrote: AFAICS, videobuf-vmalloc use of mem-vma and mem-vmalloc is bogus. You obtain the latter with vmalloc_user(); so far, so good. Then you have retval=remap_vmalloc_range(vma, mem-vmalloc,0); where vma is given to you by mmap();

Re: ARCH_FREE_PTE_NR 5350 on x86_64

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 16:54, Alok kataria wrote: Hi, Looking at the tlb_flush code path and its co-relation with ARCH_FREE_PTE_NR, on x86-64 architecture. I think we still don't use the ARCH_FREE_PTE_NR of 5350 as the caching value for the mmu_gathers structure, instead fallback to using

OOM killer gripe (was Re: What still uses the block layer?)

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 18:04, Rob Landley wrote: On Sunday 14 October 2007 8:45:03 pm Theodore Tso wrote: excuse for conflating different categories of devices in the first place. See the thinkpad Ultrabay drive example above. Last week I drove my laptop so deep into swap (with a

Re: [rfc][patch 3/3] x86: optimise barriers

2007-10-15 Thread Nick Piggin
On Mon, Oct 15, 2007 at 09:44:05AM +0200, Jarek Poplawski wrote: On Fri, Oct 12, 2007 at 08:13:52AM -0700, Linus Torvalds wrote: On Fri, 12 Oct 2007, Jarek Poplawski wrote: ... So no, there's no way a software person could have afforded to say it seems to work on my setup even

Re: [PATCH resend] ramdisk: fix zeroed ramdisk pages on memory pressure

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 18:28, Christian Borntraeger wrote: Andrew, this is a resend of a bugfix patch. Ramdisk seems a bit unmaintained, so decided to sent the patch to you :-). I have CCed Ted, who did work on the code in the 90s. I found no current email address of Chad Page. This really

Re: [PATCH resend] ramdisk: fix zeroed ramdisk pages on memory pressure

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 19:05, Christian Borntraeger wrote: Am Montag, 15. Oktober 2007 schrieb Nick Piggin: On Monday 15 October 2007 18:28, Christian Borntraeger wrote: Andrew, this is a resend of a bugfix patch. Ramdisk seems a bit unmaintained, so decided to sent the patch to you

Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 19:36, Andi Kleen wrote: David Chinner [EMAIL PROTECTED] writes: And yes, we delay unmapping pages until we have a batch of them to unmap. vmap and vunmap do not scale, so this is batching helps alleviate some of the worst of the problems. You're keeping vmaps

Re: OOM killer gripe (was Re: What still uses the block layer?)

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 19:52, Rob Landley wrote: On Monday 15 October 2007 8:37:44 am Nick Piggin wrote: Virtual memory isn't perfect. I've _always_ been able to come up with examples where it just doesn't work for me. This doesn't mean VM overcommit should be abolished, because

Re: [PATCH resend] ramdisk: fix zeroed ramdisk pages on memory pressure

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 19:16, Andrew Morton wrote: On Tue, 16 Oct 2007 00:06:19 +1000 Nick Piggin [EMAIL PROTECTED] wrote: On Monday 15 October 2007 18:28, Christian Borntraeger wrote: Andrew, this is a resend of a bugfix patch. Ramdisk seems a bit unmaintained, so decided to sent

Re: Interaction between Xen and XFS: stray RW mappings

2007-10-15 Thread Nick Piggin
On Monday 15 October 2007 21:07, Andi Kleen wrote: On Tue, Oct 16, 2007 at 12:56:46AM +1000, Nick Piggin wrote: Is this true even if you don't write through those old mappings? I think it happened for reads too. It is a little counter intuitive because in theory the CPU doesn't need

Re: nfs mmap adventure (was: 2.6.23-mm1)

2007-10-15 Thread Nick Piggin
On Tuesday 16 October 2007 00:06, David Howells wrote: Peter Zijlstra [EMAIL PROTECTED] wrote: I get funny SIGBUS' like so: fault if (-page_mkwrite() 0) nfs_vm_page_mkwrite() nfs_write_begin() nfs_flush_incompatible() nfs_wb_page()

Re: [git pull] scheduler updates for v2.6.24

2007-10-15 Thread Nick Piggin
On Tuesday 16 October 2007 00:17, Ingo Molnar wrote: Linus, please pull the latest scheduler git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched.git It contains lots of scheduler updates from lots of people - hopefully the last big one for quite some time.

<    4   5   6   7   8   9   10   11   12   13   >