Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-21 Thread Nick Piggin
Hugh Dickins wrote: On Mon, 21 Mar 2005, David S. Miller wrote: On Tue, 22 Mar 2005 15:14:54 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: Question, Dave: flush_tlb_pgtables after Hugh's patch is also possibly not being called with enough range to cover all page tables that have been f

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Nick Piggin
Andrew Morton wrote: With these six patches the ppc64 is hitting the BUG in exit_mmap(): BUG_ON(mm->nr_ptes);/* This is just debugging */ fairly early in boot. No doubt Hugh will have this fixed before long... but if you have time to spare, you may just try hitting it on the head and ma

Re: help needed pls. scheduler(kernel 2.6) + hyperthreaded related questions?

2005-03-22 Thread Nick Piggin
Arun Srinivas wrote: Pls. help me. I went through the sched.c for kernel 2.6 and saw that it supports hyperthreading.I would be glad if someone could answer this question(if am not wrong a HT processor has 2 architectural states and one execution unit...i.e., two pipeline streams) 1)when the

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Nick Piggin
Hugh Dickins wrote: On Tue, 22 Mar 2005, David S. Miller wrote: On Tue, 22 Mar 2005 19:36:46 + (GMT) Hugh Dickins <[EMAIL PROTECTED]> wrote: I notice that although both i386 and sparc64 use pgtable-nopud.h, the i386 pud_clear does nothing at all and the sparc64 pud_clear resets to 0. This was

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Nick Piggin
David S. Miller wrote: On Wed, 23 Mar 2005 10:32:10 +1100 Nick Piggin <[EMAIL PROTECTED]> wrote: I think David's on the right track - I think there's something a bit wrong at the top. In my reply to Andrew in this thread I posted a patch which may at least get things working... W

Re: [PATCH 1/5] freepgt: free_pgtables use vma list

2005-03-22 Thread Nick Piggin
David S. Miller wrote: On Tue, 22 Mar 2005 17:10:13 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: Hugh Dickins <[EMAIL PROTECTED]> wrote: On Tue, 22 Mar 2005, Luck, Tony wrote: > > But I'm still confused by all the math on addr/end at each > level. You think the rest of us are not ;-? umm, give

Re: help needed pls. scheduler(kernel 2.6) + hyperthreaded related questions?

2005-03-22 Thread Nick Piggin
Arun Srinivas wrote: If the SMT (apart from SMP) support is enabled in the .config file, does the kernel recogonize the 2 logical processor as 2 logical or 2 physical processors? You shouldn't be able to select SMT if SMP is not enabled. If SMT and SMP is selected, then the scheduler will recog

[PATCH] fix wait_task_inactive race (was Re: Race condition in ptrace)

2005-02-05 Thread Nick Piggin
Nick Piggin wrote: Something like the following (untested) extension of Bodo's work could be the minimal fix for 2.6.11. As I've said though, I'd consider it a hack and prefer to do something about the locking. That could be done after 2.6.11 though. Depends how you feel. I think th

Re: Race condition in ptrace

2005-02-03 Thread Nick Piggin
Bodo Stroesser wrote: Working with the new UML skas0 mode on my Xeon HT host, sporadically I saw some processes on UML segfaulting. In all cases, I could track this down to be caused by a gs segment register, that had the wrong contents. This again is caused by a problem in the host linux: A ptra

Re: [PATCH] fix wait_task_inactive race (was Re: Race condition in ptrace)

2005-02-05 Thread Nick Piggin
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: When a task is put to sleep, it is dequeued from the runqueue while it is still running. The problem is that the runqueue lock can be dropped and retaken in schedule() before the task actually schedules off, and wait_task_inacti

Re: [PATCH] fix wait_task_inactive race (was Re: Race condition in ptrace)

2005-02-05 Thread Nick Piggin
Nick Piggin wrote: Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: When a task is put to sleep, it is dequeued from the runqueue while it is still running. The problem is that the runqueue lock can be dropped and retaken in schedule() before the task actually schedules of

Re: Race condition in ptrace

2005-02-04 Thread Nick Piggin
Andrew Morton wrote: Nick Piggin <[EMAIL PROTECTED]> wrote: Bodo Stroesser wrote: Nick Piggin wrote: Bodo Stroesser wrote: I don't see how this could help because AFAIKS, child->saving is only set and cleared while the runqueue is locked. And the same runqueue lock is taken by wait

Re: Race condition in ptrace

2005-02-04 Thread Nick Piggin
Bodo Stroesser wrote: Nick Piggin wrote: Bodo Stroesser wrote: I don't see how this could help because AFAIKS, child->saving is only set and cleared while the runqueue is locked. And the same runqueue lock is taken by wait_task_inactive. Sorry, that not right. There are some routines c

Re: Race condition in ptrace

2005-02-04 Thread Nick Piggin
Nick Piggin wrote: Andrew Morton wrote: Nick Piggin <[EMAIL PROTECTED]> wrote: Andrew, IMO this is another bug to hold 2.6.11 for. Sure. I wouldn't consider Bodo's patch to be the one to use though.. No. Something similar could be done that works on all architectures and all wa

Re: page fault scalability patch V16 [3/4]: Drop page_table_lock in handle_mm_fault

2005-02-03 Thread Nick Piggin
On Wed, 2005-02-02 at 14:09 +1100, Nick Piggin wrote: > On Tue, 2005-02-01 at 18:49 -0800, Christoph Lameter wrote: > > On Wed, 2 Feb 2005, Nick Piggin wrote: > > I mean we could just speculatively copy, risk copying crap and > > discard that later when we find that the

Re: page fault scalability patch V16 [3/4]: Drop page_table_lock in handle_mm_fault

2005-02-01 Thread Nick Piggin
On Tue, 2005-02-01 at 11:01 -0800, Christoph Lameter wrote: > On Tue, 1 Feb 2005, Nick Piggin wrote: > > A per-pte lock is sufficient for this case, of course, which is why the > > pte-locked system is completely free of the page table lock. > > Introducing pte locking

Re: page fault scalability patch V16 [3/4]: Drop page_table_lock in handle_mm_fault

2005-02-01 Thread Nick Piggin
On Tue, 2005-02-01 at 17:20 -0800, Christoph Lameter wrote: > On Wed, 2 Feb 2005, Nick Piggin wrote: > > > > The unmapping in rmap.c would change the pte. This would be discovered > > > after acquiring the spinlock later in do_wp_page. Which would then lead to > > &

Re: page fault scalability patch V16 [3/4]: Drop page_table_lock in handle_mm_fault

2005-01-31 Thread Nick Piggin
Christoph Lameter wrote: Slightly OT: are you still planning to move the update_mem_hiwater and friends crud out of these fastpaths? It looks like at least that function is unsafe to be lockless. @@ -1316,21 +1318,27 @@ static int do_wp_page(struct mm_struct * flush_cache_pa

Re: page fault scalability patch V16 [3/4]: Drop page_table_lock in handle_mm_fault

2005-01-31 Thread Nick Piggin
Christoph Lameter wrote: The page fault handler attempts to use the page_table_lock only for short time periods. It repeatedly drops and reacquires the lock. When the lock is reacquired, checks are made if the underlying pte has changed before replacing the pte value. These locations are a good fit

Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement

2005-02-08 Thread Nick Piggin
Dinakar Guniguntala wrote: On Mon, Feb 07, 2005 at 03:59:49PM -0800, Matthew Dobson wrote: Sorry to reply a long quiet thread, but I've been trading emails with Paul Jackson on this subject recently, and I've been unable to convince either him or myself that merging CPUSETs and CKRM is as easy a

Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement

2005-02-08 Thread Nick Piggin
Martin J. Bligh wrote: What about your proposed sched domain changes? Cant sched domains be used handle the CPU groupings and the existing code in cpusets that handle memory continue as is? Weren't sched somains supposed to give the scheduler better knowledge of the CPU groupings afterall ? sched d

Re: [Lse-tech] [PATCH] cpusets - big numa cpu and memory placement

2005-02-08 Thread Nick Piggin
Matthew Dobson wrote: Nick Piggin wrote: I didn't really follow where that idea went, but I think at least a few people thought that sort of functionality wasn't nearly fancy enough! :) Well, that's about how far the idea was supposed to go. ;) I think named hierarchical sche

Re: 2.6.11-rc3-mm2

2005-02-10 Thread Nick Piggin
On Thu, 2005-02-10 at 18:09 -0800, Matt Mackall wrote: > On Thu, Feb 10, 2005 at 04:47:27PM -0800, Chris Wright wrote: > > * Matt Mackall ([EMAIL PROTECTED]) wrote: > > > What happened to the RT rlimit code from Chris? > > > > I still have it, but I had the impression Ingo didn't like it as a long

Re: 2.6.11-rc3-mm2

2005-02-10 Thread Nick Piggin
On Thu, 2005-02-10 at 22:41 -0500, Paul Davis wrote: > [ the best solution is ] > > [ my preferred solution is ... ] > > [ it would be better if ... ] > > [ this is a kludge and it should be done instead like ... ] > > did nobody read what andrew wrote and what JOQ pointed out? >

Re: 2.6.11-rc3-mm2

2005-02-10 Thread Nick Piggin
On Fri, 2005-02-11 at 17:34 +1100, Peter Williams wrote: > Nick Piggin wrote: > > I can't say much about it because I'm not putting my hand up to > > do anything. Just mentioning that rlimit would be better if not > > for the userspace side of the equation. I think

Re: Linux 2.6.8.1 CPU Scheduler Documentation

2005-02-13 Thread Nick Piggin
On Mon, 2005-02-14 at 06:28 +0100, Willy Tarreau wrote: > Hello Josh, > > On Sun, Feb 13, 2005 at 06:23:15PM -0600, Josh Aas wrote: > > Hello, > > > > I have written an introduction to the Linux 2.6.8.1 CPU scheduler > > implementation. It should help people to understand what is going on in >

Re: [PATCH] Fix possible race with 4level-fixup.h

2005-02-17 Thread Nick Piggin
Benjamin Herrenschmidt wrote: Hi ! When using 4level-fixup.h, a PMD page may end up beeing freed before the matching PGD entry is cleared due to the way the compatibility macros work. This can cause nasty races on some architectures. This patch fixes it by defining pud_clear() to be pgd_clear(). Th

Re: [PATCH] Fix possible race with 4level-fixup.h

2005-02-17 Thread Nick Piggin
Benjamin Herrenschmidt wrote: Index: linux-work/include/asm-generic/4level-fixup.h === --- linux-work.orig/include/asm-generic/4level-fixup.h 2005-01-24 17:09:49.0 +1100 +++ linux-work/include/asm-generic/4level-fixup.h

[PATCH 1/2] optimise copy page range

2005-02-17 Thread Nick Piggin
Some of you have seen this before. Just resending because I based my next patch on top of this one. Suggested by Linus: optimise a condition in the clear_p?d_range functions. Results in one less conditional branch on i386 with gcc-3.4.4 Signed-off-by: Nick Piggin <[EMAIL PROTEC

[PATCH 2/2] page table iterators

2005-02-17 Thread Nick Piggin
I am pretty surprised myself that I was able to consolidate all "page table range" functions into a single type of iterator (well, there are a couple of variations, but it's not too bad). I thought at least the functions which allocate new page tables would have to be seperate from those which don'

[PATCH 1/2] folded page table walkers

2005-02-17 Thread Nick Piggin
(Note these are all obviously just RFCs, until after 2.6.11, at which time I'll send them off to Andrew if they've proven OK). Anyway, here is the patch to fold the page table walkers nicely for 2 and 3 level implementations... now we are getting some good reasons why people should convert to the -

[PATCH 2/2] trim unused functions

2005-02-17 Thread Nick Piggin
And lastly, redo this vital optimisation that David had to remove earlier. Saves a few bytes. --- linux-2.6-npiggin/include/asm-generic/pgtable-nopmd.h |2 ++ linux-2.6-npiggin/include/asm-generic/pgtable-nopud.h |2 ++ linux-2.6-npiggin/mm/memory.c |8 +

Re: [PATCH 2/2] page table iterators

2005-02-17 Thread Nick Piggin
Linus Torvalds wrote: On Fri, 18 Feb 2005, Nick Piggin wrote: I am pretty surprised myself that I was able to consolidate all "page table range" functions into a single type of iterator (well, there are a couple of variations, but it's not too bad). Ok, this is post-2.6.11 mate

Re: [PATCH 2/2] page table iterators

2005-02-20 Thread Nick Piggin
Andi Kleen wrote: On Thu, Feb 17, 2005 at 03:30:31PM -0800, David S. Miller wrote: On Fri, 18 Feb 2005 00:03:42 +0100 Andi Kleen <[EMAIL PROTECTED]> wrote: And to be honest we only have about 6 or 7 of these walkers in the whole kernel. And 90% of them are in memory.c While doing 4level I think I

Re: [PATCH 2/2] page table iterators

2005-02-21 Thread Nick Piggin
Benjamin Herrenschmidt wrote: All of them are slightly differently implemented, some check overflow, some don't, some have redudant checking, some aren't even consistent between all 3/4 loops of a given walk routine set, and we have seen the tendency to introduce subtle bugs in one of them when the

Re: [PATCH 2/2] page table iterators

2005-02-21 Thread Nick Piggin
Nick Piggin wrote: Haven't yet pulled out a pre-4-level kernel to see how 3-level compares I guess I'll do that now. Close. Before 4level: 119.5us, after folded walkers: 132.8us I think most of this is now coming from clear_page_range, rather than the actual traversing of the page table

Re: [patch, 2.6.11-rc2] sched: RLIMIT_RT_CPU_RATIO feature

2005-01-26 Thread Nick Piggin
On Wed, 2005-01-26 at 16:27 -0600, Jack O'Quin wrote: > Ingo Molnar <[EMAIL PROTECTED]> writes: > > > - exported the current RT-average value to /proc/stat (it's the last > >field in the cpu lines) > > > e.g. the issue Con and others raised: privileged tasks. By default, the > > root user w

Re: [patch, 2.6.11-rc2] sched: RLIMIT_RT_CPU_RATIO feature

2005-01-26 Thread Nick Piggin
On Wed, 2005-01-26 at 20:31 -0600, Jack O'Quin wrote: > Nick Piggin <[EMAIL PROTECTED]> writes: > > > I'm a bit concerned about this kind of policy and breakage of > > userspace APIs going into the kernel. I mean, if an app is > > succeeds in gaining S

Re: [patch, 2.6.11-rc2] sched: RLIMIT_RT_CPU_RATIO feature

2005-01-26 Thread Nick Piggin
On Wed, 2005-01-26 at 23:15 -0600, Jack O'Quin wrote: > Nick Piggin <[EMAIL PROTECTED]> writes: > > > But the important elements are lost. The standard provides a > > deterministic scheduling order, and a deterministic scheduling > > latency > > Where

Re: possible performance issue in 4-level page tables

2005-02-01 Thread Nick Piggin
Zou Nan hai wrote: There is a performance regression of lmbench lat_proc fork result on ia64. in 2.6.10 I got Process fork+exit:164.8438 microseconds. in 2.6.11-rc2 Process fork+exit:183.8621 microseconds. I believe this regression was caused by the 4-level page tables change. Since most of

Re: 2.6.11-rc2-mm2

2005-02-01 Thread Nick Piggin
Andrew Morton wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.11-rc2/2.6.11-rc2-mm2/ Changes since 2.6.11-rc2-mm1: Just a couple of things: +task_size-is-variable.patch +use-mm_vm_size-in-exit_mmap.patch I didn't hear back about my comments on this patch. I don't see why M

Re: page fault scalability patch V16 [3/4]: Drop page_table_lock in handle_mm_fault

2005-02-01 Thread Nick Piggin
On Tue, 2005-02-01 at 18:49 -0800, Christoph Lameter wrote: > On Wed, 2 Feb 2005, Nick Piggin wrote: > > > Well yeah, but the interesting case is when that isn't a lock ;) > > > > I'm not saying what you've got is no good. I'm sure it would be fine &g

Re: 2.6.10: kswapd spins like crazy

2005-02-03 Thread Nick Piggin
On Thu, 2005-02-03 at 11:29 +0100, Terje FÃberg wrote: > I recently upgraded my desktop from 2.4.28 to > 2.6.10. Even under moderate memory pressure kswapd > regularly eats almost all available cpu time > whenever there is a little more IO throughput, > like copying large files. The system is extr

Re: 2.6.10: kswapd spins like crazy

2005-02-03 Thread Nick Piggin
Terje FÃberg wrote: Terje FÃberg <[EMAIL PROTECTED]> skrev: The kernel is compiling right now, but I cannot reboot this machine until six or seven o'clock tonight (CET). I will report then. Well, well, I rebooted the same kernel, now with MAGIC-SYSRQ enabled. At first the kswapd-effect wouldn

Re: 2.6.10: kswapd spins like crazy

2005-02-03 Thread Nick Piggin
Nick Piggin wrote: Hmm, your DMA zone has no active pages, and pages_scanned (which triggers all_unreclaimable) is only incremented when scanning the active list. But I wonder, if the pages can't be freed, why aren't they being put on the active list? Oh, attached should be a minimal

Re: 2.6.10: kswapd spins like crazy

2005-02-03 Thread Nick Piggin
Andrew Morton wrote: Nick Piggin <[EMAIL PROTECTED]> wrote: Oh, attached should be a minimal fix if you would like to try it out. ... --- linux-2.6/mm/vmscan.c~vmscan-minfix 2005-02-04 11:52:37.0 +1100 +++ linux-2.6-npiggin/mm/vmscan.c 2005-02-04 11:53:32.0 +1100 @@

Re: A scrub daemon (prezeroing)

2005-02-03 Thread Nick Piggin
On Thu, 2005-02-03 at 22:26 -0800, Christoph Lameter wrote: > On Fri, 4 Feb 2005, Paul Mackerras wrote: > > > As has my scepticism about pre-zeroing actually providing any benefit > > on ppc64. Nevertheless, the only definitive answer is to actually > > measure the performance both ways. > > Of

Re: 2.6.10: kswapd spins like crazy

2005-02-04 Thread Nick Piggin
Terje Fåberg wrote: Terje Fåberg <[EMAIL PROTECTED]> skrev: I'll continue to do the same things I did yesterday before kswapd started to spin. Looks very good so far. I am unable to reproduce the bad kswapd behaviour with your patch, Nick. To double-check I booted into the old kernel an hour a

Re: [PATCH 2/2] page table iterators

2005-02-22 Thread Nick Piggin
Hugh Dickins wrote: On Sun, 20 Feb 2005, Nick Piggin wrote: Open coding is probably the smaller evil. And they're really not changed that often. My opinion FWIW: I'm all for regularizing the pagetable loops to work the same way, changing their variables to use the same names, impro

Re: [PATCH 2/2] page table iterators

2005-02-22 Thread Nick Piggin
On Tue, 2005-02-22 at 20:31 -0800, David S. Miller wrote: > On Wed, 23 Feb 2005 02:06:28 + (GMT) > Hugh Dickins <[EMAIL PROTECTED]> wrote: > > > I've not seen Dave's bitmap walking functions (for clearing?), > > would they fit in better with my way? > Hugh: I'll have more of a look through y

Re: [PATCH 2/2] page table iterators

2005-02-22 Thread Nick Piggin
On Tue, 2005-02-22 at 20:31 -0800, David S. Miller wrote: > I just got also reminded that we walk these damn pagetables completely > twice every exit, once to unmap the VMAs pte mappings, once again to > zap the page tables. It might be fruitful to explore combining > those two steps, perhaps not

Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02

2005-02-23 Thread Nick Piggin
Hugh Dickins wrote: On Wed, 23 Feb 2005, Lee Revell wrote: Thanks, your patch fixes the copy_pte_range latency. clear_page_range is also problematic. Yes, I saw that from your other traces too. I know there are plans to improve clear_page_range during 2.6.12, but I didn't realize that it had beco

Re: [PATCH 2/2] page table iterators

2005-02-23 Thread Nick Piggin
Hugh Dickins wrote: I'm off to bed, but since your appetite for looking at patches is greater than mine, I'll throw what I'm currently testing over the wall to you now. Against 2.6.11-rc4-bk9, but my starting point was obviously your patches. Not yet split up, but clearly should be. Yeah you've s

Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02

2005-02-23 Thread Nick Piggin
Lee Revell wrote: On Thu, 2005-02-24 at 10:27 +1100, Nick Piggin wrote: If you are using i386 with 2-level page tables (no highmem), then the behaviour should be more or less identical. Odd. IIRC last time I really tested this a few months ago, the worst case latency on that machine was about

Re: More latency regressions with 2.6.11-rc4-RT-V0.7.39-02

2005-02-23 Thread Nick Piggin
Lee Revell wrote: On Thu, 2005-02-24 at 12:29 +1100, Nick Piggin wrote: Lee Revell wrote: IIRC last time I really tested this a few months ago, the worst case latency on that machine was about 150us. Currently its 422us from the same clear_page_range code path. Well it should be pretty trivial to

Re: [PATCH 2/2] page table iterators

2005-02-23 Thread Nick Piggin
On Thu, 2005-02-24 at 05:12 +, Hugh Dickins wrote: > On Thu, 24 Feb 2005, Nick Piggin wrote: > > OK after sleeping on it, I'm warming to your way. > > > > I don't think it makes something like David's modifications any > > easier, but mine di

[PATCH 0/13] Multiprocessor CPU scheduler patches

2005-02-23 Thread Nick Piggin
Hi, I hope that you can include the following set of CPU scheduler patches in -mm soon, if you have no other significant performance work going on. There are some fairly significant changes, with a few basic aims: * Improve SMT behaviour * Improve CMP behaviour, CMP/NUMA scheduling (ie. Opteron)

[PATCH 1/13] timestamp fixes

2005-02-23 Thread Nick Piggin
1/13 Some fixes for unsynchronised TSCs. A task's timestamp may have been set by another CPU. Although we try to adjust this correctly with the timestamp_last_tick field, there is no guarantee this will be exactly right. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux

[PATCH 2/13] improve pinned task handling

2005-02-23 Thread Nick Piggin
ancing faster, and ensure the migration threads don't start running which is another problem observed in the wild. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.

[PATCH 3/13] rework schedstats

2005-02-23 Thread Nick Piggin
3/13 I have an updated userspace parser for this thing, if you are still keeping it on your website. Move balancing fields into struct sched_domain, so we can get more useful results on systems with multiple domains (eg SMT+SMP, CMP+NUMA, SMP+NUMA, etc). Signed-off-by: Nick Piggin <[EM

[PATCH 4/13] find_busiest_group fixlets

2005-02-23 Thread Nick Piggin
4/13 Fix up a few small warts in the periodic multiprocessor rebalancing code. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c 2005-02-24 17:31:28.431609701

[PATCH 5/13] find_busiest_group cleanup

2005-02-23 Thread Nick Piggin
5/13 Cleanup find_busiest_group a bit. New sched-domains code means we can't have groups without a CPU. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c === --- linux-2.6.orig/kernel/sched.c

[PATCH 7/13] better active balancing heuristic

2005-02-23 Thread Nick Piggin
7/13 Fix up active load balancing a bit so it doesn't get called when it shouldn't. Reset the nr_balance_failed counter at more points where we have found conditions to be balanced. This reduces too aggressive active balancing seen on some workloads. Signed-off-by: Nick Piggin <[EM

[PATCH 9/13] less affine wakups

2005-02-23 Thread Nick Piggin
ance tolerance is now set at half the domain's imbalance, so we get the opportunity to do wake balancing before the more random periodic rebalancing gets preformed. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/kernel/sched.c =

[PATCH 8/13] generalised CPU load averaging

2005-02-23 Thread Nick Piggin
ily increased). So generally a higher number will result in more conservative balancing. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/asm-i386/topology.h === --- linux-2.6.orig/include/asm-i386/topology.

[PATCH 12/13] schedstats additions for sched-balance-fork

2005-02-23 Thread Nick Piggin
12/13 Add SCHEDSTAT statistics for sched-balance-fork. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/linux/sched.h === --- linux-2.6.orig/include/linux/sched.h 2005-02-24 17:39:07.616911007

[PATCH 10/13] remove aggressive idle balancing

2005-02-23 Thread Nick Piggin
10/13 Remove the very aggressive idle stuff that has recently gone into 2.6 - it is going against the direction we are trying to go. Hopefully we can regain performance through other methods. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/asm-i386/topo

[PATCH 11/13] sched-domains aware balance-on-fork

2005-02-23 Thread Nick Piggin
is for the new tasks to be sent to a different socket, but more often than not, we would first load up our sibling core, or fill two cores of a single remote socket before selecting a new one. This gives large improvements to STREAM on such systems. Signed-off-by: Nick Piggin <[EMAIL PROTEC

[PATCH 13/13] basic tuning

2005-02-23 Thread Nick Piggin
13/13 Do some basic initial tuning. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux-2.6/include/asm-x86_64/topology.h === --- linux-2.6.orig/include/asm-x86_64/topology.h 2005-02-24 17:39:07.615911131 +1100 +++

[PATCH 6/13] no aggressive idle balancing

2005-02-24 Thread Nick Piggin
6/13 Remove the special casing for idle CPU balancing. Things like this are hurting for example on SMT, where are single sibling being idle doesn't really warrant a really aggressive pull over the NUMA domain, for example. Signed-off-by: Nick Piggin <[EMAIL PROTECTED]> Index: linux

Re: [PATCH 1/13] timestamp fixes

2005-02-24 Thread Nick Piggin
On Thu, 2005-02-24 at 08:46 +0100, Ingo Molnar wrote: > * Nick Piggin <[EMAIL PROTECTED]> wrote: > > > 1/13 > > > > ugh, has this been tested? It needs the patch below. > Yes. Which might also explain why I didn't see -ve intervals :( Thanks Ingo.

Re: [PATCH 10/13] remove aggressive idle balancing

2005-02-24 Thread Nick Piggin
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: [PATCH 6/13] no aggressive idle balancing [PATCH 8/13] generalised CPU load averaging [PATCH 9/13] less affine wakups [PATCH 10/13] remove aggressive idle balancing they look fine, but these are the really scary ones :-) Maybe we

Re: [PATCH 2/2] page table iterators

2005-02-24 Thread Nick Piggin
Hugh Dickins wrote: On Thu, 24 Feb 2005, Nick Piggin wrote: pud_addr_end? next = pud_addr_end(addr, end); Hmm, yes, I'll go with that, thanks (unless a better idea follows). Something I do intend on top of what I sent before, is another set of three macros, like

Re: [PATCH 12/13] schedstats additions for sched-balance-fork

2005-02-24 Thread Nick Piggin
Ingo Molnar wrote: * Nick Piggin <[EMAIL PROTECTED]> wrote: [PATCH 11/13] sched-domains aware balance-on-fork [PATCH 12/13] schedstats additions for sched-balance-fork [PATCH 13/13] basic tuning STREAMS numbers tricky. It's pretty much the only benchmark that 1) relies on being able

Re: [PATCH 2/2] page table iterators

2005-02-24 Thread Nick Piggin
Hugh Dickins wrote: At one stage I was adding unlikelies to all the p??_bads, then it seemed more sensible to hide that in a new macro (which of course must do the none and bad tests inline, before going off to the function). Yeah that sounds OK. I think (un)likely can propagate through inline func

Re: [PATCH 3/13] rework schedstats

2005-02-25 Thread Nick Piggin
Rick Lindsley wrote: I have an updated userspace parser for this thing, if you are still keeping it on your website. Sure, be happy to include it, thanks! Send it along. Is it for version 11 or version 12? Version 12. I can send it to you next week. This was actually directed at Andrew, w

Re: [PATCH 12/13] schedstats additions for sched-balance-fork

2005-02-25 Thread Nick Piggin
Rick Lindsley wrote: There is little help we get from userspace, and i'm not sure we want to add scheduler overhead for this single benchmark - when something like a _tiny_ bit of NUMAlib use within the OpenMP library would probably solve things equally well! There's has been a gene

Re: sched_yield behavior

2005-02-27 Thread Nick Piggin
Giovanni Tusa wrote: If I am not wrong, the scheduler will choose it again (it will be still the higher priority task, and the only of its priority list). I have to add an explicit sleep to effectively relinquish the CPU for some time, or the scheduler can deal with such a situation in another way

Re: Slowdown on high-load machines with 3000 sockets

2005-02-27 Thread Nick Piggin
Christian Schmid wrote: I already tried with 300 KB and even used a perl-hash as a horrible-slow buffer for a readahead-replacement. It still slowed down on the syswrite to the socket. Thats the strange thing. Do you have to use manual readahead though? What is the performance like if you just l

Re: Slowdown on high-load machines with 3000 sockets

2005-02-28 Thread Nick Piggin
Christian Schmid wrote: This issue has been tracked down more. This bug does NOT appear if I disable preemtive kernel. Maybe this helps. Yes, it may help - can you boot with profile=schedule and get the results for say, a 30 second period while the application is experiencing problems? So: start

Re: swapper: page allocation failure. order:1, mode:0x20

2005-02-28 Thread Nick Piggin
Robert Hancock wrote: Bernd Schubert wrote: Oh no, not this page allocation problems again. In summer I already posted problems with page allocation errors with 2.6.7, but to me it seemed that nobody cared. That time we got those problems every morning during the cron jobs and our main file serv

Re: [PATCH 1/13] timestamp fixes

2005-03-01 Thread Nick Piggin
Andrew Theurer wrote: Nick, can you describe the system you run the DB tests on? Do you have any cpu idle time stats and hopefully some context switch rate stats? Yeah, it is dbt3-pgsql on OSDL's 8-way STP machines. I think they're PIII Xeons with 2MB L2 cache. I had been having some difficulty r

Re: [PATCH 1/13] timestamp fixes

2005-03-01 Thread Nick Piggin
Nick Piggin wrote: Andrew Theurer wrote: Nick, can you describe the system you run the DB tests on? Do you have any cpu idle time stats and hopefully some context switch rate stats? Yeah, it is dbt3-pgsql on OSDL's 8-way STP machines. I think they're PIII Xeons with 2MB L2 cache.

Re: [PATCH] New operation for kref to help avoid locks

2005-03-01 Thread Nick Piggin
Corey Minyard wrote: Arjan van de Ven wrote: Just doing an atomic operation is not faster than doing a lock, an atomic operation, then an unlock? Am I missing something? if the lock and the atomic are on the same cacheline they're the same cost on most modern cpus... Ah, I see. Not likely

Re: [PATCH] New operation for kref to help avoid locks

2005-03-01 Thread Nick Piggin
Corey Minyard wrote: Nick Piggin wrote: Is get_with_check actually going to be useful for anything? It seems like it promotes complex and potentially unsafe schemes. It is certainly more complex to use this, and I'm guessing that's why Greg rejected it. Certainly a valid problem. e

[patch] nicksched for 2.6.11

2005-03-02 Thread Nick Piggin
I've had a few queries about this, so by "popular" demand, I've put my latest nicksched stuff here: www.kerneltrap.org/~npiggin/2.6.11-nicksched.gz It includes all the multiprocessor stuff that's in -mm, and also my alternate scheduler policy. Nick - To unsubscribe from this list: send the line "un

Re: RFD: Kernel release numbering

2005-03-02 Thread Nick Piggin
Andrew Morton wrote: Jeff Garzik <[EMAIL PROTECTED]> wrote: IMO too confusing. 2.6.even: bugfixes only 2.6.odd: bugfixes and features. That doesn't even confuse me! I actually second Matt's request; -RCs à la 2.4. Then your above becomes: 2.6.x-rc: bugfixes only 2.6.x-pre: bugfixes and features And

Re: Page fault scalability patch V18: Drop first acquisition of ptl

2005-03-02 Thread Nick Piggin
Andrew Morton wrote: Christoph Lameter <[EMAIL PROTECTED]> wrote: On Wed, 2 Mar 2005, Andrew Morton wrote: Earlier releases back in September 2004 had some pte locking code (and AFAIK Nick also played around with pte locking) but that was less efficient than atomic operations. How much less effici

Re: Page fault scalability patch V18: Drop first acquisition of ptl

2005-03-02 Thread Nick Piggin
Benjamin Herrenschmidt wrote: However, if this pte_cmpxchg() thing is used for removing access, then sparc64 can't use it. In such a case a race in the TLB handler would result in using an invalid PTE. I could "spin" on some lock bit, but there is no way I'm adding instructions to the carefully c

Re: Page fault scalability patch V18: Drop first acquisition of ptl

2005-03-02 Thread Nick Piggin
Benjamin Herrenschmidt wrote: On Fri, 2005-03-04 at 04:19 +1100, Nick Piggin wrote: You don't want to do that for all architectures, as I said earlier. eg. i386 can concurrently set the dirty bit with the MMU (which won't honour the lock). So you then need an atomic lock, atomic pte

Re: RFD: Kernel release numbering

2005-03-04 Thread Nick Piggin
Andrew Morton wrote: Thomas Gleixner <[EMAIL PROTECTED]> wrote: I don't see that the releases are stable. They are defined stable by proclamation. If they were stable we'd release the darn things! *obviously* -rc kernels are expected to still have problems. Release the -rc kernel when it is st

Re: kswapd (& clock?) problems in 2.6.10

2005-01-15 Thread Nick Piggin
[EMAIL PROTECTED] wrote: Since installing 2.6.10 kswapd has decided to loop wildly on two occations. Both occations happened after starting a big compiles. Checking vmstat, I noticed a steady stream of io/bi figures ranging between 2-5K (which is about what I can get out of this box during normal o

Re: BUG: Slowdown on 3000 socket-machines tracked down

2005-03-06 Thread Nick Piggin
Christian Schmid wrote: Today I tested with 5000 sockets. The problem is the same like above but the more sockets there come, it just doesnt claim more bandwidth as it SHOULD of course do. It seems it doesn't slow down but it just doesnt scale anymore. The badwidth doesnt go over 80 MB/Sec, no m

Re: BUG: Slowdown on 3000 socket-machines tracked down

2005-03-06 Thread Nick Piggin
Ben Greear wrote: Christian Schmid wrote: Ben Greear wrote: How many bytes are you sending with each call to write()/sendto() whatever? I am using sendfile-call every 100 ms per socket with the poll-api. So basically around 40 kb per round. My application is single-threaded, uses non-blocking

Re: [PATCH 10/13] remove aggressive idle balancing

2005-03-06 Thread Nick Piggin
Siddha, Suresh B wrote: By code inspection, I see an issue with this patch [PATCH 10/13] remove aggressive idle balancing Why are we removing cpu_and_siblings_are_idle check from active_load_balance? In case of SMT, we want to give prioritization to an idle package while doing active_load_

Re: BUG: Slowdown on 3000 socket-machines tracked down

2005-03-06 Thread Nick Piggin
Willy Tarreau wrote: On Mon, Mar 07, 2005 at 04:14:37PM +1100, Nick Piggin wrote: I think you would have better luck in reproducing this problem if you did the full sendfile thing. I think it is becoming disk bound due to page reclaim problems, which is causing the slowdown. In that case

Re: BUG: Slowdown on 3000 socket-machines tracked down

2005-03-06 Thread Nick Piggin
Nick Piggin wrote: Willy Tarreau wrote: thousands of sockets). I never had enough time to investigate more, so I went back to 2.4. I have heard other complaints about this, and they are definitely related to the scheduler (not saying yours is, but it is very possible). Oh, and if you could dig

Re: [patch] nicksched for 2.6.11

2005-03-06 Thread Nick Piggin
Prakash Punnoor wrote: Nick Piggin schrieb: I've had a few queries about this, so by "popular" demand, I've put my latest nicksched stuff here: www.kerneltrap.org/~npiggin/2.6.11-nicksched.gz It includes all the multiprocessor stuff that's in -mm, and also my alternate sc

Re: [patch 2/5] setup_per_zone_lowmem_reserve() oops fix

2005-03-07 Thread Nick Piggin
[EMAIL PROTECTED] wrote: If you do 'echo 0 0 > /proc/sys/vm/lowmem_reserve_ratio' the kernel gets a divide-by-zero. Prevent that, and fiddle with some whitespace too. Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> Can we instead have a patch that makes the value zero turn off the lowmem reserve e

Re: [PATCH 10/13] remove aggressive idle balancing

2005-03-07 Thread Nick Piggin
Siddha, Suresh B wrote: Nick, On Mon, Mar 07, 2005 at 04:34:18PM +1100, Nick Piggin wrote: Active balancing should only kick in after the prescribed number of rebalancing failures - can_migrate_task will see this, and will allow the balancing to take place. We are resetting the nr_balance_failed

<    1   2   3   4   5   6   7   8   9   10   >