Re: KLive: Linux Kernel Live Usage Monitor

2005-08-30 Thread Andrea Arcangeli
On Tue, Aug 30, 2005 at 05:56:33PM +0100, Alan Cox wrote: I doubt there is anything needed that can't be done in sh and nc here. Catching boots can be done by adding one to a boot number and sending that as well. How does suspend to disk handle uptime - if the uptime stops then sending the

Re: KLive: Linux Kernel Live Usage Monitor

2005-08-30 Thread Andrea Arcangeli
On Tue, Aug 30, 2005 at 10:08:38AM -0700, Wilkerson, Bryan P wrote: they're work, I'm not sure I'd trust or use the data unless it was somehow authenticated. I doubt many testers would be willing to register on yet another website just for this. So I doubt adding authentication is a good

Re: KLive: Linux Kernel Live Usage Monitor

2005-08-30 Thread Andrea Arcangeli
On Tue, Aug 30, 2005 at 06:11:26PM -0400, Bill Davidsen wrote: the system, like load. A week running while I was on vacation doesn't test much, a week running on a loaded server tests other things. btw, I thought about adding the load average too but it wasn't really interesting, since

Re: KLive: Linux Kernel Live Usage Monitor

2005-08-31 Thread Andrea Arcangeli
On Wed, Aug 31, 2005 at 12:14:23PM -0700, [EMAIL PROTECTED] wrote: Do you want to try to handle version skew ? All kernels built from GIT trees look like 2.6.13 until Linus releases 2.6.14-rc1. Possible approaches (requiring changes to the kernel Makefile). 1) Use the SHA1 of HEAD to provide

Re: KLive: Linux Kernel Live Usage Monitor

2005-08-31 Thread Andrea Arcangeli
On Wed, Aug 31, 2005 at 04:28:59PM +0200, Sven Ladegast wrote: Why not generating a unique system ID at compilation stage of the kernel if the apopriate kernel option is enabled? This needn't have something to do with klive...just a unique kernel-ID or something like that. I could also store

Re: KLive: Linux Kernel Live Usage Monitor

2005-09-01 Thread Andrea Arcangeli
On Wed, Aug 31, 2005 at 08:20:51PM +0200, Pavel Machek wrote: Well, you could remove everything that is not valid kernel text from backtrace. What if the corruption wrote the ssh key inside a the kernel text? As suggested before, I suspect the only way would be to make it optional. Oh and

Re: KLive: Linux Kernel Live Usage Monitor

2005-09-01 Thread Andrea Arcangeli
On Wed, Aug 31, 2005 at 08:32:00PM +0200, Pavel Machek wrote: I'd say ignore suspend. Machines using it are probably not connected to network, anyway, and it stresses system quite a lot. Currently even if you're not connected to the network it's fine. As long as you connect sometime. If a

[patch] i386 seccomp fix for auditing/ptrace

2005-09-04 Thread Andrea Arcangeli
exit_code 0 signal 0 The seccomp_test.py completed successfully, thank you for testing. Thanks. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff -r 1df7bfbb783f arch/i386/kernel/ptrace.c --- a/arch/i386/kernel/ptrace.c Fri Sep 2 09:01:35 2005 +++ b/arch/i386/kernel/ptrace.c Mon Sep 5 05

Re: VM: Fix nasty and subtle race in shared mmap'ed page writeback

2007-01-29 Thread Andrea Arcangeli
On Mon, Jan 29, 2007 at 03:08:44PM +0100, Andrea Gelmini wrote: On Mon, Jan 22, 2007 at 10:10:39AM +0100, Peter Zijlstra wrote: On Fri, 2007-01-12 at 01:39 +0100, Andrea Gelmini wrote: Hi, I can't do the test 'till next week. Thanks a lot for your time, Gelma Have you

Re: O_DIRECT question

2007-01-29 Thread Andrea Arcangeli
On Sun, Jan 28, 2007 at 06:03:08PM +0100, Denis Vlasenko wrote: I still don't see much difference between O_SYNC and O_DIRECT write semantic. O_DIRECT is about avoiding the copy_user between cache and userland, when working with devices that runs faster than ram (think =100M/sec, quite standard

Re: O_DIRECT question

2007-01-30 Thread Andrea Arcangeli
On Tue, Jan 30, 2007 at 01:50:41PM -0500, Phillip Susi wrote: It should return the number of bytes successfully written before the error, giving you the location of the first error. Also using smaller individual writes ( preferably issued in parallel ) also allows the problem spot to be

Re: O_DIRECT question

2007-01-30 Thread Andrea Arcangeli
On Tue, Jan 30, 2007 at 08:57:20PM +0100, Andrea Arcangeli wrote: Please try yourself, it's simple enough: time dd if=/dev/hda of=/dev/null bs=16M count=100 time dd if=/dev/hda of=/dev/null bs=16M count=100 iflag=sync sorry, reading won't help much to exercise sync

Re: O_DIRECT question

2007-01-30 Thread Andrea Arcangeli
On Tue, Jan 30, 2007 at 06:07:14PM -0500, Phillip Susi wrote: It most certainly matters where the error happened because you are screwd is not an acceptable outcome in a mission critical application. An I/O error is not an acceptable outcome in a mission critical app, all mission critical

Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday

2007-02-01 Thread Andrea Arcangeli
On Thu, Feb 01, 2007 at 12:20:59PM +0100, Andi Kleen wrote: I think a better way to do this would be to define a new CLOCK_THREAD_MONOTONOUS (or better name) timer for clock_gettime(). [and my currently stalled vdso patches that implement clock_gettime as a vsyscall] Then also an

Re: [patch 0/9] x86_64: reliable TSC-based gettimeofday

2007-02-01 Thread Andrea Arcangeli
On Thu, Feb 01, 2007 at 01:02:41PM +0100, Andi Kleen wrote: I don't think so because having per process state in a vsyscall is quite costly. You would need to allocate at least one more page to each process, which I think would be excessive. You would need one page per cpu and to check a

Re: O_DIRECT question

2007-01-21 Thread Andrea Arcangeli
Hello everyone, This is a long thread about O_DIRECT surprisingly without a single bugreport in it, that's a good sign that O_DIRECT is starting to work well in 2.6 too ;) On Fri, Jan 12, 2007 at 02:47:48PM -0800, Andrew Morton wrote: On Fri, 12 Jan 2007 15:35:09 -0700 Erik Andersen [EMAIL

Re: Why active list and inactive list?

2007-01-22 Thread Andrea Arcangeli
On Tue, Jan 23, 2007 at 01:10:46AM +0100, Niki Hammler wrote: Dear Linux Developers/Enthusiasts, For a course at my university I'm implementing parts of an operating system where I get most ideas from the Linux Kernel (which I like very much). One book I gain information from is [1].

Re: Why active list and inactive list?

2007-01-22 Thread Andrea Arcangeli
On Tue, Jan 23, 2007 at 07:01:33AM +0530, Balbir Singh wrote: This makes me wonder if it makes sense to split up the LRU into page cache LRU and mapped pages LRU. I see two benefits 1. Currently based on swappiness, we might walk an entire list searching for page cache pages or mapped

Re: rdtscp vgettimeofday

2006-12-11 Thread Andrea Arcangeli
On Mon, Dec 11, 2006 at 01:17:25PM -0800, dean gaudet wrote: rdtscp doesn't solve anything extra [..] [..] lsl-based vgetcpu is relatively slow Well, if you accept to run slow there's nothing to solve in the first place indeed. If nothing else rdtscp should avoid the mess of restarting a

Re: rdtscp vgettimeofday

2006-12-11 Thread Andrea Arcangeli
On Mon, Dec 11, 2006 at 03:15:44PM -0800, dean gaudet wrote: rdtscp gets you 2 of the 5 values you need to compute the time. anything can happen between when you do the rdtscp and do the other 3 reads: the computation is (((tsc-A)*B)N)+C where N is a constant, and A, B, C are per-cpu

Re: KLive: Linux Kernel Live Usage Monitor

2005-09-05 Thread Andrea Arcangeli
On Wed, Aug 31, 2005 at 09:47:01PM +0200, Andrea Arcangeli wrote: I'm thinking to add optional aggregations for (\d+)\.(\d+)\.(\d+)\D and for different archs. So you can watch ia64 only or 2.6.13 only etc... The -tiger-smp/-generic-up makes life harder indeed ;). I now implemented some basic

Re: KLive: Linux Kernel Live Usage Monitor

2005-09-05 Thread Andrea Arcangeli
On Tue, Sep 06, 2005 at 12:05:07AM +0200, Marc Giger wrote: Hi Andrea Two little details: The following line does not print what you expect on alpha's: MHZ = int(re.search(r' (\d+)\.?\d?', os.popen(grep -i mhz /proc/cpuinfo | head -n 1).read()).group(1)) Thanks

Re: [PATCH 0/3] Minor changes to common hugetlb code for ARM

2012-09-12 Thread Andrea Arcangeli
argument passing in mm/huge_memory.c Both: Reviewed-by: Andrea Arcangeli aarca...@redhat.com Steve Capper (1): mm: Introduce HAVE_ARCH_TRANSPARENT_HUGEPAGE This was already introduced by the s390 THP support which I reviewed a few days ago, and it's already included in -mm, so it can

Re: [PATCH v3 10/10] thp: implement refcounting for huge zero page

2012-09-13 Thread Andrea Arcangeli
it would be more correct if __GFP_MOVABLE was clear, like (GFP_TRANSHUGE | __GFP_ZERO) ~__GFP_MOVABLE because this page isn't really movable (it's only reclaimable). The xchg vs xchgcmp locking also looks good. Reviewed-by: Andrea Arcangeli aarca...@redhat.com Thanks, Andrea -- To unsubscribe from

Re: [PATCH v3 10/10] thp: implement refcounting for huge zero page

2012-09-13 Thread Andrea Arcangeli
Hi Kirill, On Thu, Sep 13, 2012 at 08:37:58PM +0300, Kirill A. Shutemov wrote: On Thu, Sep 13, 2012 at 07:16:13PM +0200, Andrea Arcangeli wrote: Hi Kirill, On Wed, Sep 12, 2012 at 01:07:53PM +0300, Kirill A. Shutemov wrote: - hpage = alloc_pages(GFP_TRANSHUGE | __GFP_ZERO

Re: [PATCH 2/7] mm: fix potential anon_vma locking issue in mprotect()

2012-09-04 Thread Andrea Arcangeli
Hi Michel, On Tue, Sep 04, 2012 at 02:20:52AM -0700, Michel Lespinasse wrote: This change fixes an anon_vma locking issue in the following situation: - vma has no anon_vma - next has an anon_vma - vma is being shrunk / next is being expanded, due to an mprotect call We need to take next's

Re: [PATCH 2/7] mm: fix potential anon_vma locking issue in mprotect()

2012-09-04 Thread Andrea Arcangeli
On Tue, Sep 04, 2012 at 02:53:47PM -0700, Michel Lespinasse wrote: I think the minimal fix would actually be: if (vma-anon_vma (importer || start != vma-vm_start)) { anon_vma = vma-anon_vma; + else if (next-anon_vma adjust_next) + anon_vma =

Re: [RFC v2 PATCH 0/7] thp: transparent hugepages on s390

2012-09-04 Thread Andrea Arcangeli
Hi Andrew and Martin, On Fri, Aug 31, 2012 at 12:47:02PM -0700, Andrew Morton wrote: On Fri, 31 Aug 2012 09:07:57 +0200 Martin Schwidefsky schwidef...@de.ibm.com wrote: I grabbed them all. Patches 1-3 look sane to me and I cheerfully didn't read the s390 changes at all. Hopefully

Re: [RFC v2 PATCH 1/7] thp: remove assumptions on pgtable_t type

2012-09-04 Thread Andrea Arcangeli
Hi Gerald, On Wed, Aug 29, 2012 at 05:32:58PM +0200, Gerald Schaefer wrote: +#ifndef __HAVE_ARCH_PGTABLE_DEPOSIT +extern void pgtable_deposit(struct mm_struct *mm, pgtable_t pgtable); +#endif One minor nitpick on the naming of the two functions: considering that those are global exports, that

Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address

2012-08-16 Thread Andrea Arcangeli
Hi Kirill, On Thu, Aug 16, 2012 at 06:15:53PM +0300, Kirill A. Shutemov wrote: for (i = 0; i pages_per_huge_page; i++, p = mem_map_next(p, page, i)) { It may be more optimal to avoid a multiplication/shiftleft before the add, and to do: for (i = 0, vaddr = haddr; i

Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address

2012-08-16 Thread Andrea Arcangeli
On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote: Hm.. I think with static_key we can avoid cache overhead here. I'll try. Could you elaborate on the static_key? Is it some sort of self modifying code? Thanks, for review. Could you take a look at huge zero page patchset? ;)

Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page

2012-08-16 Thread Andrea Arcangeli
On Thu, Aug 09, 2012 at 12:08:18PM +0300, Kirill A. Shutemov wrote: +static void __split_huge_zero_page_pmd(struct mm_struct *mm, pmd_t *pmd, + unsigned long address) +{ + pgtable_t pgtable; + pmd_t _pmd; + unsigned long haddr = address HPAGE_PMD_MASK; + struct

Re: [PATCH, RFC 0/9] Introduce huge zero page

2012-08-16 Thread Andrea Arcangeli
Hi Andrew, On Thu, Aug 16, 2012 at 12:20:23PM -0700, Andrew Morton wrote: That's a pretty big improvement for a rather fake test case. I wonder how much benefit we'd see with real workloads? The same discussion happened about the zero page in general and there's no easy answer. I seem to

Re: [PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd()

2012-08-16 Thread Andrea Arcangeli
On Thu, Aug 09, 2012 at 12:08:17PM +0300, Kirill A. Shutemov wrote: From: Kirill A. Shutemov kirill.shute...@linux.intel.com It's required to implement huge zero pmd splitting. This isn't bisectable with the next one, it'd fail on wfg 0-DAY kernel build testing backend, however this is

Re: [PATCH v3 6/7] mm: make clear_huge_page cache clear only around the fault address

2012-08-16 Thread Andrea Arcangeli
On Thu, Aug 16, 2012 at 09:37:25PM +0300, Kirill A. Shutemov wrote: On Thu, Aug 16, 2012 at 08:29:44PM +0200, Andrea Arcangeli wrote: On Thu, Aug 16, 2012 at 07:43:56PM +0300, Kirill A. Shutemov wrote: Hm.. I think with static_key we can avoid cache overhead here. I'll try. Could you

Re: [PATCH 00/19] sched-numa rewrite

2012-08-08 Thread Andrea Arcangeli
Hi everyone, On Tue, Jul 31, 2012 at 09:12:04PM +0200, Peter Zijlstra wrote: Hi all, After having had a talk with Rik about all this NUMA nonsense where he proposed the scheme implemented in the next to last patch, I came up with a related means of doing the home-node selection. I've

Re: [PATCH 02/19] mm/mpol: Remove NUMA_INTERLEAVE_HIT

2012-08-09 Thread Andrea Arcangeli
Hi, On Tue, Jul 31, 2012 at 09:12:06PM +0200, Peter Zijlstra wrote: Since the NUMA_INTERLEAVE_HIT statistic is useless on its own; it wants to be compared to either a total of interleave allocations or to a miss count, remove it. Fixing it would be possible, but since we've gone years

Re: [PATCH 04/19] mm, thp: Preserve pgprot across huge page split

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:08PM +0200, Peter Zijlstra wrote: If we marked a THP with our special PROT_NONE protections, ensure we don't loose them over a split. Collapse seems to always allocate a new (huge) page which should already end up on the new target node so loosing protections

Re: [PATCH 05/19] mm, mpol: Create special PROT_NONE infrastructure

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:09PM +0200, Peter Zijlstra wrote: +static bool pte_prot_none(struct vm_area_struct *vma, pte_t pte) +{ + /* + * If we have the normal vma-vm_page_prot protections we're not a + * 'special' PROT_NONE page. + * + * This means we cannot get

Re: [PATCH 07/19] mm/mpol: Add MPOL_MF_NOOP

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:11PM +0200, Peter Zijlstra wrote: From: Lee Schermerhorn lee.schermerh...@hp.com This patch augments the MPOL_MF_LAZY feature by adding a NOOP policy to mbind(). When the NOOP policy is used with the 'MOVE and 'LAZY flags, mbind() [check_range()] will walk the

Re: [PATCH 10/19] mm, mpol: Use special PROT_NONE to migrate pages

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:14PM +0200, Peter Zijlstra wrote: +#ifdef CONFIG_NUMA /* - * Do fancy stuff... + * For NUMA systems we use the special PROT_NONE maps to drive + * lazy page migration, see MPOL_MF_LAZY and related. */ + page = vm_normal_page(vma,

Re: [PATCH 15/19] sched: Implement home-node awareness

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:19PM +0200, Peter Zijlstra wrote: @@ -2699,6 +2705,29 @@ select_task_rq_fair(struct task_struct * } rcu_read_lock(); + if (sched_feat_numa(NUMA_BIAS) node != -1) { + int node_cpu; + + node_cpu =

Re: [PATCH 18/19] sched, numa: Per task memory placement for big processes

2012-08-09 Thread Andrea Arcangeli
On Tue, Jul 31, 2012 at 09:12:22PM +0200, Peter Zijlstra wrote: Implement a per-task memory placement scheme for 'big' tasks (as per the last patch). It relies on a regular PROT_NONE 'migration' fault to scan the memory space of the procress and uses a two stage migration scheme to reduce the

Re: mm: kernel BUG at mm/memory.c:1230

2012-08-21 Thread Andrea Arcangeli
59af0d4348eb07087097e310f60422b994dd3a2c Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli aarca...@redhat.com Date: Tue, 21 Aug 2012 19:32:23 +0200 Subject: [PATCH] thp: make pmd_present more accurate In many places !pmd_present has been converted to pmd_none. For pmds that's equivalent and pmd_none is quicker so using pmd_none

[PATCH 14/36] autonuma: call autonuma_setup_new_exec()

2012-08-22 Thread Andrea Arcangeli
This resets all per-thread and per-process statistics across exec syscalls or after kernel threads detach from the mm. The past statistical NUMA information is unlikely to be relevant for the future in these cases. Acked-by: Rik van Riel r...@redhat.com Signed-off-by: Andrea Arcangeli aarca

[PATCH 22/36] autonuma: make khugepaged pte_numa aware

2012-08-22 Thread Andrea Arcangeli
and make it tunable with sysfs too. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- mm/huge_memory.c | 33 +++-- 1 files changed, 31 insertions(+), 2 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 08fd33c..a65590f 100644 --- a/mm

[PATCH 21/36] autonuma: call autonuma_split_huge_page()

2012-08-22 Thread Andrea Arcangeli
This is needed to make sure the tail pages are also queued into the migration queues of knuma_migrated across a transparent hugepage split. Acked-by: Rik van Riel r...@redhat.com Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- mm/huge_memory.c |2 ++ 1 files changed, 2 insertions

[PATCH 11/36] autonuma: add page structure fields

2012-08-22 Thread Andrea Arcangeli
allocated page_autonuma of 32 bytes per page (only allocated if booted on NUMA hardware, unless noautonuma is passed as parameter to the kernel at boot). Yet another later patch introduces the autonuma_list and reduces the size of the page_autonuma from 32 to 12 bytes. Signed-off-by: Andrea

[PATCH 17/36] autonuma: prevent select_task_rq_fair to return -1

2012-08-22 Thread Andrea Arcangeli
-by: Andrea Arcangeli aarca...@redhat.com --- kernel/sched/fair.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 42a88fa..677b99e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2794,6 +2794,17

[PATCH 35/36] autonuma: add knuma_migrated/allow_first_fault in sysfs

2012-08-22 Thread Andrea Arcangeli
, but it reduces some initial thrashing in case of NUMA false sharing. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/linux/autonuma_flags.h | 20 mm/autonuma.c |7 +-- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git

[PATCH 13/36] autonuma: autonuma_enter/exit

2012-08-22 Thread Andrea Arcangeli
faults to start. All other actions follow after that. If knuma_scand doesn't run, AutoNUMA is fully bypassed. If knuma_scand is stopped, soon all other AutoNUMA gears will settle down too. Acked-by: Rik van Riel r...@redhat.com Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- kernel/fork.c

[PATCH 18/36] autonuma: teach CFS about autonuma affinity

2012-08-22 Thread Andrea Arcangeli
and task_autonuma_cpu will always return true in that case. Includes fixes from Hillf Danton dhi...@gmail.com. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- kernel/sched/fair.c | 71 ++ 1 files changed, 59 insertions(+), 12 deletions(-) diff

[PATCH 07/36] autonuma: mm_autonuma and task_autonuma data structures

2012-08-22 Thread Andrea Arcangeli
Define the two data structures that collect the per-process (in the mm) and per-thread (in the task_struct) statistical information that are the input of the CPU follow memory algorithms in the NUMA scheduler. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/linux/autonuma_types.h

[PATCH 01/36] autonuma: make set_pmd_at always available

2012-08-22 Thread Andrea Arcangeli
set_pmd_at() will also be used for the knuma_scand/pmd = 1 (default) mode even when TRANSPARENT_HUGEPAGE=n. Make it available so the build won't fail. Acked-by: Rik van Riel r...@redhat.com Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- arch/x86/include/asm/paravirt.h |2 -- 1 files

[PATCH 15/36] autonuma: alloc/free/init task_autonuma

2012-08-22 Thread Andrea Arcangeli
it on NUMA hardware. So the non NUMA hardware only pays the memory of a pointer in the kernel stack (which remains NULL at all times in that case). If the kernel is compiled with CONFIG_AUTONUMA=n, not even the pointer is allocated on the kernel stack of course. Signed-off-by: Andrea Arcangeli aarca

[PATCH 32/36] autonuma: boost khugepaged scanning rate

2012-08-22 Thread Andrea Arcangeli
Until THP native migration is implemented it's safer to boost khugepaged scanning rate because all memory migration are splitting the hugepages. So the regular rate of scanning becomes too low when lots of memory is migrated. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- mm

[PATCH 27/36] autonuma: add CONFIG_AUTONUMA and CONFIG_AUTONUMA_DEFAULT_ENABLED

2012-08-22 Thread Andrea Arcangeli
Add the config options to allow building the kernel with AutoNUMA. If CONFIG_AUTONUMA_DEFAULT_ENABLED is =y, then /sys/kernel/mm/autonuma/enabled will be equal to 1, and AutoNUMA will be enabled automatically at boot. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- arch/Kconfig

[PATCH 23/36] autonuma: retain page last_nid information in khugepaged

2012-08-22 Thread Andrea Arcangeli
When pages are collapsed try to keep the last_nid information from one of the original pages. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- mm/huge_memory.c | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index

[PATCH 28/36] autonuma: page_autonuma

2012-08-22 Thread Andrea Arcangeli
is booted on real NUMA hardware and noautonuma is not passed as a parameter to the kernel. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/linux/autonuma.h | 18 +++- include/linux/autonuma_types.h | 55 + include/linux/mm_types.h | 26 include/linux

[PATCH 10/36] autonuma: CPU follows memory algorithm

2012-08-22 Thread Andrea Arcangeli
and cleanups from Hillf Danton dhi...@gmail.com. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/linux/autonuma_sched.h | 50 include/linux/mm_types.h |5 + include/linux/sched.h |3 + kernel/sched/core.c|1 + kernel/sched/fair.c

[PATCH 00/36] AutoNUMA24

2012-08-22 Thread Andrea Arcangeli
it does nothing at all. Changelog from alpha11 to alpha13: o autonuma_balance optimization (take the fast path when process is in the preferred NUMA node) TODO: o THP native migration (orthogonal and also needed for cpuset/migrate_pages(2)/numa/sched). Andrea Arcangeli (35): autonuma

[PATCH 05/36] autonuma: teach gup_fast about pmd_numa

2012-08-22 Thread Andrea Arcangeli
Riel r...@redhat.com Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- arch/x86/mm/gup.c | 13 - 1 files changed, 12 insertions(+), 1 deletions(-) diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c index dd74e46..02c5ec5 100644 --- a/arch/x86/mm/gup.c +++ b/arch/x86/mm/gup.c

[PATCH 31/36] autonuma: shrink the per-page page_autonuma struct size

2012-08-22 Thread Andrea Arcangeli
for now). This means the max RAM configuration fully supported by AutoNUMA becomes AUTONUMA_LIST_MAX_PFN_OFFSET multiplied by 32767 nodes multiplied by the PAGE_SIZE (assume 4096 here, but for some archs it's bigger). 4096*32767*(0x-3)(10*5) = 511 PetaBytes. Signed-off-by: Andrea Arcangeli

[PATCH 24/36] autonuma: numa hinting page faults entry points

2012-08-22 Thread Andrea Arcangeli
This is where the numa hinting page faults are detected and are passed over to the AutoNUMA core logic. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/linux/huge_mm.h |2 ++ mm/huge_memory.c| 18 ++ mm/memory.c | 31

[PATCH 12/36] autonuma: knuma_migrated per NUMA node queues

2012-08-22 Thread Andrea Arcangeli
the memory in a round robin fashion from all remote nodes to the daemon's local node. The head that belongs to the local node that knuma_migrated runs on, for now must be empty and it's not being used. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/linux/mmzone.h | 18

[PATCH 26/36] autonuma: link mm/autonuma.o and kernel/sched/numa.o

2012-08-22 Thread Andrea Arcangeli
Link the AutoNUMA core and scheduler object files in the kernel if CONFIG_AUTONUMA=y. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- kernel/sched/Makefile |1 + mm/Makefile |1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/kernel/sched/Makefile b

[PATCH 02/36] autonuma: export is_vma_temporary_stack() even if CONFIG_TRANSPARENT_HUGEPAGE=n

2012-08-22 Thread Andrea Arcangeli
is_vma_temporary_stack() is needed by mm/autonuma.c too, and without this the build breaks with CONFIG_TRANSPARENT_HUGEPAGE=n. Reported-by: Petr Holasek phola...@redhat.com Acked-by: Rik van Riel r...@redhat.com Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/linux/huge_mm.h

[PATCH 03/36] autonuma: define _PAGE_NUMA_PTE and _PAGE_NUMA_PMD

2012-08-22 Thread Andrea Arcangeli
established by ioremap, never on pmds so there's no risk of collision with Xen. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- arch/x86/include/asm/pgtable_types.h | 28 1 files changed, 28 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm

[PATCH 08/36] autonuma: define the autonuma flags

2012-08-22 Thread Andrea Arcangeli
These flags are the ones tweaked through sysfs, they control the behavior of autonuma, from enabling disabling it, to selecting various runtime options. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/linux/autonuma_flags.h | 129 1 files

[PATCH 19/36] autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection

2012-08-22 Thread Andrea Arcangeli
...@gmail.com. Math documentation on autonuma_last_nid in the header of last_nid_set() reworked from sched-numa code by Peter Zijlstra a.p.zijls...@chello.nl. Signed-off-by: Andrea Arcangeli aarca...@redhat.com Signed-off-by: Hillf Danton dhi...@gmail.com --- mm/autonuma.c | 1619

[PATCH 33/36] autonuma: powerpc port

2012-08-22 Thread Andrea Arcangeli
* Page migration is yet to be observed/verified Signed-off-by: Vaidyanathan Srinivasan sva...@linux.vnet.ibm.com Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- arch/powerpc/include/asm/pgtable.h| 48 - arch/powerpc/include/asm/pte-hash64-64k.h |4

[PATCH 34/36] autonuma: make the AUTONUMA_SCAN_PMD_FLAG conditional to CONFIG_HAVE_ARCH_AUTONUMA_SCAN_PMD

2012-08-22 Thread Andrea Arcangeli
Remove the sysfs entry /sys/kernel/mm/autonuma/knuma_scand/pmd and force the knuma_scand pmd mode off if CONFIG_HAVE_ARCH_AUTONUMA_SCAN_PMD is not set by the architecture. Enable AutoNUMA for PPC64. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- arch/Kconfig |3 +++ arch

[PATCH 06/36] autonuma: introduce kthread_bind_node()

2012-08-22 Thread Andrea Arcangeli
This function makes it easy to bind the per-node knuma_migrated threads to their respective NUMA nodes. Those threads take memory from the other nodes (in round robin with a incoming queue for each remote node) and they move that memory to their local node. Signed-off-by: Andrea Arcangeli aarca

[PATCH 09/36] autonuma: core autonuma.h header

2012-08-22 Thread Andrea Arcangeli
Header that defines the generic AutoNUMA specific functions. All functions are defined unconditionally, but are only linked into the kernel if CONFIG_AUTONUMA=y. When CONFIG_AUTONUMA=n, their call sites are optimized away at build time (or the kernel wouldn't link). Signed-off-by: Andrea

[PATCH 16/36] autonuma: alloc/free/init mm_autonuma

2012-08-22 Thread Andrea Arcangeli
hardware the memory cost is reduced to one pointer per mm. To get rid of the pointer in the each mm, the kernel can be compiled with CONFIG_AUTONUMA=n. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- kernel/fork.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git

[PATCH 36/36] autonuma: add mm_autonuma working set estimation

2012-08-22 Thread Andrea Arcangeli
are never used. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/linux/autonuma_flags.h | 25 ++--- mm/autonuma.c | 25 + 2 files changed, 47 insertions(+), 3 deletions(-) diff --git a/include/linux/autonuma_flags.h b

[PATCH 20/36] autonuma: default mempolicy follow AutoNUMA

2012-08-22 Thread Andrea Arcangeli
-by: Andrea Arcangeli aarca...@redhat.com --- mm/mempolicy.c | 12 ++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index bd92431..19a8f72 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1951,10 +1951,18 @@ retry_cpuset

[PATCH 29/36] autonuma: autonuma_migrate_head[0] dynamic size

2012-08-22 Thread Andrea Arcangeli
Reduce the autonuma_migrate_head array entries from MAX_NUMNODES to num_possible_nodes() or zero if autonuma is not possible. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- arch/x86/mm/numa.c |6 -- arch/x86/mm/numa_32.c |3 ++- include/linux

[PATCH 25/36] autonuma: reset autonuma page data when pages are freed

2012-08-22 Thread Andrea Arcangeli
When pages are freed abort any pending migration. If knuma_migrated arrives first it will notice because get_page_unless_zero would fail. You can safely ignore the #ifdef because a later patch (page_autonuma) clears it. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- mm/page_alloc.c

[PATCH 30/36] autonuma: bugcheck page_autonuma fields on newly allocated pages

2012-08-22 Thread Andrea Arcangeli
Debug tweak. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- include/linux/autonuma.h | 19 +++ mm/page_alloc.c |3 ++- 2 files changed, 21 insertions(+), 1 deletions(-) diff --git a/include/linux/autonuma.h b/include/linux/autonuma.h index 1d87ecc

[PATCH 04/36] autonuma: pte_numa() and pmd_numa()

2012-08-22 Thread Andrea Arcangeli
pages to migrate queues. They are extremely quick, absolutely non-blocking and do not allocate memory. The generic implementation is used when CONFIG_AUTONUMA=n. Acked-by: Rik van Riel r...@redhat.com Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- arch/x86/include/asm/pgtable.h | 65

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
On Wed, Aug 22, 2012 at 02:03:41PM +0800, Xiao Guangrong wrote: On 08/21/2012 11:06 PM, Andrea Arcangeli wrote: CPU0CPU1 oldpage[1] == 0 (both guest host) oldpage[0] = 1 trigger do_wp_page We always do ptep_clear_flush before

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
On Wed, Aug 22, 2012 at 11:51:17AM +0800, Xiao Guangrong wrote: Hmm, in KSM code, i found this code in replace_page: set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma-vm_page_prot)); It is possible to establish a writable pte, no? Hugh already answered this thanks. Further details on the

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
Hi Andrew, On Wed, Aug 22, 2012 at 12:15:35PM -0700, Andrew Morton wrote: On Wed, 22 Aug 2012 18:29:55 +0200 Andrea Arcangeli aarca...@redhat.com wrote: On Wed, Aug 22, 2012 at 02:03:41PM +0800, Xiao Guangrong wrote: On 08/21/2012 11:06 PM, Andrea Arcangeli wrote: CPU0

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-22 Thread Andrea Arcangeli
On Wed, Aug 22, 2012 at 12:58:05PM -0700, Andrew Morton wrote: If you can suggest some text I'll type it in right now. Ok ;), I tried below: This is safe to start by updating the secondary MMUs, because the relevant primary MMU pte invalidate must have already happened with a ptep_clear_flush

Re: [PATCH 19/36] autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection

2012-08-22 Thread Andrea Arcangeli
Hi Andi, On Wed, Aug 22, 2012 at 01:19:04PM -0700, Andi Kleen wrote: Andrea Arcangeli aarca...@redhat.com writes: +/* + * In this function we build a temporal CPU_node-page relation by + * using a two-stage autonuma_last_nid filter to remove short/unlikely + * relations

Re: [PATCH 00/36] AutoNUMA24

2012-08-22 Thread Andrea Arcangeli
On Wed, Aug 22, 2012 at 11:40:48PM +0200, Ingo Molnar wrote: * Rik van Riel r...@redhat.com wrote: On 08/22/2012 10:58 AM, Andrea Arcangeli wrote: Hello everyone, Before the Kernel Summit, I think it's good idea to post a new AutoNUMA24 and to go through a new review cycle

Re: [PATCH 33/36] autonuma: powerpc port

2012-08-22 Thread Andrea Arcangeli
On Thu, Aug 23, 2012 at 08:01:47AM +1000, Benjamin Herrenschmidt wrote: On Wed, 2012-08-22 at 16:59 +0200, Andrea Arcangeli wrote: diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h index 2e0e411..5f03079 100644 --- a/arch/powerpc/include/asm/pgtable.h

Re: [PATCH 19/36] autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection

2012-08-22 Thread Andrea Arcangeli
Hi Andi, On Thu, Aug 23, 2012 at 12:37:33AM +0200, Andi Kleen wrote: This comment seems quite accurate to me (btw I taken it from sched-numa rewrite with minor changes). I had expected it to describe the next function. If it's a strategic overview maybe it should be somewhere else.

Re: [PATCH 33/36] autonuma: powerpc port

2012-08-22 Thread Andrea Arcangeli
Hi Benjamin, On Thu, Aug 23, 2012 at 08:56:34AM +1000, Benjamin Herrenschmidt wrote: What I mean here is that it's fine as a proof of concept ;-) I don't like it being in a series aimed at upstream... We can try to flush out the issues, but as it is, the patch isn't upstreamable imho. Well

Re: [PATCH 33/36] autonuma: powerpc port

2012-08-23 Thread Andrea Arcangeli
Hi Benjamin, On Thu, Aug 23, 2012 at 03:11:00PM +1000, Benjamin Herrenschmidt wrote: Basically PROT_NONE turns into _PAGE_PRESENT without _PAGE_USER for us. Maybe the simplest is to implement pte_numa as !_PAGE_USER too. No need to clear the _PAGE_PRESENT bit and to alter pte_present() if

Re: thp and memory barrier assumptions

2012-08-03 Thread Andrea Arcangeli
ad51771a2c3fa697fa0267edda23b48d0b85f023 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli aarca...@redhat.com Date: Fri, 3 Aug 2012 21:10:44 +0200 Subject: [PATCH] thp: document barrier() in wrprotect THP fault path Inline doc. Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- mm/memory.c |6 ++ 1 files

Re: [RFC] page-table walkers vs memory order

2012-08-04 Thread Andrea Arcangeli
On Tue, Jul 24, 2012 at 02:51:05PM -0700, Hugh Dickins wrote: Since then, I think THP has made the rules more complicated; but I believe Andrea paid a great deal of attention to that kind of issue. There were many issues, one unexpected was 1a5a9906d4e8d1976b701f889d8f35d54b928f25. Keep in

Re: [RFC] page-table walkers vs memory order

2012-08-04 Thread Andrea Arcangeli
On Sat, Aug 04, 2012 at 03:02:45PM -0700, Paul E. McKenney wrote: OK, I'll bite. ;-) :)) The most sane way for this to happen is with feedback-driven techniques involving profiling, similar to what is done for basic-block reordering or branch prediction. The idea is that you compile the

Re: [PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page

2012-08-17 Thread Andrea Arcangeli
On Fri, Aug 17, 2012 at 11:12:33AM +0300, Kirill A. Shutemov wrote: I've used do_huge_pmd_wp_page_fallback() as template for my code. What's difference between these two code paths? Why is do_huge_pmd_wp_page_fallback() safe? Good point. do_huge_pmd_wp_page_fallback works only on the current

Re: [PATCH 00/19] sched-numa rewrite

2012-08-17 Thread Andrea Arcangeli
Hi, On Wed, Aug 08, 2012 at 02:43:34PM -0400, Rik van Riel wrote: While the sched-numa code is relatively small and clean, the current version does not seem to offer a significant performance improvement over not having it, and in one of the tests performance actually regresses vs. mainline.

Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

2012-08-21 Thread Andrea Arcangeli
(set_pte_at_notify must always run under the PT lock of course). How about this: = From 160a0b1b2be9bf96c45b30d9423f8196ecebe351 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli aarca...@redhat.com Date: Tue, 21 Aug 2012 16:48:11 +0200 Subject: [PATCH] mmu_notifier: fix race in set_pte_at_notify usage

Re: [PATCH] mmu notifiers #v5

2008-02-04 Thread Andrea Arcangeli
On Mon, Feb 04, 2008 at 11:09:01AM -0800, Christoph Lameter wrote: On Sun, 3 Feb 2008, Andrea Arcangeli wrote: Right but that pin requires taking a refcount which we cannot do. GRU can use my patch without the pin. XPMEM obviously can't use my patch as my invalidate_page[s] are under

Re: [PATCH] mmu notifiers #v5

2008-02-05 Thread Andrea Arcangeli
On Mon, Feb 04, 2008 at 10:11:24PM -0800, Christoph Lameter wrote: Zero problems only if you find having a single callout for every page acceptable. So the invalidate_range in your patch is only working invalidate_pages is only a further optimization that was strightforward in some places

Re: [PATCH] mmu notifiers #v5

2008-02-05 Thread Andrea Arcangeli
On Tue, Feb 05, 2008 at 10:17:41AM -0800, Christoph Lameter wrote: The other approach will not have any remote ptes at that point. Why would there be a coherency issue? It never happens that two threads writes to two different physical pages by working on the same process virtual address. This

<    5   6   7   8   9   10   11   12   13   14   >