[PATCH 6/7] sched: rt-group: per group period

2008-01-04 Thread Peter Zijlstra
Steven asked for per group periods in order to get closer to RMA or EDF scheduling. Use the fancy new hrtimers to provide a per group period Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/sched.h|2 kernel/sched.c | 225

[PATCH 1/7] sched: rt throttling vs no_hz

2008-01-04 Thread Peter Zijlstra
We need to teach no_hz about the rt throttling because its tick driven. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/sched.h|2 ++ kernel/sched.c | 23 ++- kernel/sched_rt.c| 30 -- kernel/time

Re: [PATCH 6/7] sched: rt-group: per group period

2008-01-05 Thread Peter Zijlstra
Could you please fold this into the 6/7 patch. It reverts a wandering chunk (the 32768 thing), but more importantly it fixes !FAIR_GROUP_SCHED compilation. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- kernel/sched.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions

Re: 2.6.24-rc6: possible recursive locking detected

2008-01-05 Thread Peter Zijlstra
On Sat, 2008-01-05 at 18:12 +1100, Herbert Xu wrote: On Fri, Jan 04, 2008 at 09:30:49AM +0100, Ingo Molnar wrote: [ 1310.670986] = [ 1310.671690] [ INFO: possible recursive locking detected ] [ 1310.672097] 2.6.24-rc6 #1 [

Re: 2.6.24-rc6: possible recursive locking detected

2008-01-05 Thread Peter Zijlstra
On Sat, 2008-01-05 at 17:53 +0100, Peter Zijlstra wrote: On Sat, 2008-01-05 at 18:12 +1100, Herbert Xu wrote: On Fri, Jan 04, 2008 at 09:30:49AM +0100, Ingo Molnar wrote: [ 1310.670986] = [ 1310.671690] [ INFO: possible recursive locking

Re: [PATCH] block2mtd lockdep_init_map warning

2008-01-06 Thread Peter Zijlstra
On Sun, 2008-01-06 at 14:13 +0100, Jörn Engel wrote: Ingo, Peter, does either of you actually care about this problem? In the last round when I debugged this problem there was a notable lack of reaction from either of you. Yeah I do, I just know very little about the module stuff and

[PATCH 11/11] sched: rt-group: interface

2008-01-06 Thread Peter Zijlstra
Change the rt_ratio interface to rt_runtime_us, to match rt_period_us. This avoids picking a granularity for the ratio. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/sched.h |8 +++ kernel/sched.c| 116 ++ kernel

[PATCH 02/11] sched: load_balance_monitor rename

2008-01-06 Thread Peter Zijlstra
don't start the load_balance_monitor when there is only a single cpu. rename the kthread because its currently longer than TASK_COMM_LEN Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- kernel/sched.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) Index: linux-2.6/kernel

[PATCH 10/11] sched: rt-group: EDF

2008-01-06 Thread Peter Zijlstra
Use a simple Ealiest Deadline First implementation to schedule the realtime groups. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/sched.h |1 kernel/sched.c| 13 + kernel/sched_rt.c | 115 +++--- 3 files

[PATCH 08/11] sched: rt-group: deal with PI

2008-01-06 Thread Peter Zijlstra
Steven mentioned the fun case where a lock holding task will be throttled. Simple fix: allow groups that have boosted tasks to run anyway. This is ofcourse not quite correct. Needs more tricks. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- kernel/sched.c|3 +++ kernel/sched_rt.c

[PATCH 01/11] sched: rt throttling vs no_hz

2008-01-06 Thread Peter Zijlstra
We need to teach no_hz about the rt throttling because its tick driven. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/sched.h|2 ++ kernel/sched.c | 23 ++- kernel/sched_rt.c| 30 -- kernel/time

[PATCH 00/11] another rt group sched update

2008-01-06 Thread Peter Zijlstra
this time compile tested on all 16 combinations of: CONFIG_SMP CONFIG_FAIR_GROUP_SCHED CONFIG_HIGH_RES_TIMERS CONFIG_NO_HZ ran some but not all combinations -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More

[PATCH 03/11] hrtimer: clean up cpu-base locking tricks

2008-01-06 Thread Peter Zijlstra
In order to more easily allow for the scheduler to use timers, clean up the locking a bit. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- kernel/hrtimer.c | 109 +++ kernel/time/tick-sched.c |8 --- 2 files changed, 102 insertions

[PATCH 05/11] hrtimer: unlock hrtimer_wakeup

2008-01-06 Thread Peter Zijlstra
hrtimer_wakeup creates a base-lock rq-lock lock dependancy. Avoid this by switching to HRTIMER_CB_IRQSAFE_NO_SOFTIRQ which doesn't hold base-lock. This fully untangles hrtimer locks from the scheduler locks, and allows hrtimer usage in the scheduler proper. Signed-off-by: Peter Zijlstra

[PATCH 09/11] sched: rt-group: dynamic period ticks

2008-01-06 Thread Peter Zijlstra
Disable the period updates for inactive groups. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- kernel/sched.c| 158 -- kernel/sched_rt.c | 54 ++ 2 files changed, 53 insertions(+), 159 deletions(-) Index: linux-2.6

[PATCH 04/11] hrtimer: fixup the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ fallback

2008-01-06 Thread Peter Zijlstra
Currently all highres=off timers are run from softirq context, but HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context. Fix this up by splitting it similar to the highres=on case. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/hrtimer.h |5 - kernel

[PATCH 06/11] sched: rt-group: reduce rescheduling

2008-01-06 Thread Peter Zijlstra
Only reschedule if the new group has a higher prio task. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- kernel/sched_rt.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/sched_rt.c

[PATCH 07/11] sched: rt-group: per group period

2008-01-06 Thread Peter Zijlstra
Steven asked for per group periods in order to get closer to RMA or EDF scheduling. Use the fancy new hrtimers to provide a per group period Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/sched.h|2 kernel/sched.c | 229

Re: [stable] [PATCH] lockdep: fix mismatched lockdep_depth/curr_chain_hash

2007-10-25 Thread Peter Zijlstra
On Mon, 2007-10-08 at 10:39 -0700, Greg KH wrote: On Mon, Oct 08, 2007 at 07:36:10PM +0200, Peter Zijlstra wrote: On Mon, 2007-10-08 at 10:24 -0700, Greg KH wrote: On Fri, Oct 05, 2007 at 11:31:26AM +0200, Peter Zijlstra wrote: Stable team, please consider this patch

Re: aim7 -30% regression in 2.6.24-rc1

2007-10-26 Thread Peter Zijlstra
On Fri, 2007-10-26 at 17:43 +0800, Zhang, Yanmin wrote: I tested 2.6.24-rc1 on my x86_64 machine which has 2 quad-core processors. Comparing with 2.6.23, aim7 has about -30% regression. I did a bisect and found patch

Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)

2007-10-26 Thread Peter Zijlstra
On Wed, 2007-10-03 at 15:35 +0200, Kay Sievers wrote: On Wed, 2007-10-03 at 12:37 +0200, Peter Zijlstra wrote: On Wed, 2007-10-03 at 12:15 +0200, Kay Sievers wrote: On Tue, 2007-10-02 at 22:05 +1000, Nick Piggin wrote: On Tuesday 02 October 2007 21:40, Peter Zijlstra wrote: On Tue

Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)

2007-10-26 Thread Peter Zijlstra
On Fri, 2007-10-26 at 17:33 +0200, Kay Sievers wrote: On Fri, 2007-10-26 at 17:22 +0200, Peter Zijlstra wrote: On Fri, 2007-10-26 at 17:10 +0200, Kay Sievers wrote: On Fri, 2007-10-26 at 16:48 +0200, Peter Zijlstra wrote: I appreciate the sysfs people their opinion that /sys/bdi

Re: 2.6.24-rc1: First impressions

2007-10-26 Thread Peter Zijlstra
On Fri, 2007-10-26 at 17:22 +0200, Ingo Molnar wrote: * Martin Knoblauch [EMAIL PROTECTED] wrote: Hi , just to give some feedback on 2.6.24-rc1. For some time I am tracking IO/writeback problems that hurt system responsiveness big-time. I tested Peters stuff together with

Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)

2007-10-26 Thread Peter Zijlstra
On Fri, 2007-10-26 at 17:10 +0200, Kay Sievers wrote: On Fri, 2007-10-26 at 16:48 +0200, Peter Zijlstra wrote: I appreciate the sysfs people their opinion that /sys/bdi/ might not be the best from their POV, however I'm not seeing where to hook the BDI object from so that it all makes

Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)

2007-10-26 Thread Peter Zijlstra
This crashes and burns on bootup, but I'm too tired to figure out what I did wrong... will give it another try tomorrow.. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- block/genhd.c |2 fs/fuse/inode.c |2 fs/nfs/client.c |2 include

Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)

2007-10-26 Thread Peter Zijlstra
On Fri, 2007-10-26 at 22:04 +0200, Peter Zijlstra wrote: This crashes and burns on bootup, but I'm too tired to figure out what I did wrong... will give it another try tomorrow.. Ok, can't sleep.. took a look. I have several problems here. The thing that makes it go *boom* is the __ATTR_NULL

Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)

2007-10-27 Thread Peter Zijlstra
On Fri, 2007-10-26 at 19:40 -0700, Greg KH wrote: On Sat, Oct 27, 2007 at 03:18:08AM +0200, Peter Zijlstra wrote: On Fri, 2007-10-26 at 22:04 +0200, Peter Zijlstra wrote: This crashes and burns on bootup, but I'm too tired to figure out what I did wrong... will give it another try

Networked filesystems vs backing_dev_info

2007-10-27 Thread Peter Zijlstra
Hi, I had me a little look at bdi usage in networked filesystems. NFS, CIFS, (smbfs), AFS, CODA and NCP And of those, NFS is the only one that I could find that creates backing_dev_info structures. The rest seems to fall back to default_backing_dev_info. With my recent per bdi dirty limit

Re: BUG: lock held when returning to user space

2007-10-27 Thread Peter Zijlstra
On Sat, 2007-10-27 at 17:12 +0200, Jiri Kosina wrote: On Sat, 27 Oct 2007, Gabriel C wrote: I found that today in dmesg after booting current git ( ec3b67c11df42362ccda81261d62829042f223f0 ) : ... [ 592.752777] [ 592.752781] [

Re: Networked filesystems vs backing_dev_info

2007-10-27 Thread Peter Zijlstra
On Sat, 2007-10-27 at 11:22 -0400, Jan Harkes wrote: On Sat, Oct 27, 2007 at 11:34:26AM +0200, Peter Zijlstra wrote: I had me a little look at bdi usage in networked filesystems. NFS, CIFS, (smbfs), AFS, CODA and NCP And of those, NFS is the only one that I could find that creates

Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)

2007-10-27 Thread Peter Zijlstra
On Sat, 2007-10-27 at 09:02 -0700, Greg KH wrote: Ah, I see a few problems. Here, try this version instead. It's compile-tested only, and should be a lot simpler. Note, we still are not setting the parent to the new bdi structure properly, so the devices will show up in

Re: BUG: lock held when returning to user space

2007-10-27 Thread Peter Zijlstra
On Sat, 2007-10-27 at 08:47 -0700, Arjan van de Ven wrote: On Sat, 27 Oct 2007 17:12:41 +0200 (CEST) Jiri Kosina [EMAIL PROTECTED] wrote: On Sat, 27 Oct 2007, Gabriel C wrote: I found that today in dmesg after booting current git ( ec3b67c11df42362ccda81261d62829042f223f0 ) :

Re: [PATCH 1/2] irq_flags_t: intro and core annotations

2007-10-27 Thread Peter Zijlstra
On Sun, 2007-10-28 at 00:14 +0400, Alexey Dobriyan wrote: On Sat, Oct 27, 2007 at 09:20:43PM +0200, Roman Zippel wrote: On Sun, 21 Oct 2007, Alexey Dobriyan wrote: So far remedies were: a) grep(1) -- obviously fragile. I tried at some point grepping for spin_lock_irqsave(), found

Re: Networked filesystems vs backing_dev_info

2007-10-27 Thread Peter Zijlstra
On Sat, 2007-10-27 at 16:02 -0500, Steve French wrote: On 10/27/07, Peter Zijlstra [EMAIL PROTECTED] wrote: Hi, I had me a little look at bdi usage in networked filesystems. NFS, CIFS, (smbfs), AFS, CODA and NCP And of those, NFS is the only one that I could find that creates

Re: per BDI dirty limit (was Re: -mm merge plans for 2.6.24)

2007-10-27 Thread Peter Zijlstra
On Sat, 2007-10-27 at 23:08 +0200, Kay Sievers wrote: On Sat, 2007-10-27 at 09:02 -0700, Greg KH wrote: Ah, I see a few problems. Here, try this version instead. It's compile-tested only, and should be a lot simpler. Note, we still are not setting the parent to the new bdi structure

Re: Networked filesystems vs backing_dev_info

2007-10-27 Thread Peter Zijlstra
On Sat, 2007-10-27 at 23:30 +0200, Peter Zijlstra wrote: On Sat, 2007-10-27 at 16:02 -0500, Steve French wrote: On 10/27/07, Peter Zijlstra [EMAIL PROTECTED] wrote: Hi, I had me a little look at bdi usage in networked filesystems. NFS, CIFS, (smbfs), AFS, CODA and NCP

Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

2007-10-29 Thread Peter Zijlstra
On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote: On 10/29/07, Andrew Morton [EMAIL PROTECTED] wrote: On Mon, 22 Oct 2007 16:40:57 +0200 Stefani Seibold [EMAIL PROTECTED] wrote: The problem original occurs with the fb_defio driver (driver/video/fb_defio.c). This driver use

Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

2007-10-29 Thread Peter Zijlstra
On Mon, 2007-10-29 at 11:11 +0100, Peter Zijlstra wrote: On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote: On 10/29/07, Andrew Morton [EMAIL PROTECTED] wrote: On Mon, 22 Oct 2007 16:40:57 +0200 Stefani Seibold [EMAIL PROTECTED] wrote: The problem original occurs

Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

2007-10-29 Thread Peter Zijlstra
On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote: On 10/29/07, Andrew Morton [EMAIL PROTECTED] wrote: On Mon, 22 Oct 2007 16:40:57 +0200 Stefani Seibold [EMAIL PROTECTED] wrote: The problem original occurs with the fb_defio driver (driver/video/fb_defio.c). This driver use

Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

2007-10-29 Thread Peter Zijlstra
On Mon, 2007-10-29 at 13:51 -0400, Jaya Kumar wrote: On 10/29/07, Peter Zijlstra [EMAIL PROTECTED] wrote: On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote: An aside, I just tested that deferred IO works fine on 2.6.22.10/pxa255. I understood from the thread that PeterZ is looking

Re: [stable] 2.6.23 regression: top displaying 9999% CPU usage

2007-10-29 Thread Peter Zijlstra
On Mon, 2007-10-29 at 21:41 +0100, Ingo Molnar wrote: * Christian Borntraeger [EMAIL PROTECTED] wrote: - return clock_t_to_cputime(utime); + p-prev_utime = max(p-prev_utime, clock_t_to_cputime(utime)); + return p-prev_utime; } [...] I dont think it will work. It will make

Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

2007-10-29 Thread Peter Zijlstra
On Mon, 2007-10-29 at 19:17 +0100, Peter Zijlstra wrote: On Mon, 2007-10-29 at 13:51 -0400, Jaya Kumar wrote: On 10/29/07, Peter Zijlstra [EMAIL PROTECTED] wrote: On Mon, 2007-10-29 at 01:17 -0700, Jaya Kumar wrote: An aside, I just tested that deferred IO works fine on 2.6.22.10/pxa255

Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

2007-10-30 Thread Peter Zijlstra
On Mon, 2007-10-29 at 21:22 -0400, Jaya Kumar wrote: On 10/29/07, Peter Zijlstra [EMAIL PROTECTED] wrote: [ also, remap_vmalloc_range() suffers similar issues, only file and anon have proper rmap ] I'm not sure we want full rmap for remap_pfn/vmalloc_range, but perhaps we could

Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

2007-10-30 Thread Peter Zijlstra
On Tue, 2007-10-30 at 12:39 +, Hugh Dickins wrote: On Tue, 30 Oct 2007, Stefani Seibold wrote: the question is how can i get all pte's from a vmalloc'ed memory. Due to the zeroed mapping pointer i dont see how to do this? The mapping pointer is zeroed because you've done nothing to

Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

2007-10-30 Thread Peter Zijlstra
On Tue, 2007-10-30 at 09:16 -0400, Jaya Kumar wrote: On 10/30/07, Peter Zijlstra [EMAIL PROTECTED] wrote: So page-index does what you want it to, identify which part of the framebuffer this particular page belongs to. Ok. I'm attempting to walk the code sequence. Here's what I think

Re: vm_ops.page_mkwrite() fails with vmalloc on 2.6.23

2007-10-30 Thread Peter Zijlstra
On Tue, 2007-10-30 at 15:47 +, Hugh Dickins wrote: On Tue, 30 Oct 2007, Peter Zijlstra wrote: On Tue, 2007-10-30 at 09:16 -0400, Jaya Kumar wrote: - defio mmap adds this vma to private list (equivalent of address_space or anon_vma) - foreach vma { foreach page

[PATCH 32/33] nfs: fix various memory recursions possible with swap over NFS.

2007-10-30 Thread Peter Zijlstra
GFP_NOFS is not enough, since swap traffic is IO, hence fall back to GFP_NOIO. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/nfs/pagelist.c |2 +- fs/nfs/write.c|6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) Index: linux-2.6/fs/nfs/write.c

[PATCH 05/33] mm: kmem_estimate_pages()

2007-10-30 Thread Peter Zijlstra
Provide a method to get the upper bound on the pages needed to allocate a given number of objects from a given kmem_cache. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/slab.h |3 + mm/slub.c| 82 +++ 2 files

[PATCH 24/33] mm: prepare swap entry methods for use in page methods

2007-10-30 Thread Peter Zijlstra
Move around the swap entry methods in preparation for use from page methods. Also provide a function to obtain the swap_info_struct backing a swap cache page. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/mm.h |8 include/linux/swap.h| 48

[PATCH 03/33] mm: slub: add knowledge of reserve pages

2007-10-30 Thread Peter Zijlstra
Restrict objects from reserve slabs (ALLOC_NO_WATERMARKS) to allocation contexts that are entitled to it. Care is taken to only touch the SLUB slow path. This is done to ensure reserve pages don't leak out and get consumed. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux

[PATCH 16/33] netvm: network reserve infrastructure

2007-10-30 Thread Peter Zijlstra
to charge SKB data against in the fallback path. The consumers for these reserves are sockets marked with: SOCK_MEMALLOC Such sockets are to be used to service the VM (iow. to swap over). They must be handled kernel side, exposing such a socket to user-space is a BUG. Signed-off-by: Peter Zijlstra

[PATCH 08/33] mm: emergency pool

2007-10-30 Thread Peter Zijlstra
Provide means to reserve a specific amount of pages. The emergency pool is separated from the min watermark because ALLOC_HARDER and ALLOC_HIGH modify the watermark in a relative way and thus do not ensure a strict minimum. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux

[PATCH 30/33] nfs: swap vs nfs_writepage

2007-10-30 Thread Peter Zijlstra
For now just use the -writepage() path for swap traffic. Trond would like to see -swap_page() or some such additional a_op. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/nfs/write.c | 23 +++ 1 file changed, 23 insertions(+) Index: linux-2.6/fs/nfs/write.c

[PATCH 26/33] mm: methods for teaching filesystems about PG_swapcache pages

2007-10-30 Thread Peter Zijlstra
the expected result, even for PG_swapcache pages. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/mm.h | 26 ++ include/linux/pagemap.h |2 +- 2 files changed, 27 insertions(+), 1 deletion(-) Index: linux-2.6/include/linux/mm.h

[PATCH 02/33] mm: tag reseve pages

2007-10-30 Thread Peter Zijlstra
Tag pages allocated from the reserves with a non-zero page-reserve. This allows us to distinguish and account reserve pages. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/mm_types.h |1 + mm/page_alloc.c |4 +++- 2 files changed, 4 insertions(+), 1 deletion

[PATCH 28/33] nfs: teach the NFS client how to treat PG_swapcache pages

2007-10-30 Thread Peter Zijlstra
Replace all relevant occurences of page-index and page-mapping in the NFS client with the new page_file_index() and page_file_mapping() functions. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/nfs/file.c |8 fs/nfs/internal.h |7 --- fs/nfs/pagelist.c |6

[PATCH 22/33] netfilter: NF_QUEUE vs emergency skbs

2007-10-30 Thread Peter Zijlstra
Avoid memory getting stuck waiting for userspace, drop all emergency packets. This of course requires the regular storage route to not include an NF_QUEUE target ;-) Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- net/netfilter/core.c |3 +++ 1 file changed, 3 insertions(+) Index: linux

[PATCH 19/33] netvm: hook skb allocation to reserves

2007-10-30 Thread Peter Zijlstra
overhead to be limited to the later kind. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/mm_types.h |1 include/linux/skbuff.h | 25 +- net/core/skbuff.c| 173 +-- 3 files changed, 173 insertions(+), 26 deletions

[PATCH 13/33] net: wrap sk-sk_backlog_rcv()

2007-10-30 Thread Peter Zijlstra
Wrap calling sk-sk_backlog_rcv() in a function. This will allow extending the generic sk_backlog_rcv behaviour. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/net/sock.h |5 + net/core/sock.c |4 ++-- net/ipv4/tcp.c |2 +- net/ipv4/tcp_timer.c |2

[PATCH 23/33] netvm: skb processing

2007-10-30 Thread Peter Zijlstra
In order to make sure emergency packets receive all memory needed to proceed ensure processing of emergency SKBs happens under PF_MEMALLOC. Use the (new) sk_backlog_rcv() wrapper to ensure this for backlog processing. Skip taps, since those are user-space again. Signed-off-by: Peter Zijlstra

[PATCH 25/33] mm: add support for non block device backed swap files

2007-10-30 Thread Peter Zijlstra
that the VM relies on it, and the address_space should take adequate measures (like reserving memory for mempools or the like). Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- Documentation/filesystems/Locking |9 + include/linux/buffer_head.h |2 - include/linux/fs.h

[PATCH 20/33] netvm: filter emergency skbs.

2007-10-30 Thread Peter Zijlstra
Toss all emergency packets not for a SOCK_MEMALLOC socket. This ensures our precious memory reserve doesn't get stuck waiting for user-space. The correctness of this approach relies on the fact that networks must be assumed lossy. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/net

[PATCH 01/33] mm: gfp_to_alloc_flags()

2007-10-30 Thread Peter Zijlstra
Factor out the gfp to alloc_flags mapping so it can be used in other places. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- mm/internal.h | 11 ++ mm/page_alloc.c | 98 2 files changed, 67 insertions(+), 42 deletions

[PATCH 10/33] mm: __GFP_MEMALLOC

2007-10-30 Thread Peter Zijlstra
__GFP_MEMALLOC will allow the allocation to disregard the watermarks, much like PF_MEMALLOC. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/gfp.h |3 ++- mm/page_alloc.c |4 +++- 2 files changed, 5 insertions(+), 2 deletions(-) Index: linux-2.6/include/linux

[PATCH 17/33] sysctl: propagate conv errors

2007-10-30 Thread Peter Zijlstra
Currently conv routines will only generate -EINVAL, allow for other errors to be propagetd. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- kernel/sysctl.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) Index: linux-2.6/kernel/sysctl.c

[PATCH 14/33] net: packet split receive api

2007-10-30 Thread Peter Zijlstra
Add some packet-split receive hooks. For one this allows to do NUMA node affine page allocs. Later on these hooks will be extended to do emergency reserve allocations for fragments. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- drivers/net/e1000/e1000_main.c |8 ++-- drivers/net

[PATCH 15/33] net: sk_allocation() - concentrate socket related allocations

2007-10-30 Thread Peter Zijlstra
Introduce sk_allocation(), this function allows to inject sock specific flags to each sock related allocation. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/net/sock.h|7 ++- net/ipv4/tcp_output.c | 11 ++- net/ipv6/tcp_ipv6.c | 14 +- 3 files

[PATCH 07/33] mm: serialize access to min_free_kbytes

2007-10-30 Thread Peter Zijlstra
There is a small race between the procfs caller and the memory hotplug caller of setup_per_zone_pages_min(). Not a big deal, but the next patch will add yet another caller. Time to close the gap. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- mm/page_alloc.c | 16 +--- 1 file

[PATCH 11/33] mm: memory reserve management

2007-10-30 Thread Peter Zijlstra
Generic reserve management code. It provides methods to reserve and charge. Upon this, generic alloc/free style reserve pools could be build, which could fully replace mempool_t functionality. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/reserve.h | 54 + mm/Makefile

[PATCH 21/33] netvm: prevent a TCP specific deadlock

2007-10-30 Thread Peter Zijlstra
the SOCK_MEMALLOC sockets from the rmem limit. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/net/sock.h |7 --- net/core/stream.c |5 +++-- 2 files changed, 7 insertions(+), 5 deletions(-) Index: linux-2.6/include/net/sock.h

[PATCH 29/33] nfs: disable data cache revalidation for swapfiles

2007-10-30 Thread Peter Zijlstra
races with invalidate_inode_pages2() to deal with. And since we cannot set PG_private we cannot use page-private (which is already used by PG_swapcache pages anyway) to store the nfs_page. Thus augment the new nfs_page_find_request logic. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs

[PATCH 00/33] Swap over NFS -v14

2007-10-30 Thread Peter Zijlstra
Hi, Another posting of the full swap over NFS series. [ I tried just posting the first part last time around, but that just gets more confusion by lack of a general picture ] [ patches against 2.6.23-mm1, also to be found online at:

[PATCH 27/33] nfs: remove mempools

2007-10-30 Thread Peter Zijlstra
With the introduction of the shared dirty page accounting in .19, NFS should not be able to surpise the VM with all dirty pages. Thus it should always be able to free some memory. Hence no more need for mempools. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/nfs/read.c | 15

[PATCH 31/33] nfs: enable swap on NFS

2007-10-30 Thread Peter Zijlstra
and the early (re)setting of SOCK_MEMALLOC should allow us to receive the packets required for the TCP connection buildup. (swapping continues over a server reset during heavy network traffic) Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/Kconfig | 18 fs

[PATCH 04/33] mm: allow mempool to fall back to memalloc reserves

2007-10-30 Thread Peter Zijlstra
Allow the mempool to use the memalloc reserves when all else fails and the allocation context would otherwise allow it. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- mm/mempool.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) Index: linux-2.6/mm/mempool.c

[PATCH 06/33] mm: allow PF_MEMALLOC from softirq context

2007-10-30 Thread Peter Zijlstra
Allow PF_MEMALLOC to be set in softirq context. When running softirqs from a borrowed context save current-flags, ksoftirqd will have its own task_struct. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/sched.h |4 kernel/softirq.c |3 +++ mm/page_alloc.c

[PATCH 12/33] selinux: tag avc cache alloc as non-critical

2007-10-30 Thread Peter Zijlstra
Failing to allocate a cache entry will only harm performance not correctness. Do not consume valuable reserve pages for something like that. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] Acked-by: James Morris [EMAIL PROTECTED] --- security/selinux/avc.c |2 +- 1 file changed, 1 insertion

[PATCH 18/33] netvm: INET reserves.

2007-10-30 Thread Peter Zijlstra
reserve protocol TX pages network RX reserve + IPv6 route cache + IPv4 route cache SKB data reserve + IPv6 fragment cache + IPv4 fragment cache Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/sysctl.h

[PATCH 33/33] nfs: do not warn on radix tree node allocation failures

2007-10-30 Thread Peter Zijlstra
GFP_ATOMIC failures are rather common, no not warn about them. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/nfs/inode.c |2 +- fs/nfs/write.c | 10 ++ 2 files changed, 11 insertions(+), 1 deletion(-) Index: linux-2.6/fs/nfs/inode.c

[PATCH 09/33] mm: system wide ALLOC_NO_WATERMARK

2007-10-30 Thread Peter Zijlstra
Change ALLOC_NO_WATERMARK page allocation such that the reserves are system wide - which they are per setup_per_zone_pages_min(), when we scrape the barrel, do it properly. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- mm/page_alloc.c |6 ++ 1 file changed, 6 insertions(+) Index

Re: [RFC] Create kinst/ or ki/ directory ?

2007-10-30 Thread Peter Zijlstra
On Tue, 2007-10-30 at 13:24 -0400, Mathieu Desnoyers wrote: * Jeff Garzik ([EMAIL PROTECTED]) wrote: ... Pick a shorter word like probes or profile or what... or better yet... just leave most things in their current directories. ... How about something along the kinst or ki

Re: [PATCH 23/33] netvm: skb processing

2007-10-30 Thread Peter Zijlstra
On Tue, 2007-10-30 at 14:26 -0700, Stephen Hemminger wrote: On Tue, 30 Oct 2007 17:04:24 +0100 Peter Zijlstra [EMAIL PROTECTED] wrote: In order to make sure emergency packets receive all memory needed to proceed ensure processing of emergency SKBs happens under PF_MEMALLOC. Use

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Peter Zijlstra
On Tue, 2007-10-30 at 21:37 -0700, David Miller wrote: From: Nick Piggin [EMAIL PROTECTED] Date: Wed, 31 Oct 2007 14:26:32 +1100 Is it really worth all the added complexity of making swap over NFS files work, given that you could use a network block device instead? Don't be misled.

Re: aim7 -30% regression in 2.6.24-rc1

2007-10-31 Thread Peter Zijlstra
-by: Peter Zijlstra [EMAIL PROTECTED] --- diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 3b4efbe..0f34c91 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -94,6 +94,7 @@ static int two = 2; static int zero; static int one_hundred = 100; +static int int_max = INT_MAX; /* this is needed

Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 14:37 +1100, Nick Piggin wrote: On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote: Restrict objects from reserve slabs (ALLOC_NO_WATERMARKS) to allocation contexts that are entitled to it. Care is taken to only touch the SLUB slow path. This is done

Re: [PATCH 05/33] mm: kmem_estimate_pages()

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 14:43 +1100, Nick Piggin wrote: On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote: Provide a method to get the upper bound on the pages needed to allocate a given number of objects from a given kmem_cache. Fair enough, but just to make it a bit easier, can

Re: [PATCH 06/33] mm: allow PF_MEMALLOC from softirq context

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 14:51 +1100, Nick Piggin wrote: On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote: Allow PF_MEMALLOC to be set in softirq context. When running softirqs from a borrowed context save current-flags, ksoftirqd will have its own task_struct. What's

Re: [PATCH 09/33] mm: system wide ALLOC_NO_WATERMARK

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 14:52 +1100, Nick Piggin wrote: On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote: Change ALLOC_NO_WATERMARK page allocation such that the reserves are system wide - which they are per setup_per_zone_pages_min(), when we scrape the barrel, do it properly

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 08:50 +, Christoph Hellwig wrote: On Tue, Oct 30, 2007 at 09:37:53PM -0700, David Miller wrote: Don't be misled. Swapping over NFS is just a scarecrow for the seemingly real impetus behind these changes which is network storage stuff like iSCSI. So can we

Re: NBD was Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 12:18 +0100, Pavel Machek wrote: Hi! So please get the VM bits for swap over network blockdevices in first, Trouble with that part is that we don't have any sane network block devices atm, NBD is utter crap, and iSCSI is too complex to be called sane. Hey,

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 14:26 +1100, Nick Piggin wrote: On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote: Hi, Another posting of the full swap over NFS series. Hi, Is it really worth all the added complexity of making swap over NFS files work, given that you could use a network

Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 21:46 +1100, Nick Piggin wrote: On Wednesday 31 October 2007 21:42, Peter Zijlstra wrote: On Wed, 2007-10-31 at 14:37 +1100, Nick Piggin wrote: On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote: Restrict objects from reserve slabs (ALLOC_NO_WATERMARKS

Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 22:25 +1100, Nick Piggin wrote: On Wednesday 31 October 2007 23:17, Peter Zijlstra wrote: On Wed, 2007-10-31 at 21:46 +1100, Nick Piggin wrote: And I'd prevent these ones from doing so. Without keeping track of reserve pages, which doesn't feel too clean

Re: [PATCH 00/33] Swap over NFS -v14

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 08:16 -0400, Jeff Garzik wrote: Thoughts: 1) I absolutely agree that NFS is far more prominent and useful than any network block device, at the present time. 2) Nonetheless, swap over NFS is a pretty rare case. I view this work as interesting, but I really don't

Re: [PATCH 06/33] mm: allow PF_MEMALLOC from softirq context

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 21:49 +1100, Nick Piggin wrote: On Wednesday 31 October 2007 21:42, Peter Zijlstra wrote: On Wed, 2007-10-31 at 14:51 +1100, Nick Piggin wrote: On Wednesday 31 October 2007 03:04, Peter Zijlstra wrote: Allow PF_MEMALLOC to be set in softirq context. When running

Re: [PATCH 03/33] mm: slub: add knowledge of reserve pages

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 13:54 +0100, Peter Zijlstra wrote: On Wed, 2007-10-31 at 22:25 +1100, Nick Piggin wrote: What I'm saying is that the slab allocator slowpath should always just check watermarks against the current task. Instead of this -reserve stuff. So what you say is to allocate

Re: sched: fix new task startup crash

2007-10-31 Thread Peter Zijlstra
Hi, Commit: b9dca1e0fcb696716840a3bc8f20a6941b484dbf seems to me that by calling enqueue_fair_task() from task_new_fair() is wrong. The wakeup=1 in enqueue_fair_task() will cause all non-top sched_entities to be re-positioned by place_entity(). Although the current implementation thereof seems

Re: [PATCH] lockdep: fix mismatched lockdep_depth/curr_chain_hash

2007-10-31 Thread Peter Zijlstra
On Wed, 2007-10-31 at 11:44 -0400, Gregory Haskins wrote: Hi Greg, Here is the backported version of the patch. I applied it on top of 2.6.22.10. Let me know if you have any issues. -Greg Thanks Gregory! - To unsubscribe from this list: send the line unsubscribe linux-kernel in

[PATCH 6/6] sched: place_entity() comments

2007-10-31 Thread Peter Zijlstra
Add a few comments to place_entity(). Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- kernel/sched_fair.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) Index: linux-2.6/kernel/sched_fair.c === --- linux

[PATCH 2/6] sched: make sched_slice() group scheduling savvy

2007-10-31 Thread Peter Zijlstra
-by: Peter Zijlstra [EMAIL PROTECTED] CC: Srivatsa Vaddagiri [EMAIL PROTECTED] --- kernel/sched_fair.c | 42 +++--- 1 file changed, 31 insertions(+), 11 deletions(-) Index: linux-2.6/kernel/sched_fair.c

<    1   2   3   4   5   6   7   8   9   10   >