Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c

2007-02-21 Thread Srivatsa Vaddagiri
On Wed, Feb 21, 2007 at 05:30:10PM +0300, Oleg Nesterov wrote:
> Agreed. Note that we don't need the new "del_work". It is always safe to
> use cancel_work_sync() if we know that the workqueue is frozen, it won't
> block. We can also do
> 
>   if (!cancel_delayed_work())
>   cancel_work_sync();
> 
> but it is ok to do cancel_work_sync() unconditionally.

Argh ..I should keep referring to recent sources. I didnt see
cancel_work_sync() in my sources (2.6.20-rc4) and hence invented that 
del_work()! Anyway thanx for pointing out.

This change will probably let us do CPU_DOWN_PREPARE after
freeze_processes(). However I will keep my fingers crossed on whether it
is really a good idea to send CPU_DOWN/UP_PREPARE after
freeze_processes() until we get more review/testing results.

-- 
Regards,
vatsa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 08/29] mm: kmem_cache_objs_to_pages()

2007-02-21 Thread Pekka Enberg

Hi Peter,

On 2/21/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:

Provide a method to calculate the number of pages needed to store a given
number of slab objects (upper bound when considering possible partial and
free slabs).


So how does this work? You ask the slab allocator how many pages you
need for a given number of objects and then those pages are available
to it via the page allocator? Can other users also dip into those
reserves?

I would prefer we simply have an API for telling the slab allocator to
keep certain number of pages in a reserve for a cache rather than
exposing internals such as object size to rest of the world.

Pekka
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2.6.21-rc1] serial: serial_txx9 driver update

2007-02-21 Thread Alan
> +static void serial_txx9_initialize(struct uart_port *port)
> +{
> + struct uart_txx9_port *up = (struct uart_txx9_port *)port;
> +
> + sio_out(up, TXX9_SIFCR, TXX9_SIFCR_SWRST);
> +#ifdef CONFIG_CPU_TX49XX
> + /* TX4925 BUG WORKAROUND.  Accessing SIOC register
> +  * immediately after soft reset causes bus error. */
> + iob();
> + udelay(1);
> +#endif

Given this costs 1uS in a path that is not performance critical is it
worth putting the #ifdef/#endif in instead of having one set of code that
works for all ?

> + while (sio_in(up, TXX9_SIFCR) & TXX9_SIFCR_SWRST)
> + ;

Suppose it doesn't clear ? Should also use cpu_relax() in busy loops
so any processor variant with power management can do the right thing.

Neither of course are bugs you have added, just things you have moved that
seem worth asking about.

Acked-by: Alan Cox <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel oops in 2.6.18.3 with RAID5

2007-02-21 Thread Andrew Robinson

Update: I think that you can ignore this error. I am getting
segmentation faults when I attempt to rebuild the kernel. This is
exactly the same problem I had with slackware 10.1 with the 2.6.10
kernel. So I think it is a hardware issue. Memtest86 didn't show any
errors after 35 passes, so I'll have to check the CPU and motherboard.

Thanks for anyone who spent any time thinking/researching this.

On 2/20/07, Andrew Robinson <[EMAIL PROTECTED]> wrote:

Here is the full dmesg log of the crash:

iret exception:  [#1]
SMP
Modules linked in: ppdev lp button ac battery ipv6 dm_snapshot
dm_mirror dm_mod loop tsdev rtc psmouse parport_pc parport floppy
serio_raw pcspkr i2c_nforce2 snd_intel8x0 snd_ac97_codec snd_
ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc shpchp
pci_hotplug i2c_core nvidia_agp agpgart evdev ext3 jbd mbcache raid456
md_mod xor ide_cd cdrom ide_disk sd_mod generic 8139too
amd74xx ide_core sata_sil 8139cp mii sata_nv libata scsi_mod forcedeth
ehci_hcd ohci_hcd usbcore thermal processor fan
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 0216   (2.6.18-3-686 #1)
EIP is at copy_data+0xff/0x14b [raid456]
eax: ddcce000   ebx: 1000   ecx: 000f   edx: c1f71000
esi: ddccefc4   edi: c1f71fc4   ebp:    esp: dd261e4c
ds: 007b   es: 007b   ss: 0068
Process md0_raid5 (pid: 1115, ti=dd26 task=dd0ed550 task.ti=dd26)
Stack: c1f71000 ddb55460 c1e377a0  ddcce000 1000 c1f71000 
      dd20c388 c1e377a0 dd20c354 de95d96d 0c649510
    c0116d0a 06323c4f  dd20c384  c13c52e0 e000
Call Trace:
 [] handle_stripe+0x10da/0x2075 [raid456]
 [] find_busiest_group+0x177/0x46a
 [] __wake_up+0x2a/0x3d
 [] __release_stripe+0x10c/0x110 [raid456]
 [] release_stripe+0x21/0x2e [raid456]
 [] raid5d+0x10d/0x132 [raid456]
 [] md_thread+0xd7/0xed [md_mod]
 [] autoremove_wake_function+0x0/0x2d
 [] md_thread+0x0/0xed [md_mod]
 [] kthread+0xc2/0xef
 [] kthread+0x0/0xef
 [] kernel_thread_helper+0x5/0xb
Code: 8d 04 2f 01 4c 24 18 83 7c 24 0c 00 8b 54 24 18 8d 34 32 89 34
24 74 09 89 d9 89 c7 c1 e9 02 eb 0a 8b 3c 24 89 d9 89 c6 c1 e9 02 
a5 89 d9 83 e1 03 74 02 f3 a4 8b 44 24 18 ba 03 00
 00 00 e8
EIP: [] copy_data+0xff/0x14b [raid456] SS:ESP 0068:dd261e4c
 <6>note: md0_raid5[1115] exited with preempt_count 1

I was having instability with this machine before (slackware 10.1 with
2.6.10 kernel) while compiling code (especially the kernel). I just
rebuilt is as a debian box. It never died in the raid array code
before though, just in gcc.

I have tested the machine's ram with memtest86 (3 passes) and will
more thoroughly check it tonight. Besides bad RAM, does anyone have
any other ideas on what may be causing the issue?


On 2/20/07, Andrew Robinson wrote:
> I can't seem to find sufficient information on what may have caused an
> oops. I am running a debian machine using kernel 2.6.18.3. Here is
> detailed information on the system:
>
> debian etch
> CPU: AMD athlon 2100+
> kernel package: linux-image-2.6.18-3-686
> raid5 array: 3 active, 1 spare on md0
> raid fs: ext3
> raid is physically across 2 on-board NVidia SATA ports and 2 ports
> from a SATA controller card
>
> I am at work, and this was a home computer. This is what I got from
> syslog when in SSH before it died:
>
> bserver kernel: iret exception:  [#1]
> bserver kernel: SMP
> bserver kernel: CPU:0
> bserver kernel: EIP is at copy_data+0xff/0x14b [raid456]
> bserver kernel: eax: ddcce000   ebx: 1000   ecx: 000f   edx: c1f71000
> bserver kernel: esi: ddccefc4   edi: c1f71fc4   ebp:    esp: dd261e4c
> bserver kernel: ds: 007b   es: 007b   ss: 0068
> bserver kernel: Process md0_raid5 (pid: 1115, ti=dd26
> task=dd0ed550 task.ti=dd26)
> bserver kernel: Stack: c1f71000 ddb55460 c1e377a0  ddcce000
> 1000 c1f71000 
> bserver kernel:   dd20c388 c1e377a0
> dd20c354 de95d96d 0c649510
> bserver kernel: c0116d0a 06323c4f  dd20c384
>  c13c52e0 e000
> bserver kernel: Call Trace:
> bserver kernel: Code: 8d 04 2f 01 4c 24 18 83 7c 24 0c 00 8b 54 24 18
> 8d 34 32 89 34 24 74 09 89 d9 89 c7 c1 e9 02 eb 0a 8b 3c 24 89 d9 89
> c6 c1 e9 02  a5 89 d9 83 e1 03 74 02 f3 a4 8b 44 24 18 ba 03 00 00
> 00 e8
> bserver kernel: EIP: [] copy_data+0xff/0x14b [raid456]
> SS:ESP 0068:dd261e4c
>
> The only similar message chain that I could find was about 2.6.19 and
> they recommended disabling preempting, but debian's 2.6.18.3 already
> has that disabled by default.
>
> Any ideas?
>
> Thanks,
> Andrew
>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/29] netvm: INET reserves.

2007-02-21 Thread Peter Zijlstra
Add reserves for INET.

The two big users seem to be the route cache and ip-fragment cache.

Account the route cache to the auxillary reserve.
Account the fragments to the skb reserve so that one can at least
overflow the fragment cache (avoids fragment deadlocks).

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 net/ipv4/ip_fragment.c |1 +
 net/ipv4/route.c   |   18 +-
 net/ipv4/sysctl_net_ipv4.c |   13 -
 net/ipv6/reassembly.c  |1 +
 net/ipv6/route.c   |   18 +-
 net/ipv6/sysctl_net_ipv6.c |   12 +++-
 6 files changed, 59 insertions(+), 4 deletions(-)

Index: linux-2.6-git/net/ipv4/sysctl_net_ipv4.c
===
--- linux-2.6-git.orig/net/ipv4/sysctl_net_ipv4.c   2007-02-20 
15:12:56.0 +0100
+++ linux-2.6-git/net/ipv4/sysctl_net_ipv4.c2007-02-20 16:41:28.0 
+0100
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* From af_inet.c */
 extern int sysctl_ip_nonlocal_bind;
@@ -186,6 +187,16 @@ static int strategy_allowed_congestion_c
 
 }
 
+static int proc_dointvec_fragment(ctl_table *table, int write, struct file 
*filp,
+void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   int ret;
+   int old_thresh = *(int *)table->data;
+   ret = proc_dointvec(table,write,filp,buffer,lenp,ppos);
+   skb_reserve_memory(*(int *)table->data - old_thresh);
+   return ret;
+}
+
 ctl_table ipv4_table[] = {
{
.ctl_name   = NET_IPV4_TCP_TIMESTAMPS,
@@ -291,7 +302,7 @@ ctl_table ipv4_table[] = {
.data   = _ipfrag_high_thresh,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = _dointvec
+   .proc_handler   = _dointvec_fragment
},
{
.ctl_name   = NET_IPV4_IPFRAG_LOW_THRESH,
Index: linux-2.6-git/net/ipv6/sysctl_net_ipv6.c
===
--- linux-2.6-git.orig/net/ipv6/sysctl_net_ipv6.c   2007-02-20 
15:12:56.0 +0100
+++ linux-2.6-git/net/ipv6/sysctl_net_ipv6.c2007-02-20 16:41:28.0 
+0100
@@ -15,6 +15,16 @@
 
 #ifdef CONFIG_SYSCTL
 
+static int proc_dointvec_fragment(ctl_table *table, int write, struct file 
*filp,
+void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   int ret;
+   int old_thresh = *(int *)table->data;
+   ret = proc_dointvec(table,write,filp,buffer,lenp,ppos);
+   skb_reserve_memory(*(int *)table->data - old_thresh);
+   return ret;
+}
+
 static ctl_table ipv6_table[] = {
{
.ctl_name   = NET_IPV6_ROUTE,
@@ -44,7 +54,7 @@ static ctl_table ipv6_table[] = {
.data   = _ip6frag_high_thresh,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = _dointvec
+   .proc_handler   = _dointvec_fragment
},
{
.ctl_name   = NET_IPV6_IP6FRAG_LOW_THRESH,
Index: linux-2.6-git/net/ipv4/ip_fragment.c
===
--- linux-2.6-git.orig/net/ipv4/ip_fragment.c   2007-02-20 15:12:56.0 
+0100
+++ linux-2.6-git/net/ipv4/ip_fragment.c2007-02-20 16:41:28.0 
+0100
@@ -743,6 +743,7 @@ void ipfrag_init(void)
ipfrag_secret_timer.function = ipfrag_secret_rebuild;
ipfrag_secret_timer.expires = jiffies + sysctl_ipfrag_secret_interval;
add_timer(_secret_timer);
+   skb_reserve_memory(sysctl_ipfrag_high_thresh);
 }
 
 EXPORT_SYMBOL(ip_defrag);
Index: linux-2.6-git/net/ipv6/reassembly.c
===
--- linux-2.6-git.orig/net/ipv6/reassembly.c2007-02-20 15:12:56.0 
+0100
+++ linux-2.6-git/net/ipv6/reassembly.c 2007-02-20 16:41:28.0 +0100
@@ -772,4 +772,5 @@ void __init ipv6_frag_init(void)
ip6_frag_secret_timer.function = ip6_frag_secret_rebuild;
ip6_frag_secret_timer.expires = jiffies + 
sysctl_ip6frag_secret_interval;
add_timer(_frag_secret_timer);
+   skb_reserve_memory(sysctl_ip6frag_high_thresh);
 }
Index: linux-2.6-git/net/ipv4/route.c
===
--- linux-2.6-git.orig/net/ipv4/route.c 2007-02-20 15:12:56.0 +0100
+++ linux-2.6-git/net/ipv4/route.c  2007-02-20 16:41:28.0 +0100
@@ -2884,6 +2884,20 @@ static int ipv4_sysctl_rtcache_flush_str
return 0;
 }
 
+static int proc_dointvec_rt_size(ctl_table *table, int write, struct file 
*filp,
+void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+   int ret;
+   int new_pages;
+   int old_pages = kmem_cache_objs_to_pages(ipv4_dst_ops.kmem_cachep,
+   *(int *)table->data);
+   ret = 

Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?

2007-02-21 Thread Michal Piotrowski

On 21/02/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote:

On Tue, 2007-02-20 at 23:37 +0100, Michal Piotrowski wrote:
> On 20/02/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> > On Tue, 2007-02-20 at 19:54 +0100, Michal Piotrowski wrote:
> > >
> > > Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit?
> >
> > I doubt that it is, but can you revert it ?
>
> I'm using the latest kernel without this patch since 3 hours.
>
> So far so good.

But you still have those softirq pending messages, right ?


Yes

(+ new NOHZ: local_softirq_pending 02)
http://www.ussg.iu.edu/hypermail/linux/kernel/0702.2/1944.html


I think those
are pointing to the root cause of this. Still no idea how to get hold of
them. All my systems refuse to produce that. Hrmpf.

tglx



Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/29] netvm: hook skb allocation to reserves

2007-02-21 Thread Peter Zijlstra
Change the skb allocation api to indicate RX usage and use this to fall back to
the reserve when needed. Skbs allocated from the reserve are tagged in
skb->emergency.

Teach all other skb ops about emergency skbs and the reserve accounting.

Use the (new) packet split API to allocate and track fragment pages from the
emergency reserve. Do this using an atomic counter in page->index. This is
needed because the fragments have a different sharing semantic than that
indicated by skb_shinfo()->dataref. 

(NOTE the extra atomic overhead is only for those pages allocated from the
reserves - it does not affect the normal fast path.)

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 include/linux/skbuff.h |   22 --
 net/core/skbuff.c  |  170 ++---
 2 files changed, 165 insertions(+), 27 deletions(-)

Index: linux-2.6-git/include/linux/skbuff.h
===
--- linux-2.6-git.orig/include/linux/skbuff.h   2007-02-15 12:31:05.0 
+0100
+++ linux-2.6-git/include/linux/skbuff.h2007-02-15 12:31:05.0 
+0100
@@ -284,7 +284,8 @@ struct sk_buff {
nfctinfo:3;
__u8pkt_type:3,
fclone:2,
-   ipvs_property:1;
+   ipvs_property:1,
+   emergency:1;
__be16  protocol;
 
void(*destructor)(struct sk_buff *skb);
@@ -329,10 +330,19 @@ struct sk_buff {
 
 #include 
 
+#define SKB_ALLOC_FCLONE   0x01
+#define SKB_ALLOC_RX   0x02
+
+#ifdef CONFIG_NETVM
+#define skb_emergency(skb) unlikely((skb)->emergency)
+#else
+#define skb_emergency(skb) false
+#endif
+
 extern void kfree_skb(struct sk_buff *skb);
 extern void   __kfree_skb(struct sk_buff *skb);
 extern struct sk_buff *__alloc_skb(unsigned int size,
-  gfp_t priority, int fclone, int node);
+  gfp_t priority, int flags, int node);
 static inline struct sk_buff *alloc_skb(unsigned int size,
gfp_t priority)
 {
@@ -342,7 +352,7 @@ static inline struct sk_buff *alloc_skb(
 static inline struct sk_buff *alloc_skb_fclone(unsigned int size,
   gfp_t priority)
 {
-   return __alloc_skb(size, priority, 1, -1);
+   return __alloc_skb(size, priority, SKB_ALLOC_FCLONE, -1);
 }
 
 extern void   kfree_skbmem(struct sk_buff *skb);
@@ -1103,7 +1113,8 @@ static inline void __skb_queue_purge(str
 static inline struct sk_buff *__dev_alloc_skb(unsigned int length,
  gfp_t gfp_mask)
 {
-   struct sk_buff *skb = alloc_skb(length + NET_SKB_PAD, gfp_mask);
+   struct sk_buff *skb =
+   __alloc_skb(length + NET_SKB_PAD, gfp_mask, SKB_ALLOC_RX, -1);
if (likely(skb))
skb_reserve(skb, NET_SKB_PAD);
return skb;
@@ -1149,6 +1160,7 @@ static inline struct sk_buff *netdev_all
 }
 
 extern struct page *__netdev_alloc_page(struct net_device *dev, gfp_t 
gfp_mask);
+extern void __netdev_free_page(struct net_device *dev, struct page *page);
 
 /**
  * netdev_alloc_page - allocate a page for ps-rx on a specific device
@@ -1165,7 +1177,7 @@ static inline struct page *netdev_alloc_
 
 static inline void netdev_free_page(struct net_device *dev, struct page *page)
 {
-   __free_page(page);
+   __netdev_free_page(dev, page);
 }
 
 /**
Index: linux-2.6-git/net/core/skbuff.c
===
--- linux-2.6-git.orig/net/core/skbuff.c2007-02-15 12:31:05.0 
+0100
+++ linux-2.6-git/net/core/skbuff.c 2007-02-15 12:45:50.0 +0100
@@ -142,28 +142,36 @@ EXPORT_SYMBOL(skb_truesize_bug);
  * %GFP_ATOMIC.
  */
 struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
-   int fclone, int node)
+   int flags, int node)
 {
struct kmem_cache *cache;
struct skb_shared_info *shinfo;
struct sk_buff *skb;
u8 *data;
+   int emergency = 0;
 
-   cache = fclone ? skbuff_fclone_cache : skbuff_head_cache;
+   size = SKB_DATA_ALIGN(size);
+   cache = (flags & SKB_ALLOC_FCLONE)
+   ? skbuff_fclone_cache : skbuff_head_cache;
+#ifdef CONFIG_NETVM
+   if (flags & SKB_ALLOC_RX)
+   gfp_mask |= __GFP_NOMEMALLOC|__GFP_NOWARN;
+#endif
 
+retry_alloc:
/* Get the HEAD */
skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node);
if (!skb)
-   goto out;
+   goto noskb;
 
/* Get the DATA. Size must match skb_add_mtu(). */
-   size = SKB_DATA_ALIGN(size);
data = kmalloc_node_track_caller(size + sizeof(struct skb_shared_info),
   

[PATCH 01/29] mm: page allocation rank

2007-02-21 Thread Peter Zijlstra
Introduce page allocation rank.

This allocation rank is an measure of the 'hardness' of the page allocation.
Where hardness refers to how deep we have to reach (and thereby if reclaim 
was activated) to obtain the page.

It basically is a mapping from the ALLOC_/gfp flags into a scalar quantity,
which allows for comparisons of the kind: 
  'would this allocation have succeeded using these gfp flags'.

For the gfp -> alloc_flags mapping we use the 'hardest' possible, those
used by __alloc_pages() right before going into direct reclaim.

The alloc_flags -> rank mapping is given by: 2*2^wmark - harder - 2*high
where wmark = { min = 1, low, high } and harder, high are booleans.
This gives:
  0 is the hardest possible allocation - ALLOC_NO_WATERMARK,
  1 is ALLOC_WMARK_MIN|ALLOC_HARDER|ALLOC_HIGH,
  ...
  15 is ALLOC_WMARK_HIGH|ALLOC_HARDER,
  16 is the softest allocation - ALLOC_WMARK_HIGH.

Rank <= 4 will have woke up kswapd and when also > 0 might have ran into
direct reclaim.

Rank > 8 rarely happens and means lots of memory free (due to parallel oom 
kill).

The allocation rank is stored in page->index for successful allocations.

'offline' testing of the rank is made impossible by direct reclaim and
fragmentation issues. That is, it is impossible to tell if a given allocation
will succeed without actually doing it.

The purpose of this measure is to introduce some fairness into the slab
allocator.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 mm/internal.h   |   89 
 mm/page_alloc.c |   58 ++--
 2 files changed, 106 insertions(+), 41 deletions(-)

Index: linux-2.6-git/mm/internal.h
===
--- linux-2.6-git.orig/mm/internal.h2007-01-08 11:53:13.0 +0100
+++ linux-2.6-git/mm/internal.h 2007-01-09 11:29:18.0 +0100
@@ -12,6 +12,7 @@
 #define __MM_INTERNAL_H
 
 #include 
+#include 
 
 static inline void set_page_count(struct page *page, int v)
 {
@@ -37,4 +38,92 @@ static inline void __put_page(struct pag
 extern void fastcall __init __free_pages_bootmem(struct page *page,
unsigned int order);
 
+#define ALLOC_HARDER   0x01 /* try to alloc harder */
+#define ALLOC_HIGH 0x02 /* __GFP_HIGH set */
+#define ALLOC_WMARK_MIN0x04 /* use pages_min watermark */
+#define ALLOC_WMARK_LOW0x08 /* use pages_low watermark */
+#define ALLOC_WMARK_HIGH   0x10 /* use pages_high watermark */
+#define ALLOC_NO_WATERMARKS0x20 /* don't check watermarks at all */
+#define ALLOC_CPUSET   0x40 /* check for correct cpuset */
+
+/*
+ * get the deepest reaching allocation flags for the given gfp_mask
+ */
+static int inline gfp_to_alloc_flags(gfp_t gfp_mask)
+{
+   struct task_struct *p = current;
+   int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET;
+   const gfp_t wait = gfp_mask & __GFP_WAIT;
+
+   /*
+* The caller may dip into page reserves a bit more if the caller
+* cannot run direct reclaim, or if the caller has realtime scheduling
+* policy or is asking for __GFP_HIGH memory.  GFP_ATOMIC requests will
+* set both ALLOC_HARDER (!wait) and ALLOC_HIGH (__GFP_HIGH).
+*/
+   if (gfp_mask & __GFP_HIGH)
+   alloc_flags |= ALLOC_HIGH;
+
+   if (!wait) {
+   alloc_flags |= ALLOC_HARDER;
+   /*
+* Ignore cpuset if GFP_ATOMIC (!wait) rather than fail alloc.
+* See also cpuset_zone_allowed() comment in kernel/cpuset.c.
+*/
+   alloc_flags &= ~ALLOC_CPUSET;
+   } else if (unlikely(rt_task(p)) && !in_interrupt())
+   alloc_flags |= ALLOC_HARDER;
+
+   if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
+   if (!in_interrupt() &&
+   ((p->flags & PF_MEMALLOC) ||
+unlikely(test_thread_flag(TIF_MEMDIE
+   alloc_flags |= ALLOC_NO_WATERMARKS;
+   }
+
+   return alloc_flags;
+}
+
+#define MAX_ALLOC_RANK 16
+
+/*
+ * classify the allocation: 0 is hardest, 16 is easiest.
+ */
+static inline int alloc_flags_to_rank(int alloc_flags)
+{
+   int rank;
+
+   if (alloc_flags & ALLOC_NO_WATERMARKS)
+   return 0;
+
+   rank = alloc_flags & (ALLOC_WMARK_MIN|ALLOC_WMARK_LOW|ALLOC_WMARK_HIGH);
+   rank -= alloc_flags & (ALLOC_HARDER|ALLOC_HIGH);
+
+   return rank;
+}
+
+static inline int gfp_to_rank(gfp_t gfp_mask)
+{
+   /*
+* Although correct this full version takes a ~3% performance
+* hit on the network tests in aim9.
+*
+
+   return alloc_flags_to_rank(gfp_to_alloc_flags(gfp_mask));
+
+*
+* Just check the bare essential ALLOC_NO_WATERMARKS case this keeps
+* the aim9 results within the error margin.
+*/
+
+   if 

[PATCH 22/29] mm: add support for non block device backed swap files

2007-02-21 Thread Peter Zijlstra
A new addres_space_operations method is added:
  int swapfile(struct address_space *, int)

When during sys_swapon() this method is found and returns no error the 
swapper_space.a_ops will proxy to sis->swap_file->f_mapping->a_ops.

The swapfile method will be used to communicate to the address_space that the
VM relies on it, and the address_space should take adequate measures (like 
reserving memory for mempools or the like).

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
CC: Trond Myklebust <[EMAIL PROTECTED]>
---
 Documentation/filesystems/Locking |9 
 include/linux/fs.h|1 
 include/linux/swap.h  |3 ++
 mm/Kconfig|4 +++
 mm/page_io.c  |   42 ++
 mm/swap_state.c   |4 +++
 mm/swapfile.c |   22 +++
 7 files changed, 84 insertions(+), 1 deletion(-)

Index: linux-2.6/include/linux/swap.h
===
--- linux-2.6.orig/include/linux/swap.h
+++ linux-2.6/include/linux/swap.h
@@ -163,6 +163,7 @@ enum {
SWP_USED= (1 << 0), /* is slot in swap_info[] used? */
SWP_WRITEOK = (1 << 1), /* ok to write to this swap?*/
SWP_ACTIVE  = (SWP_USED | SWP_WRITEOK),
+   SWP_FILE= (1 << 2), /* file swap area */
/* add others here before... */
SWP_SCANNING= (1 << 8), /* refcount in scan_swap_map */
 };
@@ -265,6 +266,8 @@ extern int shmem_unuse(swp_entry_t entry
 /* linux/mm/page_io.c */
 extern int swap_readpage(struct file *, struct page *);
 extern int swap_writepage(struct page *page, struct writeback_control *wbc);
+extern void swap_sync_page(struct page *page);
+extern int swap_set_page_dirty(struct page *page);
 extern int end_swap_bio_read(struct bio *bio, unsigned int bytes_done, int 
err);
 
 /* linux/mm/swap_state.c */
Index: linux-2.6/mm/page_io.c
===
--- linux-2.6.orig/mm/page_io.c
+++ linux-2.6/mm/page_io.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 static struct bio *get_swap_bio(gfp_t gfp_flags, pgoff_t index,
@@ -110,6 +111,18 @@ int swap_writepage(struct page *page, st
unlock_page(page);
goto out;
}
+#ifdef CONFIG_SWAP_FILE
+   {
+   struct swap_info_struct *sis = page_swap_info(page);
+   if (sis->flags & SWP_FILE) {
+   ret = sis->swap_file->f_mapping->
+   a_ops->writepage(page, wbc);
+   if (!ret)
+   count_vm_event(PSWPOUT);
+   return ret;
+   }
+   }
+#endif
bio = get_swap_bio(GFP_NOIO, page_private(page), page,
end_swap_bio_write);
if (bio == NULL) {
@@ -128,6 +141,23 @@ out:
return ret;
 }
 
+#ifdef CONFIG_SWAP_FILE
+int swap_set_page_dirty(struct page *page)
+{
+   struct swap_info_struct *sis = page_swap_info(page);
+
+   if (sis->flags & SWP_FILE) {
+   const struct address_space_operations * a_ops =
+   sis->swap_file->f_mapping->a_ops;
+   if (a_ops->set_page_dirty)
+   return a_ops->set_page_dirty(page);
+   return __set_page_dirty_buffers(page);
+   }
+
+   return __set_page_dirty_nobuffers(page);
+}
+#endif
+
 int swap_readpage(struct file *file, struct page *page)
 {
struct bio *bio;
@@ -135,6 +165,18 @@ int swap_readpage(struct file *file, str
 
BUG_ON(!PageLocked(page));
ClearPageUptodate(page);
+#ifdef CONFIG_SWAP_FILE
+   {
+   struct swap_info_struct *sis = page_swap_info(page);
+   if (sis->flags & SWP_FILE) {
+   ret = sis->swap_file->f_mapping->
+   a_ops->readpage(sis->swap_file, page);
+   if (!ret)
+   count_vm_event(PSWPIN);
+   return ret;
+   }
+   }
+#endif
bio = get_swap_bio(GFP_KERNEL, page_private(page), page,
end_swap_bio_read);
if (bio == NULL) {
Index: linux-2.6/mm/swap_state.c
===
--- linux-2.6.orig/mm/swap_state.c
+++ linux-2.6/mm/swap_state.c
@@ -26,7 +26,11 @@
  */
 static const struct address_space_operations swap_aops = {
.writepage  = swap_writepage,
+#ifdef CONFIG_SWAP_FILE
+   .set_page_dirty = swap_set_page_dirty,
+#else
.set_page_dirty = __set_page_dirty_nobuffers,
+#endif
.migratepage= migrate_page,
 };
 
Index: linux-2.6/mm/swapfile.c
===
--- linux-2.6.orig/mm/swapfile.c

[PATCH 28/29] nfs: enable swap on NFS

2007-02-21 Thread Peter Zijlstra
Provide an ops->swapfile() implementation for NFS. This will set the
NFS socket to SOCK_VMIO and run socket reconnect under PF_MEMALLOC as well
as reset SOCK_VMIO before engaging the protocol ->connect() method.

PF_MEMALLOC should allow the allocation of struct socket and related objects
and the early (re)setting of SOCK_VMIO should allow us to receive the packets
required for the TCP connection buildup.

(swapping continues over a server reset during heavy network traffic)

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
Cc: Trond Myklebust <[EMAIL PROTECTED]>
---
 fs/Kconfig  |   14 
 fs/nfs/file.c   |6 +
 include/linux/sunrpc/xprt.h |5 +++-
 net/sunrpc/sched.c  |   13 +++
 net/sunrpc/xprtsock.c   |   49 
 5 files changed, 81 insertions(+), 6 deletions(-)

Index: linux-2.6-git/fs/nfs/file.c
===
--- linux-2.6-git.orig/fs/nfs/file.c2007-02-21 12:15:16.0 +0100
+++ linux-2.6-git/fs/nfs/file.c 2007-02-21 12:15:19.0 +0100
@@ -324,6 +324,11 @@ static int nfs_launder_page(struct page 
return nfs_wb_page(page_file_mapping(page)->host, page);
 }
 
+static int nfs_swapfile(struct address_space *mapping, int enable)
+{
+   return xs_swapper(NFS_CLIENT(mapping->host)->cl_xprt, enable);
+}
+
 const struct address_space_operations nfs_file_aops = {
.readpage = nfs_readpage,
.readpages = nfs_readpages,
@@ -338,6 +343,7 @@ const struct address_space_operations nf
.direct_IO = nfs_direct_IO,
 #endif
.launder_page = nfs_launder_page,
+   .swapfile = nfs_swapfile,
 };
 
 static ssize_t nfs_file_write(struct kiocb *iocb, const struct iovec *iov,
Index: linux-2.6-git/include/linux/sunrpc/xprt.h
===
--- linux-2.6-git.orig/include/linux/sunrpc/xprt.h  2007-02-21 
11:04:08.0 +0100
+++ linux-2.6-git/include/linux/sunrpc/xprt.h   2007-02-21 12:15:19.0 
+0100
@@ -149,7 +149,9 @@ struct rpc_xprt {
unsigned intmax_reqs;   /* total slots */
unsigned long   state;  /* transport state */
unsigned char   shutdown   : 1, /* being shut down */
-   resvport   : 1; /* use a reserved port */
+   resvport   : 1, /* use a reserved port */
+   swapper: 1; /* we're swapping over this
+  transport */
 
/*
 * Connection of transports
@@ -241,6 +243,7 @@ voidxprt_disconnect(struct rpc_xprt 
*
  */
 struct rpc_xprt *  xs_setup_udp(struct sockaddr *addr, size_t addrlen, 
struct rpc_timeout *to);
 struct rpc_xprt *  xs_setup_tcp(struct sockaddr *addr, size_t addrlen, 
struct rpc_timeout *to);
+intxs_swapper(struct rpc_xprt *xprt, int enable);
 
 /*
  * Reserved bit positions in xprt->state
Index: linux-2.6-git/net/sunrpc/sched.c
===
--- linux-2.6-git.orig/net/sunrpc/sched.c   2007-02-21 11:04:08.0 
+0100
+++ linux-2.6-git/net/sunrpc/sched.c2007-02-21 12:15:19.0 +0100
@@ -751,10 +751,13 @@ void * rpc_malloc(struct rpc_task *task,
struct rpc_rqst *req = task->tk_rqstp;
gfp_t   gfp;
 
-   if (task->tk_flags & RPC_TASK_SWAPPER)
-   gfp = GFP_ATOMIC;
-   else
-   gfp = GFP_NOFS;
+   /*
+* this rcpio thread might be needed by reclaim, hence we cannot
+* wait on a regular alloc to succeed.
+*/
+   gfp = GFP_ATOMIC;
+   if (RPC_IS_SWAPPER(task))
+   gfp |= __GFP_EMERGENCY;
 
if (size > RPC_BUFFER_MAXSIZE) {
req->rq_buffer = kmalloc(size, gfp);
@@ -834,7 +837,7 @@ void rpc_init_task(struct rpc_task *task
 static struct rpc_task *
 rpc_alloc_task(void)
 {
-   return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOFS);
+   return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_NOIO);
 }
 
 static void rpc_free_task(struct rcu_head *rcu)
Index: linux-2.6-git/net/sunrpc/xprtsock.c
===
--- linux-2.6-git.orig/net/sunrpc/xprtsock.c2007-02-21 11:04:08.0 
+0100
+++ linux-2.6-git/net/sunrpc/xprtsock.c 2007-02-21 12:15:19.0 +0100
@@ -1215,11 +1215,15 @@ static void xs_udp_connect_worker(struct
container_of(work, struct sock_xprt, connect_worker.work);
struct rpc_xprt *xprt = >xprt;
struct socket *sock = transport->sock;
+   unsigned long pflags = current->flags;
int err, status = -EIO;
 
if (xprt->shutdown || !xprt_bound(xprt))
goto out;
 
+   if (xprt->swapper)
+   current->flags |= 

[PATCH 13/29] netvm: link network to vm layer

2007-02-21 Thread Peter Zijlstra
Hook up networking to the memory reserve.

There are two kinds of reserves: skb and aux. 
 - skb reserves are used for incomming packets,
 - aux reserves are used for processing these packets.

The consumers for these reserves are sockets marked with:
  SOCK_VMIO

Such sockets are to be used to service the VM (iow. to swap over). They
must be handled kernel side, exposing such a socket to user-space is a BUG.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 include/net/sock.h |   31 
 net/Kconfig|3 +
 net/core/sock.c|  134 +
 3 files changed, 168 insertions(+)

Index: linux-2.6-git/include/net/sock.h
===
--- linux-2.6-git.orig/include/net/sock.h   2007-02-20 15:06:17.0 
+0100
+++ linux-2.6-git/include/net/sock.h2007-02-20 15:07:45.0 +0100
@@ -392,6 +392,7 @@ enum sock_flags {
SOCK_RCVTSTAMP, /* %SO_TIMESTAMP setting */
SOCK_LOCALROUTE, /* route locally only, %SO_DONTROUTE setting */
SOCK_QUEUE_SHRUNK, /* write queue has been shrunk recently */
+   SOCK_VMIO, /* the VM depends on us - make sure we're serviced */
 };
 
 static inline void sock_copy_flags(struct sock *nsk, struct sock *osk)
@@ -414,6 +415,36 @@ static inline int sock_flag(struct sock 
return test_bit(flag, >sk_flags);
 }
 
+static inline int sk_has_vmio(struct sock *sk)
+{
+   return sock_flag(sk, SOCK_VMIO);
+}
+
+#define MAX_PAGES_PER_SKB 3
+#define MAX_FRAGMENTS ((65536 + 1500 - 1) / 1500)
+/*
+ * Guestimate the per request queue TX upper bound.
+ */
+#define TX_RESERVE_PAGES \
+   (4 * MAX_FRAGMENTS * MAX_PAGES_PER_SKB)
+
+extern atomic_t vmio_socks;
+
+static inline int sk_vmio_socks(void)
+{
+   return atomic_read(_socks);
+}
+
+extern int rx_emergency_get(int bytes);
+extern int rx_emergency_get_overcommit(int bytes);
+extern void rx_emergency_put(int bytes);
+
+extern void sk_adjust_memalloc(int socks, int tx_reserve_pages);
+extern void skb_reserve_memory(int skb_reserve_bytes);
+extern void aux_reserve_memory(int aux_reserve_pages);
+extern int sk_set_vmio(struct sock *sk);
+extern int sk_clear_vmio(struct sock *sk);
+
 static inline void sk_acceptq_removed(struct sock *sk)
 {
sk->sk_ack_backlog--;
Index: linux-2.6-git/net/core/sock.c
===
--- linux-2.6-git.orig/net/core/sock.c  2007-02-20 15:06:17.0 +0100
+++ linux-2.6-git/net/core/sock.c   2007-02-20 15:18:48.0 +0100
@@ -112,6 +112,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -196,6 +197,138 @@ __u32 sysctl_rmem_default __read_mostly 
 /* Maximal space eaten by iovec or ancilliary data plus some space */
 int sysctl_optmem_max __read_mostly = sizeof(unsigned long)*(2*UIO_MAXIOV+512);
 
+static atomic_t rx_emergency_bytes;
+
+static int skb_reserve_bytes;
+static int aux_reserve_pages;
+
+static DEFINE_SPINLOCK(memalloc_lock);
+static int rx_net_reserve;
+atomic_t vmio_socks;
+EXPORT_SYMBOL_GPL(vmio_socks);
+
+/*
+ * is there room for another emergency packet?
+ * we account in power of two units to approx the slab allocator.
+ */
+static int __rx_emergency_get(int bytes, bool overcommit)
+{
+   int size = roundup_pow_of_two(bytes);
+   int nr = atomic_add_return(size, _emergency_bytes);
+   int thresh = (3 * skb_reserve_bytes) / 2;
+   if (nr < thresh || overcommit)
+   return 1;
+
+   atomic_dec(_emergency_bytes);
+   return 0;
+}
+
+int rx_emergency_get(int bytes)
+{
+   return __rx_emergency_get(bytes, false);
+}
+
+int rx_emergency_get_overcommit(int bytes)
+{
+   return __rx_emergency_get(bytes, true);
+}
+
+void rx_emergency_put(int bytes)
+{
+   int size = roundup_pow_of_two(bytes);
+   return atomic_sub(size, _emergency_bytes);
+}
+
+/**
+ * sk_adjust_memalloc - adjust the global memalloc reserve for critical RX
+ * @socks: number of new %SOCK_VMIO sockets
+ * @tx_resserve_pages: number of pages to (un)reserve for TX
+ *
+ * This function adjusts the memalloc reserve based on system demand.
+ * The RX reserve is a limit, and only added once, not for each socket.
+ *
+ * NOTE:
+ *@tx_reserve_pages is an upper-bound of memory used for TX hence
+ *we need not account the pages like we do for RX pages.
+ */
+void sk_adjust_memalloc(int socks, int tx_reserve_pages)
+{
+   unsigned long flags;
+   int reserve = tx_reserve_pages;
+   int nr_socks;
+
+   spin_lock_irqsave(_lock, flags);
+   nr_socks = atomic_add_return(socks, _socks);
+   BUG_ON(nr_socks < 0);
+
+   if (nr_socks) {
+   int skb_reserve_pages = skb_reserve_bytes / PAGE_SIZE;
+   int rx_pages = 2 * skb_reserve_pages + aux_reserve_pages;
+   reserve += rx_pages - rx_net_reserve;
+   rx_net_reserve = rx_pages;
+

[PATCH 11/29] net: packet split receive api

2007-02-21 Thread Peter Zijlstra
Add some packet-split receive hooks.

For one this allows to do NUMA node affine page allocs.  Later on these hooks
will be extended to do emergency reserve allocations for fragments.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 drivers/net/e1000/e1000_main.c |8 ++--
 drivers/net/sky2.c |   16 ++--
 include/linux/skbuff.h |   23 +++
 net/core/skbuff.c  |   20 
 4 files changed, 51 insertions(+), 16 deletions(-)

Index: linux-2.6-git/drivers/net/e1000/e1000_main.c
===
--- linux-2.6-git.orig/drivers/net/e1000/e1000_main.c   2007-02-14 
08:31:12.0 +0100
+++ linux-2.6-git/drivers/net/e1000/e1000_main.c2007-02-14 
11:42:07.0 +0100
@@ -4412,12 +4412,8 @@ e1000_clean_rx_irq_ps(struct e1000_adapt
pci_unmap_page(pdev, ps_page_dma->ps_page_dma[j],
PAGE_SIZE, PCI_DMA_FROMDEVICE);
ps_page_dma->ps_page_dma[j] = 0;
-   skb_fill_page_desc(skb, j, ps_page->ps_page[j], 0,
-  length);
+   skb_add_rx_frag(skb, j, ps_page->ps_page[j], 0, length);
ps_page->ps_page[j] = NULL;
-   skb->len += length;
-   skb->data_len += length;
-   skb->truesize += length;
}
 
/* strip the ethernet crc, problem is we're using pages now so
@@ -4623,7 +4619,7 @@ e1000_alloc_rx_buffers_ps(struct e1000_a
if (j < adapter->rx_ps_pages) {
if (likely(!ps_page->ps_page[j])) {
ps_page->ps_page[j] =
-   alloc_page(GFP_ATOMIC);
+   netdev_alloc_page(netdev);
if (unlikely(!ps_page->ps_page[j])) {
adapter->alloc_rx_buff_failed++;
goto no_buffers;
Index: linux-2.6-git/include/linux/skbuff.h
===
--- linux-2.6-git.orig/include/linux/skbuff.h   2007-02-14 11:29:54.0 
+0100
+++ linux-2.6-git/include/linux/skbuff.h2007-02-14 11:59:04.0 
+0100
@@ -813,6 +813,9 @@ static inline void skb_fill_page_desc(st
skb_shinfo(skb)->nr_frags = i + 1;
 }
 
+extern void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page,
+   int off, int size);
+
 #define SKB_PAGE_ASSERT(skb)   BUG_ON(skb_shinfo(skb)->nr_frags)
 #define SKB_FRAG_ASSERT(skb)   BUG_ON(skb_shinfo(skb)->frag_list)
 #define SKB_LINEAR_ASSERT(skb)  BUG_ON(skb_is_nonlinear(skb))
@@ -1148,6 +1151,26 @@ static inline struct sk_buff *netdev_all
return __netdev_alloc_skb(dev, length, GFP_ATOMIC);
 }
 
+extern struct page *__netdev_alloc_page(struct net_device *dev, gfp_t 
gfp_mask);
+
+/**
+ * netdev_alloc_page - allocate a page for ps-rx on a specific device
+ * @dev: network device to receive on
+ *
+ * Allocate a new page node local to the specified device.
+ *
+ * %NULL is returned if there is no free memory.
+ */
+static inline struct page *netdev_alloc_page(struct net_device *dev)
+{
+   return __netdev_alloc_page(dev, GFP_ATOMIC);
+}
+
+static inline void netdev_free_page(struct net_device *dev, struct page *page)
+{
+   __free_page(page);
+}
+
 /**
  * skb_cow - copy header of skb when it is required
  * @skb: buffer to cow
Index: linux-2.6-git/net/core/skbuff.c
===
--- linux-2.6-git.orig/net/core/skbuff.c2007-02-14 11:29:54.0 
+0100
+++ linux-2.6-git/net/core/skbuff.c 2007-02-14 12:01:40.0 +0100
@@ -279,6 +279,24 @@ struct sk_buff *__netdev_alloc_skb(struc
return skb;
 }
 
+struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask)
+{
+   int node = dev->dev.parent ? dev_to_node(dev->dev.parent) : -1;
+   struct page *page;
+
+   page = alloc_pages_node(node, gfp_mask, 0);
+   return page;
+}
+
+void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off,
+   int size)
+{
+   skb_fill_page_desc(skb, i, page, off, size);
+   skb->len += size;
+   skb->data_len += size;
+   skb->truesize += size;
+}
+
 static void skb_drop_list(struct sk_buff **listp)
 {
struct sk_buff *list = *listp;
@@ -2066,6 +2084,8 @@ EXPORT_SYMBOL(kfree_skb);
 EXPORT_SYMBOL(__pskb_pull_tail);
 EXPORT_SYMBOL(__alloc_skb);
 EXPORT_SYMBOL(__netdev_alloc_skb);
+EXPORT_SYMBOL(__netdev_alloc_page);
+EXPORT_SYMBOL(skb_add_rx_frag);
 EXPORT_SYMBOL(pskb_copy);
 EXPORT_SYMBOL(pskb_expand_head);
 EXPORT_SYMBOL(skb_checksum);

[PATCH 06/29] mm: __GFP_EMERGENCY

2007-02-21 Thread Peter Zijlstra
__GFP_EMERGENCY will allow the allocation to disregard the watermarks, 
much like PF_MEMALLOC.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 include/linux/gfp.h |7 ++-
 mm/internal.h   |   10 +++---
 2 files changed, 13 insertions(+), 4 deletions(-)

Index: linux-2.6-git/include/linux/gfp.h
===
--- linux-2.6-git.orig/include/linux/gfp.h  2006-12-14 10:02:18.0 
+0100
+++ linux-2.6-git/include/linux/gfp.h   2006-12-14 10:02:52.0 +0100
@@ -35,17 +35,21 @@ struct vm_area_struct;
 #define __GFP_HIGH ((__force gfp_t)0x20u)  /* Should access emergency 
pools? */
 #define __GFP_IO   ((__force gfp_t)0x40u)  /* Can start physical IO? */
 #define __GFP_FS   ((__force gfp_t)0x80u)  /* Can call down to low-level 
FS? */
+
 #define __GFP_COLD ((__force gfp_t)0x100u) /* Cache-cold page required */
 #define __GFP_NOWARN   ((__force gfp_t)0x200u) /* Suppress page allocation 
failure warning */
 #define __GFP_REPEAT   ((__force gfp_t)0x400u) /* Retry the allocation.  Might 
fail */
 #define __GFP_NOFAIL   ((__force gfp_t)0x800u) /* Retry for ever.  Cannot fail 
*/
+
 #define __GFP_NORETRY  ((__force gfp_t)0x1000u)/* Do not retry.  Might fail */
 #define __GFP_NO_GROW  ((__force gfp_t)0x2000u)/* Slab internal usage */
 #define __GFP_COMP ((__force gfp_t)0x4000u)/* Add compound page metadata */
 #define __GFP_ZERO ((__force gfp_t)0x8000u)/* Return zeroed page on 
success */
+
 #define __GFP_NOMEMALLOC ((__force gfp_t)0x1u) /* Don't use emergency 
reserves */
 #define __GFP_HARDWALL   ((__force gfp_t)0x2u) /* Enforce hardwall cpuset 
memory allocs */
 #define __GFP_THISNODE ((__force gfp_t)0x4u)/* No fallback, no policies */
+#define __GFP_EMERGENCY  ((__force gfp_t)0x8u) /* Use emergency reserves */
 
 #define __GFP_BITS_SHIFT 20/* Room for 20 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
@@ -54,7 +58,8 @@ struct vm_area_struct;
 #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
__GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
__GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
-   __GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE)
+   __GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE| \
+   __GFP_EMERGENCY)
 
 /* This equals 0, but use constants in case they ever change */
 #define GFP_NOWAIT (GFP_ATOMIC & ~__GFP_HIGH)
Index: linux-2.6-git/mm/internal.h
===
--- linux-2.6-git.orig/mm/internal.h2006-12-14 10:02:52.0 +0100
+++ linux-2.6-git/mm/internal.h 2006-12-14 10:02:52.0 +0100
@@ -75,7 +75,9 @@ static int inline gfp_to_alloc_flags(gfp
alloc_flags |= ALLOC_HARDER;
 
if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
-   if (!in_irq() && (p->flags & PF_MEMALLOC))
+   if (gfp_mask & __GFP_EMERGENCY)
+   alloc_flags |= ALLOC_NO_WATERMARKS;
+   else if (!in_irq() && (p->flags & PF_MEMALLOC))
alloc_flags |= ALLOC_NO_WATERMARKS;
else if (!in_interrupt() &&
unlikely(test_thread_flag(TIF_MEMDIE)))
@@ -103,7 +105,7 @@ static inline int alloc_flags_to_rank(in
return rank;
 }
 
-static inline int gfp_to_rank(gfp_t gfp_mask)
+static __always_inline int gfp_to_rank(gfp_t gfp_mask)
 {
/*
 * Although correct this full version takes a ~3% performance
@@ -118,7 +120,9 @@ static inline int gfp_to_rank(gfp_t gfp_
 */
 
if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
-   if (!in_irq() && (current->flags & PF_MEMALLOC))
+   if (gfp_mask & __GFP_EMERGENCY)
+   return 0;
+   else if (!in_irq() && (current->flags & PF_MEMALLOC))
return 0;
else if (!in_interrupt() &&
unlikely(test_thread_flag(TIF_MEMDIE)))

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 27/29] nfs: disable data cache revalidation for swapfiles

2007-02-21 Thread Peter Zijlstra
Do as Trond suggested:
  http://lkml.org/lkml/2006/8/25/348

Disable NFS data cache revalidation on swap files since it doesn't really 
make sense to have other clients change the file while you are using it.

Thereby we can stop setting PG_private on swap pages, since there ought to
be no further races with invalidate_inode_pages2() to deal with.

And since we cannot set PG_private we cannot use page->private (which is
already used by PG_swapcache pages anyway) to store the nfs_page. Thus
augment the new nfs_page_find_request logic.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
Cc: Trond Myklebust <[EMAIL PROTECTED]>
---
 fs/nfs/inode.c |6 ++
 fs/nfs/write.c |   35 +++
 2 files changed, 29 insertions(+), 12 deletions(-)

Index: linux-2.6-git/fs/nfs/inode.c
===
--- linux-2.6-git.orig/fs/nfs/inode.c   2007-02-21 11:04:08.0 +0100
+++ linux-2.6-git/fs/nfs/inode.c2007-02-21 11:52:21.0 +0100
@@ -719,6 +719,12 @@ int nfs_revalidate_mapping_nolock(struct
struct nfs_inode *nfsi = NFS_I(inode);
int ret = 0;
 
+   /*
+* swapfiles are not supposed to be shared.
+*/
+   if (IS_SWAPFILE(inode))
+   goto out;
+
if ((nfsi->cache_validity & NFS_INO_REVAL_PAGECACHE)
|| nfs_attribute_timeout(inode) || NFS_STALE(inode)) {
ret = __nfs_revalidate_inode(NFS_SERVER(inode), inode);
Index: linux-2.6-git/fs/nfs/write.c
===
--- linux-2.6-git.orig/fs/nfs/write.c   2007-02-21 11:52:17.0 +0100
+++ linux-2.6-git/fs/nfs/write.c2007-02-21 11:53:18.0 +0100
@@ -107,7 +107,7 @@ void nfs_writedata_release(void *wdata)
nfs_writedata_free(wdata);
 }
 
-static struct nfs_page *nfs_page_find_request_locked(struct page *page)
+static struct nfs_page *nfs_page_find_request_locked(struct nfs_inode *nfsi, 
struct page *page)
 {
struct nfs_page *req = NULL;
 
@@ -115,6 +115,10 @@ static struct nfs_page *nfs_page_find_re
req = (struct nfs_page *)page_private(page);
if (req != NULL)
atomic_inc(>wb_count);
+   } else if (unlikely(PageSwapCache(page))) {
+   req = radix_tree_lookup(>nfs_page_tree, 
page_file_index(page));
+   if (req != NULL)
+   atomic_inc(>wb_count);
}
return req;
 }
@@ -122,10 +126,11 @@ static struct nfs_page *nfs_page_find_re
 static struct nfs_page *nfs_page_find_request(struct page *page)
 {
struct nfs_page *req = NULL;
-   spinlock_t *req_lock = _I(page_file_mapping(page)->host)->req_lock;
+   struct nfs_inode *nfsi = NFS_I(page_file_mapping(page)->host);
+   spinlock_t *req_lock = >req_lock;
 
spin_lock(req_lock);
-   req = nfs_page_find_request_locked(page);
+   req = nfs_page_find_request_locked(nfsi, page);
spin_unlock(req_lock);
return req;
 }
@@ -248,12 +253,13 @@ static void nfs_end_page_writeback(struc
 static int nfs_page_mark_flush(struct page *page)
 {
struct nfs_page *req;
-   spinlock_t *req_lock = _I(page_file_mapping(page)->host)->req_lock;
+   struct nfs_inode *nfsi = NFS_I(page_file_mapping(page)->host);
+   spinlock_t *req_lock = >req_lock;
int ret;
 
spin_lock(req_lock);
for(;;) {
-   req = nfs_page_find_request_locked(page);
+   req = nfs_page_find_request_locked(nfsi, page);
if (req == NULL) {
spin_unlock(req_lock);
return 1;
@@ -368,8 +374,14 @@ static int nfs_inode_add_request(struct 
if (nfs_have_delegation(inode, FMODE_WRITE))
nfsi->change_attr++;
}
-   SetPagePrivate(req->wb_page);
-   set_page_private(req->wb_page, (unsigned long)req);
+   /*
+* Swap-space should not get truncated. Hence no need to plug the race
+* with invalidate/truncate.
+*/
+   if (likely(!PageSwapCache(req->wb_page))) {
+   SetPagePrivate(req->wb_page);
+   set_page_private(req->wb_page, (unsigned long)req);
+   }
nfsi->npages++;
atomic_inc(>wb_count);
return 0;
@@ -386,8 +398,10 @@ static void nfs_inode_remove_request(str
BUG_ON (!NFS_WBACK_BUSY(req));
 
spin_lock(>req_lock);
-   set_page_private(req->wb_page, 0);
-   ClearPagePrivate(req->wb_page);
+   if (likely(!PageSwapCache(req->wb_page))) {
+   set_page_private(req->wb_page, 0);
+   ClearPagePrivate(req->wb_page);
+   }
radix_tree_delete(>nfs_page_tree, req->wb_index);
nfsi->npages--;
if (!nfsi->npages) {
@@ -600,7 +614,7 @@ static struct nfs_page * nfs_update_requ
 * A request for the page we wish to update
   

[PATCH 12/29] net: remove alloc_skb_from_cache

2007-02-21 Thread Peter Zijlstra
Lets get rid of the unused alloc_skb_from_cache() thing.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 include/linux/skbuff.h |3 --
 net/core/dev.c |1 
 net/core/skbuff.c  |   71 ++---
 3 files changed, 11 insertions(+), 64 deletions(-)

Index: linux-2.6-git/include/linux/skbuff.h
===
--- linux-2.6-git.orig/include/linux/skbuff.h   2007-02-14 08:31:13.0 
+0100
+++ linux-2.6-git/include/linux/skbuff.h2007-02-14 10:11:36.0 
+0100
@@ -345,9 +345,6 @@ static inline struct sk_buff *alloc_skb_
return __alloc_skb(size, priority, 1, -1);
 }
 
-extern struct sk_buff *alloc_skb_from_cache(struct kmem_cache *cp,
-   unsigned int size,
-   gfp_t priority);
 extern void   kfree_skbmem(struct sk_buff *skb);
 extern struct sk_buff *skb_clone(struct sk_buff *skb,
 gfp_t priority);
Index: linux-2.6-git/net/core/skbuff.c
===
--- linux-2.6-git.orig/net/core/skbuff.c2007-02-14 08:31:12.0 
+0100
+++ linux-2.6-git/net/core/skbuff.c 2007-02-14 10:11:16.0 +0100
@@ -198,61 +198,6 @@ nodata:
 }
 
 /**
- * alloc_skb_from_cache-   allocate a network buffer
- * @cp: kmem_cache from which to allocate the data area
- *   (object size must be big enough for @size bytes + skb overheads)
- * @size: size to allocate
- * @gfp_mask: allocation mask
- *
- * Allocate a new _buff. The returned buffer has no headroom and
- * tail room of size bytes. The object has a reference count of one.
- * The return is the buffer. On a failure the return is %NULL.
- *
- * Buffers may only be allocated from interrupts using a @gfp_mask of
- * %GFP_ATOMIC.
- */
-struct sk_buff *alloc_skb_from_cache(struct kmem_cache *cp,
-unsigned int size,
-gfp_t gfp_mask)
-{
-   struct sk_buff *skb;
-   u8 *data;
-
-   /* Get the HEAD */
-   skb = kmem_cache_alloc(skbuff_head_cache,
-  gfp_mask & ~__GFP_DMA);
-   if (!skb)
-   goto out;
-
-   /* Get the DATA. */
-   size = SKB_DATA_ALIGN(size);
-   data = kmem_cache_alloc(cp, gfp_mask);
-   if (!data)
-   goto nodata;
-
-   memset(skb, 0, offsetof(struct sk_buff, truesize));
-   skb->truesize = size + sizeof(struct sk_buff);
-   atomic_set(>users, 1);
-   skb->head = data;
-   skb->data = data;
-   skb->tail = data;
-   skb->end  = data + size;
-
-   atomic_set(&(skb_shinfo(skb)->dataref), 1);
-   skb_shinfo(skb)->nr_frags  = 0;
-   skb_shinfo(skb)->gso_size = 0;
-   skb_shinfo(skb)->gso_segs = 0;
-   skb_shinfo(skb)->gso_type = 0;
-   skb_shinfo(skb)->frag_list = NULL;
-out:
-   return skb;
-nodata:
-   kmem_cache_free(skbuff_head_cache, skb);
-   skb = NULL;
-   goto out;
-}
-
-/**
  * __netdev_alloc_skb - allocate an skbuff for rx on a specific device
  * @dev: network device to receive on
  * @length: length to allocate

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/29] mm: methods for teaching filesystems about PG_swapcache pages

2007-02-21 Thread Peter Zijlstra
In order to teach filesystems to handle swap cache pages, two new page
functions are introduced:

  pgoff_t page_file_index(struct page *);
  struct address_space *page_file_mapping(struct page *);

page_file_index - gives the offset of this page in the file in PAGE_CACHE_SIZE
blocks. Like page->index is for mapped pages, this function also gives the
correct index for PG_swapcache pages.

page_file_mapping - gives the mapping backing the actual page; that is for
swap cache pages it will give swap_file->f_mapping.

page_offset() is modified to use page_file_index(), so that it will give the
expected result, even for PG_swapcache pages.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
CC: Trond Myklebust <[EMAIL PROTECTED]>
---
 include/linux/mm.h  |   25 +
 include/linux/pagemap.h |2 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

Index: linux-2.6-git/include/linux/mm.h
===
--- linux-2.6-git.orig/include/linux/mm.h   2007-02-21 12:15:01.0 
+0100
+++ linux-2.6-git/include/linux/mm.h2007-02-21 12:15:07.0 +0100
@@ -594,6 +594,16 @@ static inline struct swap_info_struct *p
return get_swap_info_struct(swp_type(swap));
 }
 
+static inline
+struct address_space *page_file_mapping(struct page *page)
+{
+#ifdef CONFIG_SWAP_FILE
+   if (unlikely(PageSwapCache(page)))
+   return page_swap_info(page)->swap_file->f_mapping;
+#endif
+   return page->mapping;
+}
+
 static inline int PageAnon(struct page *page)
 {
return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
@@ -611,6 +621,21 @@ static inline pgoff_t page_index(struct 
 }
 
 /*
+ * Return the file index of the page. Regular pagecache pages use ->index
+ * whereas swapcache pages use swp_offset(->private)
+ */
+static inline pgoff_t page_file_index(struct page *page)
+{
+#ifdef CONFIG_SWAP_FILE
+   if (unlikely(PageSwapCache(page))) {
+   swp_entry_t swap = { .val = page_private(page) };
+   return swp_offset(swap);
+   }
+#endif
+   return page->index;
+}
+
+/*
  * The atomic page->_mapcount, like _count, starts from -1:
  * so that transitions both from it and to it can be tracked,
  * using atomic_inc_and_test and atomic_add_negative(-1).
Index: linux-2.6-git/include/linux/pagemap.h
===
--- linux-2.6-git.orig/include/linux/pagemap.h  2007-02-21 12:14:54.0 
+0100
+++ linux-2.6-git/include/linux/pagemap.h   2007-02-21 12:15:07.0 
+0100
@@ -120,7 +120,7 @@ extern void __remove_from_page_cache(str
  */
 static inline loff_t page_offset(struct page *page)
 {
-   return ((loff_t)page->index) << PAGE_CACHE_SHIFT;
+   return ((loff_t)page_file_index(page)) << PAGE_CACHE_SHIFT;
 }
 
 static inline pgoff_t linear_page_index(struct vm_area_struct *vma,

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/29] netvm: prevent a TCP specific deadlock

2007-02-21 Thread Peter Zijlstra
It could happen that all !SOCK_VMIO sockets have buffered so much data
that we're over the global rmem limit. This will prevent SOCK_VMIO buffers
from receiving data, which will prevent userspace from running, which is needed
to reduce the buffered data.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 include/net/sock.h  |7 ---
 net/core/stream.c   |5 +++--
 net/ipv4/tcp_ipv4.c |8 
 net/ipv6/tcp_ipv6.c |8 
 4 files changed, 23 insertions(+), 5 deletions(-)

Index: linux-2.6-git/include/net/sock.h
===
--- linux-2.6-git.orig/include/net/sock.h   2007-02-14 12:09:05.0 
+0100
+++ linux-2.6-git/include/net/sock.h2007-02-14 12:09:21.0 +0100
@@ -730,7 +730,8 @@ static inline struct inode *SOCK_INODE(s
 }
 
 extern void __sk_stream_mem_reclaim(struct sock *sk);
-extern int sk_stream_mem_schedule(struct sock *sk, int size, int kind);
+extern int sk_stream_mem_schedule(struct sock *sk, struct sk_buff *skb,
+   int size, int kind);
 
 #define SK_STREAM_MEM_QUANTUM ((int)PAGE_SIZE)
 
@@ -757,13 +758,13 @@ static inline void sk_stream_writequeue_
 static inline int sk_stream_rmem_schedule(struct sock *sk, struct sk_buff *skb)
 {
return (int)skb->truesize <= sk->sk_forward_alloc ||
-   sk_stream_mem_schedule(sk, skb->truesize, 1);
+   sk_stream_mem_schedule(sk, skb, skb->truesize, 1);
 }
 
 static inline int sk_stream_wmem_schedule(struct sock *sk, int size)
 {
return size <= sk->sk_forward_alloc ||
-  sk_stream_mem_schedule(sk, size, 0);
+  sk_stream_mem_schedule(sk, NULL, size, 0);
 }
 
 /* Used by processes to "lock" a socket state, so that
Index: linux-2.6-git/net/core/stream.c
===
--- linux-2.6-git.orig/net/core/stream.c2007-02-14 12:09:05.0 
+0100
+++ linux-2.6-git/net/core/stream.c 2007-02-14 12:09:21.0 +0100
@@ -207,7 +207,7 @@ void __sk_stream_mem_reclaim(struct sock
 
 EXPORT_SYMBOL(__sk_stream_mem_reclaim);
 
-int sk_stream_mem_schedule(struct sock *sk, int size, int kind)
+int sk_stream_mem_schedule(struct sock *sk, struct sk_buff *skb, int size, int 
kind)
 {
int amt = sk_stream_pages(size);
 
@@ -224,7 +224,8 @@ int sk_stream_mem_schedule(struct sock *
/* Over hard limit. */
if (atomic_read(sk->sk_prot->memory_allocated) > 
sk->sk_prot->sysctl_mem[2]) {
sk->sk_prot->enter_memory_pressure();
-   goto suppress_allocation;
+   if (!skb || (skb && !skb_emergency(skb)))
+   goto suppress_allocation;
}
 
/* Under pressure. */

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/29] mm: allow PF_MEMALLOC from softirq context

2007-02-21 Thread Peter Zijlstra
Allow PF_MEMALLOC to be set in softirq context. When running softirqs from
a borrowed context save current->flags, ksoftirqd will have its own 
task_struct.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 kernel/softirq.c |3 +++
 mm/internal.h|   14 --
 2 files changed, 11 insertions(+), 6 deletions(-)

Index: linux-2.6-git/mm/internal.h
===
--- linux-2.6-git.orig/mm/internal.h2006-12-14 10:02:52.0 +0100
+++ linux-2.6-git/mm/internal.h 2006-12-14 10:10:09.0 +0100
@@ -75,9 +75,10 @@ static int inline gfp_to_alloc_flags(gfp
alloc_flags |= ALLOC_HARDER;
 
if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
-   if (!in_interrupt() &&
-   ((p->flags & PF_MEMALLOC) ||
-unlikely(test_thread_flag(TIF_MEMDIE
+   if (!in_irq() && (p->flags & PF_MEMALLOC))
+   alloc_flags |= ALLOC_NO_WATERMARKS;
+   else if (!in_interrupt() &&
+   unlikely(test_thread_flag(TIF_MEMDIE)))
alloc_flags |= ALLOC_NO_WATERMARKS;
}
 
@@ -117,9 +118,10 @@ static inline int gfp_to_rank(gfp_t gfp_
 */
 
if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
-   if (!in_interrupt() &&
-   ((current->flags & PF_MEMALLOC) ||
-unlikely(test_thread_flag(TIF_MEMDIE
+   if (!in_irq() && (current->flags & PF_MEMALLOC))
+   return 0;
+   else if (!in_interrupt() &&
+   unlikely(test_thread_flag(TIF_MEMDIE)))
return 0;
}
 
Index: linux-2.6-git/kernel/softirq.c
===
--- linux-2.6-git.orig/kernel/softirq.c 2006-12-14 10:02:18.0 +0100
+++ linux-2.6-git/kernel/softirq.c  2006-12-14 10:02:52.0 +0100
@@ -209,6 +209,8 @@ asmlinkage void __do_softirq(void)
__u32 pending;
int max_restart = MAX_SOFTIRQ_RESTART;
int cpu;
+   unsigned long pflags = current->flags;
+   current->flags &= ~PF_MEMALLOC;
 
pending = local_softirq_pending();
account_system_vtime(current);
@@ -247,6 +249,7 @@ restart:
 
account_system_vtime(current);
_local_bh_enable();
+   current->flags = pflags;
 }
 
 #ifndef __ARCH_HAS_DO_SOFTIRQ

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 00/29] swap over networked storage -v11

2007-02-21 Thread Peter Zijlstra
(patches against 2.6.20-mm1)

There is a fundamental deadlock associated with paging; when writing out a page
to free memory requires free memory to complete. The usually solution is to
keep a small amount of memory available at all times so we can overcome this
problem. This however assumes the amount of memory needed for writeout is
(constant and) smaller than the provided reserve.

It is this latter assumption that breaks when doing writeout over network.
Network can take up an unspecified amount of memory while waiting for a reply
to our write request. This re-introduces the deadlock; we might never complete
the writeout, for we might not have enough memory to receive the completion
message.

The proposed solution is simple, only allow traffic servicing the VM to make
use of the reserves. Since the VM is always present to service, this limited
amount of memory can sustain a full connection; after a packet has been
processed its memory can be re-used for the next packet.

This however implies you know what packets are for whom, which generally
speaking you don't. Hence we need to receive all packets but discard them as
soon as we encounter a non VM bound packet allocated from the reserves.

Also knowing it is headed towards the VM needs a little help, hence we
introduce the socket flag SOCK_VMIO to mark sockets with.

Of course, since we are paging all this has to happen in kernel-space, since
user-space might just not be there.

Since packet processing might also require memory, this all also implies that
those auxiliary allocations may use the reserves when an emergency packet is
processed. This is accomplished by using PF_MEMALLOC.

How much memory is to be reserved is also an issue, enough memory to saturate
both the route cache and IP fragment reassembly, along with various constants.

This patch-set comes in 5 parts:

1) introduce the memory reserve and make the SLAB allocator play nice with it.
   patches 01-09

2) add some needed infrastructure to the network code
   patches 10-12

3) implement the idea outlined above
   patches 13-19

4) teach the swap machinery to use generic address_spaces
   patches 20-23

5) implement swap over NFS using all the new stuff
   patches 24-29
-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/29] mm: slab allocation fairness

2007-02-21 Thread Pekka Enberg

On 2/21/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:

[AIM9 results go here]


Yes please. I would really like to know what we gain by making the
slab even more complex.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 04/29] mm: serialize access to min_free_kbytes

2007-02-21 Thread Peter Zijlstra
There is a small race between the procfs caller and the memory hotplug caller
of setup_per_zone_pages_min(). Not a big deal, but the next patch will add yet
another caller. Time to close the gap.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 mm/page_alloc.c |   16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

Index: linux-2.6-git/mm/page_alloc.c
===
--- linux-2.6-git.orig/mm/page_alloc.c  2007-01-15 09:58:49.0 +0100
+++ linux-2.6-git/mm/page_alloc.c   2007-01-15 09:58:51.0 +0100
@@ -95,6 +95,7 @@ static char * const zone_names[MAX_NR_ZO
 #endif
 };
 
+static DEFINE_SPINLOCK(min_free_lock);
 int min_free_kbytes = 1024;
 
 unsigned long __meminitdata nr_kernel_pages;
@@ -3074,12 +3075,12 @@ static void setup_per_zone_lowmem_reserv
 }
 
 /**
- * setup_per_zone_pages_min - called when min_free_kbytes changes.
+ * __setup_per_zone_pages_min - called when min_free_kbytes changes.
  *
  * Ensures that the pages_{min,low,high} values for each zone are set correctly
  * with respect to min_free_kbytes.
  */
-void setup_per_zone_pages_min(void)
+static void __setup_per_zone_pages_min(void)
 {
unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
unsigned long lowmem_pages = 0;
@@ -3133,6 +3134,15 @@ void setup_per_zone_pages_min(void)
calculate_totalreserve_pages();
 }
 
+void setup_per_zone_pages_min(void)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(_free_lock, flags);
+   __setup_per_zone_pages_min();
+   spin_unlock_irqrestore(_free_lock, flags);
+}
+
 /*
  * Initialise min_free_kbytes.
  *
@@ -3168,7 +3178,7 @@ static int __init init_per_zone_pages_mi
min_free_kbytes = 128;
if (min_free_kbytes > 65536)
min_free_kbytes = 65536;
-   setup_per_zone_pages_min();
+   __setup_per_zone_pages_min();
setup_per_zone_lowmem_reserve();
return 0;
 }

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/29] mm: kmem_cache_objs_to_pages()

2007-02-21 Thread Peter Zijlstra
Provide a method to calculate the number of pages needed to store a given
number of slab objects (upper bound when considering possible partial and
free slabs).

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 include/linux/slab.h |1 +
 mm/slab.c|6 ++
 2 files changed, 7 insertions(+)

Index: linux-2.6-git/include/linux/slab.h
===
--- linux-2.6-git.orig/include/linux/slab.h 2007-01-09 11:28:32.0 
+0100
+++ linux-2.6-git/include/linux/slab.h  2007-01-09 11:30:16.0 +0100
@@ -43,6 +43,7 @@ typedef struct kmem_cache kmem_cache_t _
  */
 void __init kmem_cache_init(void);
 extern int slab_is_available(void);
+extern unsigned int kmem_cache_objs_to_pages(struct kmem_cache *, int);
 
 struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
unsigned long,
Index: linux-2.6-git/mm/slab.c
===
--- linux-2.6-git.orig/mm/slab.c2007-01-09 11:30:00.0 +0100
+++ linux-2.6-git/mm/slab.c 2007-01-09 11:30:16.0 +0100
@@ -4482,3 +4482,9 @@ unsigned int ksize(const void *objp)
 
return obj_size(virt_to_cache(objp));
 }
+
+unsigned int kmem_cache_objs_to_pages(struct kmem_cache *cachep, int nr)
+{
+   return ((nr + cachep->num - 1) / cachep->num) << cachep->gfporder;
+}
+EXPORT_SYMBOL_GPL(kmem_cache_objs_to_pages);

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/29] uml: rename arch/um remove_mapping()

2007-02-21 Thread Peter Zijlstra
When 'include/linux/mm.h' includes 'include/linux/swap.h', the global
remove_mapping() definition clashes with the arch/um one.

Rename the arch/um one.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
Acked-by: Jeff Dike <[EMAIL PROTECTED]>
---
 arch/um/kernel/physmem.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Index: linux-2.6-git/arch/um/kernel/physmem.c
===
--- linux-2.6-git.orig/arch/um/kernel/physmem.c 2007-02-12 09:40:47.0 
+0100
+++ linux-2.6-git/arch/um/kernel/physmem.c  2007-02-12 11:17:47.0 
+0100
@@ -160,7 +160,7 @@ int physmem_subst_mapping(void *virt, in
 
 static int physmem_fd = -1;
 
-static void remove_mapping(struct phys_desc *desc)
+static void um_remove_mapping(struct phys_desc *desc)
 {
void *virt = desc->virt;
int err;
@@ -184,7 +184,7 @@ int physmem_remove_mapping(void *virt)
if(desc == NULL)
return 0;
 
-   remove_mapping(desc);
+   um_remove_mapping(desc);
return 1;
 }
 
@@ -205,7 +205,7 @@ void physmem_forget_descriptor(int fd)
page = list_entry(ele, struct phys_desc, list);
offset = page->offset;
addr = page->virt;
-   remove_mapping(page);
+   um_remove_mapping(page);
err = os_seek_file(fd, offset);
if(err)
panic("physmem_forget_descriptor - failed to seek "

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/29] mm: allow mempool to fall back to memalloc reserves

2007-02-21 Thread Peter Zijlstra
Allow the mempool to use the memalloc reserves when all else fails and
the allocation context would otherwise allow it.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 mm/mempool.c |   10 ++
 1 file changed, 10 insertions(+)

Index: linux-2.6-git/mm/mempool.c
===
--- linux-2.6-git.orig/mm/mempool.c 2007-01-12 08:03:44.0 +0100
+++ linux-2.6-git/mm/mempool.c  2007-01-12 10:38:57.0 +0100
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include "internal.h"
 
 static void add_element(mempool_t *pool, void *element)
 {
@@ -229,6 +230,15 @@ repeat_alloc:
}
spin_unlock_irqrestore(>lock, flags);
 
+   /* if we really had right to the emergency reserves try those */
+   if (gfp_to_alloc_flags(gfp_mask) & ALLOC_NO_WATERMARKS) {
+   if (gfp_temp & __GFP_NOMEMALLOC) {
+   gfp_temp &= ~(__GFP_NOMEMALLOC|__GFP_NOWARN);
+   goto repeat_alloc;
+   } else
+   gfp_temp |= __GFP_NOMEMALLOC|__GFP_NOWARN;
+   }
+
/* We must not sleep in the GFP_ATOMIC case */
if (!(gfp_mask & __GFP_WAIT))
return NULL;

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/29] netfilter: notify about NF_QUEUE vs emergency skbs

2007-02-21 Thread Peter Zijlstra
Emergency skbs should never touch user-space, however NF_QUEUE is fully user
configurable. Notify the user of his mistake and try to continue.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 net/netfilter/core.c |5 +
 1 file changed, 5 insertions(+)

Index: linux-2.6-git/net/netfilter/core.c
===
--- linux-2.6-git.orig/net/netfilter/core.c 2007-02-14 12:09:07.0 
+0100
+++ linux-2.6-git/net/netfilter/core.c  2007-02-14 12:09:18.0 +0100
@@ -187,6 +187,11 @@ next_hook:
kfree_skb(*pskb);
ret = -EPERM;
} else if ((verdict & NF_VERDICT_MASK)  == NF_QUEUE) {
+   if (unlikely((*pskb)->emergency)) {
+   printk(KERN_ERR "nf_hook: NF_QUEUE encountered for "
+   "emergency skb - skipping rule.\n");
+   goto next_hook;
+   }
NFDEBUG("nf_hook: Verdict = QUEUE.\n");
if (!nf_queue(*pskb, elem, pf, hook, indev, outdev, okfn,
  verdict >> NF_VERDICT_BITS))

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/29] netvm: filter emergency skbs.

2007-02-21 Thread Peter Zijlstra
Toss all emergency packets not for a SOCK_VMIO socket. This ensures our
precious memory reserve doesn't get stuck waiting for user-space.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 include/net/sock.h |3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6-git/include/net/sock.h
===
--- linux-2.6-git.orig/include/net/sock.h   2007-02-14 16:15:49.0 
+0100
+++ linux-2.6-git/include/net/sock.h2007-02-14 16:16:27.0 +0100
@@ -926,6 +926,9 @@ static inline int sk_filter(struct sock 
 {
int err;
struct sk_filter *filter;
+
+   if (skb_emergency(skb) && !sk_has_vmio(sk))
+   return -EPERM;

err = security_sock_rcv_skb(sk, skb);
if (err)

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2.6.21-rc1] serial: serial_txx9 driver update

2007-02-21 Thread Atsushi Nemoto
Update the serial_txx9 driver.

 * Use platform_device.
 * Fix and cleanup suspend/resume codes.

Signed-off-by: Atsushi Nemoto <[EMAIL PROTECTED]>
---
diff --git a/drivers/serial/serial_txx9.c b/drivers/serial/serial_txx9.c
index f4440d3..124d056 100644
--- a/drivers/serial/serial_txx9.c
+++ b/drivers/serial/serial_txx9.c
@@ -38,6 +38,8 @@
  * Fix some spin_locks.
  * Do not call uart_add_one_port for absent ports.
  * 1.07Use CONFIG_SERIAL_TXX9_NR_UARTS.  Cleanup.
+ * 1.08Use platform_device.
+ * Fix and cleanup suspend/resume codes.
  */
 
 #if defined(CONFIG_SERIAL_TXX9_CONSOLE) && defined(CONFIG_MAGIC_SYSRQ)
@@ -50,7 +52,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -60,7 +62,7 @@
 
 #include 
 
-static char *serial_version = "1.07";
+static char *serial_version = "1.08";
 static char *serial_name = "TX39/49 Serial driver";
 
 #define PASS_LIMIT 256
@@ -94,12 +96,7 @@ static char *serial_name = "TX39/49 Seri
 
 struct uart_txx9_port {
struct uart_portport;
-
-   /*
-* We provide a per-port pm hook.
-*/
-   void(*pm)(struct uart_port *port,
- unsigned int state, unsigned int old);
+   /* No additional info for now */
 };
 
 #define TXX9_REGION_SIZE   0x24
@@ -277,6 +274,32 @@ static void serial_txx9_enable_ms(struct
/* TXX9-SIO can not control DTR... */
 }
 
+static void serial_txx9_initialize(struct uart_port *port)
+{
+   struct uart_txx9_port *up = (struct uart_txx9_port *)port;
+
+   sio_out(up, TXX9_SIFCR, TXX9_SIFCR_SWRST);
+#ifdef CONFIG_CPU_TX49XX
+   /* TX4925 BUG WORKAROUND.  Accessing SIOC register
+* immediately after soft reset causes bus error. */
+   iob();
+   udelay(1);
+#endif
+   while (sio_in(up, TXX9_SIFCR) & TXX9_SIFCR_SWRST)
+   ;
+   /* TX Int by FIFO Empty, RX Int by Receiving 1 char. */
+   sio_set(up, TXX9_SIFCR,
+   TXX9_SIFCR_TDIL_MAX | TXX9_SIFCR_RDIL_1);
+   /* initial settings */
+   sio_out(up, TXX9_SILCR,
+   TXX9_SILCR_UMODE_8BIT | TXX9_SILCR_USBL_1BIT |
+   ((up->port.flags & UPF_TXX9_USE_SCLK) ?
+TXX9_SILCR_SCS_SCLK_BG : TXX9_SILCR_SCS_IMCLK_BG));
+   sio_quot_set(up, uart_get_divisor(port, 9600));
+   sio_out(up, TXX9_SIFLCR, TXX9_SIFLCR_RTSTL_MAX /* 15 */);
+   sio_out(up, TXX9_SIDICR, 0);
+}
+
 static inline void
 receive_chars(struct uart_txx9_port *up, unsigned int *status)
 {
@@ -657,9 +680,8 @@ static void
 serial_txx9_pm(struct uart_port *port, unsigned int state,
  unsigned int oldstate)
 {
-   struct uart_txx9_port *up = (struct uart_txx9_port *)port;
-   if (up->pm)
-   up->pm(port, state, oldstate);
+   if (state == 0)
+   serial_txx9_initialize(port);
 }
 
 static int serial_txx9_request_resource(struct uart_txx9_port *up)
@@ -732,7 +754,6 @@ static int serial_txx9_request_port(stru
 static void serial_txx9_config_port(struct uart_port *port, int uflags)
 {
struct uart_txx9_port *up = (struct uart_txx9_port *)port;
-   unsigned long flags;
int ret;
 
/*
@@ -749,30 +770,7 @@ static void serial_txx9_config_port(stru
if (up->port.line == up->port.cons->index)
return;
 #endif
-   spin_lock_irqsave(>port.lock, flags);
-   /*
-* Reset the UART.
-*/
-   sio_out(up, TXX9_SIFCR, TXX9_SIFCR_SWRST);
-#ifdef CONFIG_CPU_TX49XX
-   /* TX4925 BUG WORKAROUND.  Accessing SIOC register
-* immediately after soft reset causes bus error. */
-   iob();
-   udelay(1);
-#endif
-   while (sio_in(up, TXX9_SIFCR) & TXX9_SIFCR_SWRST)
-   ;
-   /* TX Int by FIFO Empty, RX Int by Receiving 1 char. */
-   sio_set(up, TXX9_SIFCR,
-   TXX9_SIFCR_TDIL_MAX | TXX9_SIFCR_RDIL_1);
-   /* initial settings */
-   sio_out(up, TXX9_SILCR,
-   TXX9_SILCR_UMODE_8BIT | TXX9_SILCR_USBL_1BIT |
-   ((up->port.flags & UPF_TXX9_USE_SCLK) ?
-TXX9_SILCR_SCS_SCLK_BG : TXX9_SILCR_SCS_IMCLK_BG));
-   sio_quot_set(up, uart_get_divisor(port, 9600));
-   sio_out(up, TXX9_SIFLCR, TXX9_SIFLCR_RTSTL_MAX /* 15 */);
-   spin_unlock_irqrestore(>port.lock, flags);
+   serial_txx9_initialize(port);
 }
 
 static int
@@ -818,7 +816,8 @@ static struct uart_ops serial_txx9_pops
 
 static struct uart_txx9_port serial_txx9_ports[UART_NR];
 
-static void __init serial_txx9_register_ports(struct uart_driver *drv)
+static void __init serial_txx9_register_ports(struct uart_driver *drv,
+ struct device *dev)
 {
int i;
 
@@ -827,6 +826,7 @@ static void __init serial_txx9_register_
 
up->port.line = i;
up->port.ops = _txx9_pops;
+   up->port.dev = dev;
 

Re: High CPU usage with sata_nv

2007-02-21 Thread Matthew Fredrickson


On Feb 20, 2007, at 9:43 PM, Robert Hancock wrote:


Matthew Fredrickson wrote:
I have noticed something that might be related as well.  I am working 
on a device driver that would have periodic data errors due to 
exceptionally long interrupt handling latency.  I have come to the 
point that I suspect that it's the sata_nv driver, and now that we 
can't do the hdparm -u1 option for sata, it seems to be a bigger 
problem.


What kernel are you using? There were some complaints about latency 
problems (the ATA status register read taking a ridiculous amount of 
time to complete) on sata_nv previously, but 2.6.20 should eliminate 
that problem at least on nForce4 chipsets..




It's a 2.6.18 kernel.  What we're seeing (by means of the interrupt pin 
on another card) is extremely large interrupt latency (measured from 
the time the interrupt pin goes low to the first couple lines of code 
in the IRQ handler to clear it) occasionally, in the order of 500-700 
microseconds.  I figured it was some other driver on the system 
disabling irqs for a long period of time, but it's difficult to trace 
what might be doing that.


Matthew Fredrickson

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git15 BUG: soft lockup detected on CPU#0! - timers?

2007-02-21 Thread Thomas Gleixner
On Tue, 2007-02-20 at 23:37 +0100, Michal Piotrowski wrote:
> On 20/02/07, Thomas Gleixner <[EMAIL PROTECTED]> wrote:
> > On Tue, 2007-02-20 at 19:54 +0100, Michal Piotrowski wrote:
> > >
> > > Might it be 6ba9b346e1e0eca65ec589d32de3a9fe32dc5de6 commit?
> >
> > I doubt that it is, but can you revert it ?
> 
> I'm using the latest kernel without this patch since 3 hours.
> 
> So far so good.

But you still have those softirq pending messages, right ? I think those
are pointing to the root cause of this. Still no idea how to get hold of
them. All my systems refuse to produce that. Hrmpf.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] PATA_PCMCIA does not work

2007-02-21 Thread Manuel Lauss
On Wed, Feb 21, 2007 at 03:37:29PM +, Alan wrote:
> Does this fix the oops ?
> 
> Alan
> 
> 
> diff -u --new-file --recursive --exclude-from /usr/src/exclude 
> linux.vanilla-2.6.20-mm2/drivers/ata/pata_pcmcia.c 
> linux-2.6.20-mm2/drivers/ata/pata_pcmcia.c
> --- linux.vanilla-2.6.20-mm2/drivers/ata/pata_pcmcia.c2007-02-20 
> 13:37:58.0 +
> +++ linux-2.6.20-mm2/drivers/ata/pata_pcmcia.c2007-02-21 
> 14:06:58.792707976 +
> @@ -308,7 +342,6 @@
>   if (info->ndev) {
>   struct ata_host *host = dev_get_drvdata(dev);
>   ata_host_detach(host);
> - dev_set_drvdata(dev, NULL);
>   }
>   info->ndev = 0;
>   pdev->priv = NULL;

This bug is also present in 2.6.21-rc1, and this patch
indeed fixes it.

Thanks,

-- 
 ml.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/29] selinux: tag avc cache alloc as non-critical

2007-02-21 Thread James Morris
On Wed, 21 Feb 2007, Peter Zijlstra wrote:

> Failing to allocate a cache entry will only harm performance.
> 
> Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
> ---
>  security/selinux/avc.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Acked-by: James Morris <[EMAIL PROTECTED]>

> 
> Index: linux-2.6-git/security/selinux/avc.c
> ===
> --- linux-2.6-git.orig/security/selinux/avc.c 2007-02-14 08:31:13.0 
> +0100
> +++ linux-2.6-git/security/selinux/avc.c  2007-02-14 10:10:47.0 
> +0100
> @@ -332,7 +332,7 @@ static struct avc_node *avc_alloc_node(v
>  {
>   struct avc_node *node;
>  
> - node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC);
> + node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC|__GFP_NOMEMALLOC);
>   if (!node)
>   goto out;
>  
> 
> 

-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA problems

2007-02-21 Thread Pablo Sebastian Greco

Tejun Heo wrote:

Pablo Sebastian Greco wrote:
  

Tejun Heo wrote:


* Pablo, the bug you saw was bad interaction between blacklisted NCQ
device and dynamic queue depth adjustment.  Patches are submitted to fix
the problem.  Just drop the blacklist patch.  Your drives should work
fine in NCQ mode.  My gut feeling is that your problem is power related
from the beginning.
  
  

I had the same problems with a new Power Supply, Now everything is ok
with the old Power Supply and the new drives.



So, it was bad drives?  Are you using the same model or different ones?
 NCQ works okay now?

  
All I can say is that now is working, other things changed with the new 
drives: 1.5Gbps instead of 3Gbps, also new drives don't use NCQ (I'm 
reattaching  a full dmesg).
Also I've found this firmware upgrade 
(http://www.samsung.com/Products/HardDiskDrive/support/faqs/faqs_20060414_246673.htm) 
for the old drives, but couldn't confirm if it should be applied because 
the server is in Brazil and I live in Argentina. Won't be there until 
April to test.


Thanks.
Pablo.
Linux version 2.6.19-1.2895.fc6 ([EMAIL PROTECTED]) (gcc version 4.1.1 20070105 
(Red Hat 4.1.1-51)) #1 SMP Wed Jan 10 18:50:56 EST 2007
Command line: ro root=LABEL=/
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009ec00 (usable)
 BIOS-e820: 0009ec00 - 0010 (reserved)
 BIOS-e820: 0010 - df938000 (usable)
 BIOS-e820: df938000 - df9d2000 (ACPI NVS)
 BIOS-e820: df9d2000 - dfa42000 (usable)
 BIOS-e820: dfa42000 - dfa9a000 (reserved)
 BIOS-e820: dfa9a000 - dfab8000 (usable)
 BIOS-e820: dfab8000 - dfb1a000 (ACPI NVS)
 BIOS-e820: dfb1a000 - dfb2c000 (usable)
 BIOS-e820: dfb2c000 - dfb3a000 (ACPI data)
 BIOS-e820: dfb3a000 - dfc0 (usable)
 BIOS-e820: ffc0 - ffc0c000 (reserved)
 BIOS-e820: 0001 - 00012000 (usable)
Entering add_active_range(0, 0, 158) 0 entries of 3200 used
Entering add_active_range(0, 256, 915768) 1 entries of 3200 used
Entering add_active_range(0, 915922, 916034) 2 entries of 3200 used
Entering add_active_range(0, 916122, 916152) 3 entries of 3200 used
Entering add_active_range(0, 916250, 916268) 4 entries of 3200 used
Entering add_active_range(0, 916282, 916480) 5 entries of 3200 used
Entering add_active_range(0, 1048576, 1179648) 6 entries of 3200 used
end_pfn_map = 1179648
DMI 2.4 present.
ACPI: RSDP (v002 INTEL ) @ 0x000f0350
ACPI: XSDT (v001 INTEL  S5000VSA 0x INTL 0x0113) @ 
0xdfb39120
ACPI: FADT (v003 INTEL  S5000VSA 0x INTL 0x0113) @ 
0xdfb36000
ACPI: MADT (v001 INTEL  S5000VSA 0x INTL 0x0113) @ 
0xdfb35000
ACPI: SPCR (v001 INTEL  S5000VSA 0x INTL 0x0113) @ 
0xdfb2f000
ACPI: HPET (v001 INTEL  S5000VSA 0x0001 INTL 0x0113) @ 
0xdfb2e000
ACPI: MCFG (v001 INTEL  S5000VSA 0x0001 INTL 0x0113) @ 
0xdfb2d000
ACPI: SSDT (v002 INTEL  S5000VSA 0x4000 INTL 0x0113) @ 
0xdfb2c000
ACPI: DSDT (v002 INTEL  S5000VSA 0x0008 INTL 0x0113) @ 
0x
No NUMA configuration found
Faking a node at -00012000
Entering add_active_range(0, 0, 158) 0 entries of 3200 used
Entering add_active_range(0, 256, 915768) 1 entries of 3200 used
Entering add_active_range(0, 915922, 916034) 2 entries of 3200 used
Entering add_active_range(0, 916122, 916152) 3 entries of 3200 used
Entering add_active_range(0, 916250, 916268) 4 entries of 3200 used
Entering add_active_range(0, 916282, 916480) 5 entries of 3200 used
Entering add_active_range(0, 1048576, 1179648) 6 entries of 3200 used
Bootmem setup node 0 -00012000
Zone PFN ranges:
  DMA 0 -> 4096
  DMA324096 ->  1048576
  Normal1048576 ->  1179648
early_node_map[7] active PFN ranges
0:0 ->  158
0:  256 ->   915768
0:   915922 ->   916034
0:   916122 ->   916152
0:   916250 ->   916268
0:   916282 ->   916480
0:  1048576 ->  1179648
On node 0 totalpages: 1047100
  DMA zone: 64 pages used for memmap
  DMA zone: 1450 pages reserved
  DMA zone: 2484 pages, LIFO batch:0
  DMA32 zone: 16320 pages used for memmap
  DMA32 zone: 895710 pages, LIFO batch:31
  Normal zone: 2048 pages used for memmap
  Normal zone: 129024 pages, LIFO batch:31
ACPI: PM-Timer IO Port: 0x408
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 (Bootup-CPU)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
Processor #2
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
Processor #3
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x84] disabled)
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x85] disabled)
ACPI: LAPIC 

Re: [PATCH] [MTD] CHIPS: oops in cfi_amdstd_sync

2007-02-21 Thread Josh Boyer
On Wed, 2007-02-21 at 14:54 +, Jörn Engel wrote:
> On Tue, 20 February 2007 17:46:13 -0800, Vijay Sampath wrote:
> > 
> > The files cfi_cmdset_0002.c and cfi_cmdset_0020.c do not initialize
> > their wait queues like is done in cfi_cmdset_0001.c. This causes an
> > oops when the wait queue is accessed. I have copied the code from
> > cfi_cmdset_0001.c that is pertinent to initialization of the wait
> > queue.
> 
> Patch looks good, but I can no longer test it.  Josh may still have
> access to some commandset 20 chips.  Josh, any objections?

The patch looks good to me as well.  No access to those chips anymore,
but I have no objections.

josh

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] tty: Clarify documentation of ->write()

2007-02-21 Thread Alan
The tty driver write method is different to the usual fops device write
methods as the buffer is already in kernel space. Clarify the docs since
someone writing a driver made that mistake.

Signed-off-by: Alan Cox <[EMAIL PROTECTED]>

diff -u --new-file --recursive --exclude-from /usr/src/exclude 
linux.vanilla-2.6.20-mm2/Documentation/tty.txt 
linux-2.6.20-mm2/Documentation/tty.txt
--- linux.vanilla-2.6.20-mm2/Documentation/tty.txt  2007-02-20 
12:32:28.0 +
+++ linux-2.6.20-mm2/Documentation/tty.txt  2007-02-21 14:56:36.521024464 
+
@@ -108,7 +108,9 @@
 structure:
 
 write()Write a block of characters to the tty device.
-   Returns the number of characters accepted.
+   Returns the number of characters accepted. The 
+   character buffer passed to this method is already
+   in kernel space.
 
 put_char() Queues a character for writing to the tty device.
If there is no room in the queue, the character is
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 26/29] nfs: teach the NFS client how to treat PG_swapcache pages

2007-02-21 Thread Peter Zijlstra
Replace all relevant occurences of page->index and page->mapping in the NFS
client with the new page_file_index() and page_file_mapping() functions.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
Cc: Trond Myklebust <[EMAIL PROTECTED]>
---
 fs/nfs/file.c |4 ++--
 fs/nfs/internal.h |7 ---
 fs/nfs/pagelist.c |6 +++---
 fs/nfs/read.c |6 +++---
 fs/nfs/write.c|   35 ++-
 5 files changed, 30 insertions(+), 28 deletions(-)

Index: linux-2.6/fs/nfs/file.c
===
--- linux-2.6.orig/fs/nfs/file.c
+++ linux-2.6/fs/nfs/file.c
@@ -310,7 +310,7 @@ static void nfs_invalidate_page(struct p
if (offset != 0)
return;
/* Cancel any unstarted writes on this page */
-   nfs_wb_page_priority(page->mapping->host, page, FLUSH_INVALIDATE);
+   nfs_wb_page_priority(page_file_mapping(page)->host, page, 
FLUSH_INVALIDATE);
 }
 
 static int nfs_release_page(struct page *page, gfp_t gfp)
@@ -321,7 +321,7 @@ static int nfs_release_page(struct page 
 
 static int nfs_launder_page(struct page *page)
 {
-   return nfs_wb_page(page->mapping->host, page);
+   return nfs_wb_page(page_file_mapping(page)->host, page);
 }
 
 const struct address_space_operations nfs_file_aops = {
Index: linux-2.6/fs/nfs/pagelist.c
===
--- linux-2.6.orig/fs/nfs/pagelist.c
+++ linux-2.6/fs/nfs/pagelist.c
@@ -81,11 +81,11 @@ nfs_create_request(struct nfs_open_conte
 * update_nfs_request below if the region is not locked. */
req->wb_page= page;
atomic_set(>wb_complete, 0);
-   req->wb_index   = page->index;
+   req->wb_index   = page_file_index(page);
page_cache_get(page);
BUG_ON(PagePrivate(page));
BUG_ON(!PageLocked(page));
-   BUG_ON(page->mapping->host != inode);
+   BUG_ON(page_file_mapping(page)->host != inode);
req->wb_offset  = offset;
req->wb_pgbase  = offset;
req->wb_bytes   = count;
@@ -338,7 +338,7 @@ out:
  * @nfsi: NFS inode
  * @head: One of the NFS inode request lists
  * @dst: Destination list
- * @idx_start: lower bound of page->index to scan
+ * @idx_start: lower bound of page_file_index(page) to scan
  * @npages: idx_start + npages sets the upper bound to scan.
  *
  * Moves elements from one of the inode request lists.
Index: linux-2.6/fs/nfs/read.c
===
--- linux-2.6.orig/fs/nfs/read.c
+++ linux-2.6/fs/nfs/read.c
@@ -492,11 +492,11 @@ static const struct rpc_call_ops nfs_rea
 int nfs_readpage(struct file *file, struct page *page)
 {
struct nfs_open_context *ctx;
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = page_file_mapping(page)->host;
int error;
 
dprintk("NFS: nfs_readpage (%p [EMAIL PROTECTED])\n",
-   page, PAGE_CACHE_SIZE, page->index);
+   page, PAGE_CACHE_SIZE, page_file_index(page));
nfs_inc_stats(inode, NFSIOS_VFSREADPAGE);
nfs_add_stats(inode, NFSIOS_READPAGES, 1);
 
@@ -543,7 +543,7 @@ static int
 readpage_async_filler(void *data, struct page *page)
 {
struct nfs_readdesc *desc = (struct nfs_readdesc *)data;
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = page_file_mapping(page)->host;
struct nfs_page *new;
unsigned int len;
 
Index: linux-2.6/fs/nfs/write.c
===
--- linux-2.6.orig/fs/nfs/write.c
+++ linux-2.6/fs/nfs/write.c
@@ -122,7 +122,7 @@ static struct nfs_page *nfs_page_find_re
 static struct nfs_page *nfs_page_find_request(struct page *page)
 {
struct nfs_page *req = NULL;
-   spinlock_t *req_lock = _I(page->mapping->host)->req_lock;
+   spinlock_t *req_lock = _I(page_file_mapping(page)->host)->req_lock;
 
spin_lock(req_lock);
req = nfs_page_find_request_locked(page);
@@ -133,13 +133,13 @@ static struct nfs_page *nfs_page_find_re
 /* Adjust the file length if we're writing beyond the end */
 static void nfs_grow_file(struct page *page, unsigned int offset, unsigned int 
count)
 {
-   struct inode *inode = page->mapping->host;
+   struct inode *inode = page_file_mapping(page)->host;
loff_t end, i_size = i_size_read(inode);
unsigned long end_index = (i_size - 1) >> PAGE_CACHE_SHIFT;
 
-   if (i_size > 0 && page->index < end_index)
+   if (i_size > 0 && page_file_index(page) < end_index)
return;
-   end = ((loff_t)page->index << PAGE_CACHE_SHIFT) + 
((loff_t)offset+count);
+   end = page_offset(page) + ((loff_t)offset+count);
if (i_size >= end)
return;
nfs_inc_stats(inode, NFSIOS_EXTENDWRITE);
@@ -150,7 +150,7 @@ static void nfs_grow_file(struct page *p
 static void nfs_set_pageerror(struct 

[PATCH 19/29] netvm: skb processing

2007-02-21 Thread Peter Zijlstra
In order to make sure emergency packets receive all memory needed to proceed
ensure processing of emergency skbs happens under PF_MEMALLOC.

Use the (new) sk_backlog_rcv() wrapper to ensure this for backlog processing.

Skip taps, since those are user-space again.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 include/net/sock.h |4 
 net/core/dev.c |   42 +-
 net/core/sock.c|   19 +++
 3 files changed, 60 insertions(+), 5 deletions(-)

Index: linux-2.6-git/net/core/dev.c
===
--- linux-2.6-git.orig/net/core/dev.c   2007-02-14 12:16:03.0 +0100
+++ linux-2.6-git/net/core/dev.c2007-02-14 12:28:33.0 +0100
@@ -1767,10 +1767,23 @@ int netif_receive_skb(struct sk_buff *sk
struct net_device *orig_dev;
int ret = NET_RX_DROP;
__be16 type;
+   unsigned long pflags = current->flags;
+
+   /* Emergency skb are special, they should
+*  - be delivered to SOCK_VMIO sockets only
+*  - stay away from userspace
+*  - have bounded memory usage
+*
+* Use PF_MEMALLOC as a poor mans memory pool - the grouping kind.
+* This saves us from propagating the allocation context down to all
+* allocation sites.
+*/
+   if (skb_emergency(skb))
+   current->flags |= PF_MEMALLOC;
 
/* if we've gotten here through NAPI, check netpoll */
if (skb->dev->poll && netpoll_rx(skb))
-   return NET_RX_DROP;
+   goto out;
 
if (!skb->tstamp.off_sec)
net_timestamp(skb);
@@ -1781,7 +1794,7 @@ int netif_receive_skb(struct sk_buff *sk
orig_dev = skb_bond(skb);
 
if (!orig_dev)
-   return NET_RX_DROP;
+   goto out;
 
__get_cpu_var(netdev_rx_stat).total++;
 
@@ -1799,6 +1812,9 @@ int netif_receive_skb(struct sk_buff *sk
}
 #endif
 
+   if (skb_emergency(skb))
+   goto skip_taps;
+
list_for_each_entry_rcu(ptype, _all, list) {
if (!ptype->dev || ptype->dev == skb->dev) {
if (pt_prev)
@@ -1807,6 +1823,7 @@ int netif_receive_skb(struct sk_buff *sk
}
}
 
+skip_taps:
 #ifdef CONFIG_NET_CLS_ACT
if (pt_prev) {
ret = deliver_skb(skb, pt_prev, orig_dev);
@@ -1819,15 +1836,27 @@ int netif_receive_skb(struct sk_buff *sk
 
if (ret == TC_ACT_SHOT || (ret == TC_ACT_STOLEN)) {
kfree_skb(skb);
-   goto out;
+   goto unlock;
}
 
skb->tc_verd = 0;
 ncls:
 #endif
 
+   if (skb_emergency(skb))
+   switch(skb->protocol) {
+   case __constant_htons(ETH_P_ARP):
+   case __constant_htons(ETH_P_IP):
+   case __constant_htons(ETH_P_IPV6):
+   case __constant_htons(ETH_P_8021Q):
+   break;
+
+   default:
+   goto drop;
+   }
+
if (handle_bridge(, _prev, , orig_dev))
-   goto out;
+   goto unlock;
 
type = skb->protocol;
list_for_each_entry_rcu(ptype, _base[ntohs(type)&15], list) {
@@ -1842,6 +1871,7 @@ ncls:
if (pt_prev) {
ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
} else {
+drop:
kfree_skb(skb);
/* Jamal, now you will not able to escape explaining
 * me how you were going to use this. :-)
@@ -1849,8 +1879,10 @@ ncls:
ret = NET_RX_DROP;
}
 
-out:
+unlock:
rcu_read_unlock();
+out:
+   current->flags = pflags;
return ret;
 }
 
Index: linux-2.6-git/include/net/sock.h
===
--- linux-2.6-git.orig/include/net/sock.h   2007-02-14 12:32:03.0 
+0100
+++ linux-2.6-git/include/net/sock.h2007-02-14 12:32:37.0 +0100
@@ -510,10 +510,14 @@ static inline void sk_add_backlog(struct
skb->next = NULL;
 }
 
+#ifndef CONFIG_NETVM
 static inline int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
 {
return sk->sk_backlog_rcv(sk, skb);
 }
+#else
+extern int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb);
+#endif
 
 #define sk_wait_event(__sk, __timeo, __condition)  \
 ({ int rc; \
Index: linux-2.6-git/net/core/sock.c
===
--- linux-2.6-git.orig/net/core/sock.c  2007-02-14 12:32:07.0 +0100
+++ linux-2.6-git/net/core/sock.c   2007-02-14 12:37:11.0 +0100
@@ -332,6 +332,25 @@ int sk_clear_vmio(struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(sk_clear_vmio);
 
+#ifdef CONFIG_NETVM
+int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)

Re: [PATCH] free swap space when (re)activating page

2007-02-21 Thread Al Boldi
Rik van Riel wrote:
> Rik van Riel wrote:
> > ... because I think this is what my patch does :)
>
> Never mind, I see it now.
>
> The attached patch should be correct.

Your patch seems to improve the situation a little bit, but the numbers still 
look weird, especially for swap-in, which gets progressively slower.

RAM 512mb , SWAP 1G
#mount -t tmpfs -o size=1G none /dev/shm
#time cat /dev/full > /dev/shm/x.dmp
15sec
#time cat /dev/shm/x.dmp > /dev/null
58sec
#time cat /dev/shm/x.dmp > /dev/null
72sec
#time cat /dev/shm/x.dmp > /dev/null
85sec
#time cat /dev/shm/x.dmp > /dev/null
93sec
#time cat /dev/shm/x.dmp > /dev/null
99sec


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 10/29] net: wrap sk->sk_backlog_rcv()

2007-02-21 Thread Peter Zijlstra
Wrap calling sk->sk_backlog_rcv() in a function. This will allow extending the
generic sk_backlog_rcv behaviour.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 include/net/sock.h   |5 +
 net/core/sock.c  |4 ++--
 net/ipv4/tcp.c   |2 +-
 net/ipv4/tcp_timer.c |2 +-
 4 files changed, 9 insertions(+), 4 deletions(-)

Index: linux-2.6-git/include/net/sock.h
===
--- linux-2.6-git.orig/include/net/sock.h   2007-02-14 11:29:55.0 
+0100
+++ linux-2.6-git/include/net/sock.h2007-02-14 11:42:00.0 +0100
@@ -480,6 +480,11 @@ static inline void sk_add_backlog(struct
skb->next = NULL;
 }
 
+static inline int sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
+{
+   return sk->sk_backlog_rcv(sk, skb);
+}
+
 #define sk_wait_event(__sk, __timeo, __condition)  \
 ({ int rc; \
release_sock(__sk); \
Index: linux-2.6-git/net/core/sock.c
===
--- linux-2.6-git.orig/net/core/sock.c  2007-02-14 11:29:55.0 +0100
+++ linux-2.6-git/net/core/sock.c   2007-02-14 11:42:00.0 +0100
@@ -290,7 +290,7 @@ int sk_receive_skb(struct sock *sk, stru
 */
mutex_acquire(>sk_lock.dep_map, 0, 1, _RET_IP_);
 
-   rc = sk->sk_backlog_rcv(sk, skb);
+   rc = sk_backlog_rcv(sk, skb);
 
mutex_release(>sk_lock.dep_map, 1, _RET_IP_);
} else
@@ -1244,7 +1244,7 @@ static void __release_sock(struct sock *
struct sk_buff *next = skb->next;
 
skb->next = NULL;
-   sk->sk_backlog_rcv(sk, skb);
+   sk_backlog_rcv(sk, skb);
 
/*
 * We are in process context here with softirqs
Index: linux-2.6-git/net/ipv4/tcp.c
===
--- linux-2.6-git.orig/net/ipv4/tcp.c   2007-02-14 11:29:35.0 +0100
+++ linux-2.6-git/net/ipv4/tcp.c2007-02-14 11:42:00.0 +0100
@@ -1002,7 +1002,7 @@ static void tcp_prequeue_process(struct 
 * necessary */
local_bh_disable();
while ((skb = __skb_dequeue(>ucopy.prequeue)) != NULL)
-   sk->sk_backlog_rcv(sk, skb);
+   sk_backlog_rcv(sk, skb);
local_bh_enable();
 
/* Clear memory counter. */
Index: linux-2.6-git/net/ipv4/tcp_timer.c
===
--- linux-2.6-git.orig/net/ipv4/tcp_timer.c 2007-02-14 11:29:36.0 
+0100
+++ linux-2.6-git/net/ipv4/tcp_timer.c  2007-02-14 11:42:00.0 +0100
@@ -198,7 +198,7 @@ static void tcp_delack_timer(unsigned lo
NET_INC_STATS_BH(LINUX_MIB_TCPSCHEDULERFAILED);
 
while ((skb = __skb_dequeue(>ucopy.prequeue)) != NULL)
-   sk->sk_backlog_rcv(sk, skb);
+   sk_backlog_rcv(sk, skb);
 
tp->ucopy.memory = 0;
}

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] Could the k8temp driver be interfering with ACPI?

2007-02-21 Thread Jean Delvare
Hi Matthew,

On Tue, 20 Feb 2007 15:18:13 +, Matthew Garrett wrote:
> On Sun, Feb 18, 2007 at 06:38:05PM +0100, Jean Delvare wrote:
> 
> > ACPI is broken here, not k8temp, so let's fix ACPI instead. ACPI
> > doesn't conflict with only k8temp, but with virtually all hardware
> > monitoring drivers, all I2C/SMBus drivers, and probably other types of
> > drivers too. We just can't restrict or blacklist all these drivers
> > because ACPI misbehaves.
> 
> No, the simple fact of the matter is that if you're running on an ACPI 
> platform you need to change some of your assumptions. ACPI owns the 
> hardware. The OS doesn't. To an extent this has always been true on 

The Linux device driver model assumes that it owns the hardware. If
this is not true, then should we prevent any non-ACPI driver from
loading as soon as ACPI is enabled?

> laptops and servers /anyway/ - the BIOS is free to have a wide variety 
> of SMM insanity that invalidates basic assumptions like "If I hold this 
> lock, nothing can interrupt me between this write and this read". That's 
> simply not true.

Yeah, this is correct, and just as unfortunate. It's amazingly sad that
hardware vendors as a whole are still repeating the same design
mistakes over and over again :(

> So this isn't about fixing ACPI. It's about trying to find a mechanism 
> that allows ACPI and raw hardware drivers to coexist, which is made 

Exactly what I said, you're only rewording it to make it sound nicer ;)

> somewhat harder by it not being a situation that the platform designers 
> have considered in the slightest. The suggested low-level driver for 
> io-port arbitration would certainly be a step forward in making this 
> work better.

I sure hope we can find a solution, by as your said yourself, nothing
is going to prevent SMM and similar oddities from messing up the drivers
assumptions.

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24/29] nfs: remove mempools

2007-02-21 Thread Peter Zijlstra
With the introduction of the shared dirty page accounting in .19, NFS should
not be able to surpise the VM with all dirty pages. Thus it should always be
able to free some memory. Hence no more need for mempools.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
Cc: Trond Myklebust <[EMAIL PROTECTED]>
---
 fs/nfs/read.c  |   15 +++
 fs/nfs/write.c |   27 +--
 2 files changed, 8 insertions(+), 34 deletions(-)

Index: linux-2.6-git/fs/nfs/read.c
===
--- linux-2.6-git.orig/fs/nfs/read.c2007-02-21 12:14:54.0 +0100
+++ linux-2.6-git/fs/nfs/read.c 2007-02-21 12:15:10.0 +0100
@@ -32,14 +32,11 @@ static const struct rpc_call_ops nfs_rea
 static const struct rpc_call_ops nfs_read_full_ops;
 
 static struct kmem_cache *nfs_rdata_cachep;
-static mempool_t *nfs_rdata_mempool;
-
-#define MIN_POOL_READ  (32)
 
 struct nfs_read_data *nfs_readdata_alloc(size_t len)
 {
unsigned int pagecount = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   struct nfs_read_data *p = mempool_alloc(nfs_rdata_mempool, GFP_NOFS);
+   struct nfs_read_data *p = kmem_cache_alloc(nfs_rdata_cachep, GFP_NOFS);
 
if (p) {
memset(p, 0, sizeof(*p));
@@ -50,7 +47,7 @@ struct nfs_read_data *nfs_readdata_alloc
else {
p->pagevec = kcalloc(pagecount, sizeof(struct page *), 
GFP_NOFS);
if (!p->pagevec) {
-   mempool_free(p, nfs_rdata_mempool);
+   kmem_cache_free(nfs_rdata_cachep, p);
p = NULL;
}
}
@@ -63,7 +60,7 @@ static void nfs_readdata_rcu_free(struct
struct nfs_read_data *p = container_of(head, struct nfs_read_data, 
task.u.tk_rcu);
if (p && (p->pagevec != >page_array[0]))
kfree(p->pagevec);
-   mempool_free(p, nfs_rdata_mempool);
+   kmem_cache_free(nfs_rdata_cachep, p);
 }
 
 static void nfs_readdata_free(struct nfs_read_data *rdata)
@@ -614,16 +611,10 @@ int __init nfs_init_readpagecache(void)
if (nfs_rdata_cachep == NULL)
return -ENOMEM;
 
-   nfs_rdata_mempool = mempool_create_slab_pool(MIN_POOL_READ,
-nfs_rdata_cachep);
-   if (nfs_rdata_mempool == NULL)
-   return -ENOMEM;
-
return 0;
 }
 
 void nfs_destroy_readpagecache(void)
 {
-   mempool_destroy(nfs_rdata_mempool);
kmem_cache_destroy(nfs_rdata_cachep);
 }
Index: linux-2.6-git/fs/nfs/write.c
===
--- linux-2.6-git.orig/fs/nfs/write.c   2007-02-21 12:14:54.0 +0100
+++ linux-2.6-git/fs/nfs/write.c2007-02-21 12:15:10.0 +0100
@@ -29,9 +29,6 @@
 
 #define NFSDBG_FACILITYNFSDBG_PAGECACHE
 
-#define MIN_POOL_WRITE (32)
-#define MIN_POOL_COMMIT(4)
-
 /*
  * Local function declarations
  */
@@ -45,12 +42,10 @@ static const struct rpc_call_ops nfs_wri
 static const struct rpc_call_ops nfs_commit_ops;
 
 static struct kmem_cache *nfs_wdata_cachep;
-static mempool_t *nfs_wdata_mempool;
-static mempool_t *nfs_commit_mempool;
 
 struct nfs_write_data *nfs_commit_alloc(void)
 {
-   struct nfs_write_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOFS);
+   struct nfs_write_data *p = kmem_cache_alloc(nfs_wdata_cachep, GFP_NOFS);
 
if (p) {
memset(p, 0, sizeof(*p));
@@ -64,7 +59,7 @@ void nfs_commit_rcu_free(struct rcu_head
struct nfs_write_data *p = container_of(head, struct nfs_write_data, 
task.u.tk_rcu);
if (p && (p->pagevec != >page_array[0]))
kfree(p->pagevec);
-   mempool_free(p, nfs_commit_mempool);
+   kmem_cache_free(nfs_wdata_cachep, p);
 }
 
 void nfs_commit_free(struct nfs_write_data *wdata)
@@ -75,7 +70,7 @@ void nfs_commit_free(struct nfs_write_da
 struct nfs_write_data *nfs_writedata_alloc(size_t len)
 {
unsigned int pagecount = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
-   struct nfs_write_data *p = mempool_alloc(nfs_wdata_mempool, GFP_NOFS);
+   struct nfs_write_data *p = kmem_cache_alloc(nfs_wdata_cachep, GFP_NOFS);
 
if (p) {
memset(p, 0, sizeof(*p));
@@ -86,7 +81,7 @@ struct nfs_write_data *nfs_writedata_all
else {
p->pagevec = kcalloc(pagecount, sizeof(struct page *), 
GFP_NOFS);
if (!p->pagevec) {
-   mempool_free(p, nfs_wdata_mempool);
+   kmem_cache_free(nfs_wdata_cachep, p);
p = NULL;
}
}
@@ -99,7 +94,7 @@ static void nfs_writedata_rcu_free(struc
struct nfs_write_data *p = container_of(head, struct nfs_write_data, 
task.u.tk_rcu);
if (p && (p->pagevec != 

[PATCH 05/29] mm: emergency pool

2007-02-21 Thread Peter Zijlstra
Provide means to reserve a specific amount pages.

The emergency pool is separated from the min watermark because ALLOC_HARDER
and ALLOC_HIGH modify the watermark in a relative way and thus do not ensure
a strict minimum.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 include/linux/mmzone.h |3 +-
 mm/page_alloc.c|   52 -
 mm/vmstat.c|6 ++---
 3 files changed, 48 insertions(+), 13 deletions(-)

Index: linux-2.6-git/include/linux/mmzone.h
===
--- linux-2.6-git.orig/include/linux/mmzone.h   2007-02-12 09:40:51.0 
+0100
+++ linux-2.6-git/include/linux/mmzone.h2007-02-12 11:13:58.0 
+0100
@@ -178,7 +178,7 @@ enum zone_type {
 
 struct zone {
/* Fields commonly accessed by the page allocator */
-   unsigned long   pages_min, pages_low, pages_high;
+   unsigned long   pages_emerg, pages_min, pages_low, pages_high;
/*
 * We don't know if the memory that we're going to allocate will be 
freeable
 * or/and it will be released eventually, so to avoid totally wasting 
several
@@ -562,6 +562,7 @@ int sysctl_min_unmapped_ratio_sysctl_han
struct file *, void __user *, size_t *, loff_t *);
 int sysctl_min_slab_ratio_sysctl_handler(struct ctl_table *, int,
struct file *, void __user *, size_t *, loff_t *);
+void adjust_memalloc_reserve(int pages);
 
 #include 
 /* Returns the number of the current Node. */
Index: linux-2.6-git/mm/page_alloc.c
===
--- linux-2.6-git.orig/mm/page_alloc.c  2007-02-12 11:13:35.0 +0100
+++ linux-2.6-git/mm/page_alloc.c   2007-02-12 11:14:16.0 +0100
@@ -101,6 +101,7 @@ static char * const zone_names[MAX_NR_ZO
 
 static DEFINE_SPINLOCK(min_free_lock);
 int min_free_kbytes = 1024;
+int var_free_kbytes;
 
 unsigned long __meminitdata nr_kernel_pages;
 unsigned long __meminitdata nr_all_pages;
@@ -995,7 +996,8 @@ int zone_watermark_ok(struct zone *z, in
if (alloc_flags & ALLOC_HARDER)
min -= min / 4;
 
-   if (free_pages <= min + z->lowmem_reserve[classzone_idx])
+   if (free_pages <= min + z->lowmem_reserve[classzone_idx] +
+   z->pages_emerg)
return 0;
for (o = 0; o < order; o++) {
/* At the next order, this order's pages become unavailable */
@@ -1348,8 +1350,8 @@ nofail_alloc:
 nopage:
if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
printk(KERN_WARNING "%s: page allocation failure."
-   " order:%d, mode:0x%x\n",
-   p->comm, order, gfp_mask);
+   " order:%d, mode:0x%x, alloc_flags:0x%x, 
pflags:0x%lx\n",
+   p->comm, order, gfp_mask, alloc_flags, p->flags);
dump_stack();
show_mem();
}
@@ -1562,9 +1564,9 @@ void show_free_areas(void)
"\n",
zone->name,
K(zone_page_state(zone, NR_FREE_PAGES)),
-   K(zone->pages_min),
-   K(zone->pages_low),
-   K(zone->pages_high),
+   K(zone->pages_emerg + zone->pages_min),
+   K(zone->pages_emerg + zone->pages_low),
+   K(zone->pages_emerg + zone->pages_high),
K(zone_page_state(zone, NR_ACTIVE)),
K(zone_page_state(zone, NR_INACTIVE)),
K(zone->present_pages),
@@ -3000,7 +3002,7 @@ static void calculate_totalreserve_pages
}
 
/* we treat pages_high as reserved pages. */
-   max += zone->pages_high;
+   max += zone->pages_high + zone->pages_emerg;
 
if (max > zone->present_pages)
max = zone->present_pages;
@@ -3057,7 +3059,8 @@ static void setup_per_zone_lowmem_reserv
  */
 static void __setup_per_zone_pages_min(void)
 {
-   unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
+   unsigned pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
+   unsigned pages_emerg = var_free_kbytes >> (PAGE_SHIFT - 10);
unsigned long lowmem_pages = 0;
struct zone *zone;
unsigned long flags;
@@ -3069,11 +3072,13 @@ static void __setup_per_zone_pages_min(v
}
 
for_each_zone(zone) {
-   u64 tmp;
+   u64 tmp, tmp_emerg;
 
spin_lock_irqsave(>lru_lock, flags);
tmp = (u64)pages_min * zone->present_pages;
do_div(tmp, lowmem_pages);
+   tmp_emerg = (u64)pages_emerg * zone->present_pages;
+   do_div(tmp_emerg, lowmem_pages);
if 

[PATCH 25/29] nfs: only use stable storage for swap

2007-02-21 Thread Peter Zijlstra
unstable writes don't make sense for swap pages.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
Cc: Trond Myklebust <[EMAIL PROTECTED]>
---
 fs/nfs/write.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6-git/fs/nfs/write.c
===
--- linux-2.6-git.orig/fs/nfs/write.c   2007-02-21 12:15:10.0 +0100
+++ linux-2.6-git/fs/nfs/write.c2007-02-21 12:15:13.0 +0100
@@ -197,7 +197,7 @@ static int nfs_writepage_setup(struct nf
 static int wb_priority(struct writeback_control *wbc)
 {
if (wbc->for_reclaim)
-   return FLUSH_HIGHPRI;
+   return FLUSH_HIGHPRI|FLUSH_STABLE;
if (wbc->for_kupdate)
return FLUSH_LOWPRI;
return 0;

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/29] mm: prepare swap entry methods for use in page methods

2007-02-21 Thread Peter Zijlstra
Move around the swap entry methods in preparation for use from
page methods.

Also provide a function to obtain the swap_info_struct backing
a swap cache page.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
CC: Trond Myklebust <[EMAIL PROTECTED]>
---
 include/linux/mm.h  |8 
 include/linux/swap.h|   48 
 include/linux/swapops.h |   44 
 mm/swapfile.c   |1 +
 4 files changed, 57 insertions(+), 44 deletions(-)

Index: linux-2.6-git/include/linux/mm.h
===
--- linux-2.6-git.orig/include/linux/mm.h   2007-02-21 12:15:00.0 
+0100
+++ linux-2.6-git/include/linux/mm.h2007-02-21 12:15:01.0 +0100
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct mempolicy;
 struct anon_vma;
@@ -586,6 +587,13 @@ static inline struct address_space *page
return mapping;
 }
 
+static inline struct swap_info_struct *page_swap_info(struct page *page)
+{
+   swp_entry_t swap = { .val = page_private(page) };
+   BUG_ON(!PageSwapCache(page));
+   return get_swap_info_struct(swp_type(swap));
+}
+
 static inline int PageAnon(struct page *page)
 {
return ((unsigned long)page->mapping & PAGE_MAPPING_ANON) != 0;
Index: linux-2.6-git/include/linux/swap.h
===
--- linux-2.6-git.orig/include/linux/swap.h 2007-02-21 12:15:00.0 
+0100
+++ linux-2.6-git/include/linux/swap.h  2007-02-21 12:15:01.0 +0100
@@ -79,6 +79,50 @@ typedef struct {
 } swp_entry_t;
 
 /*
+ * swapcache pages are stored in the swapper_space radix tree.  We want to
+ * get good packing density in that tree, so the index should be dense in
+ * the low-order bits.
+ *
+ * We arrange the `type' and `offset' fields so that `type' is at the five
+ * high-order bits of the swp_entry_t and `offset' is right-aligned in the
+ * remaining bits.
+ *
+ * swp_entry_t's are *never* stored anywhere in their arch-dependent format.
+ */
+#define SWP_TYPE_SHIFT(e)  (sizeof(e.val) * 8 - MAX_SWAPFILES_SHIFT)
+#define SWP_OFFSET_MASK(e) ((1UL << SWP_TYPE_SHIFT(e)) - 1)
+
+/*
+ * Store a type+offset into a swp_entry_t in an arch-independent format
+ */
+static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset)
+{
+   swp_entry_t ret;
+
+   ret.val = (type << SWP_TYPE_SHIFT(ret)) |
+   (offset & SWP_OFFSET_MASK(ret));
+   return ret;
+}
+
+/*
+ * Extract the `type' field from a swp_entry_t.  The swp_entry_t is in
+ * arch-independent format
+ */
+static inline unsigned swp_type(swp_entry_t entry)
+{
+   return (entry.val >> SWP_TYPE_SHIFT(entry));
+}
+
+/*
+ * Extract the `offset' field from a swp_entry_t.  The swp_entry_t is in
+ * arch-independent format
+ */
+static inline pgoff_t swp_offset(swp_entry_t entry)
+{
+   return entry.val & SWP_OFFSET_MASK(entry);
+}
+
+/*
  * current->reclaim_state points to one of these when a task is running
  * memory reclaim
  */
@@ -326,6 +370,10 @@ static inline int valid_swaphandles(swp_
return 0;
 }
 
+static inline struct swap_info_struct *get_swap_info_struct(unsigned type)
+{
+   return NULL;
+}
 #define can_share_swap_page(p) (page_mapcount(p) == 1)
 
 static inline int move_to_swap_cache(struct page *page, swp_entry_t entry)
Index: linux-2.6-git/include/linux/swapops.h
===
--- linux-2.6-git.orig/include/linux/swapops.h  2007-02-21 12:15:00.0 
+0100
+++ linux-2.6-git/include/linux/swapops.h   2007-02-21 12:15:01.0 
+0100
@@ -1,48 +1,4 @@
 /*
- * swapcache pages are stored in the swapper_space radix tree.  We want to
- * get good packing density in that tree, so the index should be dense in
- * the low-order bits.
- *
- * We arrange the `type' and `offset' fields so that `type' is at the five
- * high-order bits of the swp_entry_t and `offset' is right-aligned in the
- * remaining bits.
- *
- * swp_entry_t's are *never* stored anywhere in their arch-dependent format.
- */
-#define SWP_TYPE_SHIFT(e)  (sizeof(e.val) * 8 - MAX_SWAPFILES_SHIFT)
-#define SWP_OFFSET_MASK(e) ((1UL << SWP_TYPE_SHIFT(e)) - 1)
-
-/*
- * Store a type+offset into a swp_entry_t in an arch-independent format
- */
-static inline swp_entry_t swp_entry(unsigned long type, pgoff_t offset)
-{
-   swp_entry_t ret;
-
-   ret.val = (type << SWP_TYPE_SHIFT(ret)) |
-   (offset & SWP_OFFSET_MASK(ret));
-   return ret;
-}
-
-/*
- * Extract the `type' field from a swp_entry_t.  The swp_entry_t is in
- * arch-independent format
- */
-static inline unsigned swp_type(swp_entry_t entry)
-{
-   return (entry.val >> SWP_TYPE_SHIFT(entry));
-}
-
-/*
- * Extract the `offset' field from a swp_entry_t.  The swp_entry_t is in
- * arch-independent format
- */

[PATCH 02/29] mm: slab allocation fairness

2007-02-21 Thread Peter Zijlstra
The slab allocator has some unfairness wrt gfp flags; when the slab cache is
grown the gfp flags are used to allocate more memory, however when there is 
slab cache available (in partial or free slabs, per cpu caches or otherwise)
gfp flags are ignored.

Thus it is possible for less critical slab allocations to succeed and gobble
up precious memory when under memory pressure.

This patch solves that by using the newly introduced page allocation rank.

Page allocation rank is a scalar quantity connecting ALLOC_ and gfp flags which
represents how deep we had to reach into our reserves when allocating a page. 
Rank 0 is the deepest we can reach (ALLOC_NO_WATERMARK) and 16 is the most 
shallow allocation possible (ALLOC_WMARK_HIGH).

When the slab space is grown the rank of the page allocation is stored. For
each slab allocation we test the given gfp flags against this rank. Thereby
asking the question: would these flags have allowed the slab to grow.

If not so, we need to test the current situation. This is done by forcing the
growth of the slab space. (Just testing the free page limits will not work due
to direct reclaim) Failing this we need to fail the slab allocation.

Thus if we grew the slab under great duress while PF_MEMALLOC was set and we 
really did access the memalloc reserve the rank would be set to 0. If the next
allocation to that slab would be GFP_NOFS|__GFP_NOMEMALLOC (which ordinarily
maps to rank 4 and always > 0) we'd want to make sure that memory pressure has
decreased enough to allow an allocation with the given gfp flags.

So in this case we try to force grow the slab cache and on failure we fail the
slab allocation. Thus preserving the available slab cache for more pressing
allocations.

If this newly allocated slab will be trimmed on the next kmem_cache_free
(not unlikely) this is no problem, since 1) it will free memory and 2) the
sole purpose of the allocation was to probe the allocation rank, we didn't
need the space itself.

[AIM9 results go here]

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 mm/Kconfig |3 ++
 mm/slab.c  |   81 -
 2 files changed, 57 insertions(+), 27 deletions(-)

Index: linux-2.6/mm/slab.c
===
--- linux-2.6.orig/mm/slab.c
+++ linux-2.6/mm/slab.c
@@ -114,6 +114,7 @@
 #include   
 #include   
 #include   
+#include   "internal.h"
 
 /*
  * DEBUG   - 1 for kmem_cache_create() to honour; SLAB_DEBUG_INITIAL,
@@ -380,6 +381,7 @@ static void kmem_list3_init(struct kmem_
 
 struct kmem_cache {
 /* 1) per-cpu data, touched during every alloc/free */
+   int rank;
struct array_cache *array[NR_CPUS];
 /* 2) Cache tunables. Protected by cache_chain_mutex */
unsigned int batchcount;
@@ -1023,21 +1025,21 @@ static inline int cache_free_alien(struc
 }
 
 static inline void *alternate_node_alloc(struct kmem_cache *cachep,
-   gfp_t flags)
+   gfp_t flags, int rank)
 {
return NULL;
 }
 
 static inline void *cache_alloc_node(struct kmem_cache *cachep,
-gfp_t flags, int nodeid)
+gfp_t flags, int nodeid, int rank)
 {
return NULL;
 }
 
 #else  /* CONFIG_NUMA */
 
-static void *cache_alloc_node(struct kmem_cache *, gfp_t, int);
-static void *alternate_node_alloc(struct kmem_cache *, gfp_t);
+static void *cache_alloc_node(struct kmem_cache *, gfp_t, int, int);
+static void *alternate_node_alloc(struct kmem_cache *, gfp_t, int);
 
 static struct array_cache **alloc_alien_cache(int node, int limit)
 {
@@ -1639,6 +1641,7 @@ static void *kmem_getpages(struct kmem_c
if (!page)
return NULL;
 
+   cachep->rank = page->index;
nr_pages = (1 << cachep->gfporder);
if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
add_zone_page_state(page_zone(page),
@@ -2287,6 +2290,7 @@ kmem_cache_create (const char *name, siz
}
 #endif
 #endif
+   cachep->rank = MAX_ALLOC_RANK;
 
/*
 * Determine if the slab management is 'on' or 'off' slab.
@@ -2953,7 +2957,7 @@ bad:
 #define check_slabp(x,y) do { } while(0)
 #endif
 
-static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags)
+static void *cache_alloc_refill(struct kmem_cache *cachep, gfp_t flags, int 
rank)
 {
int batchcount;
struct kmem_list3 *l3;
@@ -2965,6 +2969,8 @@ static void *cache_alloc_refill(struct k
check_irq_off();
ac = cpu_cache_get(cachep);
 retry:
+   if (unlikely(rank > cachep->rank))
+   goto force_grow;
batchcount = ac->batchcount;
if (!ac->touched && batchcount > BATCHREFILL_LIMIT) {
/*
@@ -3020,14 +3026,16 @@ must_grow:
l3->free_objects -= ac->avail;
 alloc_done:
spin_unlock(>list_lock);
-
if (unlikely(!ac->avail)) {
int x;
+force_grow:
x = 

[PATCH 29/29] balance_dirty_pages() vs throttle_vm_writeout() deadlock

2007-02-21 Thread Peter Zijlstra
If we have a lot of dirty memory and hit the throttle in balance_dirty_pages()
we (potentially) generate a lot of writeback and unstable pages, if however
during this writeback we need to reclaim a bit, we might hit
throttle_vm_writeout(), which might delay us until the combined total of
NR_UNSTABLE_NFS + NR_WRITEBACK falls below the dirty limit.

However unstable pages don't go away automagickally, they need a push. While
balance_dirty_pages() does this push, throttle_vm_writeout() doesn't. So we can
sit here ad infintum.

Hence I propose to remove the NR_UNSTABLE_NFS count from throttle_vm_writeout().

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 mm/page-writeback.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6-git/mm/page-writeback.c
===
--- linux-2.6-git.orig/mm/page-writeback.c  2007-02-20 15:07:43.0 
+0100
+++ linux-2.6-git/mm/page-writeback.c   2007-02-20 16:42:45.0 +0100
@@ -310,8 +310,7 @@ void throttle_vm_writeout(void)
  */
 dirty_thresh += dirty_thresh / 10;  /* wh... */
 
-if (global_page_state(NR_UNSTABLE_NFS) +
-   global_page_state(NR_WRITEBACK) <= dirty_thresh)
+if (global_page_state(NR_WRITEBACK) <= dirty_thresh)
break;
 congestion_wait(WRITE, HZ/10);
 }

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/29] selinux: tag avc cache alloc as non-critical

2007-02-21 Thread Peter Zijlstra
Failing to allocate a cache entry will only harm performance.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
---
 security/selinux/avc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6-git/security/selinux/avc.c
===
--- linux-2.6-git.orig/security/selinux/avc.c   2007-02-14 08:31:13.0 
+0100
+++ linux-2.6-git/security/selinux/avc.c2007-02-14 10:10:47.0 
+0100
@@ -332,7 +332,7 @@ static struct avc_node *avc_alloc_node(v
 {
struct avc_node *node;
 
-   node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC);
+   node = kmem_cache_zalloc(avc_node_cachep, GFP_ATOMIC|__GFP_NOMEMALLOC);
if (!node)
goto out;
 

-- 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


cat problem in tiny_tty driver (the source included)

2007-02-21 Thread Mockern
I tried to check cat operations for tiny_tty driver from LDD book. 

What is wrong with cat operation here?

Here is the output from strace cat hello > /dev/my_tty1

[EMAIL PROTECTED]:/home# strace cat hello > /dev/my_tty1
execve("/bin/cat", ["cat", "hello"], [/* 12 vars */]) = 0
brk(0)  = 0x7d000
open("/etc/ld.so.preload", O_RDONLY)= -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)  = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=5664, ...}) = 0
old_mmap(NULL, 5664, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40017000
close(3)= 0
open("/lib/libm.so.6", O_RDONLY)= 3
read(3, "\177ELF\1\1\1a\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\250B\0\000"..., 512) = 51
2
fstat64(3, {st_mode=S_IFREG|0755, st_size=480324, ...}) = 0
old_mmap(NULL, 506412, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4002
mprotect(0x40093000, 35372, PROT_NONE)  = 0
old_mmap(0x40098000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x70
000) = 0x40098000
close(3)= 0
open("/lib/libcrypt.so.1", O_RDONLY)= 3
read(3, "\177ELF\1\1\1a\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\260\10\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=19940, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0
x40019000
old_mmap(NULL, 211220, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4009c000
mprotect(0x400a1000, 190740, PROT_NONE) = 0
old_mmap(0x400a4000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) =
 0x400a4000
old_mmap(0x400a9000, 157972, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANO
NYMOUS, -1, 0) = 0x400a9000
close(3)= 0
open("/lib/libc.so.6", O_RDONLY)= 3
read(3, "\177ELF\1\1\1a\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\330p\1\000"..., 512) = 51
2
fstat64(3, {st_mode=S_IFREG|0755, st_size=1240024, ...}) = 0
old_mmap(NULL, 1257088, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x400d
mprotect(0x401f5000, 56960, PROT_NONE)  = 0
old_mmap(0x401f8000, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x12
) = 0x401f8000
old_mmap(0x40201000, 7808, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONY
MOUS, -1, 0) = 0x40201000
close(3)= 0
munmap(0x40017000, 5664)= 0
getuid32()  = 0
getgid32()  = 0
setgid32(0) = 0
setuid32(0) = 0
brk(0)  = 0x7d000
brk(0x9e000)= 0x9e000
brk(0)  = 0x9e000
open("hello", O_RDONLY) = 3
read(3, "123456789", 8192)  = 9
write(1, "123456789", 9)= -1 EINVAL (Invalid 
argument)//??
write(2, "cat: ", 5cat: )= 5
write(2, "Write Error", 11Write Error) = 11
write(2, ": Invalid argument\n", 19: Invalid argument
)= 19
close(3)= 0
io_submit(0, 0x40200164, 0 
Process 1432 detached
[EMAIL PROTECTED]:/home#



/*
 * Tiny TTY driver
 *
 * Copyright (C) 2002-2004 Greg Kroah-Hartman ([EMAIL PROTECTED])
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation, version 2 of the License.
 *
 * This driver shows how to create a minimal tty driver.  It does not rely on
 * any backing hardware, but creates a timer that emulates data being received
 * from some kind of hardware.
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


#define DRIVER_VERSION "v2.0"
#define DRIVER_AUTHOR "Greg Kroah-Hartman <[EMAIL PROTECTED]>"
#define DRIVER_DESC "Tiny TTY driver"

/* Module information */
MODULE_AUTHOR( DRIVER_AUTHOR );
MODULE_DESCRIPTION( DRIVER_DESC );
MODULE_LICENSE("GPL");

#define DELAY_TIME  HZ * 2  /* 2 seconds per character */
#define TINY_DATA_CHARACTER 't'

#define TINY_TTY_MAJOR  240 /* experimental range */
#define TINY_TTY_MINORS 4   /* only have 4 devices */

struct tiny_serial {
struct tty_struct   *tty;   /* pointer to the tty for this 
device */
int open_count; /* number of times this port 
has been opened */
struct semaphoresem;/* locks this structure */
struct timer_list   *timer;

/* for tiocmget and tiocmset functions */
int msr;/* MSR shadow */
int mcr;/* MCR shadow */

/* for ioctl fun */
struct serial_structserial;
wait_queue_head_t   wait;
struct async_icount icount;
};

static struct tiny_serial 

Re: how to limit flip buffer size in tty driver?

2007-02-21 Thread Alan
On Tue, 20 Feb 2007 15:50:27 +0300 (MSK)
"Mockern" <[EMAIL PROTECTED]> wrote:

> Thank you Alan for your respond,
> 
> Could you help me with a problem which I have with my tty driver, please?
> 
> It does not work with Linux cat operation (but there are no problems to 
> write, read with select from user space application!). 

Looking at the trace I am confused as to what does not work ? Can you
explain the way in which it doesn't work ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI riser cards and PCI irq routing, etc

2007-02-21 Thread Udo van den Heuvel
Krzysztof Halasa wrote:
> Udo van den Heuvel <[EMAIL PROTECTED]> writes:
> 
>> So if my non-VIA riser card can use DN 19 and also INT_A things should work?
> 
> That INT_A may be INT_A from their (motherboard) point of view, but
> the riser card doesn't know about that, it only knows INTs as seen
> at its PCI edge connector (so this INT_A here is meaningless).
> 
> Device numbers aren't rotated but rather derived from address lines
> (address/data). AD0-31 lines are the same across the whole PCI bus.
> That means device numbers are independent of POV.
> 
>> (assuming that VIA Epia EN BIOS 1.07 is enough to use this card)
> 
> My VIA EPIA-M 600 is probably older than your one, so I'd assume
> it should work as well.
> When you configure 0x13 and 0x14, both devices get IRQs - that means
> the BIOS can see both of them.

But the IRQ for the DVB-T card doesn't work.
I would need to test the DVB-T card alone to be sure it has working IRQ.
If so, what would be the conclusion?

>> The DN is the only variable so INT lines are hardwired on the riser card?
> 
> Yep. You just need a bit of soldering.

What IRQ rerouting would I need to try? 1 of 3 choices?
Or one best bet?

Udo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [lm-sensors] Could the k8temp driver be interfering with ACPI?

2007-02-21 Thread Jean Delvare
Hi Luca,

On Tue, 20 Feb 2007 16:33:56 +0100, Luca Tettamanti wrote:
> Motherboard vendors usually provide tools for $(TheOtherOS) that can
> read from all thermal  / fan / voltage / whatever sensors, so I guess
> it's possible to make the ACPI driver and the "raw" one play nice with
> each other[1].
> 
> Luca
> [1] Unless their solution is "poke at the hardware and hope that ACPI
> doesn't blow up", that is.

Without the sources it's hard to tell. And all these applications are
vendor-specific, so if they indeed have ways to avoid conflicting
accesses between ACPI and the rest of the system, these ways are likely
to be vendor-specific as well, and not documented.

Either way, this means we need the support from hardware vendors to
solve this concurrent access problem, and unfortunately I doubt this
happens anytime soon :(

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc1 build error on ARM with modular IPV6

2007-02-21 Thread Michael-Luke Jones

Apologies for brain failure, below should read 2.6.21-rc1.

Everything else should be correct.

Michael-Luke Jones

On 21 Feb 2007, at 10:50, M.L. Jones wrote:


NB: I'm not subscribed so please CC me in any reply! Thanks...

Hi there,

Just attempted a build of vanilla 2.6.20-rc1 and got a failure with  
our usual defconfig. Notably, we build IPV6 support as modular -  
this seems to be the source of the problem.


If any other info is required, please ask.

Thanks for your help,

Michael-Luke Jones
NSLU2-Linux Core Team

==Error==
 GEN .version
 CHK include/linux/compile.h
 UPD include/linux/compile.h
 CC  init/version.o
 LD  init/built-in.o
 LD  .tmp_vmlinux1
net/built-in.o: In function `svc_udp_recvfrom':
sysctl_net.c:(.text+0x79fb4): undefined reference to  
`__ipv6_addr_type'

make[1]: *** [.tmp_vmlinux1] Error 1

==Config==
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.21-rc1
# Wed Feb 21 02:32:29 2007
#
CONFIG_ARM=y
CONFIG_SYS_SUPPORTS_APM_EMULATION=y
CONFIG_GENERIC_TIME=y
CONFIG_MMU=y
# CONFIG_NO_IOPORT is not set
CONFIG_GENERIC_HARDIRQS=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ZONE_DMA=y
CONFIG_VECTORS_BASE=0x
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_SYSFS_DEPRECATED=y
# CONFIG_RELAY is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_EXTRA_PASS=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_SLAB=y
# CONFIG_VM_EVENT_COUNTERS is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
# CONFIG_SLOB is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y

#
# Block layer
#
CONFIG_BLOCK=y
# CONFIG_LBD is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_LSF is not set

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
CONFIG_IOSCHED_DEADLINE=y
# CONFIG_IOSCHED_CFQ is not set
# CONFIG_DEFAULT_AS is not set
CONFIG_DEFAULT_DEADLINE=y
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="deadline"

#
# System Type
#
# CONFIG_ARCH_AAEC2000 is not set
# CONFIG_ARCH_INTEGRATOR is not set
# CONFIG_ARCH_REALVIEW is not set
# CONFIG_ARCH_VERSATILE is not set
# CONFIG_ARCH_AT91 is not set
# CONFIG_ARCH_CLPS7500 is not set
# CONFIG_ARCH_CLPS711X is not set
# CONFIG_ARCH_CO285 is not set
# CONFIG_ARCH_EBSA110 is not set
# CONFIG_ARCH_EP93XX is not set
# CONFIG_ARCH_FOOTBRIDGE is not set
# CONFIG_ARCH_NETX is not set
# CONFIG_ARCH_H720X is not set
# CONFIG_ARCH_IMX is not set
# CONFIG_ARCH_IOP32X is not set
# CONFIG_ARCH_IOP33X is not set
# CONFIG_ARCH_IOP13XX is not set
CONFIG_ARCH_IXP4XX=y
# CONFIG_ARCH_IXP2000 is not set
# CONFIG_ARCH_IXP23XX is not set
# CONFIG_ARCH_L7200 is not set
# CONFIG_ARCH_NS9XXX is not set
# CONFIG_ARCH_PNX4008 is not set
# CONFIG_ARCH_PXA is not set
# CONFIG_ARCH_RPC is not set
# CONFIG_ARCH_SA1100 is not set
# CONFIG_ARCH_S3C2410 is not set
# CONFIG_ARCH_SHARK is not set
# CONFIG_ARCH_LH7A40X is not set
# CONFIG_ARCH_OMAP is not set
CONFIG_ARCH_SUPPORTS_BIG_ENDIAN=y

#
# Intel IXP4xx Implementation Options
#

#
# IXP4xx Platforms
#
CONFIG_MACH_NSLU2=y
CONFIG_MACH_AVILA=y
CONFIG_MACH_LOFT=y
# CONFIG_ARCH_ADI_COYOTE is not set
CONFIG_ARCH_IXDP425=y
# CONFIG_MACH_IXDPG425 is not set
# CONFIG_MACH_IXDP465 is not set
CONFIG_ARCH_IXCDP1100=y
# CONFIG_ARCH_PRPMC1100 is not set
CONFIG_MACH_NAS100D=y
CONFIG_ARCH_IXDP4XX=y
# CONFIG_MACH_GTWX5715 is not set

#
# IXP4xx Options
#
CONFIG_DMABOUNCE=y
# CONFIG_IXP4XX_INDIRECT_PCI is not set

#
# Processor Type
#
CONFIG_CPU_32=y
CONFIG_CPU_XSCALE=y
CONFIG_CPU_32v5=y
CONFIG_CPU_ABRT_EV5T=y
CONFIG_CPU_CACHE_VIVT=y
CONFIG_CPU_TLB_V4WBI=y
CONFIG_CPU_CP15=y
CONFIG_CPU_CP15_MMU=y

#
# Processor Features
#
CONFIG_ARM_THUMB=y
# CONFIG_CPU_BIG_ENDIAN is not set
# CONFIG_CPU_DCACHE_DISABLE is not set
# CONFIG_OUTER_CACHE is not set
# CONFIG_IWMMXT is not set
CONFIG_XSCALE_PMU=y

#
# Bus support
#
CONFIG_PCI=y

#
# PCCARD (PCMCIA/CardBus) support
#
# 

Re: [lm-sensors] Could the k8temp driver be interfering with ACPI?

2007-02-21 Thread Jean Delvare
Hi Chuck,

On Tue, 20 Feb 2007 10:08:26 -0500, Chuck Ebbert wrote:
> I blacklisted the k8temp driver (and the out-of-tree k8_edac driver
> in Fedora) and the temps were still volatile, so that's not causing
> it. Since then I've upgraded the system BIOS from F.06 to F.27 and
> the problems _may_ have gone away. My own custom 2.6.19 kernel has
> never been a problem, so I'm thinking it's one of these drivers
> loaded by Fedora that I never even compile:
> 
>   i2c_core

i2c-core doesn't touch the hardware by itself.

>   i2c_ec

Presumably autoloaded by the ACPI subsystem, I guess your ACPI
implementation includes an SMBus 2.0 controller.

>   i2c_piix4

i2c-piix4 will autoload if a supported PCI device is found on your
system. Assuming this is the same physical bus as i2c_ec is exposing,
it's no good to load both i2c-piix4 and i2c_ec at the same time.
Unfortunately i2c_ec doesn't request the I/O resources it uses so this
kind of conflict cannot be avoided currently.

Can you try to load the i2c-dev driver, then run the following commands
and report the results:
$ i2cdetect -l
For each bus listed:
$ i2cdetect N
(where N is the i2c bus number)

>   asus_acpi (on a Compaq???)
>   sbs

This is a new battery driver used in conjunction with i2c_ec. I guess
you have a smart battery in your laptop which is accessed through
the SMBus. I found that this driver bypasses the i2c-core locking,
which is really bad. I reported it one week ago:
http://marc.theaimsgroup.com/?l=linux-acpi=117160531631100=2
(for some reason my original post wasn't archived)
My patch wasn't applied, but the problems you describe could well be
caused by this locking issue. So I suggest that you unload the sbs
driver and see if things get better. If they do, you could try to apply
my patch and load sbs again, and see if it fixes it.

-- 
Jean Delvare
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Serial related oops

2007-02-21 Thread Jose Goncalves
Jose Goncalves wrote:
> New devolpments.
> I have upgraded to 2.6.16.41, applied a patch sent by Frederik that
> removed the changed made in http://lkml.org/lkml/2005/6/23/266 and
> activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL,
> CONFIG_DEBUG_KERNEL, CONFIG_DETECT_SOFTLOCKUP, CONFIG_DEBUG_SLAB,
> CONFIG_DEBUG_MUTEXES, CONFIG_FRAME_POINTER and CONFIG_FORCED_INLINING
> (thanks to vda for pointing me to the right doc.).
> At first it seemed to work fine, but after some days of continuous
> running I've got another kernel Oops!
> I attach the dmesg output and the assembly dump of serial8250_startup()
> and serial8250_shutdown().
>   

And also the assembly dump of serial_in() were the NULL pointer
dereference happens.

José Gonçalves


vmlinux-2.6.16.41-mtm5-debug1: file format elf32-i386

Disassembly of section .text:

c01bfa70 :
c01bfa70:   55  push   %ebp
c01bfa71:   89 e5   mov%esp,%ebp
c01bfa73:   53  push   %ebx
c01bfa74:   8b 5d 08mov0x8(%ebp),%ebx
c01bfa77:   8b 55 0cmov0xc(%ebp),%edx
c01bfa7a:   0f b6 4b 12 movzbl 0x12(%ebx),%ecx
c01bfa7e:   0f b6 43 13 movzbl 0x13(%ebx),%eax
c01bfa82:   d3 e2   shl%cl,%edx
c01bfa84:   83 f8 02cmp$0x2,%eax
c01bfa87:   74 1a   je c01bfaa3 
c01bfa89:   7f 05   jg c01bfa90 
c01bfa8b:   48  dec%eax
c01bfa8c:   74 09   je c01bfa97 
c01bfa8e:   eb 21   jmpc01bfab1 
c01bfa90:   83 f8 03cmp$0x3,%eax
c01bfa93:   74 15   je c01bfaaa 
c01bfa95:   eb 1a   jmpc01bfab1 
c01bfa97:   8a 43 78mov0x78(%ebx),%al
c01bfa9a:   01 d0   add%edx,%eax
c01bfa9c:   8b 13   mov(%ebx),%edx
c01bfa9e:   48  dec%eax
c01bfa9f:   ee  out%al,(%dx)
c01bfaa0:   42  inc%edx
c01bfaa1:   eb 10   jmpc01bfab3 
c01bfaa3:   03 53 04add0x4(%ebx),%edx
c01bfaa6:   8a 02   mov(%edx),%al
c01bfaa8:   eb 0a   jmpc01bfab4 
c01bfaaa:   03 53 04add0x4(%ebx),%edx
c01bfaad:   8b 02   mov(%edx),%eax
c01bfaaf:   eb 06   jmpc01bfab7 
c01bfab1:   03 13   add(%ebx),%edx
c01bfab3:   ec  in (%dx),%al
c01bfab4:   0f b6 c0movzbl %al,%eax
c01bfab7:   5b  pop%ebx
c01bfab8:   5d  pop%ebp
c01bfab9:   c3  ret
Disassembly of section .init.text:
Disassembly of section .altinstr_replacement:
Disassembly of section .exit.text:


Re: [PATCH] [MTD] CHIPS: oops in cfi_amdstd_sync

2007-02-21 Thread Jörn Engel
On Tue, 20 February 2007 17:46:13 -0800, Vijay Sampath wrote:
> 
> The files cfi_cmdset_0002.c and cfi_cmdset_0020.c do not initialize
> their wait queues like is done in cfi_cmdset_0001.c. This causes an
> oops when the wait queue is accessed. I have copied the code from
> cfi_cmdset_0001.c that is pertinent to initialization of the wait
> queue.

Patch looks good, but I can no longer test it.  Josh may still have
access to some commandset 20 chips.  Josh, any objections?

Jörn

-- 
The only real mistake is the one from which we learn nothing.
-- John Powell


signature.asc
Description: Digital signature


Re: [RFC PATCH(Experimental) 0/4] Freezer based Cpu-hotplug

2007-02-21 Thread Gautham R Shenoy
Rafael,
On Sat, Feb 17, 2007 at 12:24:45PM +0100, Rafael J. Wysocki wrote:
> 
> Pavel, do you think we can remove the PF_NOFREEZE from bluetooth, BTW?

The create_workqueue by default marks the worker_threads to be
non_freezable. For cpu hotplug, all workqueues can be frozen 
except the "kthread" workqueue (which is single threaded, so won't 
be frozen anyway).

And a quick cscope scan shows that only the "xfslogd" and "xfsdatad"
are the only freezable workqueues. Any particular reason
for not marking rest of the non-single_threaded workqueues freezeable ??

thanks and regards
gautham
-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] PXAFB: Support for backlight control

2007-02-21 Thread Rodolfo Giometti
Backlight control support for the PXA fram buffer.

Signed-off-by: Rodolfo Giometti <[EMAIL PROTECTED]>

---

Each platform should define the backlight properties in its own setup
file in "linux/arch/arm/mach-pxa/" as follow:

   static int pxafb_bl_get_brightness(struct backlight_device *bl_dev)
   {
return read_the_backlight_brightness();
   }

   static int pxafb_bl_update_status(struct backlight_device *bl_dev)
   {
   int perc, ret;

   if (bl_dev->props->power != FB_BLANK_UNBLANK ||
   bl_dev->props->fb_blank != FB_BLANK_UNBLANK)
   perc = 0;
   else
   perc = bl_dev->props->brightness;

write_the_backlight_brightness(perc);

   return 0;
   }

   static struct backlight_properties wwpc1100_backlight_props = {
   .get_brightness = pxafb_bl_get_brightness,
   .update_status  = pxafb_bl_update_status,
   .max_brightness = 100,
   };
  
and then seting up the fb info as follow:

   wwpc1100_pxafb_info.modes = _modes;
   wwpc1100_pxafb_info.bl_props = _backlight_props;
   set_pxa_fb_info(_pxafb_info);
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index c1536d7..1ee589e 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -1514,6 +1514,14 @@ config FB_PXA_PARAMETERS
 
   describes the available parameters.
 
+config FB_PXA_BACKLIGHT
+   bool "Support for backlight control"
+   default y
+   depends on FB_PXA
+   select FB_BACKLIGHT
+   help
+ Say Y here if you want to control the backlight of your display.
+
 config FB_MBX
tristate "2700G LCD framebuffer support"
depends on FB && ARCH_PXA
diff --git a/drivers/video/pxafb.c b/drivers/video/pxafb.c
index b4947c8..489174a 100644
--- a/drivers/video/pxafb.c
+++ b/drivers/video/pxafb.c
@@ -9,6 +9,8 @@
  *  which in turn is
  *   Based on acornfb.c Copyright (C) Russell King.
  *
+ * Backlight support by Rodolfo Giometti <[EMAIL PROTECTED]>
+ *
  * This file is subject to the terms and conditions of the GNU General Public
  * License.  See the file COPYING in the main directory of this archive for
  * more details.
@@ -37,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -58,7 +61,6 @@
 #define LCCR0_INVALID_CONFIG_MASK 
(LCCR0_OUM|LCCR0_BM|LCCR0_QDM|LCCR0_DIS|LCCR0_EFM|LCCR0_IUM|LCCR0_SFM|LCCR0_LDM|LCCR0_ENB)
 #define LCCR3_INVALID_CONFIG_MASK (LCCR3_HSP|LCCR3_VSP|LCCR3_PCD|LCCR3_BPP)
 
-static void (*pxafb_backlight_power)(int);
 static void (*pxafb_lcd_power)(int, struct fb_var_screeninfo *);
 
 static int pxafb_activate_var(struct fb_var_screeninfo *var, struct pxafb_info 
*);
@@ -69,6 +71,71 @@ static void set_ctrlr_state(struct pxafb_info *fbi, u_int 
state);
 static char g_options[PXAFB_OPTIONS_SIZE] __initdata = "";
 #endif
 
+
+/*
+ * Backlight control
+ */
+#ifdef CONFIG_FB_BACKLIGHT
+static void pxafb_bl_suspend(struct pxafb_info *fbi)
+{
+   struct backlight_device *bl_dev = fbi->fb.bl_dev;
+
+   if (bl_dev) {
+   down(_dev->sem);
+   bl_dev->props->fb_blank = FB_BLANK_POWERDOWN;
+   bl_dev->props->update_status(bl_dev);
+   up(_dev->sem);
+   }
+}
+
+static void pxafb_bl_resume(struct pxafb_info *fbi)
+{
+   struct backlight_device *bl_dev = fbi->fb.bl_dev;
+
+   if (bl_dev) {
+   down(_dev->sem);
+   bl_dev->props->fb_blank = FB_BLANK_UNBLANK;
+   bl_dev->props->update_status(bl_dev);
+   up(_dev->sem);
+   }
+}
+
+static void pxafb_bl_init(struct fb_info *info, struct backlight_properties 
*bl_props)
+{
+   struct backlight_device *bl_dev;
+   char name[16];
+
+   snprintf(name, sizeof(name), "pxabl%d", info->node);
+
+   bl_dev = backlight_device_register(name, info->dev, info, bl_props);
+   if (IS_ERR(bl_dev)) {
+   info->bl_dev = NULL;
+   printk(KERN_WARNING "pxafb: backlight registration failed\n");
+   return;
+   }
+
+   mutex_lock(>bl_mutex);
+   info->bl_dev = bl_dev;
+   fb_bl_default_curve(info, 0, 0, 100);   /* level: 0 - 100 */
+   mutex_unlock(>bl_mutex);
+
+   down(_dev->sem);
+   bl_dev->props->brightness = bl_props->max_brightness;
+   bl_dev->props->power = FB_BLANK_UNBLANK;
+   bl_dev->props->update_status(bl_dev);
+   up(_dev->sem);
+
+   printk("pxafb: backlight initialized (%s)\n", name);
+}
+#else
+static inline void pxafb_bl_init(struct fb_info *info, struct 
backlight_properties *bl_props) {}
+static inline void pxafb_bl_suspend(struct pxafb_info *fbi) {}
+static inline void pxafb_bl_resume(struct pxafb_info *fbi) {}
+#endif /* CONFIG_FB_BACKLIGHT */
+
+/*
+ *
+ */
 static inline void pxafb_schedule_work(struct pxafb_info *fbi, u_int state)
 {
unsigned long flags;
@@ -736,14 +803,6 @@ static int pxafb_activate_var(struct fb_var_screeninfo 
*var, 

[PATCH] siimage: DRAC4 note

2007-02-21 Thread Alan
Revised DRAC4 warning as Jeff suggested, this one includes more info
about why the problem occurs

Signed-off-by: Alan Cox <[EMAIL PROTECTED]>

diff -u --new-file --recursive --exclude-from /usr/src/exclude 
linux.vanilla-2.6.20-mm2/drivers/ide/pci/siimage.c 
linux-2.6.20-mm2/drivers/ide/pci/siimage.c
--- linux.vanilla-2.6.20-mm2/drivers/ide/pci/siimage.c  2007-02-20 
13:38:01.0 +
+++ linux-2.6.20-mm2/drivers/ide/pci/siimage.c  2007-02-21 14:30:08.187487864 
+
@@ -26,6 +26,11 @@
  * If you have strange problems with nVidia chipset systems please
  * see the SI support documentation and update your system BIOS
  * if neccessary
+ *
+ *  The Dell DRAC4 has some interesting features including effectively hot
+ *  unplugging/replugging the virtual CD interface when the DRAC is reset.
+ *  This often causes drivers/ide/siimage to panic but is ok with the rather
+ *  smarter code in libata.
  */
 
 #include 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c

2007-02-21 Thread Gautham R Shenoy
On Wed, Feb 21, 2007 at 05:30:10PM +0300, Oleg Nesterov wrote:
> On 02/21, Srivatsa Vaddagiri wrote:
> >
> > On Tue, Feb 20, 2007 at 11:09:36PM +0300, Oleg Nesterov wrote:
> > > > Which caller are you referring to here? Maybe we can decide on the
> > > > option after we see the users of flush_workqueue() in DOWN_PREPARE.
> > > 
> > > mm/slab.c:cpuup_callback()
> > 
> > The cancel_rearming_delayed_work, if used as it is in cpuup_callback,
> > will require that we send DOWN_PREPARE before freeze_processes().
> > 
> > But ..I am wondering if we can avoid doing cancel_rearming_delayed_work
> > (and thus flush_workqueue) in CPU_DOWN_PREPARE of slab.c. Basically,
> > 
> > mm/slab.c:
> > 
> > CPU_DOWN_PREPARE:   /* All processes frozen now */
> > cancel_delayed_work(_cpu(reap_work, cpu).timer);
> > del_work(_cpu(reap_work, cpu).work);
> > break;
> > 
> > 
> > At the point of CPU_DOWN_PREPARE, keventd should be frozen and hence
> > del_work() is a matter of just deleting the work from cwq->worklist.
> 
> Agreed. Note that we don't need the new "del_work". It is always safe to
> use cancel_work_sync() if we know that the workqueue is frozen, it won't
> block. We can also do
> 
>   if (!cancel_delayed_work())
>   cancel_work_sync();
> 
> but it is ok to do cancel_work_sync() unconditionally.

True. But this might be a one off solution for slab. However, if someone
in the future might require to do a flush_workqueue from
CPU_DOWN_PREPARE context, we would need to find a new workaround.

So, I'll try running CPU_DOWN_PREPARE and CPU_UP_PREPARE from
a non frozen context to check if there are any potential problems.
Hopfully there shouldn't be (m)any!
> 
> Oleg.
> 
thanks and regards
gautham.

-- 
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUG] PATA_PCMCIA does not work

2007-02-21 Thread Alan
Does this fix the oops ?

Alan


diff -u --new-file --recursive --exclude-from /usr/src/exclude 
linux.vanilla-2.6.20-mm2/drivers/ata/pata_pcmcia.c 
linux-2.6.20-mm2/drivers/ata/pata_pcmcia.c
--- linux.vanilla-2.6.20-mm2/drivers/ata/pata_pcmcia.c  2007-02-20 
13:37:58.0 +
+++ linux-2.6.20-mm2/drivers/ata/pata_pcmcia.c  2007-02-21 14:06:58.792707976 
+
@@ -308,7 +342,6 @@
if (info->ndev) {
struct ata_host *host = dev_get_drvdata(dev);
ata_host_detach(host);
-   dev_set_drvdata(dev, NULL);
}
info->ndev = 0;
pdev->priv = NULL;



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH(Experimental) 2/4] Revert changes to workqueue.c

2007-02-21 Thread Oleg Nesterov
On 02/21, Srivatsa Vaddagiri wrote:
>
> On Tue, Feb 20, 2007 at 11:09:36PM +0300, Oleg Nesterov wrote:
> > > Which caller are you referring to here? Maybe we can decide on the
> > > option after we see the users of flush_workqueue() in DOWN_PREPARE.
> > 
> > mm/slab.c:cpuup_callback()
> 
> The cancel_rearming_delayed_work, if used as it is in cpuup_callback,
> will require that we send DOWN_PREPARE before freeze_processes().
> 
> But ..I am wondering if we can avoid doing cancel_rearming_delayed_work
> (and thus flush_workqueue) in CPU_DOWN_PREPARE of slab.c. Basically,
> 
> mm/slab.c:
> 
>   CPU_DOWN_PREPARE:   /* All processes frozen now */
>   cancel_delayed_work(_cpu(reap_work, cpu).timer);
>   del_work(_cpu(reap_work, cpu).work);
>   break;
> 
> 
> At the point of CPU_DOWN_PREPARE, keventd should be frozen and hence
> del_work() is a matter of just deleting the work from cwq->worklist.

Agreed. Note that we don't need the new "del_work". It is always safe to
use cancel_work_sync() if we know that the workqueue is frozen, it won't
block. We can also do

if (!cancel_delayed_work())
cancel_work_sync();

but it is ok to do cancel_work_sync() unconditionally.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] drivers/infiniband/hw/cxgb3/: cleanups

2007-02-21 Thread Steve Wise
Thanks Adrian!

Acked-by: Steve Wise <[EMAIL PROTECTED]>


On Wed, 2007-02-21 at 11:52 +0100, Adrian Bunk wrote:
> On Tue, Feb 20, 2007 at 08:43:06AM -0600, Steve Wise wrote:
> > On Tue, 2007-02-20 at 01:02 +0100, Adrian Bunk wrote:
> > > This patch contains the following possible cleanups:
> > > - don't mark static functions in C files as inline - gcc should know
> > >   best whether inlining makes sense
> > > - never compile the unused cxio_dbg.c
> > > - make the following needlessly global functions static:
> > >   - cxio_hal.c: cxio_hal_clear_qp_ctx()
> > >   - iwch_provider.c: iwch_get_qp()
> > > - #if 0 the following unused global functions:
> > >   - cxio_hal.c: cxio_allocate_stag()
> > >   - cxio_resource.: cxio_hal_get_rhdl()
> > >   - cxio_resource.: cxio_hal_put_rhdl()
> > > 
> > 
> > You could just remove the code instead of #if 0...
> >...
> 
> Updated patch below.
> 
> cu
> Adrian
> 
> 
> <--  snip  -->
> 
> 
> This patch contains the following possible cleanups:
> - don't mark static functions in C files as inline - gcc should know
>   best whether inlining makes sense
> - never compile the unused cxio_dbg.c
> - make the following needlessly global functions static:
>   - cxio_hal.c: cxio_hal_clear_qp_ctx()
>   - iwch_provider.c: iwch_get_qp()
> - remove the following unused global functions:
>   - cxio_hal.c: cxio_allocate_stag()
>   - cxio_resource.: cxio_hal_get_rhdl()
>   - cxio_resource.: cxio_hal_put_rhdl()
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> 
> ---
> 
>  drivers/infiniband/hw/cxgb3/Makefile|1 
>  drivers/infiniband/hw/cxgb3/cxio_hal.c  |   31 +---
>  drivers/infiniband/hw/cxgb3/cxio_hal.h  |5 ---
>  drivers/infiniband/hw/cxgb3/cxio_resource.c |   14 +
>  drivers/infiniband/hw/cxgb3/iwch_cm.c   |5 +--
>  drivers/infiniband/hw/cxgb3/iwch_provider.c |2 -
>  drivers/infiniband/hw/cxgb3/iwch_provider.h |1 
>  drivers/infiniband/hw/cxgb3/iwch_qp.c   |   29 --
>  8 files changed, 27 insertions(+), 61 deletions(-)
> 
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile.old 2007-02-17 
> 17:21:03.0 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/Makefile 2007-02-17 
> 17:21:08.0 +0100
> @@ -8,5 +8,4 @@
>  
>  ifdef CONFIG_INFINIBAND_CXGB3_DEBUG
>  EXTRA_CFLAGS += -DDEBUG
> -iw_cxgb3-y += cxio_dbg.o
>  endif
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h.old   
> 2007-02-17 17:22:53.0 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/cxio_hal.h   2007-02-17 
> 17:25:08.0 +0100
> @@ -144,7 +144,6 @@
>  void cxio_rdev_close(struct cxio_rdev *rdev);
>  int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq,
>  enum t3_cq_opcode op, u32 credit);
> -int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid);
>  int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
>  int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
>  int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
> @@ -155,8 +154,6 @@
>  int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq,
>   struct cxio_ucontext *uctx);
>  int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode);
> -int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
> -enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr);
>  int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
>  enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
>  u8 page_size, __be64 *pbl, u32 *pbl_size,
> @@ -172,8 +169,6 @@
>  int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr);
>  void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
>  void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
> -u32 cxio_hal_get_rhdl(void);
> -void cxio_hal_put_rhdl(u32 rhdl);
>  u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp);
>  void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid);
>  int __init cxio_hal_init(void);
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h.old  
> 2007-02-17 17:25:35.0 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.h  
> 2007-02-17 17:25:41.0 +0100
> @@ -179,7 +179,6 @@
>  
>  void iwch_qp_add_ref(struct ib_qp *qp);
>  void iwch_qp_rem_ref(struct ib_qp *qp);
> -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn);
>  
>  struct iwch_ucontext {
>   struct ib_ucontext ibucontext;
> --- linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c.old  
> 2007-02-17 17:25:50.0 +0100
> +++ linux-2.6.20-mm1/drivers/infiniband/hw/cxgb3/iwch_provider.c  
> 2007-02-17 17:25:57.0 +0100
> @@ -949,7 +949,7 @@
>   wake_up(&(to_iwch_qp(qp)->wait));
>  }
>  
> -struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
> +static struct ib_qp *iwch_get_qp(struct ib_device *dev, int 

[PATCH]: Remove __exit from mon_bin_exit & mon_text_exit.

2007-02-21 Thread Prarit Bhargava
Remove __exit from mon_bin_exit & mon_text_exit.  Both functions are used
in error code paths in __init functions.

Resolves MODPOST warnings similar to:

`mon_bin_exit' referenced in section `.init.text' of drivers/built-in.o: 
defined in discarded section `.exit.text' of drivers/built-in.o
`mon_text_exit' referenced in section `.init.text' of drivers/built-in.o: 
defined in discarded section `.exit.text' of drivers/built-in.o
make: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Prarit Bhargava <[EMAIL PROTECTED]>

diff --git a/drivers/usb/mon/mon_bin.c b/drivers/usb/mon/mon_bin.c
index c01dfe6..b2bedd9 100644
--- a/drivers/usb/mon/mon_bin.c
+++ b/drivers/usb/mon/mon_bin.c
@@ -1165,7 +1165,7 @@ err_dev:
return rc;
 }
 
-void __exit mon_bin_exit(void)
+void mon_bin_exit(void)
 {
cdev_del(_bin_cdev);
unregister_chrdev_region(mon_bin_dev0, MON_BIN_MAX_MINOR);
diff --git a/drivers/usb/mon/mon_text.c b/drivers/usb/mon/mon_text.c
index d38a127..494ee3b 100644
--- a/drivers/usb/mon/mon_text.c
+++ b/drivers/usb/mon/mon_text.c
@@ -520,7 +520,7 @@ int __init mon_text_init(void)
return 0;
 }
 
-void __exit mon_text_exit(void)
+void mon_text_exit(void)
 {
debugfs_remove(mon_dir);
 }
diff --git a/drivers/usb/mon/usb_mon.h b/drivers/usb/mon/usb_mon.h
index 4f949ce..efdfd89 100644
--- a/drivers/usb/mon/usb_mon.h
+++ b/drivers/usb/mon/usb_mon.h
@@ -57,9 +57,9 @@ void mon_text_del(struct mon_bus *mbus);
 // void mon_bin_add(struct mon_bus *);
 
 int __init mon_text_init(void);
-void __exit mon_text_exit(void);
+void mon_text_exit(void);
 int __init mon_bin_init(void);
-void __exit mon_bin_exit(void);
+void mon_bin_exit(void);
 
 /*
  * DMA interface.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/bridge/br_if.c: fix possible use-after-free in port_carrier_check()

2007-02-21 Thread Oleg Nesterov
On 02/21, Jarek Poplawski wrote:
>
> > On Wed, 21 Feb 2007 01:19:41 +0300
> > Oleg Nesterov <[EMAIL PROTECTED]> wrote:
> > 
> > > + p = container_of(work, struct net_bridge_port, carrier_check.work);
> > >  
> > >   rtnl_lock();
> > > - p = dev->br_port;
> > > - if (!p)
> > > - goto done;
> > >   br = p->br;
> 
> It doesn't matter very much but I think this would look
> better after the first if check.

OK. I can re-send if this patch is otherwise correct.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] EXPORT_SYMBOL() time functions

2007-02-21 Thread Rolf Eike Beer
Arjan van de Ven wrote:
> On Wed, 2007-02-21 at 14:12 +0100, Rolf Eike Beer wrote:
> > These functions were inlines before
> > 8b9365d753d9870bb6451504c13570b81923228f. Now EXPORT_SYMBOL() them to
> > allow them to be used in modules again.
>
> please do not add random exports without users; exports eat up kernel
> size and memory. At minimum specify which mainline modules use the
> exports..

Nothing in mainline now. I just found out that the module I'm writing doesn't 
work anymore as timeval_to_jiffies() disappeared. If this is planned to go 
away from modules I should consider switching to timespec.

Eike


pgpk4J9C51SGQ.pgp
Description: PGP signature


Re: [PATCH] net/bridge/br_if.c: fix possible use-after-free in port_carrier_check()

2007-02-21 Thread Oleg Nesterov
On 02/20, Stephen Hemminger wrote:
>
> On Wed, 21 Feb 2007 01:19:41 +0300
> Oleg Nesterov <[EMAIL PROTECTED]> wrote:
> 
> >  static void release_nbp(struct kobject *kobj)
> >  {
> > struct net_bridge_port *p
> > = container_of(kobj, struct net_bridge_port, kobj);
> > +
> > +   dev_put(p->dev);
> > kfree(p);
> >  }
> >  
> > @@ -127,12 +129,6 @@ static struct kobj_type brport_ktype = {
> >  
> >  static void destroy_nbp(struct net_bridge_port *p)
> >  {
> > -   struct net_device *dev = p->dev;
> > -
> > -   p->br = NULL;
> > -   p->dev = NULL;
> > -   dev_put(dev);
> > -
> > kobject_put(>kobj);
> >  }
> 
> Moving this around is problematic.
> The ordering here was chosen to be RCU friendly so that
> p->dev indicates the port is in process of being deleted but traffic
> may still be using old reference, but new traffic should not use it.

But it is still RCU friendly? destroy_nbp() is rcu-callback which
calls release_nbp() if we have a last reference to ->kobj. This
means that dev_put() may be done a bit later, but not earlier.
And RCU can only garantee "not before", any rcu-callback could be
delayed unpredictably.

Stephen, I know nothing about net/, and

> Probably the best thing to do is pull the whole delayed work queue
> and auto port speed stuff. When STP is moved to user space then
> it can do the ethtool op there.

I can't understand any single word in the paragraph above :)

But the bug (the stable tree has it too) is real. If this patch is
really wrong, could you please take care of it?

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Serial related oops

2007-02-21 Thread Jose Goncalves
New devolpments.
I have upgraded to 2.6.16.41, applied a patch sent by Frederik that
removed the changed made in http://lkml.org/lkml/2005/6/23/266 and
activated some more kernel debug, i.e., CONFIG_KALLSYMS_ALL,
CONFIG_DEBUG_KERNEL, CONFIG_DETECT_SOFTLOCKUP, CONFIG_DEBUG_SLAB,
CONFIG_DEBUG_MUTEXES, CONFIG_FRAME_POINTER and CONFIG_FORCED_INLINING
(thanks to vda for pointing me to the right doc.).
At first it seemed to work fine, but after some days of continuous
running I've got another kernel Oops!
I attach the dmesg output and the assembly dump of serial8250_startup()
and serial8250_shutdown().

Regards,
José Gonçalves

<1>[18840.304048] Unable to handle kernel NULL pointer dereference at virtual address 0012
<1>[18840.313046]  printing eip:
<4>[18840.321687] c01bfa7a
<1>[18840.321714] *pde = 
<0>[18840.331287] Oops:  [#1]
<4>[18840.340687] Modules linked in:
<0>[18840.349749] CPU:0
<4>[18840.349767] EIP:0060:[]Not tainted VLI
<4>[18840.349782] EFLAGS: 00010202   (2.6.16.41-mtm5-debug1 #1) 
<0>[18840.377277] EIP is at serial_in+0xa/0x4a
<0>[18840.387221] eax: 0060   ebx:    ecx:    edx: 
<0>[18840.397805] esi:    edi: 0040   ebp: c728fe1c   esp: c728fe18
<0>[18840.408579] ds: 007b   es: 007b   ss: 0068
<0>[18840.419624] Process gp_position (pid: 11629, threadinfo=c728e000 task=c7443a90)
<0>[18840.420509] Stack: <0>  c01c0f88   c031fef0 0005 0202 
<0>[18840.445655]c7161a1c c031fef0 c124b510 c728fe60 c01bd97d c031fef0 c124b510 c124b510 
<0>[18840.460540] c773dbcc c728fe7c c01befe7 c124b510  ffed c773dbcc 
<0>[18840.475892] Call Trace:
<0>[18840.490039]  [] show_stack_log_lvl+0xa5/0xad
<0>[18840.504944]  [] show_registers+0x106/0x16f
<0>[18840.520104]  [] die+0xb6/0x127
<0>[18840.535497]  [] do_page_fault+0x380/0x4b3
<0>[18840.550828]  [] error_code+0x4f/0x60
<0>[18840.566645]  [] serial8250_startup+0x28f/0x2a9
<0>[18840.582471] Code: 38 43 78 75 02 b2 01 89 d0 eb 10 8b 41 70 39 43 70 0f 94 c0 0f b6 c0 eb 02 31 c0 5b 5d c3 90 90 90 55 89 e5 53 8b 5d 08 8b 55 0c <0f> b6 4b 12 0f b6 43 13 d3 e2 83 f8 02 74 1a 7f 05 48 74 09 eb 
<4>[18840.680471]  BUG: gp_position/11629, lock held at task exit time!
<4>[18840.702808]  [c124b528] {uart_register_driver}
<4>[18840.722346] .. held by:   gp_position:11629 [c7443a90, 117]
<4>[18840.742113] ... acquired at:   uart_get+0x28/0xde

vmlinux-2.6.16.41-mtm5-debug1: file format elf32-i386

Disassembly of section .text:

c01c0cf9 :
c01c0cf9:   55  push   %ebp
c01c0cfa:   89 e5   mov%esp,%ebp
c01c0cfc:   57  push   %edi
c01c0cfd:   56  push   %esi
c01c0cfe:   53  push   %ebx
c01c0cff:   53  push   %ebx
c01c0d00:   8b 5d 08mov0x8(%ebp),%ebx
c01c0d03:   8b 43 60mov0x60(%ebx),%eax
c01c0d06:   c6 83 a7 00 00 00 00movb   $0x0,0xa7(%ebx)
c01c0d0d:   89 c2   mov%eax,%edx
c01c0d0f:   c1 e2 04shl$0x4,%edx
c01c0d12:   83 f8 0acmp$0xa,%eax
c01c0d15:   8b 92 ac 37 25 c0   mov0xc02537ac(%edx),%edx
c01c0d1b:   66 89 93 9c 00 00 00mov%dx,0x9c(%ebx)
c01c0d22:   75 66   jnec01c0d8a 

c01c0d24:   c6 83 a4 00 00 00 00movb   $0x0,0xa4(%ebx)
c01c0d2b:   68 bf 00 00 00  push   $0xbf
c01c0d30:   6a 03   push   $0x3
c01c0d32:   53  push   %ebx
c01c0d33:   e8 82 ed ff ff  call   c01bfaba 
c01c0d38:   6a 10   push   $0x10
c01c0d3a:   6a 02   push   $0x2
c01c0d3c:   53  push   %ebx
c01c0d3d:   e8 78 ed ff ff  call   c01bfaba 
c01c0d42:   6a 00   push   $0x0
c01c0d44:   6a 01   push   $0x1
c01c0d46:   53  push   %ebx
c01c0d47:   e8 6e ed ff ff  call   c01bfaba 
c01c0d4c:   83 c4 24add$0x24,%esp
c01c0d4f:   6a 00   push   $0x0
c01c0d51:   6a 03   push   $0x3
c01c0d53:   53  push   %ebx
c01c0d54:   e8 61 ed ff ff  call   c01bfaba 
c01c0d59:   6a 00   push   $0x0
c01c0d5b:   6a 0c   push   $0xc
c01c0d5d:   53  push   %ebx
c01c0d5e:   e8 a7 ed ff ff  call   c01bfb0a 
c01c0d63:   68 bf 00 00 00  push   $0xbf
c01c0d68:   6a 03   push   $0x3
c01c0d6a:   53  push   %ebx
c01c0d6b:   e8 4a ed ff ff  call   c01bfaba 
c01c0d70:   83 c4 24add$0x24,%esp
c01c0d73:   6a 10   push   $0x10
c01c0d75:   6a 02   push   

Re: [PATCH 2/2] aio: propogate post-EIOCBQUEUED errors to completion event

2007-02-21 Thread Suparna Bhattacharya
On Mon, Feb 19, 2007 at 01:38:35PM -0800, Zach Brown wrote:
> aio: propogate post-EIOCBQUEUED errors to completion event
> 
> This addresses an oops reported by Leonid Ananiev <[EMAIL PROTECTED]>
> as archived at http://lkml.org/lkml/2007/2/8/337.
> 
> O_DIRECT kicks off bios and returns -EIOCBQUEUED to indicate its intention to
> call aio_complete() once the bios complete.   As we return from submission we
> must preserve the -EIOCBQUEUED return code so that fs/aio.c knows to let the
> bio completion call aio_complete().  This stops us from returning errors after
> O_DIRECT submission.
> 
> But we have a few places that are very interested in generating errors after
> bio submission.
> 
> The most critical of these is invalidating the page cache after a write.  This
> avoids exposing stale data to buffered operations that are performed after the
> O_DIRECT write succeeds.  We must do this after submission because the user
> buffer might have been an mmap()ed buffer of the region being written to.  The
> get_user_pages() in the O_DIRECT completion path could have faulted in stale
> data.
> 
> So this patch introduces a helper, aio_propogate_error(), which queues
> post-submission errors in the iocb so that they are given to the user
> completion event when aio_complete() is finally called.
> 
> To get this working we change the aio_complete() path so that the ring
> insertion is performed as the final iocb reference is dropped.  This gives the
> submission path time to queue its pending error before it drops its reference.
> This increases the space in the iocb as it has to record the two result codes
> from aio_complete() and the pending error from the submission path.

This is an interesting trick, but I'd like to consider hard whether the added
complexity is worth it. Could you list the various other cases you have in mind
which would want to use it ?

For the invalidate_inode_pages2_range() case, I wonder if the right
place to issue this is after the direct IO write has completed and before
aio_complete() is issued (somewhat like the way we do bio_check_pages_dirty
for DIO reads), rather than after submission when the IO may still not have
hit the disk. This would also make the behaviour uniform for synchronous and
async cases.

BTW, am I right in interpreting that with your change aio_complete() may
trigger an io_getevents() wakeup, before the corresponding event is placed
on the ring buffer ?

Regards
Suparna

> 
> This was tested by running O_DIRECT aio-stress concurrently with buffered 
> reads
> while triggering EIO in invalidate_inode_pages2_range() with the help of a
> debugfs bool hack.  Previously the kernel would oops as fs/aio.c and bio
> completion both called aio_complete().  With this patch aio-stress sees -EIO.
> 
> Signed-off-by: Zach Brown <[EMAIL PROTECTED]>
> ---
> 
>  fs/aio.c|   49 +-
>  include/linux/aio.h |   30 +
>  mm/filemap.c|4 +--
>  3 files changed, 57 insertions(+), 26 deletions(-)
> 
> --- a/fs/aio.cMon Feb 19 13:12:20 2007 -0800
> +++ b/fs/aio.cMon Feb 19 13:16:00 2007 -0800
> @@ -193,8 +193,7 @@ static int aio_setup_ring(struct kioctx 
>   kunmap_atomic((void *)((unsigned long)__event & PAGE_MASK), km); \
>  } while(0)
> 
> -static void aio_ring_insert_entry(struct kioctx *ctx, struct kiocb *iocb,
> -   long res, long res2)
> +static void aio_ring_insert_entry(struct kioctx *ctx, struct kiocb *iocb)
>  {
>   struct aio_ring_info*info;
>   struct aio_ring *ring;
> @@ -213,12 +212,12 @@ static void aio_ring_insert_entry(struct
> 
>   event->obj = (u64)(unsigned long)iocb->ki_obj.user;
>   event->data = iocb->ki_user_data;
> - event->res = res;
> - event->res2 = res2;
> -
> - dprintk("aio_complete: %p[%lu]: %p: %p %Lx %lx %lx\n",
> + event->res = iocb->ki_pending_err ? iocb->ki_pending_err : iocb->ki_res;
> + event->res2 = iocb->ki_res2;
> +
> + dprintk("aio_complete: %p[%lu]: %p: %p %Lx %d %lx %lx\n",
>   ctx, tail, iocb, iocb->ki_obj.user, iocb->ki_user_data,
> - res, res2);
> + iocb->ki_pending_err, iocb->ki_res, iocb->ki_res2);
> 
>   /* after flagging the request as done, we
>* must never even look at it again
> @@ -459,6 +458,7 @@ static struct kiocb fastcall *__aio_get_
>   req->ki_cancel = NULL;
>   req->ki_retry = NULL;
>   req->ki_dtor = NULL;
> + req->ki_pending_err = 0;
>   req->private = NULL;
>   req->ki_iovec = NULL;
>   INIT_LIST_HEAD(>ki_run_list);
> @@ -548,10 +548,14 @@ static int __aio_put_req(struct kioctx *
> 
>   assert_spin_locked(>ctx_lock);
> 
> - req->ki_users --;
> + req->ki_users--;
>   BUG_ON(req->ki_users < 0);
>   if (likely(req->ki_users))
>   return 0;
> +
> + if (kiocbIsInserted(req))
> + aio_ring_insert_entry(ctx, req);
> +
>  

Re: libata FUA revisited

2007-02-21 Thread Robert Hancock

Tejun Heo wrote:

Aside from the issue above, as I mentioned elsewhere, lots of NCQ drives
don't support non-NCQ FUA writes..


To me, using the NCQ FUA bit on such drives doesn't seem to be a good
idea.  Maybe I'm just too chicken but it's not like we can gain a lot
from doing FUA at this point.  Are there a lot of drives which support
NCQ but not FUA opcodes?


Well, it's hard to say whether "lots" have this issue, but the ones I 
have in my machine, Seagate 7200.7 NCQ 160GB (ST3160827AS) and 7200.10 
320GB (ST3320620AS), both support NCQ and don't support non-NCQ FUA, and 
those (especially the latter) seem to be very popular models.


Likely Seagate didn't implement that command since they figured nobody 
would use that if they had NCQ..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.20-git and 2.6.21-rc1, failed to boot on sata_via

2007-02-21 Thread Jean-Luc Coulon (f5ibh)

Hi,

I've an ASUS A8V motherboard with a via chipset:

[EMAIL PROTECTED] % lspci
00:00.0 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge  
[K8T800/K8T890 South]


2 SATA drives are on this chipset, using lvm over a raid1.

I've a X86-64 architecture on an amd64.

Since 2.6.20-git, the system refuses to boot, I've  
the same problem with 2.6.21-rc1.


I've opened a bug, see: http://bugzilla.kernel.org/show_bug.cgi?id=8025
but I'm not sure it was the appropriate way.

At boot time, I've the following messages:

ACPI: PCI Interrupt :00:0f.0[B] GSI 20(level, low) -> IRQ 20
sata_via :00:0f.0: failed on iomap PCI BAR 0
sata_via :00:0f.0: out of memory
ACPI: PCI interrupt for device :00:0f.0 is disabled
sata_via: probe of :00:0f.0 failed with error -12

Then the raid is not assembled and the system wont boot.

These are the options (parts of] enabled in my config file:

#
# ATA/ATAPI/MFM/RLL support
#
CONFIG_IDE=m
CONFIG_BLK_DEV_IDE=m

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_IDE_SATA is not set
# CONFIG_BLK_DEV_HD_IDE is not set
CONFIG_BLK_DEV_IDEDISK=m
CONFIG_IDEDISK_MULTI_MODE=y
CONFIG_BLK_DEV_IDECD=m
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
...
#
# IDE chipset support/bugfixes
#
# CONFIG_IDE_GENERIC is not set
# CONFIG_BLK_DEV_CMD640 is not set
# CONFIG_BLK_DEV_IDEPNP is not set
CONFIG_BLK_DEV_IDEPCI=y
# CONFIG_IDEPCI_SHARE_IRQ is not set
# CONFIG_BLK_DEV_OFFBOARD is not set
# CONFIG_BLK_DEV_GENERIC is not set
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_BLK_DEV_ATIIXP is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_JMICRON is not set
# CONFIG_BLK_DEV_SC1200 is not set
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_BLK_DEV_IT8213 is not set
# CONFIG_BLK_DEV_IT821X is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
CONFIG_BLK_DEV_VIA82CXXX=m
# CONFIG_BLK_DEV_TC86C001 is not set
# CONFIG_IDE_ARM is not set
CONFIG_BLK_DEV_IDEDMA=y
CONFIG_IDEDMA_IVB=y
CONFIG_IDEDMA_AUTO=y



#
# Serial ATA (prod) and Parallel ATA (experimental) drivers
#
CONFIG_ATA=m
# CONFIG_ATA_NONSTANDARD is not set
# CONFIG_SATA_AHCI is not set
# CONFIG_SATA_SVW is not set
# CONFIG_ATA_PIIX is not set
# CONFIG_SATA_MV is not set
# CONFIG_SATA_NV is not set
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
CONFIG_SATA_PROMISE=m
# CONFIG_SATA_SX4 is not set
# CONFIG_SATA_SIL is not set
# CONFIG_SATA_SIL24 is not set
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_ULI is not set
CONFIG_SATA_VIA=m
# CONFIG_SATA_VITESSE is not set
# CONFIG_SATA_INIC162X is not set
CONFIG_SATA_ACPI=y
...


I've found a quite similar problem in the archive but which seems to  
have been fixed in the meantime, in this thread:

http://www.ussg.iu.edu/hypermail/linux/kernel/0702.1/1580.html

[I'm not subscribed to the list]


Regards

Jean-Luc


pgpdu2oscxZQl.pgp
Description: PGP signature


Re: [Ecryptfs-devel] [PATCH] ecryptfs lower_file largefile issue

2007-02-21 Thread Dmitriy Monakhov
Michael Halcrow <[EMAIL PROTECTED]> writes:

> On Wed, Feb 21, 2007 at 01:07:22PM +0300, Dmitriy Monakhov wrote:
>> Where is largefile issue in ecryptfs.
>
> Thanks for your thorough work on resolving such issues. We will
> integrate your patches and testcases into the next release as soon as
> we get a chance to review and test.
BTW previously i've sent some patches for ecryptfs to lkml but forgot add
[EMAIL PROTECTED] to CC:.
links to patches:
"ecryptfs lower_file handling issues" http://lkml.org/lkml/2007/2/19/147
"ecryptfs_read_super path_lookup" errh fix http://lkml.org/lkml/2007/2/19/171

>
> Mike
> .___.
>  Michael A. Halcrow  
>Security Software Engineer, IBM Linux Technology Center   
> GnuPG Fingerprint: 419C 5B1E 948A FA73 A54C  20F5 DB40 8531 6DCA 8769
>
> "To prohibit sharing software is to cut the bonds of society."   
>  - Richard Stallman

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] [PATCH 2/2] s390: SCSI dump kernel and userspace application

2007-02-21 Thread Michael Holzheu
Userspace part of the s390 SCSI dumper.

Acked-by: Martin Schwidefsky <[EMAIL PROTECTED]>
Signed-off-by: Michael Holzheu <[EMAIL PROTECTED]>

---
 arch/s390/Makefile|4 
 arch/s390/zfcpdump/Makefile   |5 
 arch/s390/zfcpdump/defconfig.zfcpdump |  467 
 arch/s390/zfcpdump/initramfs.txt  |6 
 arch/s390/zfcpdump/zfcpdump.c |  953 ++
 arch/s390/zfcpdump/zfcpdump.h |  214 +++
 6 files changed, 1649 insertions(+)

Index: git-linux-2.6/arch/s390/Makefile
===
--- git-linux-2.6.orig/arch/s390/Makefile   2007-02-21 13:09:01.0 
+0100
+++ git-linux-2.6/arch/s390/Makefile2007-02-21 13:09:06.0 +0100
@@ -94,6 +94,7 @@ drivers-$(CONFIG_MATHEMU) += arch/$(ARCH
 drivers-$(CONFIG_OPROFILE) += arch/s390/oprofile/
 
 boot   := arch/$(ARCH)/boot
+zfcpdump   := arch/s390/zfcpdump
 
 all: image
 
@@ -103,6 +104,9 @@ install: vmlinux
 image: vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
 
+zfcpdump:
+   $(Q)$(MAKE) $(build)=$(zfcpdump) $(zfcpdump)/$@
+
 archclean:
$(Q)$(MAKE) $(clean)=$(boot)
 
Index: git-linux-2.6/arch/s390/zfcpdump/Makefile
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ git-linux-2.6/arch/s390/zfcpdump/Makefile   2007-02-21 13:09:06.0 
+0100
@@ -0,0 +1,5 @@
+targets := zfcpdump
+
+$(obj)/zfcpdump: arch/s390/zfcpdump/zfcpdump.c arch/s390/zfcpdump/zfcpdump.h
+   $(CC) -Wall -o $(obj)/zfcpdump $(obj)/zfcpdump.c -static
+
Index: git-linux-2.6/arch/s390/zfcpdump/defconfig.zfcpdump
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ git-linux-2.6/arch/s390/zfcpdump/defconfig.zfcpdump 2007-02-21 
13:09:06.0 +0100
@@ -0,0 +1,467 @@
+#
+# Automatically generated make config: don't edit
+# Linux kernel version: 2.6.20
+# Tue Feb 20 12:54:03 2007
+#
+CONFIG_MMU=y
+CONFIG_LOCKDEP_SUPPORT=y
+CONFIG_STACKTRACE_SUPPORT=y
+CONFIG_RWSEM_XCHGADD_ALGORITHM=y
+# CONFIG_ARCH_HAS_ILOG2_U32 is not set
+# CONFIG_ARCH_HAS_ILOG2_U64 is not set
+CONFIG_GENERIC_HWEIGHT=y
+CONFIG_GENERIC_TIME=y
+CONFIG_S390=y
+CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
+
+#
+# Code maturity level options
+#
+# CONFIG_EXPERIMENTAL is not set
+CONFIG_LOCK_KERNEL=y
+CONFIG_INIT_ENV_ARG_LIMIT=32
+
+#
+# General setup
+#
+CONFIG_LOCALVERSION=""
+CONFIG_LOCALVERSION_AUTO=y
+CONFIG_SWAP=y
+# CONFIG_SYSVIPC is not set
+# CONFIG_BSD_PROCESS_ACCT is not set
+# CONFIG_TASKSTATS is not set
+# CONFIG_UTS_NS is not set
+# CONFIG_AUDIT is not set
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+# CONFIG_CPUSETS is not set
+# CONFIG_SYSFS_DEPRECATED is not set
+# CONFIG_RELAY is not set
+CONFIG_INITRAMFS_SOURCE="arch/s390/zfcpdump/initramfs.txt"
+CONFIG_INITRAMFS_ROOT_UID=0
+CONFIG_INITRAMFS_ROOT_GID=0
+CONFIG_SYSCTL=y
+# CONFIG_EMBEDDED is not set
+CONFIG_SYSCTL_SYSCALL=y
+CONFIG_KALLSYMS=y
+# CONFIG_KALLSYMS_ALL is not set
+# CONFIG_KALLSYMS_EXTRA_PASS is not set
+CONFIG_HOTPLUG=y
+CONFIG_PRINTK=y
+CONFIG_BUG=y
+CONFIG_ELF_CORE=y
+CONFIG_BASE_FULL=y
+CONFIG_FUTEX=y
+CONFIG_EPOLL=y
+CONFIG_SHMEM=y
+CONFIG_SLAB=y
+CONFIG_VM_EVENT_COUNTERS=y
+CONFIG_RT_MUTEXES=y
+# CONFIG_TINY_SHMEM is not set
+CONFIG_BASE_SMALL=0
+# CONFIG_SLOB is not set
+
+#
+# Loadable module support
+#
+# CONFIG_MODULES is not set
+
+#
+# Block layer
+#
+CONFIG_BLOCK=y
+# CONFIG_BLK_DEV_IO_TRACE is not set
+
+#
+# IO Schedulers
+#
+CONFIG_IOSCHED_NOOP=y
+CONFIG_IOSCHED_AS=y
+CONFIG_IOSCHED_DEADLINE=y
+CONFIG_IOSCHED_CFQ=y
+# CONFIG_DEFAULT_AS is not set
+CONFIG_DEFAULT_DEADLINE=y
+# CONFIG_DEFAULT_CFQ is not set
+# CONFIG_DEFAULT_NOOP is not set
+CONFIG_DEFAULT_IOSCHED="deadline"
+
+#
+# Base setup
+#
+
+#
+# Processor type and features
+#
+CONFIG_64BIT=y
+CONFIG_SMP=y
+CONFIG_NR_CPUS=32
+# CONFIG_HOTPLUG_CPU is not set
+CONFIG_DEFAULT_MIGRATION_COST=100
+# CONFIG_COMPAT is not set
+CONFIG_AUDIT_ARCH=y
+CONFIG_S390_SWITCH_AMODE=y
+CONFIG_S390_EXEC_PROTECT=y
+
+#
+# Code generation options
+#
+# CONFIG_MARCH_G5 is not set
+CONFIG_MARCH_Z900=y
+# CONFIG_MARCH_Z990 is not set
+# CONFIG_MARCH_Z9_109 is not set
+CONFIG_PACK_STACK=y
+# CONFIG_SMALL_STACK is not set
+CONFIG_CHECK_STACK=y
+CONFIG_STACK_GUARD=256
+# CONFIG_WARN_STACK is not set
+CONFIG_ARCH_POPULATES_NODE_MAP=y
+CONFIG_FLATMEM=y
+CONFIG_FLAT_NODE_MEM_MAP=y
+# CONFIG_SPARSEMEM_STATIC is not set
+CONFIG_SPLIT_PTLOCK_CPUS=4
+CONFIG_RESOURCES_64BIT=y
+CONFIG_HOLES_IN_ZONE=y
+
+#
+# I/O subsystem configuration
+#
+CONFIG_MACHCHK_WARNING=y
+CONFIG_QDIO=y
+# CONFIG_QDIO_DEBUG is not set
+
+#
+# Misc
+#
+CONFIG_PREEMPT=y
+CONFIG_IPL=y
+# CONFIG_IPL_TAPE is not set
+CONFIG_IPL_VM=y
+CONFIG_BINFMT_ELF=y
+# CONFIG_BINFMT_MISC is not set
+# CONFIG_PROCESS_DEBUG is not set
+CONFIG_PFAULT=y
+# CONFIG_SHARED_KERNEL is not set
+# CONFIG_CMM is 

[RFC] [PATCH 1/2] s390: SCSI dump kernel and userspace application

2007-02-21 Thread Michael Holzheu
Kernel part of the s390 SCSI dumper: zcore character device driver.

Acked-by: Martin Schwidefsky <[EMAIL PROTECTED]>
Signed-off-by: Michael Holzheu <[EMAIL PROTECTED]>

---
 Documentation/s390/zfcpdump.txt |   41 +
 arch/s390/Kconfig   |7 
 arch/s390/kernel/early.c|3 
 arch/s390/kernel/head64.S   |   43 +
 arch/s390/kernel/ipl.c  |   24 -
 arch/s390/kernel/setup.c|3 
 arch/s390/kernel/smp.c  |1 
 drivers/s390/char/Makefile  |2 
 drivers/s390/char/sclp.h|2 
 drivers/s390/char/sclp_sdias.c  |  248 +++
 drivers/s390/char/zcore.c   |  885 
 drivers/s390/cio/cio.c  |1 
 include/asm-s390/ipl.h  |  116 +
 include/asm-s390/processor.h|5 
 include/asm-s390/sclp.h |2 
 include/asm-s390/setup.h|   75 ---
 16 files changed, 1360 insertions(+), 98 deletions(-)

Index: git-linux-2.6/Documentation/s390/zfcpdump.txt
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ git-linux-2.6/Documentation/s390/zfcpdump.txt   2007-02-21 
11:00:15.0 +0100
@@ -0,0 +1,41 @@
+s390 SCSI dump tool (zfcpdump)
+
+System z machines (z900 or higher) provide hardware support for creating system
+dumps on SCSI disks. The dump process is initiated by booting a dump tool, 
which
+has to create a dump of the current (probably crashed) Linux image. In order to
+not overwrite memory of the crashed Linux with data of the dump tool, the
+hardware saves some memory plus the register sets of the boot cpu before the
+dump tool is loaded. There exists an SCLP hardware interface to obtain the 
saved
+memory afterwards. Currently 32 MB are saved.
+
+This zfcpdump implementation consists of a small Linux dump kernel together 
with
+a userspace dump tool, which are loaded together into the saved memory region
+below 32 MB. zfcpdump is installed on a SCSI disk using zipl (as contained in 
the
+s390-tools package) to make the device bootable. The operator of a Linux system
+can then trigger a scsi dump by booting the SCSI disk, where zfcpdump resides 
on.
+
+The kernel part of zfcpdump is implemented as a character device driver named
+"zcore", which exports memory and registers of the crashed Linux in an s390
+standalone dump format. It can be used in the same way as e.g. /dev/mem. The
+dump format defines a 4K header followed by plain uncompressed memory. The
+register sets are stored in the prefix pages of the different cpus. To build an
+dump enabled kernel with the zcore driver, the kernel config option
+"S390_ZFCPDUMP" has to be set. When reading from /dev/zcore, the first part of
+memory, which has been saved by the hardware is read by the driver via the SCLP
+hardware interface. The second part is just copied from the non overwritten 
real
+memory.
+
+The userspace application of zfcpdump resides in an intitramfs. It reads from
+/dev/zcore and writes the system dump to a file on a SCSI disk.
+
+To build zfcpdump you have to do the following:
+- Use /arch/s390/zfcpdump/defconfig.zfcpdump as kernel configuration file, 
which
+  enables the S390_ZFCPDUMP option and sets all other required kernel options.
+- Issue "make zfcpdump" from the toplevel directory of the linux tree to
+  build the userspace application. Note, that the zfcpdump application has a
+  dependency on glibc and libz.
+- Issue "make image" to build the zfcpdump image with initramfs.
+
+The zfcpdump enabled kernel image must be copied to
+/usr/share/zfcpdump/zfcpdump.image, where the zipl tool is looking for the dump
+kernel, when preparing a SCSI dump disk.
Index: git-linux-2.6/arch/s390/Kconfig
===
--- git-linux-2.6.orig/arch/s390/Kconfig2007-02-21 10:22:03.0 
+0100
+++ git-linux-2.6/arch/s390/Kconfig 2007-02-21 11:00:15.0 +0100
@@ -512,6 +512,13 @@ config KEXEC
  current kernel, and to start another kernel.  It is like a reboot
  but is independent of hardware/microcode support.
 
+config S390_ZFCPDUMP
+   bool "zfcp dump kernel"
+   default n
+   help
+ Select this option if you want to build an zfcp dump enabled kernel.
+ Do NOT select this option for normal kernels!
+
 endmenu
 
 source "net/Kconfig"
Index: git-linux-2.6/arch/s390/kernel/early.c
===
--- git-linux-2.6.orig/arch/s390/kernel/early.c 2007-02-21 10:22:04.0 
+0100
+++ git-linux-2.6/arch/s390/kernel/early.c  2007-02-21 10:56:03.0 
+0100
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -129,7 +130,7 @@ static noinline __init void detect_machi
 {
struct cpuinfo_S390 *cpuinfo = _lowcore.cpu_data;
 
-   asm volatile("stidp %0" : "=m" (S390_lowcore.cpu_data.cpu_id));
+   

[RFC] [PATCH 0/2] s390: SCSI dump kernel and userspace application

2007-02-21 Thread Michael Holzheu
s390 machines (z900 or higher) provide hardware support for creating Linux
dumps on SCSI disks. Our current implementation consists of a userspace
application running on an special Linux dump kernel, which exploits the
s390 hardware support. Since both parts (kernel and userspace) belong
together, we would like to put the application code in the Linux source tree
under "arch/s390". Currently we have a dependency on the libc of the
target system. Klibc would be a nice solution for that. Are there currently
any plans for upstream klibc?

* Patch 2 implements the kernel support (zcore character device)
* Patch 3 implements the userspace part, which resides in arch/s390

For more information please refer to Documentation/s390/zfcpdump.txt,
which is contained in patch 2.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 00/21] 2.6.19-stable review

2007-02-21 Thread Stefan Richter
I wrote:
> I hope you can extract the patch from this MIME attachment.

Probably not unless I attach it for real.
-- 
Stefan Richter
-=-=-=== --=- =-=-=
http://arcgraph.de/sr/
Subject: [PATCH] Missing critical phys_to_virt in lib/swiotlb.c
From: David Moore <[EMAIL PROTECTED]>
Date: Sun, 04 Feb 2007 13:39:40 -0500

From: David Moore <[EMAIL PROTECTED]>

Adds missing call to phys_to_virt() in the
lib/swiotlb.c:swiotlb_sync_sg() function.  Without this change, a kernel
panic will always occur whenever a SWIOTLB bounce buffer from a
scatter-gather list gets synced.

Signed-off-by: David Moore <[EMAIL PROTECTED]>
Signed-off-by: Stefan Richter <[EMAIL PROTECTED]>
---

This is a fraction of patch "[IA64] swiotlb bug fixes" in 2.6.20-git#,
commit cde14bbfb3aa79b479db35bd29e6c083513d8614. Unlike its heading
suggests, it is also important for EM64T.

Example crashes caused by swiotlb_sync_sg:
http://lists.opensuse.org/opensuse-bugs/2006-12/msg02943.html
http://qa.mandriva.com/show_bug.cgi?id=28224
http://www.pchdtv.com/forum/viewtopic.php?t=2063=a959a14a4c2db0eebaab7b0df56103ce

--- linux-2.6.19.x86_64/lib/swiotlb.c.orig  2007-02-04 13:18:41.0 
-0500
+++ linux-2.6.19.x86_64/lib/swiotlb.c   2007-02-04 13:19:43.0 -0500
@@ -750,7 +750,7 @@ swiotlb_sync_sg(struct device *hwdev, st
 
for (i = 0; i < nelems; i++, sg++)
if (sg->dma_address != SG_ENT_PHYS_ADDRESS(sg))
-   sync_single(hwdev, (void *) sg->dma_address,
+   sync_single(hwdev, phys_to_virt(sg->dma_address),
sg->dma_length, dir, target);
 }
 





Re: [patch 00/21] 2.6.19-stable review

2007-02-21 Thread Stefan Richter
Greg KH wrote:
> This is the start of the stable review cycle for the 2.6.19.5 release.
> 
> This will probably be the last release of the 2.6.19-stable series, so
> if there are patches that you feel should be applied to that tree,
> please let me know.

There is one here: "Missing critical phys_to_virt in lib/swiotlb.c".
http://lkml.org/lkml/2007/2/4/116
It fixes a DMA related bug which was seen with a variety of drivers
especially on EM64T machines with more than 3GB RAM. I hope you can
extract the patch from this MIME attachment.

Adrian, AFAICS it applies as-is to 2.6.16.y too. I don't have a machine
to test personally, but it is quite obvious.

The mentioned bigger patch has been merged by Linus between 2.6.20 and
2.6.21-rc1.
-- 
Stefan Richter
-=-=-=== --=- =-=-=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [2.6 patch] drivers/hid/hid-debug.c should #include

2007-02-21 Thread Jiri Kosina
On Wed, 21 Feb 2007, Adrian Bunk wrote:

> Every file should include the headers containing the prototypes for
> it's global functions.

Applied, thanks.

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] allow oom_adj of saintly processes

2007-02-21 Thread Joshua N Pritikin
If the badness of a process is zero then oom_adj>0 has no effect. This 
patch makes sure that the oom_adj shift actually increases badness 
points appropriately.

I am not subscribed. Please CC me with any comments. Thanks.

Signed-off-by: Joshua N. Pritikin <[EMAIL PROTECTED]>

--- linux/mm/oom_kill.c.orig2007-02-18 14:58:35.0 +0530
+++ linux/mm/oom_kill.c 2007-02-18 14:57:52.0 +0530
@@ -147,9 +147,11 @@ unsigned long badness(struct task_struct
 * Adjust the score by oomkilladj.
 */
if (p->oomkilladj) {
-   if (p->oomkilladj > 0)
+   if (p->oomkilladj > 0) {
+   if (!points)
+   points = 1;
points <<= p->oomkilladj;
-   else
+   } else
points >>= -(p->oomkilladj);
}
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] EXPORT_SYMBOL() time functions

2007-02-21 Thread Arjan van de Ven
On Wed, 2007-02-21 at 14:12 +0100, Rolf Eike Beer wrote:
> These functions were inlines before 8b9365d753d9870bb6451504c13570b81923228f.
> Now EXPORT_SYMBOL() them to allow them to be used in modules again.


please do not add random exports without users; exports eat up kernel
size and memory. At minimum specify which mainline modules use the
exports..


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: utrace regressions (was: -mm merge plans for 2.6.21)

2007-02-21 Thread Alexey Dobriyan
On Sat, Feb 17, 2007 at 06:35:31PM -0800, Roland McGrath wrote:
> > Looking at mainline x86_64 ptrace code I think hole for u_debugreg[4]
> > and [5] is also needed.
>
> It's not.  The utrace_regset for the debugregs already has that behavior
> for those two words, so mapping all 8 uarea words to the regset is fine.

Sorry, I don't get it. Choosing segment from x86_64_uarea is done before
calling regset->set and regset->get as well as before zero-filling. No
segment for u_debugreg[4] and [5] means -EIO before segment handlers
will have a chance to be called.

Do you want to consolidate these two?

{offsetof(struct user, u_debugreg[0]), offsetof(struct user, 
u_debugreg[4]), 3, 0},
{offsetof(struct user, u_debugreg[6]), offsetof(struct user, 
u_debugreg[8]), 3, 6 * sizeof(long)},

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21-rc1

2007-02-21 Thread Thomas Gleixner
On Tue, 2007-02-20 at 20:53 -0800, Linus Torvalds wrote:
> But there's a ton of architecture updates (arm, mips, powerpc, x86, you 
> name it), ACPI updates, and lots of driver work. And just a lot of 
> cleanups.
> 
> Have fun,

Yup. Fun starts in drivers/net/e1000

e1000 is not working anymore. ifup fails permanentely.
 ADDRCONF(NETDEV_UP): eth0: link is not ready
nothing else

bisect identifies:

d2ed16356ff4fb9de23fbc5e5d582ce580390106 is first bad commit
commit d2ed16356ff4fb9de23fbc5e5d582ce580390106
Author: Kok, Auke <[EMAIL PROTECTED]>
Date:   Fri Feb 16 14:39:26 2007 -0800

e1000: fix shared interrupt warning message

Signed-off-by: Jesse Brandeburg <[EMAIL PROTECTED]>
Signed-off-by: Auke Kok <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
Signed-off-by: Jeff Garzik <[EMAIL PROTECTED]>

Reverting this patch on top of -rc1 helps as well.

tglx


lspci output:

04:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet 
Controller
Subsystem: Intel Corporation Unknown device 30a5
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.21-rc1

2007-02-21 Thread Faik Uygur
Hi,

21 Şub 2007 Çar 06:53 tarihinde, Linus Torvalds şunları yazmıştı: 
> Ok, the merge window for 2.6.21 has closed, and -rc1 is out there.

  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CHK include/linux/compile.h
  CC [M]  drivers/char/ip2/ip2main.o
In file included from drivers/char/ip2/ip2main.c:285:
drivers/char/ip2/i2lib.c: In function `iiSendPendingMail_t':
drivers/char/ip2/i2lib.c:83: sorry, unimplemented: inlining failed in call 
to 'iiSendPendingMail': function body not available
drivers/char/ip2/i2lib.c:157: sorry, unimplemented: called from here
make[3]: *** [drivers/char/ip2/ip2main.o] Error 1
make[2]: *** [drivers/char/ip2] Error 2
make[1]: *** [drivers/char] Error 2
make: *** [drivers] Error 2

With cleanup changes in commit 40565f1962c5be9b9e285e05af01ab7771534868 
compilation fails.

Regards,
- Faik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Ecryptfs-devel] [PATCH] ecryptfs lower_file largefile issue

2007-02-21 Thread Michael Halcrow
-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

On Wed, Feb 21, 2007 at 01:07:22PM +0300, Dmitriy Monakhov wrote:
> Where is largefile issue in ecryptfs.

Thanks for your thorough work on resolving such issues. We will
integrate your patches and testcases into the next release as soon as
we get a chance to review and test.

Mike
.___.
 Michael A. Halcrow  
   Security Software Engineer, IBM Linux Technology Center   
GnuPG Fingerprint: 419C 5B1E 948A FA73 A54C  20F5 DB40 8531 6DCA 8769

"To prohibit sharing software is to cut the bonds of society."   
 - Richard Stallman

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iQEVAwUBRdxDsdtAhTFtyodpAQPT0wf/amu/FboyQXHXwnn7pUEdPqqkxXpbZGrR
L8hknVilgVA8pvgK/RrUNuakrHRWZTBokbSbiUA/xgpHlUTHa6T3VE/JOe5pLzG+
vDEo8+Ya+yjRAMJ/970oLz1T4sUXvalgJlYRaKKqKEsZmFIQwMKoK7lmc+Th+p2J
R/ZfJ3olWmskPMlLHhN4j9EEVoAGQLUqiH+bBkx4AqLJNVq9S69fFBwGFVk4t+/y
Z6KJNhJsjHnOj6ADjSlObXmB5JFIWsUl4P0fouELB4lxTYh1C42S0hy5Pl60Sq1o
MrmXIvkAdm1klYwbhaF174zu2XDzDJICXq4475v3WWVxT65HYkanMQ==
=3zMo
-END PGP SIGNATURE-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] EXPORT_SYMBOL() time functions

2007-02-21 Thread Rolf Eike Beer
These functions were inlines before 8b9365d753d9870bb6451504c13570b81923228f.
Now EXPORT_SYMBOL() them to allow them to be used in modules again.

Signed-off-by: Rolf Eike Beer <[EMAIL PROTECTED]>

---

Sent once again, this time without PGP signature so importing into git is 
easier.

commit 0a543599f4a9ea02b587bda26e0e11ae94774f61
tree aa815eab413d2575925b0964a1fa01d41439b26b
parent 6b8afc66b9d6893d3fa292b75769b58539836ff3
author Rolf Eike Beer <[EMAIL PROTECTED]> Wed, 21 Feb 2007 14:10:12 +0100
committer Rolf Eike Beer <[EMAIL PROTECTED]> Wed, 21 Feb 2007 14:10:12 +0100

 kernel/time.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/time.c b/kernel/time.c
index c6c80ea..0b351b2 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -635,6 +635,7 @@ timeval_to_jiffies(const struct timeval *value)
(((u64)usec * USEC_CONVERSION + USEC_ROUND) >>
 (USEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC;
 }
+EXPORT_SYMBOL(timeval_to_jiffies);
 
 void jiffies_to_timeval(const unsigned long jiffies, struct timeval *value)
 {
@@ -649,6 +650,7 @@ void jiffies_to_timeval(const unsigned long jiffies, struct 
timeval *value)
tv_usec /= NSEC_PER_USEC;
value->tv_usec = tv_usec;
 }
+EXPORT_SYMBOL(jiffies_to_timeval);
 
 /*
  * Convert jiffies/jiffies_64 to clock_t and back.
@@ -723,6 +725,7 @@ u64 nsec_to_clock_t(u64 x)
 #endif
return x;
 }
+EXPORT_SYMBOL(nsec_to_clock_t);
 
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] EXPORT_SYMBOL() time functions

2007-02-21 Thread Rolf Eike Beer
These functions were inlines before 8b9365d753d9870bb6451504c13570b81923228f.
Now EXPORT_SYMBOL() them to allow them to be used in modules again.

Signed-off-by: Rolf Eike Beer <[EMAIL PROTECTED]>

---
commit 0a543599f4a9ea02b587bda26e0e11ae94774f61
tree aa815eab413d2575925b0964a1fa01d41439b26b
parent 6b8afc66b9d6893d3fa292b75769b58539836ff3
author Rolf Eike Beer <[EMAIL PROTECTED]> Wed, 21 Feb 2007 14:10:12 +0100
committer Rolf Eike Beer <[EMAIL PROTECTED]> Wed, 21 Feb 2007 14:10:12 +0100

 kernel/time.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/time.c b/kernel/time.c
index c6c80ea..0b351b2 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -635,6 +635,7 @@ timeval_to_jiffies(const struct timeval *value)
(((u64)usec * USEC_CONVERSION + USEC_ROUND) >>
 (USEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC;
 }
+EXPORT_SYMBOL(timeval_to_jiffies);
 
 void jiffies_to_timeval(const unsigned long jiffies, struct timeval *value)
 {
@@ -649,6 +650,7 @@ void jiffies_to_timeval(const unsigned long jiffies, struct 
timeval *value)
tv_usec /= NSEC_PER_USEC;
value->tv_usec = tv_usec;
 }
+EXPORT_SYMBOL(jiffies_to_timeval);
 
 /*
  * Convert jiffies/jiffies_64 to clock_t and back.
@@ -723,6 +725,7 @@ u64 nsec_to_clock_t(u64 x)
 #endif
return x;
 }
+EXPORT_SYMBOL(nsec_to_clock_t);
 
 #if (BITS_PER_LONG < 64)
 u64 get_jiffies_64(void)


pgptlQoaEVqET.pgp
Description: PGP signature


[PATCH 4/4] NOMMU: Make it possible for RomFS to use MTD devices directly

2007-02-21 Thread David Howells
From: David Howells <[EMAIL PROTECTED]>

Change RomFS so that it can use MTD devices directly - without the intercession
of the block layer - as well as using block devices.

This permits RomFS:

 (1) to use the MTD direct mapping facility available under NOMMU conditions if
 the underlying device is directly accessible by the CPU;

 (2) and thus to be used when the block layer is disabled.

RomFS can be configured with support just for MTD devices, just for Block
devices or for both.  If RomFS is configured for both, then it will treat
mtdblock device files as MTD backing stores, not block layer backing stores.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 fs/Kconfig|   24 ++
 fs/romfs/Makefile |9 +
 fs/romfs/inode.c  |  649 -
 fs/romfs/internal.h   |   47 
 fs/romfs/mmap-nommu.c |   75 ++
 fs/romfs/storage.c|  261 
 fs/romfs/super.c  |  647 +
 7 files changed, 1060 insertions(+), 652 deletions(-)

diff --git a/fs/Kconfig b/fs/Kconfig
index 3c4886b..adaec7b 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -478,6 +478,8 @@ config MINIX_FS
  partition (the one containing the directory /) cannot be compiled as
  a module.
 
+endif
+
 config ROMFS_FS
tristate "ROM file system support"
---help---
@@ -494,7 +496,27 @@ config ROMFS_FS
  If you don't know whether you need it, then you don't need it:
  answer N.
 
-endif
+config ROMFS_ON_BLOCK
+   bool "Block device-backed ROM file system support"
+   depends on ROMFS_FS && BLOCK
+   help
+ This permits ROMFS to use block devices buffered through the page
+ cache as the medium from which to retrieve data.  It does not allow
+ direct mapping of the medium.
+
+ If unsure, answer Y.
+
+config ROMFS_ON_MTD
+   bool "MTD-backed ROM file system support"
+   depends on ROMFS_FS && MTD
+   help
+ This permits ROMFS to use MTD based devices directly, without the
+ intercession of the block layer (which may have been disabled).  It
+ also allows direct mapping of MTD devices through romfs files under
+ NOMMU conditions if the underlying device is directly addressable by
+ the CPU.
+
+ If unsure, answer Y.
 
 config INOTIFY
bool "Inotify file change notification support"
diff --git a/fs/romfs/Makefile b/fs/romfs/Makefile
index c95b21c..420beb7 100644
--- a/fs/romfs/Makefile
+++ b/fs/romfs/Makefile
@@ -1,7 +1,12 @@
 #
-# Makefile for the linux romfs filesystem routines.
+# Makefile for the linux RomFS filesystem routines.
 #
 
 obj-$(CONFIG_ROMFS_FS) += romfs.o
 
-romfs-objs := inode.o
+romfs-y := storage.o super.o
+
+ifneq ($(CONFIG_MMU),y)
+romfs-$(CONFIG_ROMFS_ON_MTD) += mmap-nommu.o
+endif
+
diff --git a/fs/romfs/inode.c b/fs/romfs/inode.c
deleted file mode 100644
index fd60101..000
--- a/fs/romfs/inode.c
+++ /dev/null
@@ -1,649 +0,0 @@
-/*
- * ROMFS file system, Linux implementation
- *
- * Copyright (C) 1997-1999  Janos Farkas <[EMAIL PROTECTED]>
- *
- * Using parts of the minix filesystem
- * Copyright (C) 1991, 1992  Linus Torvalds
- *
- * and parts of the affs filesystem additionally
- * Copyright (C) 1993  Ray Burr
- * Copyright (C) 1996  Hans-Joachim Widmaier
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- *
- * Changes
- * Changed for 2.1.19 modules
- * Jan 1997Initial release
- * Jun 19972.1.43+ changes
- * Proper page locking in readpage
- * Changed to work with 2.1.45+ fs
- * Jul 1997Fixed follow_link
- * 2.1.47
- * lookup shouldn't return -ENOENT
- * from Horst von Brand:
- *   fail on wrong checksum
- *   double unlock_super was possible
- *   correct namelen for statfs
- * spotted by Bill Hawes:
- *   readlink shouldn't iput()
- * Jun 19982.1.106 from Avery Pennarun: glibc scandir()
- *   exposed a problem in readdir
- * 2.1.107 code-freeze spellchecker run
- * Aug 19982.1.118+ VFS changes
- * Sep 19982.1.122 another VFS change (follow_link)
- * Apr 19992.2.7   no more EBADF checking in
- *

[PATCH 3/4] NOMMU: Generalise the handling of MTD-specific superblocks

2007-02-21 Thread David Howells
From: David Howells <[EMAIL PROTECTED]>

Generalise the handling of MTD-specific superblocks so that JFFS2 and ROMFS can
both share it.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 drivers/mtd/Makefile  |2 
 drivers/mtd/mtdsuper.c|  231 +
 fs/jffs2/super.c  |  194 --
 include/linux/fs.h|1 
 include/linux/mtd/super.h |   30 ++
 5 files changed, 282 insertions(+), 176 deletions(-)

diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
index a75b82a..f9f0ffd 100644
--- a/drivers/mtd/Makefile
+++ b/drivers/mtd/Makefile
@@ -4,7 +4,7 @@
 # $Id: Makefile.common,v 1.7 2005/07/11 10:39:27 gleixner Exp $
 
 # Core functionality.
-mtd-y  := mtdcore.o mtdbdi.o
+mtd-y  := mtdcore.o mtdbdi.o mtdsuper.o
 mtd-$(CONFIG_MTD_PARTITIONS)   += mtdpart.o
 obj-$(CONFIG_MTD)  += $(mtd-y)
 
diff --git a/drivers/mtd/mtdsuper.c b/drivers/mtd/mtdsuper.c
new file mode 100644
index 000..7ffab96
--- /dev/null
+++ b/drivers/mtd/mtdsuper.c
@@ -0,0 +1,231 @@
+/* MTD-based superblock management
+ *
+ * Copyright (C) 2006 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells ([EMAIL PROTECTED])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+
+/*
+ * compare superblocks to see if they're equivalent
+ * - they are if the underlying MTD device is the same
+ */
+static int get_sb_mtd_compare(struct super_block *sb, void *_mtd)
+{
+   struct mtd_info *mtd = _mtd;
+
+   if (sb->s_mtd == mtd) {
+   DEBUG(2, "MTDSB: Match on device %d (\"%s\")\n",
+ mtd->index, mtd->name);
+   return 1;
+   }
+
+   DEBUG(2, "MTDSB: No match, device %d (\"%s\"), device %d (\"%s\")\n",
+ sb->s_mtd->index, sb->s_mtd->name, mtd->index, mtd->name);
+   return 0;
+}
+
+/*
+ * mark the superblock by the MTD device it is using
+ * - set the device number to be the correct MTD block device for pesuperstence
+ *   of NFS exports
+ */
+static int get_sb_mtd_set(struct super_block *sb, void *_mtd)
+{
+   struct mtd_info *mtd = _mtd;
+
+   sb->s_mtd = mtd;
+   sb->s_dev = MKDEV(MTD_BLOCK_MAJOR, mtd->index);
+   return 0;
+}
+
+/*
+ * get a superblock on an MTD-backed filesystem
+ */
+static int get_sb_mtd_aux(struct file_system_type *fs_type, int flags,
+ const char *dev_name, void *data,
+ struct mtd_info *mtd,
+ int (*fill_super)(struct super_block *, void *, int),
+ struct vfsmount *mnt)
+{
+   struct super_block *sb;
+   int ret;
+
+   sb = sget(fs_type, get_sb_mtd_compare, get_sb_mtd_set, mtd);
+   if (IS_ERR(sb))
+   goto out_error;
+
+   if (sb->s_root)
+   goto already_mounted;
+
+   /* fresh new superblock */
+   DEBUG(1, "MTDSB: New superblock for device %d (\"%s\")\n",
+ mtd->index, mtd->name);
+
+   ret = fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
+   if (ret < 0) {
+   up_write(>s_umount);
+   deactivate_super(sb);
+   return ret;
+   }
+
+   /* go */
+   sb->s_flags |= MS_ACTIVE;
+   return simple_set_mnt(mnt, sb);
+
+   /* new mountpoint for an already mounted superblock */
+already_mounted:
+   DEBUG(1, "MTDSB: Device %d (\"%s\") is already mounted\n",
+ mtd->index, mtd->name);
+   ret = simple_set_mnt(mnt, sb);
+   goto out_put;
+
+out_error:
+   ret = PTR_ERR(sb);
+out_put:
+   put_mtd_device(mtd);
+   return ret;
+}
+
+/*
+ * get a superblock on an MTD-backed filesystem by MTD device number
+ */
+static int get_sb_mtd_nr(struct file_system_type *fs_type, int flags,
+const char *dev_name, void *data, int mtdnr,
+int (*fill_super)(struct super_block *, void *, int),
+struct vfsmount *mnt)
+{
+   struct mtd_info *mtd;
+
+   mtd = get_mtd_device(NULL, mtdnr);
+   if (!mtd) {
+   DEBUG(0, "MTDSB: Device #%u doesn't appear to exist\n", mtdnr);
+   return -EINVAL;
+   }
+
+   return get_sb_mtd_aux(fs_type, flags, dev_name, data, mtd, fill_super,
+ mnt);
+}
+
+/*
+ * set up an MTD-based superblock
+ */
+int get_sb_mtd(struct file_system_type *fs_type, int flags,
+  const char *dev_name, void *data,
+  int (*fill_super)(struct super_block *, void *, int),
+  struct vfsmount *mnt)
+{
+   struct nameidata nd;
+   int mtdnr, ret;
+
+   if (!dev_name)
+   return -EINVAL;

[PATCH 1/4] NOMMU: Present backing device capabilities for MTD chardevs

2007-02-21 Thread David Howells
From: David Howells <[EMAIL PROTECTED]>

Present backing device capabilities for MTD character device files to allow
NOMMU mmap to do direct mapping where possible.

Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 drivers/mtd/Makefile |2 +
 drivers/mtd/chips/map_ram.c  |   17 +++
 drivers/mtd/chips/map_rom.c  |   17 +++
 drivers/mtd/devices/mtdram.c |   14 +
 drivers/mtd/internal.h   |   17 +++
 drivers/mtd/mtdbdi.c |   43 +
 drivers/mtd/mtdchar.c|   63 +-
 drivers/mtd/mtdcore.c|   15 ++
 drivers/mtd/mtdpart.c|   15 ++
 include/linux/mtd/mtd.h  |   14 +
 10 files changed, 215 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/Makefile b/drivers/mtd/Makefile
index c130e62..a75b82a 100644
--- a/drivers/mtd/Makefile
+++ b/drivers/mtd/Makefile
@@ -4,7 +4,7 @@
 # $Id: Makefile.common,v 1.7 2005/07/11 10:39:27 gleixner Exp $
 
 # Core functionality.
-mtd-y  := mtdcore.o
+mtd-y  := mtdcore.o mtdbdi.o
 mtd-$(CONFIG_MTD_PARTITIONS)   += mtdpart.o
 obj-$(CONFIG_MTD)  += $(mtd-y)
 
diff --git a/drivers/mtd/chips/map_ram.c b/drivers/mtd/chips/map_ram.c
index 5cb6d52..611b035 100644
--- a/drivers/mtd/chips/map_ram.c
+++ b/drivers/mtd/chips/map_ram.c
@@ -22,6 +22,8 @@ static int mapram_write (struct mtd_info *, loff_t, size_t, 
size_t *, const u_ch
 static int mapram_erase (struct mtd_info *, struct erase_info *);
 static void mapram_nop (struct mtd_info *);
 static struct mtd_info *map_ram_probe(struct map_info *map);
+static unsigned long mapram_unmapped_area(struct mtd_info *, unsigned long,
+ unsigned long, unsigned long);
 
 
 static struct mtd_chip_driver mapram_chipdrv = {
@@ -65,6 +67,7 @@ static struct mtd_info *map_ram_probe(struct map_info *map)
mtd->type = MTD_RAM;
mtd->size = map->size;
mtd->erase = mapram_erase;
+   mtd->get_unmapped_area = mapram_unmapped_area;
mtd->read = mapram_read;
mtd->write = mapram_write;
mtd->sync = mapram_nop;
@@ -80,6 +83,20 @@ static struct mtd_info *map_ram_probe(struct map_info *map)
 }
 
 
+/*
+ * Allow NOMMU mmap() to directly map the device (if not NULL)
+ * - return the address to which the offset maps
+ * - return -ENOSYS to indicate refusal to do the mapping
+ */
+static unsigned long mapram_unmapped_area(struct mtd_info *mtd,
+ unsigned long len,
+ unsigned long offset,
+ unsigned long flags)
+{
+   struct map_info *map = mtd->priv;
+   return (unsigned long) map->virt + offset;
+}
+
 static int mapram_read (struct mtd_info *mtd, loff_t from, size_t len, size_t 
*retlen, u_char *buf)
 {
struct map_info *map = mtd->priv;
diff --git a/drivers/mtd/chips/map_rom.c b/drivers/mtd/chips/map_rom.c
index cb27f85..359f61e 100644
--- a/drivers/mtd/chips/map_rom.c
+++ b/drivers/mtd/chips/map_rom.c
@@ -20,6 +20,8 @@ static int maprom_read (struct mtd_info *, loff_t, size_t, 
size_t *, u_char *);
 static int maprom_write (struct mtd_info *, loff_t, size_t, size_t *, const 
u_char *);
 static void maprom_nop (struct mtd_info *);
 static struct mtd_info *map_rom_probe(struct map_info *map);
+static unsigned long maprom_unmapped_area(struct mtd_info *, unsigned long,
+ unsigned long, unsigned long);
 
 static struct mtd_chip_driver maprom_chipdrv = {
.probe  = map_rom_probe,
@@ -40,6 +42,7 @@ static struct mtd_info *map_rom_probe(struct map_info *map)
mtd->name = map->name;
mtd->type = MTD_ROM;
mtd->size = map->size;
+   mtd->get_unmapped_area = maprom_unmapped_area;
mtd->read = maprom_read;
mtd->write = maprom_write;
mtd->sync = maprom_nop;
@@ -52,6 +55,20 @@ static struct mtd_info *map_rom_probe(struct map_info *map)
 }
 
 
+/*
+ * Allow NOMMU mmap() to directly map the device (if not NULL)
+ * - return the address to which the offset maps
+ * - return -ENOSYS to indicate refusal to do the mapping
+ */
+static unsigned long maprom_unmapped_area(struct mtd_info *mtd,
+ unsigned long len,
+ unsigned long offset,
+ unsigned long flags)
+{
+   struct map_info *map = mtd->priv;
+   return (unsigned long) map->virt + offset;
+}
+
 static int maprom_read (struct mtd_info *mtd, loff_t from, size_t len, size_t 
*retlen, u_char *buf)
 {
struct map_info *map = mtd->priv;
diff --git a/drivers/mtd/devices/mtdram.c b/drivers/mtd/devices/mtdram.c
index e427c82..438cdb9 100644
--- a/drivers/mtd/devices/mtdram.c
+++ b/drivers/mtd/devices/mtdram.c
@@ -62,6 +62,19 @@ static void ram_unpoint(struct 

[PATCH 2/4] NOMMU: Add support for direct mapping through mtdconcat if possible

2007-02-21 Thread David Howells
From: David Howells <[EMAIL PROTECTED]>

Add support for direct mapping through mtdconcat, if possible, by attaching the
samebacking_dev_info structure to the master.

It has some restrictions:

 (1) It won't permit direct mapping of concatenated devices that have differing
 BDIs.

 (2) It doesn't support maps that span the 'gap' between devices, although it
 possibly could if the devices spanned across return compatible
 (ie. contiguous) addresses from their get_unmapped_area() ops.

Signed-off-by: Gavin Lambert <[EMAIL PROTECTED]>
Signed-Off-By: David Howells <[EMAIL PROTECTED]>
---

 drivers/mtd/mtdconcat.c |   47 +++
 1 files changed, 47 insertions(+), 0 deletions(-)

diff --git a/drivers/mtd/mtdconcat.c b/drivers/mtd/mtdconcat.c
index 880580c..730689b 100644
--- a/drivers/mtd/mtdconcat.c
+++ b/drivers/mtd/mtdconcat.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -686,6 +687,40 @@ static int concat_block_markbad(struct mtd_info *mtd, 
loff_t ofs)
 }
 
 /*
+ * try to support NOMMU mmaps on concatenated devices
+ * - we don't support subdev spanning as we can't guarantee it'll work
+ */
+static unsigned long concat_get_unmapped_area(struct mtd_info *mtd,
+ unsigned long len,
+ unsigned long offset,
+ unsigned long flags)
+{
+   struct mtd_concat *concat = CONCAT(mtd);
+   int i;
+
+   for (i = 0; i < concat->num_subdev; i++) {
+   struct mtd_info *subdev = concat->subdev[i];
+
+   if (offset >= subdev->size) {
+   offset -= subdev->size;
+   continue;
+   }
+
+   /* we've found the subdev over which the mapping will reside */
+   if (offset + len > subdev->size)
+   return (unsigned long) -EINVAL;
+
+   if (subdev->get_unmapped_area)
+   return subdev->get_unmapped_area(subdev, len, offset,
+flags);
+
+   break;
+   }
+
+   return (unsigned long) -ENOSYS;
+}
+
+/*
  * This function constructs a virtual MTD device by concatenating
  * num_devs MTD devices. A pointer to the new device object is
  * stored to *new_dev upon success. This function does _not_
@@ -740,6 +775,8 @@ struct mtd_info *mtd_concat_create(struct mtd_info 
*subdev[],   /* subdevices to c
 
concat->mtd.ecc_stats.badblocks = subdev[0]->ecc_stats.badblocks;
 
+   concat->mtd.backing_dev_info = subdev[0]->backing_dev_info;
+
concat->subdev[0] = subdev[0];
 
for (i = 1; i < num_devs; i++) {
@@ -766,6 +803,15 @@ struct mtd_info *mtd_concat_create(struct mtd_info 
*subdev[],  /* subdevices to c
concat->mtd.flags |=
subdev[i]->flags & MTD_WRITEABLE;
}
+
+   /* only permit direct mapping if the BDIs are all the same
+* - copy-mapping is still permitted
+*/
+   if (concat->mtd.backing_dev_info !=
+   subdev[i]->backing_dev_info)
+   concat->mtd.backing_dev_info =
+   _backing_dev_info;
+
concat->mtd.size += subdev[i]->size;
concat->mtd.ecc_stats.badblocks +=
subdev[i]->ecc_stats.badblocks;
@@ -796,6 +842,7 @@ struct mtd_info *mtd_concat_create(struct mtd_info 
*subdev[],   /* subdevices to c
concat->mtd.unlock = concat_unlock;
concat->mtd.suspend = concat_suspend;
concat->mtd.resume = concat_resume;
+   concat->mtd.get_unmapped_area = concat_get_unmapped_area;
 
/*
 * Combine the erase block size info of the subdevices:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: GPL vs non-GPL device drivers

2007-02-21 Thread Helge Hafting

Jan-Benedict Glaw wrote:

On Tue, 2007-02-20 15:36:56 +0100, Helge Hafting <[EMAIL PROTECTED]> wrote:
  

If you have a need for "secret" source code, stuff most of it
in userspace.  Make the drivers truly minimal; perhaps their
open/closed status won't matter that much when the bulk
of the code and the cleverness is kept safe in userspace.

Note that keeping drivers small this way is the recommended
way of working anyway. It isn't merely a way to keep your
code away from the GPL - you always want a small kernel.



Keeping the legal stuff out of sight for a second, this'll solve the
"problem" for the embedded developer, but surely not for the Linux
community. Would you ever expect that eg. the thin GPL layer used by
ATI/NVidia would be merged iff the rest would run in userland?
  

A thin layer - no.  To get merged, a driver would have to
be of good quality.  I didn't think like that - I was thinking of
embedded developers that sometimes implement the
entire embedded application inside their device driver.

Which is crazy from a linux design standpoint, but sometimes
reasonable for an embedded developer when the sole purpose of
the embedded computer is to control the single "device" and
perhaps a little display with a couple of buttons.
The "app" part might not be that much
bigger than the device driver itself - it is then tempting to
cut some corners and put all in one place.

Of course this kind of "driver" ends up containing everything,
and nobody wants to GPL that.  Separating driver and app(s)
properly lets them keep a proprietary app, and a driver or two
that might be small and simple enough to be released.

It's just a workaround for the
linking-the-object-file-into-the-kernel-image problem, but after all,
it doesn't lead to a working driver being freely available.

MfG, JBG


Good point.  Fortunately, most devices are much simpler than
a card with accelerated 3D-graphichs.

Helge Hafting




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: patch x86_64-fix-2.6.18-regression-ptrace_oldsetoptions-should-be-accepted.patch queued to -stable tree

2007-02-21 Thread Blaisorblade
On Wednesday 21 February 2007 00:41, [EMAIL PROTECTED] wrote:
> This is a note to let you know that we have just queued up the patch titled
>
>  Subject: x86_64: fix 2.6.18 regression - PTRACE_OLDSETOPTIONS should
> be accepted
>
> to the 2.6.18-stable tree.  Its filename is

Since you are still maintaining 2.6.18, I've just sent another patch for that, 
i.e. the backport of commit 14679eb3c50897889ba62f9a37e3bcd8a205b5e7.
Could you still merge it in this release, especially since this is the last 
2.6.18-stable you are doing?
Also, this patch should also be merged in 2.6.20, but I saw no mail about 
this, so I wanted to make sure it's heading there too.

> x86_64-fix-2.6.18-regression-ptrace_oldsetoptions-should-be-accepted.patch
>
> A git repo of this tree can be found at
>
> http://www.kernel.org/git/?p=linux/kernel/git/gregkh/stable-queue.git;a=sum
>mary

Hmm, this should be (note the missing gregkh in the path):
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
Chiacchiera con i tuoi amici in tempo reale! 
 http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] DualFS: File System with Meta-data and Data Separation

2007-02-21 Thread Jörn Engel
On Wed, 21 February 2007 05:36:22 +0100, Juan Piernas Canovas wrote:
> >
> >I don't see how you can guarantee 50% free segments.  Can you explain
> >that bit?
> It is quite simple. If 50% of your segments are busy, and the other 50% 
> are free, and the file system needs a new segment, the cleaner starts 
> freeing some of busy ones. If the cleaner is unable to free one segment at 
> least, your file system gets "full" (and it returns a nice ENOSPC error). 
> This solution wastes the half of your storage device, but it is 
> deadlock-free. Obviously, there are better approaches.

Ah, ok.  It is deadlock free, if the maximal height of your tree is 2.
It is not 100% deadlock free if the height is 3 or more.

Also, I strongly suspect that your tree is higher than 2.  A medium
sized directory will have data blocks, indirect blocks and the inode
proper, which gives you a height of 3.  Your inodes need to get accessed
somehow and unless they have fixed positions like in ext2, you need a
further tree structure of some sorts, so you're more likely looking at a
height of 5.

With a height of 5, you would need to keep 80% of you metadata free.
That is starting to get wasteful.

So I suspect that my proposed alternate cleaner mechanism or the even
better "hole plugging" mechanism proposed in the paper a few posts above
would be a better path to follow.

> >A fine principle to work with.  Surprisingly, what is the worst case for
> >you is the common case for LogFS, so maybe I'm more interested in it
> >than most people.  Or maybe I'm just more paranoid.
> 
> No, you are right. It is the common case for LogFS because it has data and 
> meta-data blocks in the same address space, but that is not the case of 
> DualFS. Anyway, I'm very interested in your work because any solution to 
> the problem of the GC will be also applicable to DualFS. So, keep up with 
> it. ;-)

Actually, no.  It is the common case for LogFS because it is designed
for flash media.  Unlike hard disks, flash lifetime is limited by the
amount of data written to it.  Therefore, having a cleaner run when the
filesystem is idle would cause unnecessary writes and reduce lifetime.

As a result, the LogFS cleaner runs as lazily as possible and the
filesystem tries hard not to mix data with different lifetimes in one
segment.  LogFS tries to avoid the cleaner like the plague.  But if it
ever needs to run it, the deadlock scenario is very close and I need to
be very aware of it. :)

In a way, the DualFS approach does not change rules for the
log-structured filesystem at all.  If you had designed your filesystem
in such a way that you simply used two existent filesystems and wrote
Actual Data (AD) to one, Metadata (MD) to another, what is MD to DualFS
is plain data to one of your underlying filesystems.  It can cause a bit
of confusion, because I tend to call MD "data" and you tend to call AD
"data", but that is about all.

Jörn

-- 
But this is not to say that the main benefit of Linux and other GPL
software is lower-cost. Control is the main benefit--cost is secondary.
-- Bruce Perens
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PCI riser cards and PCI irq routing, etc

2007-02-21 Thread Krzysztof Halasa
Udo van den Heuvel <[EMAIL PROTECTED]> writes:

> So if my non-VIA riser card can use DN 19 and also INT_A things should work?

That INT_A may be INT_A from their (motherboard) point of view, but
the riser card doesn't know about that, it only knows INTs as seen
at its PCI edge connector (so this INT_A here is meaningless).

Device numbers aren't rotated but rather derived from address lines
(address/data). AD0-31 lines are the same across the whole PCI bus.
That means device numbers are independent of POV.

> (assuming that VIA Epia EN BIOS 1.07 is enough to use this card)

My VIA EPIA-M 600 is probably older than your one, so I'd assume
it should work as well.
When you configure 0x13 and 0x14, both devices get IRQs - that means
the BIOS can see both of them.

> The DN is the only variable so INT lines are hardwired on the riser card?

Yep. You just need a bit of soldering.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] unionfs: fix memory leak when calling krealloc

2007-02-21 Thread Pekka J Enberg
From: Pekka Enberg <[EMAIL PROTECTED]>

We must not overwrite the same pointer that is passed to krealloc() 
because it can return NULL without freeing the buffer.  Fixes a memory 
leak introduced by me.

Cc: Josef Sipek <[EMAIL PROTECTED]>
Signed-off-by: Pekka Enberg <[EMAIL PROTECTED]>
---
 fs/unionfs/copyup.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: 2.6/fs/unionfs/copyup.c
===
--- 2.6.orig/fs/unionfs/copyup.c2007-02-21 14:15:30.0 +0200
+++ 2.6/fs/unionfs/copyup.c 2007-02-21 14:16:19.0 +0200
@@ -658,11 +658,14 @@
 
/* grow path table */
if (count == num_dentry) {
-   path = krealloc(path, kmalloc_size * 2, GFP_KERNEL);
-   if (!path) {
+   void *p;
+
+   p = krealloc(path, kmalloc_size * 2, GFP_KERNEL);
+   if (!p) {
hidden_dentry = ERR_PTR(-ENOMEM);
goto out;
}
+   path = p;
kmalloc_size = ksize(path);
num_dentry = kmalloc_size / sizeof(struct dentry *);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-git13 kernel BUG at /mnt/md0/devel/linux-git/kernel/time/tick-sched.c:168

2007-02-21 Thread Michal Piotrowski
Michal Piotrowski napisał(a):
> On 17/02/07, Alex Riesen <[EMAIL PROTECTED]> wrote:
>> Thomas Gleixner, Sat, Feb 17, 2007 16:14:17 +0100:
>> > On Sat, 2007-02-17 at 15:47 +0100, Alex Riesen wrote:
>> > > > > 164 if (need_resched())
>> > > > > 165 goto end;
>> > > > > 166
>> > > > > 167 cpu = smp_processor_id();
>> > > > > 168 BUG_ON(local_softirq_pending());
>> > > >
>> > > > Hmm, the BUG_ON is inside of an interrupt disabled region, so we
>> should
>> > > > have bailed out early in the need_resched() check above (because
>> we are
>> > > > in the idle task context according to the stack trace).
>> > > >
>> > > > Is this reproducible ?
>> > >
>> > > Seen this too (Ubuntu, P4/ht-SMT, SATA, typed from screen):
>> >
>> > Can you please apply the patch below, so we can at least see, which
>> > softirq is pending. This should trigger independently of hrtimers and
>> > dynticks. You can keep it compiled in and disable it at the kernel
>> > commandline with "nohz=off" and / or "highres=off"
>>
>> It did, only one time:
>>
>> Idle: local softirq pending: 0020<6>USB Universal Host Controller
>> Interface driver v3.0
>>
> 
> sudo cat /var/log/messages | grep Idle
> Feb 17 17:35:34 bitis-gabonica kernel: Idle: local softirq pending:
> 0020<6>hdd: ATAPI 48X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache,
> UDMA(33)
> Feb 17 19:20:01 bitis-gabonica kernel: Idle: local softirq pending:
> 0020<3>Idle: local softirq pending: 0020<3>Idle: local softirq
> pending: 0020<7>PM: Removing info for No Bus:vcs7
> 
> cat /proc/interrupts
>   CPU0   CPU1
>  0: 232838  0   IO-APIC-edge  timer
>  1:401  0   IO-APIC-edge  i8042
>  7:  0  0   IO-APIC-edge  parport0
>  8:  1  0   IO-APIC-edge  rtc
>  9:  1  0   IO-APIC-fasteoi   acpi
>  12:  4  0   IO-APIC-edge  i8042
>  14:319  0   IO-APIC-edge  ide0
>  15:   1612  0   IO-APIC-edge  ide1
>  16:  16494  0   IO-APIC-fasteoi   uhci_hcd:usb3, libata
>  17:   4670  0   IO-APIC-fasteoi   uhci_hcd:usb1, uhci_hcd:usb4
>  18:  0  0   IO-APIC-fasteoi   uhci_hcd:usb2
>  19:  2  0   IO-APIC-fasteoi   ehci_hcd:usb5
>  20:   2822  0   IO-APIC-fasteoi   Intel ICH5
>  21:   2881  0   IO-APIC-fasteoi   eth1
>  22: 58  0   IO-APIC-fasteoi   eth0
> NMI:  0  0
> LOC: 232562 232846
> ERR:  0
> MIS:  0
> 
> I can confirm that this is ICH5 SATA controller problem.

Here is something interesting

cat /var/log/messages | tail -n 300 | grep NOHZ
Feb 20 20:09:39 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 20 20:09:39 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 20 21:10:01 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 20 23:20:01 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:10:28 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:10:46 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
^

Feb 21 05:10:57 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:11:48 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:12:02 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:12:02 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:12:23 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:12:29 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:13:27 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:13:49 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:13:54 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:14:48 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:15:11 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:15:13 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:15:16 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:16:18 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:16:31 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:17:03 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:17:06 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:17:09 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:17:12 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
Feb 21 05:17:17 bitis-gabonica kernel: NOHZ: local_softirq_pending 20
Feb 21 05:17:18 bitis-gabonica kernel: NOHZ: local_softirq_pending 02
   
23 times in 7 minutes.


Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group (PL)
(http://www.stardust.webpages.pl/ltg/)
LTG - Linux Testers Group (EN)
(http://www.stardust.webpages.pl/linux_testers_group_en/)
-
To unsubscribe from this list: send the line "unsubscribe 

Re: [PATCH 2/3] Don't change transfer speed while requests are in flight

2007-02-21 Thread Alan
> +  * If we are in an interrupt, it should be safe to issue
> +  * SETFEATURES manually, since there shouldn't be any requests in
> +  * flight.

There may be error recovery going on from a timeout on another processor.
I don't see how your code protects against that (and the old code is
broken too)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


x86_64: PTRACE_[GS]ET_THREAD_AREA should be accepted

2007-02-21 Thread Paolo 'Blaisorblade' Giarrusso
This patch backports from 2.6.19 a fix to a 2.6.18 regression.

Like for PTRACE_OLDSETOPTIONS, we should fix PTRACE_[GS]ET_THREAD_AREA. This had
been done already for 2.6.19, so this is for 2.6.18-stable only.
This was tested with UML/32bit as API consumer, both before and after this
patch.

Cc: Andi Kleen <[EMAIL PROTECTED]>
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <[EMAIL PROTECTED]>
Index: linux-2.6.18/arch/x86_64/ia32/ptrace32.c
===
--- linux-2.6.18.orig/arch/x86_64/ia32/ptrace32.c
+++ linux-2.6.18/arch/x86_64/ia32/ptrace32.c
@@ -241,6 +241,8 @@ asmlinkage long sys32_ptrace(long reques
case PTRACE_SYSCALL:
case PTRACE_OLDSETOPTIONS:
case PTRACE_SETOPTIONS:
+   case PTRACE_SET_THREAD_AREA:
+   case PTRACE_GET_THREAD_AREA:
return sys_ptrace(request, pid, addr, data); 
 
default:


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] LinuxPPS: Pulse per Second support for Linux

2007-02-21 Thread Rodolfo Giometti
On Mon, Feb 19, 2007 at 06:56:20PM -0800, H. Peter Anvin wrote:

> It's not a precondition for a file descriptor, either.  There are plenty 
> of ioctl-only device drivers in existence.
> 
> Furthermore, a file descriptor doesn't imply a device entry.  Consider 
> pipe(2), for example.
> 
> As far as the kernel is concerned, a file handle is a nice, uniform 
> system for providing communication between the kernel and user space. 
> It doesn't matter if one can read() or write() on it; it's perfectly 
> normal to support only a subset of the normal operations.

The problem is that sometimes you cannot have a filedescriptor at
all. Think about a PPS source connected with a CPU's GPIO pin. You
have no filedes to use and defining one just for a PPS source or for a
class of PPS sources, I think, is a non sense.

RFC simply doesn't consider the fact that you can have a PPS source
__without__ a filedes connected with, and a single filedes is
considered __always__ connected with a single PPS source.

Ciao,

Rodolfo

-- 

GNU/Linux Solutions  e-mail:[EMAIL PROTECTED]
Linux Device Driver [EMAIL PROTECTED]
Embedded Systems[EMAIL PROTECTED]
UNIX programming phone: +39 349 2432127
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] videobuf_qbuf: fix? possible videobuf_queue->stream corruption and lockup

2007-02-21 Thread Adrian Bunk
On Tue, Jan 23, 2007 at 09:10:08PM -0200, Mauro Carvalho Chehab wrote:
> Em Ter, 2007-01-23 às 20:57 +0300, Oleg Nesterov escreveu:
> > I am pretty sure the bug is real, but the patch may be wrong, please review.
> > 
> > We are doing ->buf_prepare(buf) before adding buf to q->stream list. This
> > means that videobuf_qbuf() should not try to re-add a STATE_PREPARED buffer.
> > 
> > Signed-off-by: Oleg Nesterov <[EMAIL PROTECTED]>
> Signed-off-by: Mauro Carvalho Chehab <[EMAIL PROTECTED]>
> 
> Chris/Adrian,
> 
> IMO, this should also be applied at -stable trees.
>...

Thanks, applied to 2.6.16 (a trivial backport was required since the 
dprintk() was added after 2.6.16).

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[-mm patch] i386 mpparse.c: remove an unused variable

2007-02-21 Thread Adrian Bunk
On Sat, Feb 17, 2007 at 09:51:46PM -0800, Andrew Morton wrote:
>...
> Changes since 2.6.20-mm1:
>...
> +i386-irq-kill-irq-compression.patch
>...
>  x86 updates
>...

This patch removes a no longer used variable.

Spotted by the GNU C compiler.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---
--- linux-2.6.20-mm2/arch/i386/kernel/mpparse.c.old 2007-02-20 
23:41:11.0 +0100
+++ linux-2.6.20-mm2/arch/i386/kernel/mpparse.c 2007-02-20 23:41:25.0 
+0100
@@ -1046,7 +1046,6 @@
int ioapic = -1;
int ioapic_pin = 0;
int idx, bit = 0;
-   static int pci_irq = 16;
 
/* Don't set up the ACPI SCI because it's already set up */
if (acpi_gbl_FADT.sci_interrupt == gsi)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6 patch] drivers/hid/hid-debug.c should #include

2007-02-21 Thread Adrian Bunk
Every file should include the headers containing the prototypes for
it's global functions.

Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>

---
--- linux-2.6.20-mm2/drivers/hid/hid-debug.c.old2007-02-20 
23:07:32.0 +0100
+++ linux-2.6.20-mm2/drivers/hid/hid-debug.c2007-02-21 00:00:41.0 
+0100
@@ -29,6 +29,7 @@
  */
 
 #include 
+#include 
 
 struct hid_usage_entry {
unsigned  page;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >